By downloading this code you agree to the
Source Code Use License (PDF). |
{ version = 7.03; (* of search.p 2017 Oct 30}
(* begin module describe.search *)
(*
name
search: search a book for strings
synopsis
search(book: in, searchinst: out, result: out, input: intty,
searchfeatures: out, output: out)
files
book: any book from the Delila system
searchinst: Delila instructions of the form 'get from 56 -5 to same
+5;' that define the location of found strings. One must turn on
printing to the searchinst file to obtain these (see below). If
there are instructions with names inside double quotes, then these
will be put out as Delila name instructions. See the searchfeatures
file; this is turned on at the same time. See examples.
result: a transcript of the results seen on the output file.
Lines not containing numerical data begin with an '*' so that
they can be ignored by other programs such as genhis and xyplo.
input: typed input from the user, or a file of rules.
searchfeatures: features for the lister program.
To start the file, simply provide a name inside double quotes
(eg "EcoRI").
Subsequent searches (eg gaattc) will be labeled with that name.
To turn off the features, use an empty quote string, as "".
The searchfeatures file can be concatenated with other features
to create the features file for lister.
output: messages, results and prompts to the user.
description
(note: in the following examples, do not type the quote marks.)
the search program allows one to look for simple patterns in a book.
the patterns can be like 'ggag', that is, with particular bases
(always written 5' to 3') or it can include unknown 'spacing' bases,
as in 'ggagnnnnnnnnnatg'. any base will be allowed in the n positions.
one can shorten the instruction: 'ggag9natg', and one can make some of
the spacing 'extentable' as in 'ggag5e4natg' which allows a 5 to 9
spacing between the two elements. one can obtain Delila instructions
for the strings found by turning on printing, setting 'from' and 'to'
values and searching. for example: 'd p f -5 t +10 q gga6e3n#atg'
sets up printing, with from=-5, to=+10. the search will result in
instructions for strings centered on the a of the atg (by the # symbol).
the form '(a/g)ct' means to search for both 'act' and 'gct'.
you may specify numbers of mismatches, and control how much is printed.
you can type many commands on one line, separated by spaces.
you can also search for relations between bases. currently the
allowed relations are: identity, non-identity, complementarity and
non-complementarity. see delman.use.search or type 'help' while inside
the program to get more information.
NOTE: Many commands now are capital letters to avoid
confusion between commands. See the help function (H, ?)
for details.
2004 July 21: There is a new command, B, which
defines colors on seach features as displayed by the
lister program.
examples
If one is working with an odd binding site (one with an odd number
of bases) one should use the # symbol to obtain Delila instructions.
The complement sequence will continue to number the central base.
gaa#nttc complemented becomes gaa#nttc
If one is working with an even binding site (one with an even number
of bases) one should use the % symbol to obtain Delila instructions.
The complement sequence will continue to number the following base.
ga%attc complemented becomes ga#attc
The program is pretty smart about writing the Delila instructions. If
one searches for the complementary sequence, the instructions are
automatically written to extract the complementary pattern found. Thus
if one searches for #gtt in ex0bk (the example book file), there is one
found in the positive direction of the fragment. Then if one takes the
complement with the "~" command, one is searching for aa#c. Two of these
are located in the piece. The instructions are written so that the gtt's
all line up, as is easily checked by extracting the fragments with delila
and looking with alist.
To create searchfeatures, define the name of a search string by typing
the name inside quotes (as: "EcoRI") and then search. Vertical bars or
carets (| or ^) in the search string (as: g|aattc) will carry over to
the feature.
If you have search instructions:
D
from -200
to +200
q
"aceB"
agttatcaagtatttttaattaaaatggaaattgtttttgattttgcattttaaatgagtagtcttagtt#n
q
then the resulting Delila instructions in searchinst will look like:
organism E.coli; chromosome E.coli;
piece U00096;
name "aceB";
get from 4212981 -200 to same +200 direction +;
documentation
delman.use.search
see also
An example search parameter, searchp file: searchp
A search parameter file for most restriction enzymes: enzyme
Program to create input DNA sequences: delila.p
Example book for input: ex0bk
Program that takes the bookand the scan features as input
to create a map of the DNA sequence: lister.p
Example colors:
Example of colored boxes in search results:
demo-colors-search.zip
A second colored box example: search-bar-color.zip
author
Thomas D. Schneider, modified by Gary Stormo
bugs
There is overlap between the letters used as commands to the program and
letters used as ambiguous bases. For instance, h can mean (a/c/t) or it
can mean 'help'. The best way to avoid confusion is to always start
search strings with either a,c,g,t,n or (. Warning: if you use a file for
input, be sure that the rules include a quit command and have no errors in
them. It is possible that errors will lead to an infinite loop, though
this has never been observed. (This may be a general problem with
interactive i/o in pascal on your computer.)
The searchfeatures only work with rigid strings since only the first
definition will be accepted by lister. It is not clear how to handle
variable sizes (like gat3etag).
A search for "incorrect.symmetry" cc#gg will be located correctly, but
following that by ~ = (to get the complement and search again) will not
give a correct display. Using cc%gg will because the % matches the
symmetry of the site. Likewise, "ok" tc#nga ~ = "not.ok" tc%nga ~ =
It is not yet clear whether the user can be protected against this.
The variables orgchange, chrchange and piechange do not work properly.
Probably they should be done for both the book being read AND for the
delila instructions being written.
HOWEVER it seems that since the book has a certain structure,
the output instructions can follow it to some extent.
1999 June 2: It would be nice to record mismatches on the feature, but
this would require changing the definition for every found piece!
2002 Aug 28: add to wish list - Delila format mutations
---
* piece: U24170, #1, configuration: linear, direction: +, begin: 3158, end: 3238
* "atcctgggaatttctgggaa"
"agactgggcatgtctgggca"
" xx x x x " 5 mismatche(s)
^ 3194 *
---
Have search spit out the "with"-type Delila instructions for these.
2005 Sep 15: The program will go into an infinite loop if one calls it
non-interactively like this:
search < searchp
if there is no final 'q' command. It is not clear how to solve
this because interactive input relys on detecting the eof. So just
make sure that your searchp files end with 'q'!
*)
(* end module describe.search *)
{This manual page was created by makman 1.45}
{created by htmlink 1.62}