Delila Program: search

search program

Documentation for the search program is below, with links to related programs in the "see also" section.

{   version = 7.03; (* of search.p 2017 Oct 30}

(* begin module *)
      search: search a book for strings

      search(book: in, searchinst: out, result: out, input: intty,
             searchfeatures: out, output: out)

      book: any book from the Delila system

      searchinst: Delila instructions of the form 'get from 56 -5 to same
  +5;' that define the location of found strings.  One must turn on
  printing to the searchinst file to obtain these (see below).  If
  there are instructions with names inside double quotes, then these
  will be put out as Delila name instructions.  See the searchfeatures
  file; this is turned on at the same time.  See examples.

      result: a transcript of the results seen on the output file.
         Lines not containing numerical data begin with an '*' so that
         they can be ignored by other programs such as genhis and xyplo.

      input: typed input from the user, or a file of rules.

      searchfeatures: features for the lister program.
         To start the file, simply provide a name inside double quotes
         (eg "EcoRI").
         Subsequent searches (eg gaattc) will be labeled with that name.
         To turn off the features, use an empty quote string, as "".
         The searchfeatures file can be concatenated with other features
         to create the features file for lister.

      output: messages, results and prompts to the user.

      (note: in the following examples, do not type the quote marks.)
      the search program allows one to look for simple patterns in a book.
      the patterns can be like 'ggag', that is, with particular bases
      (always written 5' to 3') or it can include unknown 'spacing' bases,
      as in 'ggagnnnnnnnnnatg'.  any base will be allowed in the n positions.
      one can shorten the instruction: 'ggag9natg', and one can make some of
      the spacing 'extentable' as in 'ggag5e4natg' which allows a 5 to 9
      spacing between the two elements.  one can obtain Delila instructions
      for the strings found by turning on printing, setting 'from' and 'to'
      values and searching.  for example: 'd p f -5 t +10 q gga6e3n#atg'
      sets up printing, with from=-5, to=+10.  the search will result in
      instructions for strings centered on the a of the atg (by the # symbol).
      the form '(a/g)ct' means to search for both 'act' and 'gct'.
      you may specify numbers of mismatches, and control how much is printed.
      you can type many commands on one line, separated by spaces.
      you can also search for relations between bases.  currently the
      allowed relations are: identity, non-identity, complementarity and
      non-complementarity.  see or type 'help' while inside
      the program to get more information.

      NOTE: Many commands now are capital letters to avoid
      confusion between commands.  See the help function (H, ?)
      for details.

      2004 July 21: There is a new command, B, which
      defines colors on seach features as displayed by the
      lister program.


      If one is working with an odd binding site (one with an odd number
      of bases) one should use the # symbol to obtain Delila instructions.
      The complement sequence will continue to number the central base.
          gaa#nttc complemented becomes gaa#nttc

      If one is working with an even binding site (one with an even number
      of bases) one should use the % symbol to obtain Delila instructions.
      The complement sequence will continue to number the following base.
          ga%attc complemented becomes ga#attc

      The program is pretty smart about writing the Delila instructions.  If
      one searches for the complementary sequence, the instructions are
      automatically written to extract the complementary pattern found.  Thus
      if one searches for #gtt in ex0bk (the example book file), there is one
      found in the positive direction of the fragment.  Then if one takes the
      complement with the "~" command, one is searching for aa#c.  Two of these
      are located in the piece.  The instructions are written so that the gtt's
      all line up, as is easily checked by extracting the fragments with delila
      and looking with alist.

      To create searchfeatures, define the name of a search string by typing
      the name inside quotes (as: "EcoRI") and then search.  Vertical bars or
      carets (| or ^) in the search string (as: g|aattc) will carry over to
      the feature.

If you have search instructions:

from -200
to   +200

then the resulting Delila instructions in searchinst will look like:

organism E.coli; chromosome E.coli;
piece U00096;
name "aceB";
get from 4212981 -200 to same +200 direction +;


see also
   An example search parameter, searchp file: searchp
   A search parameter file for most restriction enzymes: enzyme

   Program to create input DNA sequences: delila.p
   Example book for input: ex0bk
   Program that takes the bookand the scan features as input
   to create a map of the DNA sequence: lister.p

   Example colors:
   image for

   Example of colored boxes in search results:

   A second colored box example:

   Thomas D. Schneider, modified by Gary Stormo

   There is overlap between the letters used as commands to the program and
   letters used as ambiguous bases.  For instance, h can mean (a/c/t) or it
   can mean 'help'.  The best way to avoid confusion is to always start
   search strings with either a,c,g,t,n or (.  Warning: if you use a file for
   input, be sure that the rules include a quit command and have no errors in
   them.  It is possible that errors will lead to an infinite loop, though
   this has never been observed.   (This may be a general problem with
   interactive i/o in pascal on your computer.)

   The searchfeatures only work with rigid strings since only the first
   definition will be accepted by lister.  It is not clear how to handle
   variable sizes (like gat3etag).

   A search for "incorrect.symmetry" cc#gg will be located correctly, but
   following that by ~ = (to get the complement and search again) will not
   give a correct display.  Using cc%gg will because the % matches the
   symmetry of the site.  Likewise, "ok" tc#nga ~ = "not.ok" tc%nga ~ =
   It is not yet clear whether the user can be protected against this.

   The variables orgchange, chrchange and piechange do not work properly.
   Probably they should be done for both the book being read AND for the
   delila instructions being written.
   HOWEVER it seems that since the book has a certain structure,
   the output instructions can follow it to some extent.

   1999 June 2:  It would be nice to record mismatches on the feature, but
   this would require changing the definition for every found piece!

   2002 Aug 28: add to wish list - Delila format mutations
* piece: U24170, #1, configuration: linear, direction: +, begin: 3158, end: 3238
* "atcctgggaatttctgggaa"
  " xx     x  x      x " 5 mismatche(s)
   ^ 3194 *
Have search spit out the "with"-type Delila instructions for these.

   2005 Sep 15:  The program will go into an infinite loop if one calls it
   non-interactively like this:

     search < searchp

   if there is no final 'q' command.  It is not clear how to solve
   this because interactive input relys on detecting the eof.  So just
   make sure that your searchp files end with 'q'!

(* end module *)
{This manual page was created by makman 1.45}

{created by htmlink 1.62}