| By downloading this code you agree to the Source Code Use License (PDF). | 
{   version = 1.47; (* of sebo.p 2010 Oct 28}
(* begin module describe.sebo *)
(*
name
   sebo: search for aligned book sequences in another book
synopsis
   sebo(inst: in, book: in, target: in, sebop: in,
        instout: out, list: out, output: out)
files
   inst:  Delila instructions to create a book.
   book:  The Delila book created with inst, containing fragments to search
          in target.
   target:  Target book to search, usually a whole genome.  The sequence is
      treated as a circle to guarantee correct results on circular genomes.
      The incorrect results on linear chromosomes will be quite rare because
      of the telomeres.
   sebop:  parameters to control the program.  The file must contain the
      following parameters, one per line:
      1. parameterversion: The version number of the program.  This allows
         the user to be warned if an old parameter file is used.
      2. If the first character of the second line is
         'f' (for 'first') then the sequences are always aligned by their
         first base.
         'i' then the sequences are aligned by the delila instructions.  If
         the inst file is empty, alignment is forced to the 'b' mode.
         'b' (for 'internal') then the alignment is on the internal zero of
         the book's sequence.  This option is to be used when "default
         coordinate zero" is used in the Delila instructions.
      3. fromdo, todo: (integer) range of aligned book to use in the search.
      4. rangecontrol, frominst, toinst: (char, integer, integer):
         Range of Delila instructions output to instout.
         If range control, the first character on the line is:
         'a' then the instructions will give the ORIGINAL fragment in
         absolute coordinates.
         'r' then the instructions will give the ORIGINAL fragment in
         relative coordinates.
         For any other initial character, the instructions will give
         the range frominst to toinst.
      5. mismatches: (integer) number of mismatches allowed.
      6. alertinterval: (integer) Many genomes are large and a search may
         take a while, making the program seem to be doing nothing.  To
         observe that the program is running, set this parameter to a number
         of bases.  Sebo will write out the location it is currently
         searching every alertinterval bases.  Nonpositive values disable the
         feature.
   instout:  revised inst giving Delila instructions for finding the
      fragments specified in inst/book in the target.  Full names of pieces
      from the book (such as those assigned by Delila instructions of the
      form "name 'abinddingsite';") will default to previous full names if
      there are any.  This follows the convention of blank name on the alist
      being the previous name.
   list: display of aligned locations found.
   output: messages to the user
description
   Nowadays it is often the case that old GenBank entries become fused into a
   single genome.  It would be nice to be able to automatically convert old
   delila instructions to the new genome.
examples
********************************************************************************
                              Example 1
----- sebo.inst:
title "An OxyR site";
{Example instruction file for sebo }
{ number only the pieces, starting at 1 }
  default numbering piece;
  default numbering 1;
  default out-of-range reduce-range;
organism E.coli;
chromosome E.coli;
name "ahpC";
piece D13187; { ECOD13187 };
get from 116 -200 to 116 +200 direction +;
----- sebop:
1.21       version of sebo that this parameter file is designed for.
i          f: first base, i: inst, b: book alignment
-30 30     fromdo todo: range of aligned book to use in the search.
-200 200   frominst toinst: range of delila instructions output to instout
2          number of mismatches allowed
500000     alertinterval (bases)
----- target:
E. coli genome U00096
----- instout:
title "sebo 1.21 search with book";
{ sebop parameters:
i: use the alignedbase from the book
       -30         30 range of book alignment used for search
      -200        200 range of resulting output instructions
         2            mismatches allowed
    500000            alert interval
}
organism E.coli; chromosome E.coli;
piece U00096; { the target sequence is 4639221 bases }
{ original source probe: }
{ name ""; piece D13187; get from 1 to 316 direction +; }
{ final target instructions: }
name "ahpC"; get from 638089 -200 to same +200 direction +;
----- list:
sebo 1.21 search with book";
sebop parameters:
i: use the alignedbase from the book
       -30         30 range of book alignment used for search
      -200        200 range of resulting output instructions
         2            mismatches allowed
    500000            alert interval
Target sequence:
piece U00096; get from 1 to 4639221 direction +; { 4639221 bases }
   Searching target with probe: name "ahpC"; piece D13187; get from 1 to 316 direction +;
   ---------------------                   +++++++++++++++++++++
   322222222221111111111--------- +++++++++111111111122222222223
   0987654321098765432109876543210123456789012345678901234567890
   ttacgaaggttgtaaggtaaaacttatcgatttgataatggaaacgcattaccggaatcgg probe
                                                      X X        differences
   ttacgaaggttgtaaggtaaaacttatcgatttgataatggaaacgcattagccgaatcgg target
                                 ^ found at 638089 on the target
********************************************************************************
                              Example 2
Suppose that you have some sequences in a file and one would like to locate
them on the chromosome of an organism in the form of Delila instructions.
Here is how to create the input files for sebo:
1.  Convert the sequences into a Delila book using one of the programs
rawbk, makebk, dbbk, or mkdb (see the web page mentioned below).  This gives
you the file 'book' in directory 1.
2.  In another directory (2) build a Delila library for your organism (this
is also described on that web page).
3.  Use delila to extract both the sequence and the complement, for example:
   title "Delila instructions for Bacteriophage T4 and its complement";
   organism c.T4; chromosome c.T4; piece AF158101;
   get all piece;
   get all piece direction -;
4. Move or link the resulting book into the file 'target' in Directory 1.
5. Then set up the sebop for searching without Delila instructions (since no
delila instructions were used to make the book):
1.21       version of sebo that this parameter file is designed for.
f          f: first base, i: inst, b: book alignment
0  40      fromdo todo: range of aligned book to use in the search.
0  40      frominst toinst: range of delila instructions output to instout
10         number of mismatches allowed
500000     alertinterval (bases)
6. Finally use sebo to create the instout.  You may have to allow for some
mismatches.  These instructions should allow you to extract the sequences
from the organism database
7. Sebo reports its results by adding lots of comments to instout in the form
{}.  If you want to remove them, (so you have a clean set of instructions)
you can use the nocom program.  You can remove the extra blanks with
noblank.  In unix:
nocom < instout | sed -e 's/{}//g' | noblank > instclean
You can also remove redundant delila instructions for organism, chromosome
and piece by hand, especially if there is only one piece in the target.
********************************************************************************
documentation
see also
   Sequence extraction program: delila.p
   Simple search engine: search.p
   More details on alignment types: alist.p
   example sebop parameter file: sebop
   example inst file: sebo.inst
   Programs for conversion of raw sequences into a Delila Book:
   rawbk.p makebk.p dbbk.p mkdb.p
   Conversion of a book of sequences into a Delila Library:
   https://alum.mit.edu/www/toms/delilalibraries.html
   Remove the comments created by sebo: nocom.p
   Remove the blank lines created by sebo: noblank.p
   Shift the instructions to whereever one wants: instshift.p
author
   Thomas Dana Schneider
bugs
technical notes
*  The org/chr/pie names written to delila instructions come from the current
   target sequence.
*  Delila instructions for sites frequently come in pairs:
get from 163 -200 to 163 +200 direction +;
get from 163 +200 to 163 -200 direction -;
   So the target will be searched in both orientations.
*  constant dnamax is set to a large value to increase
   the speed of the search.
* a nice parameter to have would allow the user to shift the coordinate of
the zero into any position they want in the piece.  Fortunately this can be
done with the instshift program, so this is not high priority.  The base that
one specifies would become the zero base of the Delila instructions.
* 2001 April 17: if the fromdo/todo range given for the book is outside of
the piece in the book (in aligned searching), then the book sequence will not
be found in the target.
*)
(* end module describe.sebo *)
{This manual page was created by makman 1.45}
{created by htmlink 1.62}