Delila Program: index

index program

Documentation for the index program is below, with links to related programs in the "see also" section.

{   version = 9.33; (* of index.p 2017 Sep 14}

(* begin module describe.index *)
      index: make an alphabetic list of oligonucleotides in a book

      index(book: in, ind: out, indexp: in; output: out)

      book: the book of sequences to be indexed
      ind: the alphabetized index to the book
      indexp: parameters to control index.  if this file is empty, then
         default values are used.  otherwise there may be 4 or 5 lines:
         first line: the number of bases in the alphabetizing window
         second line: the number of bases to print before the central window
         third line: the number of bases to print in the central window
         fourth line: the number of bases to print after the central window
         fifth line: if the first letter is a 't', then the index
            will run in a teaching mode.  do not use this mode on large books.
         sixth line: if the first letter is 'f' then only the first
            oligo of each sequence is used for alphabitization.  This produces
            a drastic reduction in the number of oligos sorted.  It is
            meant to be used to sort aligned sequences, to see if there are
            identical copies.
      output: messages to the user

      The index program generates an index of oligonucleotide fragments in a
      book.  The first base of the alphabetizing window is stepped across all
      bases of the sequence, creating a list of overlapping oligos and their
      positions.  The oligos are then sorted along with their positions.  Three
      printing windows allow one to look at bases before the first base, from
      the first base some distance on (this is not the alphabetizing window)
      and a third set even further 3'.  It is not inefficient to make the
      alphabetizing window large when there are no long repeats in the
      sequences (as when comparing two similar genes).  Following the printing
      windows are: the sequence number of the piece in the book (provided by
      delila); the position of the first base; the orientation of the oligo;
      and the similarity.  This last item is the number of bases that an oligo
      matches the previous oligo in the index, up to the point that they
      differ.  High similarity means a repeat.

      The index can be used to locate restriction enzyme sites, by simply
      'looking them up'.  It has the advantage that when new enzymes become
      available, one does not need the computer to locate their sites.  Direct
      repeats will show up as high similarity oligos, and if one gets the
      complement along with a sequence in a book (using delila) then inverted
      repeats can be found.  The first column of the alphabetizing window
      contains all the mononucleotides; the first two, the di's, etc.

      L. J. Korn, C. L. Queen and M. N. Wegman, PNAS 74: 4401-4405 (1977)

see also
      search.p, helix.p, delila.p, delman.use.comparison

      program to analyze the ind file:  indana.p

      Gary Stormo and Thomas Schneider

      One cannot sort more sequence than can fit into the computer memory.

technical notes
      The constant mapmax determines the maximum number of bases indexed.
(* end module describe.index *)
{This manual page was created by makman 1.45}

{created by htmlink 1.62}