By downloading this code you agree to the
Source Code Use License (PDF). |
{ version = 9.33; (* of index.p 2017 Sep 14}
(* begin module describe.index *)
(*
name
index: make an alphabetic list of oligonucleotides in a book
synopsis
index(book: in, ind: out, indexp: in; output: out)
files
book: the book of sequences to be indexed
ind: the alphabetized index to the book
indexp: parameters to control index. if this file is empty, then
default values are used. otherwise there may be 4 or 5 lines:
first line: the number of bases in the alphabetizing window
second line: the number of bases to print before the central window
third line: the number of bases to print in the central window
fourth line: the number of bases to print after the central window
fifth line: if the first letter is a 't', then the index
will run in a teaching mode. do not use this mode on large books.
sixth line: if the first letter is 'f' then only the first
oligo of each sequence is used for alphabitization. This produces
a drastic reduction in the number of oligos sorted. It is
meant to be used to sort aligned sequences, to see if there are
identical copies.
output: messages to the user
description
The index program generates an index of oligonucleotide fragments in a
book. The first base of the alphabetizing window is stepped across all
bases of the sequence, creating a list of overlapping oligos and their
positions. The oligos are then sorted along with their positions. Three
printing windows allow one to look at bases before the first base, from
the first base some distance on (this is not the alphabetizing window)
and a third set even further 3'. It is not inefficient to make the
alphabetizing window large when there are no long repeats in the
sequences (as when comparing two similar genes). Following the printing
windows are: the sequence number of the piece in the book (provided by
delila); the position of the first base; the orientation of the oligo;
and the similarity. This last item is the number of bases that an oligo
matches the previous oligo in the index, up to the point that they
differ. High similarity means a repeat.
examples
The index can be used to locate restriction enzyme sites, by simply
'looking them up'. It has the advantage that when new enzymes become
available, one does not need the computer to locate their sites. Direct
repeats will show up as high similarity oligos, and if one gets the
complement along with a sequence in a book (using delila) then inverted
repeats can be found. The first column of the alphabetizing window
contains all the mononucleotides; the first two, the di's, etc.
documentation
L. J. Korn, C. L. Queen and M. N. Wegman, PNAS 74: 4401-4405 (1977)
see also
search.p, helix.p, delila.p, delman.use.comparison
program to analyze the ind file: indana.p
author
Gary Stormo and Thomas Schneider
bugs
One cannot sort more sequence than can fit into the computer memory.
technical notes
The constant mapmax determines the maximum number of bases indexed.
*)
(* end module describe.index *)
{This manual page was created by makman 1.45}
{created by htmlink 1.62}