By downloading this code you agree to the
Source Code Use License (PDF). |
{ version = 1.42; (* of encode.p 2007 Jun 22}
(* begin module describe.encode *)
(*
name
encode: encodes a book of sequences into strings of integers
synopsis
encode(inst: in, book: in, encseq: out, encodep: in, output: out)
files
inst: the instructions generating the book; for aligning the sequences
If the inst file is empty, then the sequences are aligned by
the zero coordinate of the book (this allows the use of the
"default coordinate zero" option of Delila) or by the first
base of the piece, as defined by the first parameter.
book: the sequences to be encoded
encseq: the encoded sequences
encodep: parameter file for describing how the sequences are to be
encoded.
The first parameter, the first character on the first line, defines how
to align the pieces. See the alist program for the detailed logic.
There are three choices, as in alist:
'f' (for 'first') then the sequences are always aligned by their
first base.
'i' then the sequences are aligned by the delila instructions. If
the inst file is empty, alignment is forced to the 'b' mode.
'b' (for 'internal') then the alignment is on the internal zero of
the book's sequence. This option is to be used when "default
coordinate zero" is used in the Delila instructions.
The remaining parameters are stored as a list of parameter records, of
which there may be any number. Each parameter record has five lines of
information which it must include (all i's and j's are integers):
1. i j specify the nucleotides, relative to the aligned base,
over which this parameter record is to operate; these may
be any integers, but i <= j is required;
2. i is the size of the windows to be encoded; within the window
the number of each oligonucleotide of length 'coding' are
determined and printed as part of the total sequence vector;
3. i is the shift to the next window to be encoded;
4. i : j1 j2 j3 ... is the 'coding'-level and arrangement; the
'coding'-level, i, is the number of nucleotides in the oligos we
are counting, i.e., 1 means monos, 2 means dis, ...; if i > 1
then we can also skip bases between the ones we are encoding;
if the i is followed next by a colon, there must be i-1 integers
(j1..j(i-1)) which specify the number of bases to be skipped
between the ones which are encoded; for example, if we have the
sequence xyz and we are interested in the di-nucleotides we can
get the xy by the parameter '2 : 0', or we could get the xz by
parameter '2 : 1'; if there is no colon all the skips are
assumed to be zero;
5. i is the shift to the next coding site within the window;
this allows us to encode only some of the oligos within a window,
such as only those that are in-frame;
multiple parameter records can be concatenated in the encodep file
and then each sequence in the book will be encoded according to each
parameter record into a single vector of integers.
output: for messages to the user
description
This program is used to encode a book of sequences into a string of
integers. Each sequence in the book is encoded into a single string of
integers (ended by an 'end of sequence' symbol) according to the user
specified parameters, which are in the file 'encodep'.
examples
documentation
@article{Schneider1984,
author = "T. D. Schneider
and G. D. Stormo
and M. A. Yarus
and L. Gold",
title = "Delila system tools",
journal = "Nucl. Acids Res.",
volume = "12",
pages = "129-140",
year = "1984"}
see also
Example parameter file: encodep
delman.use.encode:
https://alum.mit.edu/www/toms/delman1.html#delman.use.encode.1
delman.use.aligned.books:
https://alum.mit.edu/www/toms/delman1.html#delman.use.aligned.books
Before using encode, one should always check the sequences
by looking at them as an aligned list with alist.p
The output of this program is used by rseq.p
author
Gary Stormo
bugs
none known
technical notes
*)
(* end module describe.encode *)
{This manual page was created by makman 1.45}
{created by htmlink 1.62}