By downloading this code you agree to the
Source Code Use License (PDF). |
{ version = 1.36; (* of embed.p 2012 Jun 23}
(* begin module describe.embed *)
(*
name
embed: embed an aligned set of DNA sequences into random sequences
synopsis
embed(inst: in, book: in, mkvseqs, in: ranbook, in: embedp: in,
embedbk: out, output: out)
files
inst: delila instructions of the form 'get from 56 -5 to 56 +10;'
book: the book generated by delila using inst
mkvseqs: random sequence output from the markov program
ranbook: book made from random sequences using makebk program; either
mkvseqs or ranbook must be contain sequence. If both contain
sequence, then mkvseqs will be used as the source for random
sequences.
embedp: parameters to control the program. The file must contain the
following parameters, one per line:
parameterversion: The version number of the program. This allows the
user to be warned if an old parameter file is used.
alignmenttype: The type of alignment to use. f: first base, i: inst,
b: book alignment
'b' is to be used when 'default coordinate zero;' is used in the
inst file, resulting in a book whose coordinates do not match the
inst coordinates. 'i' is to be used when the book contains a normal
coordinate system corresponding to the inst file. 'f' simply aligns
by the first base in the book. See alist.p for more details on
alignmenttype.
InFrom, InTo: the from-to range of the input sequences to be used.
OutFrom, OutTo: the from-to range of the sequences to output.
This includes the Infrom range AND the random sequences.
embedbk: book created by the program. Contains the sequences embedded
within random sequences to the specified range.
output: messages to the user
description
Embed embeds a given set of aligned sequences into random sequences
having a specified range. If there is an incomplete sequence in
the region to be embedded, it is filled in with random sequences as
well.
This allows one to destroy a pattern in the aligned sequences, so
that the sequences can be realigned to find other patterns nearby.
The parameters OutFrom, InFrom, InTo, OutTo in embedp set the range
to do the embedding. In order for the program to function
correctly, the following must be true:
OutFrom <= InFrom <= InTo <= OutTo
The sequence from InFrom to InTo is not changed, and random
sequence is filled in around it from OutFrom on the left to OutTo
on the right. See example below.
If the orginal sequence is longer than the range OutFrom to OutTo
then the book will contain the embedded sequence with orginal
sequence on either side of the random sequence.
The program stores the random sequence as a string and then uses it
base by base until there is no more in the string. Then it reads
another string of random sequence. In this way, none of the random
sequence is "thrown away".
If the program finds the end of mkvseqs or ranbook before it has
embedded all the sequences, it gives a message that it is out of
random sequence and halts. Why doesn't the program reuse the
random sequence? This is not a good idea because the embedded
sequences are designed to be fed into malign, and malign would pick
up on this reused sequence and find unnatural sequence
conservation.
Aligned sequences can be viewed with the alist program.
The random sequences are generated by the markov program. They can
be read from either mkvseqs or ranbook. mkvseqs is directly
generated from markov to a given composition and length. Ranbook
can be made using the makebk program. If both files are present,
mkvseqs is used.
The output of this program is designed to be fed into the malign
program for multiple alignment.
examples
With the following parameters from embedp the sequence would be embedded
as shown below.
-10 10 InFrom, InTo: range of input sequences to be used
-30 30 OutFrom, OutTo: range of the sequences to output
original:
-----|-------------------<---------0--------->-------------------|-----
-30 -10 +10 +30
OutFrom InFrom InTo OutTo
embedded:
********************<---------0--------->********************
-30 random -10 original +10 random +30
sequence sequence sequence
Note that if there is any sequence in the original alignment outside
the range OutFrom to OutTo, it will be copied to the embedbk.
Randomizing a Single Patch
Using embed it is possible to cover only one small area with random
sequence instead of two areas. To do this you will need to use the
embed parameters in a certain way.
For example if you wanted to cover only the zero coordinate with
random sequence, three of the parameters will need to be the same:
-1 -1 InFrom, InTo: range of input sequences to be used
-1 0 OutFrom, OutTo: range of the sequences to output
When parameters are the same, the InFrom and InTo override the
OutFrom and OutTo. The example parameters given above would keep
the sequences at the -1 coordinate the same, but make the sequences
at the 0 coordinate random. In this case all sequences other than
0 are kept the same.
Another example would be to 'zap' or randomize from -3 to +4. The
parameters would be:
-4 -4 InFrom, InTo: range of input sequences to be used
-4 4 OutFrom, OutTo: range of the sequences to output
These parameters would leave sequences from below up to the -4
coordinate alone, but make the sequences from -3 to +4 random. The
sequences from +5 and higher would be maintained as well.
documentation
see also
alist.p, markov.p, makebk.p, malign.p
author
Elaine Bucheimer
bugs
The program cannot handle sequences longer than dnamax. This is a
fixable bug.
A possible future addition to the program would be to allow the
user to specify if they want the old sequence hanging around or if
the sequence should be chopped outside the OutFrom and OutTo
coordinates.
It appears that the 'i' option does not embed correctly. The
resulting book does not have the advertised coordinates. A
temporary solution is to use the f option with appropriate ranges.
technical notes
*)
(* end module describe.embed *)
{This manual page was created by makman 1.45}
{created by htmlink 1.62}