By downloading this code you agree to the
Source Code Use License (PDF). |
{ version = 2.53; (* of evd.p 2010 Sep 24}
(* begin module describe.evd *)
(*
name
evd: evolution display
synopsis
evd(all: in, evdp: in,
display: out, sites: out, genomes: out, evfeatures: out,
evdp: in, output: out);
files
all: the all file from program ev. It contains all the genomes of the
evolving creatures, and other parameters and data. It is created by a
binary dump by the ev program for the sake of speed and so is not
readable by people and so probably cannot be successfully run on
another kind of computer. The evd program therefore is needed to
interpret the all file.
evdp: parameters. one per line:
firstcreature: the number of the first creature to display
secondcreature: the number of the last creature to display
non-site features: if the first character is 'n' then non-sites
that are recognized by the weight matrix are shown as features
in evfeatures
If evdp is empty, the defaults are: 1/1/- (first creature only,
no non-site features)
The creature of rank 1 makes the fewest mistakes;
number 2 makes more, etc.
display: a marked display of the genomes and other data.
sites: raw sequences of the sites (and 5 bases around each), the sequences
are separated by periods, and different creatures are separated by
blank lines. The current method for using this to create a sequence
logo is described in the 'see also' section.
genomes: raw sequences of the genomes of the creatures,
separated by periods.
evfeatures: The features in this genome in the form used by the
lister program.
output: messages to the user.
description
The purpose of the ev program is to evolve sites; the purpose of evd is to
display an intermediate or final result of an evolutionary run. The
genomes are displayed with the locations of the recognizer gene and its
sites:
a-- c-- g-- t-- is the region encoding the weights for
one of the recognizer fingers.
The numbers underneath are the weights for each base.
TTTT marks the threshold for the matrix.
The number underneath is the threshold.
(-------------) is a site. The number underneath is the evaluation
of the site by the weight matrix. If the site is
recognized (the evaluation is greater or equal
to the threshold) then + signs are used.
The sequences around each site are written to the file 'sites', and the
entire genomes are written to the file 'genomes'. This allows analysis
by other programs.
EXPERIMENTAL: Thermodynamic Probabilities and Information of Weight Matrix
Thermodynamic probabilities are computed by assuming that the weight
matrix values represent energies, which may not be true. However, given
this assumption, we can compute the corresponding probabilities in a
Boltzmann distribution by first computing the partition function Q = sum_b
exp(weight_b) where b is one of the four bases and weight_b is the weight
at some position l for base b. Then the probability for base b is
exp(weight_b)/Q. The uncertainty and information are then computed in the
normal way. As a technical note, weight values are often high integers,
such as 400. This will exceed the capability of the exponentiation. To
avoid this, the absolute value of the weights at all positions are taken
and the largest weight is used to normalize the entire matrix to the range
-1 to +1.
2002 April 4: The thermodynamic computation was implemented. Notably, for
the standard evp.selection at 1000 generations, where Rf = 4.0 and Rs =
4.71035+/-0.29733, the computed value is 0.98 bits. I do not yet know
what this discrepancy means. However, if the normalization is set to be
between -0.5 and +0.5, then the computed value increases to 3.08 bits.
This may indicate that the computation is meaningless or that something
else has to be done ...
see also
The "Evolution of Biological Information" paper (with active hypertext
links in references): https://alum.mit.edu/www/toms/paper/ev/
Example parameter file: evdp
Program for the evolution of binding sites (this creates the all file):
ev.p
Program to display the genomes marked by the sites:
* individual information version: lister.p
* public version: listerx.p
These programs require the Delila book format rather than the
simple sequence in the genomes file. To convert, use the
makebk.p program:
cp genomes sequ # copy the sites file to the sequ file
makebk < makebkp # make the delila book with makebkp and makebk.p
TO MAKE LOGOS:
Briefly, after running ev to evolve binding sites, use this evd.p
program to get the binding site sequences. Then copy the sites file to
the sequ file and use the makebk.p program (preferably in automatic
mode) to create a Delila book. Then use standard delila system programs
to create the logo, as described at
https://alum.mit.edu/www/toms/delila.html.
Assuming that you have all the necessary input files, in detail the
procedure is:
ev # evolve the creatures with ev.p
evd # make the display files with this program, evd.p
cp sites sequ # copy the sites file to the sequ file (a unix command)
makebk < makebkp # make the delila book with makebkp and makebk.p
encode # be sure to use the f mode in encodep with encode.p
rseq # compute information using the rseq.p program
dalvec # make the symvec using the dalvec.p program
makelogo # make the logo from the symvec with makelogo.p.
Descriptions of the required input files are given in the documentation of
each program and a general review is given in the paper
https://alum.mit.edu/www/toms/paper/oxyr/.
An example script that performs the steps above and also creates all the
necessary input files is run.ev.
A movie of binding site evolution created using these steps
is at https://alum.mit.edu/www/toms/paper/ev/movie/
author
Thomas Dana Schneider
bugs
none known (well, actually it's full of them... ;-)
Evd cannot handle non-random site placement in the display file because
the drawing mechanism was not designed to do so. The lister or listerx
program has to be used. A warning is provided to output.
technical notes
*)
(* end module describe.evd *)
{This manual page was created by makman 1.45}
{created by htmlink 1.62}