By downloading this code you agree to the
Source Code Use License (PDF). |
{version = 1.96; (* of siva.p 1999 Dec 13}
(* begin module describe.siva *)
(*
name
siva: site information variance
synopsis
siva(sorted: in, sivap: in, incu: out, curves: out, list: out,
output: out)
files
sorted: the output of the sites program that contains a sorted
list of sites for each experiment performed.
sivap: parameters to control the program.
first line: two integers, from and to coordinates over which
to do the calculations.
second line: repeats, the number of times to take passes through
the data removing subsets. This improves the statistics.
incu: the xyin input to xyplo, output of this program. Two columns:
first column is the number of sites used to find the information
second column is the amount of information in bits
The curves loop around along the axis, so they remain connected.
curves: another xyin file, for graphing the wiggling info curves
first column is the position across the site
second column is the information
The curves loop around along the axis, so they remain connected.
list: statistical picture of the result. Two columns:
first column is the number of sites used to find the information
second column is the average amount of information (corresponds
to the second column of incu, but is the average)
third column is the variance of the information (corresponds
to what your eye picks out as the thickness of the incu curves)
output: messages to the user
description
Siva calculates the variance of the information in a set of randomized sites
by eliminating each site in turn and keeping track of the increase in the
information content. The information content must increase, since with
fewer samples there must be less variation (this is the small sample bias
effect). The program allows one to graph the information content versus the
number of sites removed (incu). When this is done repeatedly, with
different orders of removing the sites, a thick band of curves is created.
The thickest part of this band shows the greatest possible amount of
variation that could be in the total set of sequences.
To be even-handed, the program removes the first sequence, then randomly
removes the others. This creates the first curve. Then the program removes
the second sequence and randomly removes the others for the second curve.
If there are n sequences, then n removal curves will be generated. This is
one complete repeat of the process. If you want, you can do this a number
of times to get better statistics, using the repeat parameter in sivap.
The largest variation in the information content is surely greater than the
variation of the information content in all the sets of removals of sites.
For several experiments, the statistics are joined into one set. With
several experiments, surely the variation of the combined experiments would
be less than the variations found for the individuals. So if one experiment
gives a greater variation, that will increase the variation siva reports in
list, so the highest value in list is an upper limit on the variation.
documentation
@article{Schneider1989,
author = "T. D. Schneider
and G. D. Stormo",
title = "Excess Information at Bacteriophage {T7} Genomic Promoters
Detected by a Random Cloning Technique",
year = "1989",
journal = "Nucl. Acids Res.",
volume = "17",
pages = "659-674"}
see also
sites.p
author
Thomas Dana Schneider
bugs
none known
*)
(* end module describe.siva *)
{This manual page was created by makman 1.45}
{created by htmlink 1.62}