Information Content of Individual Genetic Sequences
@article{Schneider-Ri1997,
author = "T. D. Schneider
and J. Spouge",
title = "{Information content of individual genetic sequences}",
journal = "J. Theor. Biol.",
volume = "189",
pages = "427--441",
pmid = "9446751",
note = "\htmladdnormallink
{https://doi.org/10.1006/jtbi.1997.0540}
{https://doi.org/10.1006/jtbi.1997.0540},
\htmladdnormallink
{https://alum.mit.edu/www/toms/papers/ri/}
{https://alum.mit.edu/www/toms/papers/ri/}",
year = "1997"}
Html version of the paper.
PDF version of the paper.
PostScript version of the paper.
Material in this paper is covered by US patent 5867402.
However, the programs are now available without needing a license.
|
This method of analyzing binding sites can be distinguished from other
methods by the following criteria.
-
Consensus sequences can be immediately rejected,
see the paper
Consensus Sequence Zen.
-
A variety of ad hoc methods are non-additive, these can be immediately
rejected.
Shannon chose his function to be addititve and it is the only
one that has this property.
-
Berg and von Hippel's (Stormo's) method
does not give results in bits.
If you flip a coin, according to this method,
you could get thousands of 'bits' of information.
Despite the claims,
it is not information theory and it does not properly connect
to thermodynamics because it confuses non-specific
binding states with specific binding states.
It also ignores the inequality in the Second Law of Thermodynamics.
-
Starting from Berg and von Hippel, if one
sets the genomic frequencies to equiprobabile
one gets
a method that is linearly proportional to Ri. These can be
rejected as not having a natural zero coordinate.
The zero coordinate corresponds to the Second Law of Thermodynamics.
This method, as with the original Berg and von Hippel method,
therefore must use an arbitrary cutoff.
-
Neural network
training methods assume that places
where we do not know anything are not sites.
This has been demonstrated to be wrong.
An example is
the missed Fis sites in the tgt/sec promoter.
Having set all these criteria, there is no other method.
Additional discussion is in
Measuring Molecular Information.
Also, see the paper
@article{Erill.ONeill2009,
author = "I. Erill
and M. C. O'Neill",
title = "{A reexamination of information theory-based methods for
DNA-binding site identification}",
journal = "BMC Bioinformatics",
volume = "10",
pages = "57",
pmid = "19210776",
year = "2009"}
.
See also the companion paper:
Sequence walkers: a graphical method to display how binding proteins
interact with DNA or RNA sequences
For more infomation see:
Individual Information Theory and Sequence Walkers
Schneider Lab
origin: 1997 December 23
updated: 2018 Jun 19