A
Cancer Data Science Laboratory
(CDSL)
Zoom talk:
Monday August 16 3:00 PM EST

https://nih.zoomgov.com/j/1614867690

Slides:
cutofftalk.pdf

Video (at google drive):
cutofftalk.mp4

Video (from this website):
cutofftalk.mp4

** Abstract **
Information theory is a mathematics initiated by Claude Shannon in
a famous 1948 paper. Shannon developed several important theorems
about information, measured in bits. A bit is the choice between
two equally likely possibilities. In 1986 Tom Schneider showed how
to measure the information in a set of aligned DNA or RNA binding
sites and discovered that the number of bits could be predicted
from the size of the genome and number of sites in the genome. The
binding site information evolves to match the information needed
to find the sites in the genome. While trying to understand why
splicing donor and acceptor consensus sequences were identical,
Mike Stephens and Tom invented the sequence logo graphic. Sequence
logos represent a mathematical average of binding sites so Tom
eventually realized how to construct a theory of individual binding
sites. The corresponding graphic is a sequence walker. Tom compared
binding site information to binding energy and found a constant
ratio. This led to the discovery that many biological systems are
70% efficient. To understand why this is so, a deeper delve into
information theory was needed. In 1949 Shannon published a short,
beautiful paper showing how messaging systems can be modeled by
packed spheres in a high dimensional space. This led to the channel
capacity equation and theorem (errors can be as low as desired) and
eventually to essentially all modern communications. Tom found
that molecules also have a capacity with a similar capacity equation
and corresponding theorem. Following Felker (1954), one can then
derive a relationship between information and energy: the minimum
energy dissipation required to gain a bit is Emin = Kb T ln 2
(joules/bit). Tom realized that the same equation can be derived
from the Second Law of Thermodynamics, so the Emin equation is one
of many versions of the Second Law (Jaynes1988). This can be used
to understand the 70% efficiency of biological systems, but that's
a story for another time. For this talk, the Second Law can be
applied to the individual information measure. Sites above zero
bits are predicted to be bound and sites below zero are not bound.
Unlike the arbitrary "scoring" systems people use, this provides a
theoretical basis for a natural binding site strength cutoff.

** BIO **
Thomas D. Schneider was a winner in the 1974 Westinghouse Science
Talent Search for work on an artificial life form. He then received
the B.S. degree in biology from the Massachusetts Institute of
Technology in 1978 and the Ph.D. degree in Molecular Biology from the
University of Colorado Boulder, CO, USA in 1984, followed by
postdoctoral research in Boulder. He is now a Senior Investigator at
the National Institutes of Health, National Cancer Institute, Center
for Cancer Research, RNA Biology Laboratory in Frederick, MD, USA. He
is interested in discovering the underlying mathematics of biology and
his motto is "Living things are too beautiful for there not to be a
mathematics that describes them." His PhD thesis showed that the
information in genetic binding sites on DNA or RNA, measured in bits,
is just sufficient for them to be found in the genome
(Schneider.Ehrenfeucht1986) which he demonstrated using a computer
model that evolves binding sites in a few seconds (Schneider.ev2000).
After coming to NIH he invented the widely-used sequence logo with
then-high school student R. Michael Stephens (Schneider.Stephens1990).
Logos show a graphical picture of the average information of binding
sites; another invention is the sequence walker which shows the
information in single binding sites (Schneider-walker1997,
Schneider-ri1997). He and his lab invented several patented
nanotechnologies: an ATP powered rotating molecular wheel, a molecular
computer, the self-contained molecular DNA or RNA Medusa(TM) sequencer
and a general molecular detector called a nanoprobe. His current work
is to understand the relationship between information and energy,
which is measured as the isothermal efficiency. His website is
permanently point to by
https://alum.mit.edu/www/toms

Related papers:

- primer: Explains uncertainty H formula
- logo: sequence logos
- ev: evolution of binding sites
- edmm: second law Emin = Kb T ln 2 (see also ccmm:)
- ri: individual information theory
- walker: sequence walkers graphic
- flexrbs: flexible sequence walkers, ribosome binding sites
- flexprom: flexible sequence walkers, sigma 70 promoters
- sigma38: flexible sequence walkers, sigma 38 promoters
- emmgeo: gives references 25 and 26 for rhodopsin cooling in picoseconds. 70% efficiency built from the ccmm and edmm paper concepts.
- zen: problems with consensus sequences
- shannonbiologist: How Claude Shannon inadvertantly used a biological criterion to get the channel capacity!

Schneider Lab

origin: 2021 Aug 15

updated: 2021 Aug 17