Information theory was introduced by Claude Shannon in 1948 to precisely characterize data flows in communications systems. The same mathematics can also be fruitfully applied to molecular biology problems. We start with the problem of understanding how proteins interact with DNA at specific sequences called binding sites. Information theory allows us to make an average picture of the binding sites and this can be shown with a computer graphic called a sequence logo (https://alum.mit.edu/www/toms/glossary.html#sequence_logo).
Sequence logos show how strongly parts of a binding site are conserved, in bits of information. They have been used to study a variety of genetic control systems. More recently the same mathematics has been used to look at individual binding sites using another computer graphic called a sequence walker (https://alum.mit.edu/www/toms/glossary.html#sequence_walker). Sequence walkers are being used to predict whether changes in human genes cause mutations or are neutral polymorphisms. It may be possible to predict the degree of colon cancer by this method.
Information theory can also be used to understand the relationship between the binding energy dissipated when two molecules stick together and the amount of sequence conservation of the molecules measured in bits. Using the Second Law of Thermodynamics, this relationship can be expressed as the efficiency of the molecular interaction. Surprisingly, many molecular systems including genetic systems, visual pigments and motility proteins have efficiencies near 70%. A purely geometrical explanation of this result shows that although biological systems are selected to have the highest efficiency, it is restricted to 70% because having precisely distinguishable molecular states is more important.
Schneider Lab
origin: 2003 July 15
updated: 2006 July 07