Pitfalls in Information Theory
and
Molecular Information Theory

Thomas D. Schneider

The first thing needed is the rectification of names.
-- Confucius, Analects 13:3
Nature Chemical Biology 5, 521 - 525 (2009); The Rectification of Names

I might note also that some of the literature is confused and some of it is just plain wrong.
--- John Pierce, Symbols, Signals and Noise: The Nature and Process of Communication, 1961, preface, x.

Small icon for Theory of Molecular Machines: physics,
chemistry, biology, molecular biology, evolutionary theory,
genetic engineering, sequence logos, information theory,
electrical engineering, thermodynamics, statistical
mechanics, hypersphere packing, gumball machines, Maxwell's
Daemon, limits of computers

Information theory and molecular biology touch on a huge number of topics, as shown by the icon to the right. (Click on it to see the detail.) As a result there are many ways that one can get into intellectual trouble and many of these are widely repeated in the literature. This page is devoted to listing the pitfalls that I have come across and needed to solve to create a consistent theory. Not everything that is in the literature is correct!

Using ambiguous or poor terminology
A sequence logo is not a consensus sequence
Confusing a model with reality: consensus sequences
Using the popular meaning of the term 'information'.
Information Is Not Entropy, Information Is Not Uncertainty! (this topic is its own page)
Thinking that information (R) is the same as uncertainty (H).
Treating Uncertainty (H) and Entropy (S) as identical OR treating them as completely unrelated.
Using the term "Shannon entropy".
Ignoring the number Zero.
Thinking that bits are merely a measure of statistical non-randomness.
Maxwell's Demon.
The meaning of ΔS in the ΔG equation.
Entropy Is Not Disorder
Chemistry is not always appropriate when thinking about molecules
Evolution is Information Change
Percent Identity: a poor measure of protein similarity
SELEX: a potentially misleading molecular biology experiment
Forgetting that information has units
Using `Relative entropy'
Misinterpeting sequence logos
Information is not energy
Being scared of high dimensional space
- 1 dimensional energy diagrams for molecules
- 2 dimensional energy diagrams for molecules: Folding funnels
Erasure: bad terminology: The Landauer Trap

Using ambiguous or poor terminology
- box
- complexity
- consensus sequence
- core consensus
- negentropy
- score
- Shannon entropy
- specificity
A sequence logo is not a consensus sequence. Despite the title of our original paper "Sequence Logos: A New Way to Display Consensus Sequences", a sequence logo is not a consensus. The strictest consensus may be read from the top letters while the anti-consensus from the bottom letters. Any combination can be read from the letters in between.
Confusing a model with reality: consensus sequences. The main example is confusing a consensus sequence (a model) with a binding site (a natural phenomenon). See The Consensus Sequence Hall of Fame and the paper Consensus Sequence Zen.
Using the popular meaning of the term 'information'. In physics it is well understood that the term 'force' has a precise technical definition, and this allows one to write Newton's famous equation F = M A (force is mass times acceleration). This is quite different from the popular use of the term force as in 'the force of my rhetoric'. It is clear that usually the phrase 'my rhetoric' is not meant to be an acceleration applied to the mass of your brain! Likewise, Shannon defined information in a precise technical sense. Beware of writers who slip from the technical definition into the popular one. You can tell when someone is being precise by seeing if they report the amount of information in bits. If what they are saying is not in bits or they don't indicate exactly how to compute the bits, then they are probably using the popular meaning.
Thinking that information (R) is the same as uncertainty (H). Because of noise, after a communication there is always some uncertainty remaining, H_after and this must be subtracted from the uncertainty before the communication is sent, H_before. In combination with the R/H pitfall, this pitfall has lead many authors to conclude that information is randomness. Examples:
1. William Dembski in the book No Free Lunch (page not identified, please tell me if you find it!). imagined flipping a coin 1000 times to get 1000 bits of information. But since an unbiased coin could have two possibilities, the uncertainty before flipping is H_before = 1 bit. After flipping, one will see frequencies approaching 50%, so H_after = 1 bit. Therefore the information gained is H_before - H_after = 0 bits.
2. Seth Lloyd (physicist) "Computational capacity of the universe" 24 Oct 2001, arXiv.org e-Print archive. Pages 2 and 7. I = S/(Kb ln2) implies that information is proportional to entropy.
3. Hubert Yockey (molecular biologist/information theorist) said (Thu, 26 Jan 1995 00:39:52 GMT)
  Information' is, of course, not the very opposite of randomness. Elitzur is using the word 'information' in the semantic sense as synonym for knowledge or meaning. Everyone knows that a random sequence, that is, one chosen without intersymbol restrictions or influence, carries the most information in the sense use by Shannon and in computer technology. ...
  to which I (Tom Schneider) responded:
  Here you have made the mistake of setting Hafter to zero. So a random sequence going into a receiver does not decrease the uncertainty of the receiver and so no information is received. But a message does allow for the decrease. Even the same signal can be information to one receiver and noise to another, depending on the receiver!
4. In A pattern analysis of the second Rehnquist U.S. Supreme Court (PNAS 2003 100: 7432-7437) Lawrence Sirovich (mathematician) said "... the information (entropy) conveyed by a decision is [Shannon uncertainty labeled "I"] where the logarithm is base two."
5. In: D. Benedetto, E. Caglioti and V. Loreto, Language Trees and Zipping", Phys. Rev. Lett, 88: 048702-1 - 048702-4, 2002. "In this context [information theory] the word information acquires a measure of the surprise the source emitting the sequences can reserve to us." This is incorrect because they forgot about noise.
6. Comprehensive splice-site analysis using comparative genomics. Sheth et al. Nucleic Acids Res. 2006;34:3955-67. Figure 5 shows "Information content" but it is actually the uncertainty.
See also: Information Is Not Entropy, Information Is Not Uncertainty!
Treating Uncertainty (H) and Entropy (S) as identical OR treating them as completely unrelated. The former philosophy is clearly incorrect because uncertainty has units of bits per symbol while entropy has units Joules per Kelvin. The latter philosophy is overcome by noting that the two can be related if one can correlate the probabilities of microstates of the system under consideration with probabilities of the symbols. See Theory of Molecular Machines. II. Energy Dissipation from Molecular Machines (J. Theor. Biol. 148 125-137, 1991) for how to do this.
There's a story told by Tribus about the origin of this confusion:
What's in a name? In the case of Shannon's measure the naming was not accidental. In 1961 one of us (Tribus) asked Shannon what he had thought about when he had finally confirmed his famous measure. Shannon replied: "My greatest concern was what to call it. I thought of calling it 'information,' but the word was overly used, so I decided to call it 'uncertainty.' When I discussed it with John von Neumann, he had a better idea. Von Neumann told me, 'You should call it entropy, for two reasons. In the first place your uncertainty function has been used in statistical mechanics under that name, so it already has a name. In the second place, and more important, no one knows what entropy really is, so in a debate you will always have the advantage.'
-- M. Tribus and E. C. McIrvine, Energy and Information, Sci. Am., 225, 3, 179-188, September, 1971. https://doi.org/10.1038/scientificamerican0971-179
Examples:
- Vic Stenger, Skeptical Briefs Vol 10 No. 4 December 2000 writes: "Now, it turns out that the Shannon uncertainty and the physicist's entropy are identical within a trivial constant, a point that Dembski either does not recognize or chooses to hide." Clearly the constant is not trivial if one is making measurements. HOWEVER you should understand that for the purposes of his argument this does not affect Stenger's reasoning (which demonstrates the errors in Dembski's arguments) because he is not making measurements! For a qualitative argument, he got the direction correct.
- In: D. Benedetto, E. Caglioti and V. Loreto, Language Trees and Zipping", Phys. Rev. Lett, 88: 048702-1 - 048702-4, 2002. "In this context [information theory] the word information acquires a measure of the surprise the source emitting the sequences can reserve to us." This is incorrect because they confused uncertainty with entropy.
Using the term "Shannon entropy". Although Shannon himself did this, it was a mistake because it leads to thinking that the thermodynamic entropy is the same as the "Shannon Entropy". There are two extreme classes of error:
- "Shannon entropy" is identical to "entropy". This is incorrect because they have different units: bits per symbol and joules per kelvin, respectively.
- "Shannon entropy" is entirely unrelated to "entropy". This is incorrect since it is clear that the forms of the equation are similar and differ by a constant.
A better term to use for measuring the state of a set of symbols is "uncertainty". I take the middle road and say that entropy and uncertainty can be related under the condition when the microstates of the system correspond to symbols, as they do for molecular machines. In this case one can write a simple conversion equation. See the paper edmm: Energy Dissipation from Molecular Machines. Examples:
- Claim of identical: William Dembski in the book No Free Lunch stated that the two forms are mathematically identical (page 131).
- Claim of unrelated: In his book The Low-Down on Entropy and Interpretive Thermodynamics (DCW Industries, Inc., 1999, ISBN Number 1-928729-01-0) Stephen J. Kline claimed that the two forms are completely unrelated. Unfortunately he fell into other pitfalls too as he didn't distinguish information and uncertainty.
Ignoring the number Zero. Molecular biologists are often not including zero in their counting systems. Surprisingly, zero was invented several thousand years ago. Physicists are shocked when I tell them that to some molecular biologists, counting goes like this: -3, -2, -1, +1, +2, +3 ...
"I'm a mathematician. There are counting numbers. We always start counting at zero."
--- Professor Carol Wood from Wesleyan University
Methods for how to treat zero coordinate systems are given in the glossary. If one creates a sequence logo without a zero, then one will be seriously bitten later on when one starts using sequence walkers, because the location of a sequence walker has to be specified and the natural place to do this is the zero base. Examples:
- Long M, de Souza SJ, Rosenberg C, Gilbert W. Proc Natl Acad Sci U S A 1998 Jan 6;95(1):219-23 Relationship between "proto-splice sites" and intron phases: evidence from dicodon analysis. Figure 2.
  Walter Gilbert, Nobel Prize in Chemistry 1980
Thinking that bits are merely a measure of statistical non-randomness. One can compute the significance of a position in a binding site as the number of z scores above background (e.g. for splice junctions splice). However, this prevents one from thinking of the bits as a measure of sequence conservation, which is a different thing. Aside from small sample effects, which can be corrected, the average number of bits in a binding site does not change as the sample size changes. By contrast, the error bars on a sequence logo show the significance of the conservation.
Maxwell's Demon. There is a huge literature on Maxwell's Demon and it is full of errors, too many to list here. The basic problem is that the people who write about the Demon are not molecular biologists, they are physicists and philosophers who do not know molecular biology, so they are not thinking in realistic molecular terms. If one treats the demon as a real physical being or device, then it is clear that there are natural analogues for the things he has to do, and none of these violate the Second Law of Thermodynamics. If one does not treat the demon as a real physical device, then one has violated known physics already and so violation of the the Second Law is not surprising. See nano2 for a detailed debunking of the Demon.
The meaning of ΔS in the ΔG equation. It is well known from thermodynamics that the free energy is:

ΔG = ΔH - T ΔS
Often people talk about ΔS in this equation as "the" entropy. This is misleading if not downright incorrect.

ΔS in the above equation is the entropy change of the system:
ΔS = ΔS_system
ΔH corresponds to the entropy change of the surroundings:
ΔH = ΔH_system = -T ΔS_surroundings
so the total free energy change is:
ΔG_system = ΔH_system -T ΔS_system

= -T ΔS_surroundings -T ΔS_system

= -T ΔS_total
This is why, of course, that ΔG_system corresponds to the total entropy change and it is why one can use the sign of ΔG_system to predict the direction of a chemical reaction.

So ΔH_system is misnamed since it is about what happens outside the system.

The pitfall is to think or say that ΔS_system is "the entropy" change. It's not since it is only part of the total entropy change.

Reference:
```
@book{Darnell1986,
author = "J. Darnell
 and H. Lodish
 and D. Baltimore",
title = "Molecular Cell Biology",
publisher = "Scientific American Books, Inc.",
address = "N. Y.",
year = "1986"}
See pages 36-38.
```
Entropy is not "disorder"; it is a measure of the dispersal of energy by Dr. Frank L. Lambert. An entropy increase MIGHT lead to disorder (by that I mean the scattering of matter) but then - as in living things - it might not!
How can we relate this idea to molecular information theory? 'Disorder' is the patterns (or mess) left behind after energy dissipates away. The measure Rsequence (the information content of a binding site) is a measure of the residue of energy dissipation left as a pattern in the DNA (by mutation and selection) when a protein binds to DNA. On the other hand, Rfrequency, the information required to find a set of binding sites, corresonds to the decrease of the positional entropy of the protein. To drive this decrease the entropy of the surrounding must increase more by dissipation of energy. After the energy has dissipated out the protein is bound. So the protein bound at the specific genetic control points represents 'ordering'. This concept applies in general to the way life dissipates energy to survive.
Chemistry is not always appropriate when thinking about molecules. Consider the reaction of the restriction enzyme EcorI with DNA. In bulk solution, one measures the reaction and observes a distribution between specifically bound and unbound states. One can write out a chemical equation for this:
EcoRI + DNA <--> EcoRI.DNA
and talk about the global ΔG. However, this tells us nothing about how a single molecule binds to the DNA. A single molecule will find a binding site IRRESPECTIVE OF THE TOTAL CONCENTRATIONS OF OTHER MOLECULES IN THE SOLUTION. In other words, the global ΔG is NOT relevant to the problem of how EcoRI finds the binding site. This is widely misunderstood in the literature.
Evolution is Information Change. The evolutionary history of a molecule can be obtained by aligning a set of related molecules and then constructing a phylogenetic tree. These trees are based on the idea that changes in the molecule are more or less regular in time because mutations are more or less regular. Consider cytochrome C. This molecule evolved to full function very early and is now continuing to diverge in various species. The changes are not (for the most part) altering the function of the molecule. To call such changes 'evolution' is not appropriate, because they are equivalent to the rearrangements of water molecules in a glass of water at room temperature. The glass of water stays "the same" as far as we are concerned. The ev program makes this more clear. In the initial state, there are no binding sites. Over generations, the creatures evolve to have binding sites, and the information content of the sites increases. Eventually the sites have just enough information for them to be found in the genome (Rsequence ~ Rfrequency). After this point the sites change (drift) but do not gain further information. What we call things is a matter of terminiology. However, in the Ev simulation the sites first evolve information and after that they have neutral drift; they are no longer evolving. The term evolution should be applied only to the information increase, and also to the decrease when selection is removed. If we call the neutral drift 'evolution' then we must also say that a glass of water sitting on the counter minding its own business is also evolving.
Percent Identity: a poor measure of protein similarity
"Per cent identity" does not take into account that amino acids are almost always not equally probable and for this reason leads to illusions. Mutual entropy is the correct measure of "similarity".
--- H. P. Yockey. Information theory, evolution and the origin of life. Information Sciences, 141:219-225, 2002.
The term 'entropy' should not be used, but otherwise the statement is correct. This means that the basis of the widely used phylogenetic tree generating programs, such as Clustal, is unreliable. These programs begin by pairwise comparison of the percent identity of proteins.
SELEX: a potentially misleading molecular biology experiment
Assoc Prof David F Callen, (Breast Cancer Genetics Group, Dame Roma Mitchell Cancer Research Labs, Hanson Institute (North Building), Adelaide, SA 5000, Australia) asked
We were wondering if you could point us to the right direction, We are doing some SELEX experiments using rounds of selection with random oligos to determine the DNA binding sites of a zinc finger protein. Do you know of any web sites that can easily determine a possible consensus from such sequences?
Ok I can help you with this, thanks for asking, but it is important to understand two things. First, if you create a consensus sequence after having done your beautiful SELEX, you will be throwing out most of your hard-earned data! See the Consensus Sequence Zen paper and also the entry on consensus sequences on this page.

Second, SELEX itself can, unfortunately, get you into deep trouble. See this paper: Using sequence logos and information analysis of Lrp DNA binding sites to investigate discrepancies between natural selection and SELEX.

So how can you avoid consensus sequences? Basically, you replace them by sequence logos and sequence walkers.

How can you avoid the pitfalls of SELEX? See if you can find enough natural sites (minimum: 6 sites) to make a natural sequence logo to compare to your SELEX results.

To specifically answer your question, you can use weblogo:

http://weblogo.berkeley.edu/

Please be sure to have a sensible zero coordinate:

https://alum.mit.edu/www/toms/glossary.html#zero_coordinate
Forgetting that information has units

You wouldn't say that you walked 5 today would you? 5 what?
On the very first page of his famous 1948 paper (Shannon1948), after noting the advantages of logarithmic measure for information, Shannon pointed out that
"The choice of a logarithmic base corresponds to the choice of a unit for measuring information. If the base 2 is used the resulting units may be called binary digits, or more briefly bits, a word suggested by J. W. Tukey."
There are papers that have used the natural log and others that have used log base 2 for measuring information in biology, so it is important to indicate the units. If you don't say the units (as in, "3 bits" or "19 bits per site") your paper will not be precise enough for someone to replicate the work. The word 'bits' is very important to have after every number. Examples:
- Neurology. 2005 Oct 25;65(8):1319-21. Analysis of LRRK2 functional domains in nondominant Parkinson disease. Skipper L, Shen H, Chua E, Bonnard C, Kolatkar P, Tan LC, Jamora RD, Puvan K, Puong KY, Zhao Y, Pavanni R, Wong MC, Yuen Y, Farrer M, Liu JJ, Tan EK.
Using `Relative entropy'
The so-called 'relative entropy' (also known as the log likelihood or Kullback-Leibler divergence) has become popular to measure the distributions of bases or amino acids. This computation the form
∑ P_i log₂ P_i/Q_i
where P_i is the frequencies i of amino acids at a given position in a protein motif and q_i is the frequencies of amino acids in proteins in general. The problem with this measure is that it gives results that are not consistent with information theory. For example, the maximum information required to identify one protein from 20 is: log₂ 20 = 4.3 bits. Yet this statistical measure can give more than 5 bits. So it is incorrect to assign units to the results of this measure as bits.

A simple example makes the situation more clear. Consider a coin which can be either heads showing or tails showing. In basic information theory there are two possible states (ignoring the coin being on a edge) and so there cannot be more than 1 bit of information stored in the coin. However, by appropriate choice of Q_i the relative entropy can give values greater than 1. Of course it is impossible for a coin to store more than 1 bit of information, so relative entropy does not give results in bits and should never be reported that way.

This does not mean that the log likelihood is not sometimes a useful statistical measure, just that if it is used the results are not compatible with Shannon information theory (except in the case when the Q_i are equally likely).

The relative entropy can be rewritten
(-∑ P_i log₂ Q_i) - (-∑ P_i log₂ P_i)
The second half is recognizable as Shannon's "uncertainty" but the first half is not.

Energy is a state function. That is, it is determined by the current state of a system. If P and Q correspond to probabilities of two conditions (e.g. a protein bound to specific DNA sequences or non-specifically on DNA, as in a sequence logo) then it is clear that the first term (-∑ P_i log₂ Q_i) is a mixture of the two states and therefore not a state function. So it is not reasonable to compare the relative entropy to energy.

If one insists on using relative entropy, then the computed values a cannot be related to energy. Shannon's channel capacity and the rest of molecular information theory goes out of one's grasp and one cannot study the efficiency of molecular machines because 'relative entropy' is the wrong measure so the results do not fit the theory.
- See: Measuring Molecular Information T. D. Schneider, J. Theor. Biol., 201, 87-92, 1999 for a discussion of this measure.
- Examples of this pitfall:
  - The PFAM database produces logos this way. Here is one case, the Insulin/IGF/Relaxin family. Clicking on 'View HMM logo' gives a logo with several positions higher than 5 bits. What could this possibly mean? There are not 32 amino acids! The original paper is: HMM Logos for visualization of protein families, Schuster-Boeckler B, Schultz J, Rahmann S, BMC Bioinformatics 2004, 5:7.
  - BLogo: A tool for visualization of bias in biological sequences
  - Skylign: a tool for creating informative, interactive logos representing sequence alignments and profile hidden Markov models.
Misinterpeting sequence logos
Sequence logos are constructed in a precise way. The height of the stack of letters is the sequence conservation measured in bits of information. The height of each letter within the stack is the proportional to its frequency. So the height of a letter is NOT its conservation. Likewise the height of the stack is NOT a statistical importance. The importance of a stack is given by an I-beam on the top of the stack as shown in the figure to the right.
An example of an incorrect understanding is:
Probably the most convenient one is a Sequence Logo [752], in which the height of each letter indicates the degree of its conservation, whereas the total height of each column represents the statistical importance of the given position (Figure 3.2)
--- Sequence - evolution - function Eugene V. Koonin, Michael Y. Galperin, page 67 (another link)
Information is not energy! When a coin is set on a table, it can store 1 bit of information (in noisy thermal motion conditions it cannot stay on its edge). Before the coin is set on the table, when it could be set to either state, it has some potential and/or kinetic energy. Setting the coin on the table in a stable state requires that this energy be removed from the coin and ultimately it will be dissipated as heat into the environment. If the coin initially has a greater height or a higher velocity, then more energy will have to be dissipated to stabilize it on one face or the other. However, in either the low or the high energy case the coin can store only 1 bit of information. So there is an inequality relationship between energy and information. <-- --> It turns out that this relationship can be expressed as a version of the second law of thermodynamics. See Theory of Molecular Machines. II. Energy Dissipation from Molecular Machines for two derivations of the exact relationship. <-- --> The bottom line is that energy is not the same as information since there is a minimum energy dissipation required per bit but the actual energy dissipation can be larger.

This opens the important question of what the actual relationship is between energy and information in molecular systems. This was solved and published in 2010, see: 70% efficiency of bistate molecular machines explained by information theory, high dimensional geometry and evolutionary convergence.

Examples of this pitfall:
- ```
@article{Stormo2000,
author = "G. D. Stormo",
title = "{DNA binding sites: representation and discovery}",
journal = "Bioinformatics",
volume = "16",
pages = "16--23",
pmid = "10812473",
year = "2000"}
```
- ```
@article{Wasserman.Sandelin2004,
author = "W. W. Wasserman
 and A. Sandelin",
title = "{Applied bioinformatics for the identification of regulatory
elements}",
journal = "Nat Rev Genet",
volume = "5",
pages = "276--287",
pmid = "15131651",
year = "2004"}
```

Being scared of high dimensional space. It is important NOT to be scared of high dimensionality; the "curse of high dimensionality" is not an excuse for working in low dimensions!

We pause to assure the reader that there is nothing mysterious about n-dimensional space. A point in n-dimensional space Rⁿ is simply a string of n real numbers ...
-- J. H. Conway and N. J. A. Sloane, "Sphere Packings, Lattices and Groups" Springer-Verlag, third edition, New York, ISBN 0-387-98585-9, 1998, page 3.
http://neilsloane.com/doc/splag.html

Modeling or depicting free energy surfaces as 2 dimensional. Such surfaces are high dimensional. This has severe effects on the shape of the path. If the individual valleys are Gaussian, the final shape is a sphere. See Theory of Molecular Machines. I. Channel Capacity of Molecular Machines.
1 dimensional energy diagrams for molecules
Molecular interactions are multidimensional. This was discussed in the ccmm paper. See also the fisbc paper, which demonstrates a non-linear effect explainable by high dimensionality. Examples of papers that incorrectly use a single dimension to display a molecular interaction:
- Shakhnovich2009
- VandenBroeck.Lindenberg2012
2 dimensional energy diagrams for molecules - Folding funnels
Examples of papers that incorrectly use a only two dimensions to display protein folding:
- Curr. Opin. Struct. Biol. 2002 Apr;12(2):161-8
  Protein folding: the free energy surface.
  Gruebele M.

Erasure: bad terminology. The Landauer Trap
In the physics literature on the Maxwell demon problem you will find the term 'erasure'. This was instigated by Rolf Landauer in this paper:
```
@article{Landauer1961,
author = "R. Landauer",
title = "Irreversibility and Heat Generation in the Computing Process",
journal = "IBM J. Res. Dev.",
volume = "5",
pages = "183--191",
year = "1961"}
```
People refer to the `Landauer Principle'. However, Landauer was beaten by Felker 9 years earler:
```
@article{Felker1952,
author = "J. H. Felker",
title = "A Link Between Information and Energy",
journal = "Proc. IRE",
volume = "40",
pages = "728--729",
note = "\url
{https://doi.org/10.1109/JRPROC.1952.274070}",
comment = "see Felker1954, Adler1954 The official version is
available: Issue 6, June 1952
http://ieeexplore.ieee.org/xpl/tocresult.jsp?isnumber=4051012&punumber-
=10933 ",
year = "1952"}


@article{Adler1954,
author = "F. P. Adler",
title = "Comments on ``{Figure of Merit for Communication Devices}''",
journal = "Proc. IRE",
volume = "42",
number = "7",
pages = "1191",
note = "\url
{http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=\&arnumber=4051768}",
comment = "see Felker1952, Felker1954",
year = "1954"}


@article{Felker1954,
author = "J. H. Felker",
title = "Rebuttal to Comments on ``{Figure of Merit for Communication
Devices}''",
journal = "Proc. IRE",
volume = "42",
number = "7",
pages = "1191",
note = "\url
{http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=\&arnumber=4051768}",
comment = "see Felker1952, Adler1954",
year = "1954"}
```
Felker clearly explains the relationship between information and energy. This was shown by Schneider to be a version of the Second Law of Thermodynamics:
```
@article{Schneider.geneff2023,
author = "T. D. Schneider",
title = "{Generalizing the Isothermal Efficiency by Using Gaussian
Distributions}",
journal = "PLOS ONE",
volume = "",
pages = "1--17",
pmid = "36626367",
pmcid = "PMC9831307",
note = "\url{https://biorxiv.org/cgi/content/short/2022.12.12.520049v1},
\url{https://alum.mit.edu/www/toms/papers/geneff/},
\url{https://doi.org/10.1371/journal.pone.0279758}",
year = "2023"}
```
If you buy the idea of 'erasure', you will have been taken in by the physicists' bad use of language.
Here is a simple case that blows away the silly ideas about 'erasure' that the physicists talk about.

Consider a coin. Set it on a table. It can store 1 bit of information.

Now to set it there from a point above the table, you have to ALLOW the potential and kinetic energy of the coin to dissipate out as noise/heat into the rest of the universe. If you don't dissipate the potential energy, the coin is not yet on the table. If you don't dissipate the kinetic energy, the coin will bounce around and so it can't store the information yet.

Setting a coin to Heads or Tails by placing it on a table dissipates energy. For some reason this simple idea escapes physicists who use the term 'erasure'.

Now what is to 'erase'? Well call Heads 1 and Tails 0. Then when we set a bunch of coins to all 0 by (setting them all to have tails pointing up), we have 'erased' whatever might have been stored in them before. But that costs just as much dissipation as storing a pattern.

'Erasure' is no different than storing a pattern of bits, it costs just as much.

For this reason it is best to avoid the term 'erasure'.

Despite a lot of silly language to the contrary, you cannot capture the dissipated energy. The entropy of the coin goes down while the entropy of the universe goes up, but more so.

FURTHERMORE, if you start the coin 1m above the table, you will dissipate a certain amount of energy when you store the 1 bit. If you start at 2m above the table, you will dissipate twice as much energy. However, in both cases the information stored in the final state of the coin on the table is the same: 1 bit. This demonstrates clearly that information is not the same thing as energy as some people seem to think. Furthermore, the process in which the coin starts higher is less efficient than when the coin starts lower. This is the key to understanding the isothermal efficiency of molecules which is discussed in the paper emmgeo: 70% efficiency of bistate molecular machines explained by information theory, high dimensional geometry and evolutionary convergence.

Reviews of Books and Papers containing Pitfalls

color bar Small icon for Theory of Molecular Machines: physics,
chemistry, biology, molecular biology, evolutionary theory,
genetic engineering, sequence logos, information theory,
electrical engineering, thermodynamics, statistical
mechanics, hypersphere packing, gumball machines, Maxwell's
Daemon, limits of computers

Schneider Lab
origin: 2002 March 13
updated: version = 1.67 of pitfalls.html 2023 May 28
color bar

Pitfalls in Information Theory and Molecular Information Theory

Reviews of Books and Papers containing Pitfalls

Pitfalls in Information Theory
and
Molecular Information Theory