# Information Is Not Entropy, Information Is Not Uncertainty!

Thomas D. Schneider

There are many many statements in the literature which say that information is the same as entropy. The reason for this was told by Tribus. The story goes that Shannon didn't know what to call his measure so he asked von Neumann, who said You should call it entropy ... [since] ... no one knows what entropy really is, so in a debate you will always have the advantage' (Tribus.McIrvine1971).

Shannon called his measure not only the entropy but also the "uncertainty". I prefer this term because it does not have physical units associated with it. If you correlate information with uncertainty, then you get into deep trouble. Suppose that:

information ~ uncertainty

but since they have almost identical formulae:

uncertainty ~ physical entropy
so

information ~ physical entropy

BUT as a system gets more random, its entropy goes up:

randomness ~ physical entropy

so

information ~ physical randomness

How could that be? Information is the very opposite of randomness!

The confusion comes from neglecting to do a subtraction:

Information is always a measure of the decrease of uncertainty at a receiver (or molecular machine).

If you use this definition, it will clarify all the confusion in the literature.

Note: Shannon understood this distinction and called the uncertainty which is subtracted the 'equivocation'. Shannon (1948) said on page 20:

R = H(x) - Hy(x)

"The conditional entropy Hy(x) will, for convenience, be called the equivocation. It measures the average ambiguity of the received signal."

The mistake is almost always made by people who are not actually trying to use the measure. As a practical example, consider the sequence logos. Further discussion on this topic is in the https://alum.mit.edu/www/toms/bionet.info-theory.faq.html under the topic I'm Confused: How Could Information Equal Entropy?

For a more mathematical approach, see the Information Theory Primer.

Some questions and answers might make these isues more clear.

References Examples of the error
• @article{Machta1999,
author = "J. Machta",
title = "{Entropy, Information, and Computation}",
journal = "Am. J. Phys.",
volume = "67",
pages = "1074-1077",
year = "1999"}

"The results of random processes usually have high information content". "Randomness and information are formally the same thing." He also shows an equation relating "Shannon information" to the uncertainty function. This is a perfect example of total confusion on this issue!

• @article{Padian2002,
title = "{EVOLUTION AND CREATIONISM:
Waiting for the Watchmaker}",
journal = "Science",
volume = "295",
pages = "2373-2374",
year = "2002"}

"In information theory, the term can imply increasing predictability or increasing entropy, depending on the context." Kevin Padian, who wrote the review, reports that the error came from the book he was reviewing:
 Intelligent Design Creationism and Its Critics
Philosophical, Theological, and Scientific Perspectives
Robert T. Pennock, Ed.
MIT Press, Cambridge, MA, 2001. 825 pp. $110, ISBN 0-262-66124-1.  • @article{Allahverdyan.Nieuwenhuizen2001, author = "A. E. Allahverdyan and T. H. Nieuwenhuizen", title = "{Breakdown of the Landauer bound for information erasure in the quantum regime}", journal = "Phys. Rev. E", volume = "64", pages = "056117-1--056117-9", year = "2001"}  This is an example of the typical physicists' muddle about "erasure" in which they set the state of a device to one of several states and call this a "loss of information". But setting a device to one state (no matter what it is) decreases the entropy and increases the information. The main mistake that the physicists make is not having any real working examples. It's entirely theoretical for them. (These people believe that they can beat the Second Law. I would simply ask them to build the perpetual motion machine and run the world power grid from it before making such a claim.) • @article{Crow2001, author = "J. F. Crow", title = "{Shannon's brief foray into genetics}", journal = "Genetics", volume = "159", pages = "915--917", year = "2001"}  He confounds information with uncertainty, but forgot the minus sign on the sum p log p formula. He also confounded information with entropy. Finally, he claimed that "a noisy system can send an undistorted signal provided that the appropriate error corrections or redundancy are built in". This is incorrect since there will always be error, but Shannon's channel capacity theorem shows that the error can be made as low as desired (but not zero as this author claims). • "Entropy measures lack of information; it also measures information. These two conceptions are complementary. " The meanings of entropy, Jean-Bernard Brissaud, Entropy 2005, 7[1], 68-96. • 2006 Oct 19: Martin Van Staveren pointed out that at the top of page 22 Shannon's 1948 paper, it seems to be suggested, that part of the received information is due to noise. This is obviously a slip of the pen of Shannon, as he merely tries to explain, in words, that the information rate R is the initial uncertainty minus the uncertainty due to the noise; but he calls H "information" instead of "entropy". He further pointed out that much of the confusion may have come from Weaver: see this: https://web-beta.archive.org/web/20141211160317/http://pages.uoregon.edu/felsing/virtual_asia/info.html This is part of the intro that Weaver wrote for "The mathematical theory of information". Some people even refer to "Shannon-Weaver theory" because of this intro. section 2.5 of this intro: noise generates "spurious", or "undesirable" information, whatever that may mean. The section also introduces the esoteric notion of "meaningless information", contrary to what Shannon himself says in the body of the text. I think that Weaver's arrogance in thinking that he had to "explain" Shannon, has done a big disservice to information theory, which really is only probability theory. • 2009 Jan 21: 6.050J Information and Entropy (Spring 2008) an MIT Open Courseware course. In the Syllabus there is a "Text" PDF. The last sentence of the second paragraph of the Preface reads: "Only recently has entropy been widely accepted as a form of information." This is, of course, backwards. Also, the statement "Second Law states that entropy never decreases as time goes on" is wrong since the entropy of a system can decreas if heat leaves the system - that's how snowflakes form! At least they admit: "In fact, we may not yet have it right."!! • @article{Sheth.Sachidanandam2006, author = "N. Sheth and X. Roca and M. L. Hastings and T. Roeder and A. R. Krainer and R. Sachidanandam", title = "{Comprehensive splice-site analysis using comparative genomics}", journal = "Nucleic Acids Res.", volume = "34", pages = "3955--3967", pmid = "16914448", pmcid = "PMC1557818", year = "2006"}  "the information content is -∑pilgpi ..." They knew about this but didn't think through that their "information" measure goes to zero for the most conserved positions, so every graph shows more "information" outside the binding sites! • 2011 Jul 14: @book{Gleick2011, author = "James Gleick", title = "The Information, A History, A theory, A Flood", publisher = "Pantheon", note = "NPR story http://www.npr.org/2011/03/08/134366651/bit-by-bit-the-information-reveals-everything", comment = "Gleick is trapped in pitfalls: H = information and common language use of 'information'", year = "2011"}  Review of Gleick2011 • 2011 Nov 04: @article{Sarkar1996, author = "S. Sarkar", title = "{Decoding Coding'': Information and DNA}", journal = "BioScience", volume = "46", pages = "857--864", year = "1996"}  On page 862: Recently, Thomas Schneider and his collaborators (starting with Schneider et al. 1986) have made promising use of information theory to find the most functionally relevant parts of long DNA sequences when these are all that are available. The basic idea, which goes back to Kimura (1961), is that functional portions of sequences are most likely to be conserved through natural selection. These will therefore have low information content (in Shannon's sense).  That's incorrect, he made the standard error confusing information with uncertainty and not grasping the essence of the original 1986 paper! He goes on in total confusion: Whether Schneider's methods will live up to their initial promise remains to be seen.  They did, see the rest of this web site and note the widespread use of sequence logos which do NOT make the error. Nevertheless, for conceptual reasons alone, this notion of "information" (i.e., Shannon information) is irrelevant in the present context. According to this notion, for DNA sequences the "information" content is a property of a set of sequences: the more varied a set, the greater the "information" content at individual positions of the DNA sequence.  No, that's backwards and confused because of the error. But "information" in this scheme is not actually what an individual DNA sequence contains, that is, not what would be decoded by the cellular organelles. Worse, what Kimura's (1961) argument suggests is that what should be regarded as biologically informative-functional sequences-are exactly those that have low "information" content. ` He's completely lost. Note: I believe that he got it right later in his book Doubting Darwin? Creationist Designs on Evolution, Sahotra Sarkar, Doubleday & Company, Inc., Garden City, New York, 2007. (google: doubting darwin creationist designs on evolution sahotra sarkar) • 2018 Mar 11: A confusion with harmonies. https://www.youtube.com/watch?v=HicAnFGE9bA&t=1m44s cites https://www.ncbi.nlm.nih.gov/pubmed/21981535 Is there more information for a regular signal? No, SciShow and the authors made a fundamental error: confusing uncertainty with information. Let's say that there is a constant amount of noise received by a person. The uncertainty of the person before receiving the signal is higher for the irregular signal ($H_{before} = -\sum_i^M P_i log_2 P_i$). Subtracting the uncertainty caused by the noise ($H_{after}$) gives the information received,$R = H_{before} - H_{after}\$. Because it has higher uncertainty before, the irregular signal would provide more "information". The authors did not realize they needed to subtract the noise after the signal has been received to get the information. This is because they called the uncertainty the confused term "information-entropy". Perhaps the non-harmonic sound is harder to process and that is "annoying".