There are many many statements in the literature which say that information is the same as entropy. The reason for this was told by Tribus. The story goes that Shannon didn't know what to call his measure so he asked von Neumann, who said `You should call it entropy ... [since] ... no one knows what entropy really is, so in a debate you will always have the advantage' (Tribus.McIrvine1971).
Shannon called his measure not only the entropy but also the "uncertainty". I prefer this term because it does not have physical units associated with it. If you correlate information with uncertainty, then you get into deep trouble. Suppose that:
but
since they have almost identical formulae:
How could that be? Information is the very opposite of randomness!
The confusion comes from neglecting to do a subtraction:
If you use this definition, it will clarify all the confusion in the literature.
Note: Shannon understood this
distinction and called the uncertainty which is subtracted the 'equivocation'.
Shannon (1948) said on page 20:
The mistake is almost always made by people who are not actually trying to use the measure. As a practical example, consider the sequence logos. Further discussion on this topic is in the https://alum.mit.edu/www/toms/bionet.info-theory.faq.html under the topic I'm Confused: How Could Information Equal Entropy?
For a more mathematical approach, see the Information Theory Primer.
Some questions and answers might make these isues more clear.
@article{Tribus.McIrvine1971, author = "M. Tribus and E. C. McIrvine", title = "Energy and Information", journal = "Sci. Am.", volume = "225", number = "3", pages = "179--188", month = "September", note = "{(Note: the table of contents in this volume incorrectly lists this as volume \textbf{224})}. \htmladdnormallink {http://dx.doi.org/10.1038/scientificamerican0971-179} {http://dx.doi.org/10.1038/scientificamerican0971-179}, \htmladdnormallink {http://www.nature.com/scientificamerican/journal/v225/n3/pdf/scientificamerican0971-179.pdf} {http://www.nature.com/scientificamerican/journal/v225/n3/pdf/scientificamerican0971-179.pdf}", year = "1971"}links:
@article{Machta1999, author = "J. Machta", title = "{Entropy, Information, and Computation}", journal = "Am. J. Phys.", volume = "67", pages = "1074-1077", year = "1999"}"The results of random processes usually have high information content". "Randomness and information are formally the same thing." He also shows an equation relating "Shannon information" to the uncertainty function. This is a perfect example of total confusion on this issue!
@article{Padian2002, author = "K. Padian", title = "{EVOLUTION AND CREATIONISM: Waiting for the Watchmaker}", journal = "Science", volume = "295", pages = "2373-2374", year = "2002"}"In information theory, the term can imply increasing predictability or increasing entropy, depending on the context." Kevin Padian, who wrote the review, reports that the error came from the book he was reviewing:
Intelligent Design Creationism and Its Critics Philosophical, Theological, and Scientific Perspectives Robert T. Pennock, Ed. MIT Press, Cambridge, MA, 2001. 825 pp. $110, ISBN 0-262-66124-1.
@article{Allahverdyan.Nieuwenhuizen2001, author = "A. E. Allahverdyan and T. H. Nieuwenhuizen", title = "{Breakdown of the Landauer bound for information erasure in the quantum regime}", journal = "Phys. Rev. E", volume = "64", pages = "056117-1--056117-9", year = "2001"}This is an example of the typical physicists' muddle about "erasure" in which they set the state of a device to one of several states and call this a "loss of information". But setting a device to one state (no matter what it is) decreases the entropy and increases the information. The main mistake that the physicists make is not having any real working examples. It's entirely theoretical for them. (These people believe that they can beat the Second Law. I would simply ask them to build the perpetual motion machine and run the world power grid from it before making such a claim.)
@article{Crow2001, author = "J. F. Crow", title = "{Shannon's brief foray into genetics}", journal = "Genetics", volume = "159", pages = "915--917", year = "2001"}He confounds information with uncertainty, but forgot the minus sign on the sum p log p formula. He also confounded information with entropy. Finally, he claimed that "a noisy system can send an undistorted signal provided that the appropriate error corrections or redundancy are built in". This is incorrect since there will always be error, but Shannon's channel capacity theorem shows that the error can be made as low as desired (but not zero as this author claims).
"Entropy measures lack of information; it also measures information. These two conceptions are complementary. " The meanings of entropy, Jean-Bernard Brissaud, Entropy 2005, 7[1], 68-96.
2006 Oct 19: Martin Van Staveren pointed out that
at the top of page 22 Shannon's 1948 paper, it seems to be suggested, that part of the received information is due to noise. This is obviously a slip of the pen of Shannon, as he merely tries to explain, in words, that the information rate R is the initial uncertainty minus the uncertainty due to the noise; but he calls H "information" instead of "entropy".He further pointed out that much of the confusion may have come from Weaver:
see this: https://web-beta.archive.org/web/20141211160317/http://pages.uoregon.edu/felsing/virtual_asia/info.html
This is part of the intro that Weaver wrote for "The mathematical theory of information". Some people even refer to "Shannon-Weaver theory" because of this intro.
section 2.5 of this intro: noise generates "spurious", or "undesirable" information, whatever that may mean. The section also introduces the esoteric notion of "meaningless information", contrary to what Shannon himself says in the body of the text. I think that Weaver's arrogance in thinking that he had to "explain" Shannon, has done a big disservice to information theory, which really is only probability theory.
2009 Jan 21: 6.050J Information and Entropy (Spring 2008) an MIT Open Courseware course. In the Syllabus there is a "Text" PDF. The last sentence of the second paragraph of the Preface reads: "Only recently has entropy been widely accepted as a form of information." This is, of course, backwards.
Also, the statement "Second Law states that entropy never decreases as time goes on" is wrong since the entropy of a system can decreas if heat leaves the system - that's how snowflakes form!
At least they admit: "In fact, we may not yet have it right."!!
@article{Sheth.Sachidanandam2006, author = "N. Sheth and X. Roca and M. L. Hastings and T. Roeder and A. R. Krainer and R. Sachidanandam", title = "{Comprehensive splice-site analysis using comparative genomics}", journal = "Nucleic Acids Res.", volume = "34", pages = "3955--3967", pmid = "16914448", pmcid = "PMC1557818", year = "2006"}"the information content is -∑pilgpi ..." They knew about this but didn't think through that their "information" measure goes to zero for the most conserved positions, so every graph shows more "information" outside the binding sites!
2011 Jul 14:
@book{Gleick2011, author = "James Gleick", title = "The Information, A History, A theory, A Flood", publisher = "Pantheon", note = "NPR story http://www.npr.org/2011/03/08/134366651/bit-by-bit-the-information-reveals-everything", comment = "Gleick is trapped in pitfalls: H = information and common language use of 'information'", year = "2011"}Review of Gleick2011
2011 Nov 04:
@article{Sarkar1996, author = "S. Sarkar", title = "{Decoding ``Coding'': Information and DNA}", journal = "BioScience", volume = "46", pages = "857--864", year = "1996"}On page 862:
That's incorrect, he made the standard error confusing information with uncertainty and not grasping the essence of the original 1986 paper! He goes on in total confusion:Recently, Thomas Schneider and his collaborators (starting with Schneider et al. 1986) have made promising use of information theory to find the most functionally relevant parts of long DNA sequences when these are all that are available. The basic idea, which goes back to Kimura (1961), is that functional portions of sequences are most likely to be conserved through natural selection. These will therefore have low information content (in Shannon's sense).
They did, see the rest of this web site and note the widespread use of sequence logos which do NOT make the error.Whether Schneider's methods will live up to their initial promise remains to be seen.
No, that's backwards and confused because of the error.Nevertheless, for conceptual reasons alone, this notion of "information" (i.e., Shannon information) is irrelevant in the present context. According to this notion, for DNA sequences the "information" content is a property of a set of sequences: the more varied a set, the greater the "information" content at individual positions of the DNA sequence.
He's completely lost.But "information" in this scheme is not actually what an individual DNA sequence contains, that is, not what would be decoded by the cellular organelles. Worse, what Kimura's (1961) argument suggests is that what should be regarded as biologically informative-functional sequences-are exactly those that have low "information" content.
Note: I believe that he got it right later in his book Doubting Darwin? Creationist Designs on Evolution, Sahotra Sarkar, Doubleday & Company, Inc., Garden City, New York, 2007. (google: doubting darwin creationist designs on evolution sahotra sarkar)
2018 Mar 11: A confusion with harmonies.
https://www.youtube.com/watch?v=HicAnFGE9bA&t=1m44s
cites
https://www.ncbi.nlm.nih.gov/pubmed/21981535
Is there more information for a regular signal? No, SciShow and the
authors made a fundamental error: confusing uncertainty with
information. Let's say that there is a constant amount of noise
received by a person. The uncertainty of the person before receiving
the signal is higher for the irregular signal ($H_{before} = -\sum_i^M
P_i log_2 P_i$). Subtracting the uncertainty caused by the noise
($H_{after}$) gives the information received, $R = H_{before} -
H_{after}$. Because it has higher uncertainty before, the irregular
signal would provide more "information". The authors did not realize
they needed to subtract the noise after the signal has been received
to get the information. This is because they called the uncertainty
the confused term "information-entropy". Perhaps the non-harmonic
sound is harder to process and that is "annoying".
Schneider Lab
origin: 1997 January 4
updated:
2020 Mar 27: Rename Tribus1971 to Tribus.McIrvine1971