Brian D. Harper (bharper@magnus.acs.ohio-state.edu)
Mon, 26 Feb 1996 14:07:40 -0500

Yockey#2 Reply to Y#1 by Tom Schneider. TS is unofficial
"moderator" of bionet.info-theory

From: toms@fcsparc6... (Tom Schneider)
Subject: Re: Book Reviews of Information Theory and Molecular Biology
Date: Thu, 26 Jan 1995 00:39:52 GMT

In article <3g37hp$e7b@newsbf02.news.aol.com> hpyockey@aol.com (HPYockey)

| Review of book reviews of Information Theory and Molecular Biology by
| Hubert P. Yockey published by Cambridge University Press 1992
| Why is information theory and coding theory important in molecular
| biology? Gregor Mendel proved that inheritance is particulate and does not
| blend. Morgan showed that inheritance is linear. Watson and Crick
| demonstrated that inheritance is digital. Thus the genome is a message
| recorded in digitial fashion in sequences of nucleotides in DNA. While
| Nature has been recording the life message digitally and using a code for
| at least 3.8 billion years, it is only recently that modern communication
| engineers have discovered the benefits of recording and transmitting
| messages in digitized form. The genetic message is isomorphic with
| messages in general and the genetic code is isomorphic with all codes.
| This means that information theory and coding theory are essential to
| molecular biology.

Nicely put!

| I asked Professor Haken for a list of his publications. ...
| My English translation: "The Shannon information concept says nothing
| about whether the message is meaningful or meaningless, valuable or
| valueless, that is, it goes in every sense of the words, or in other words
| semantics is lacking. In the field of biology this fault means a
| substantial deficiency."

enormous progress understanding DNA/RNA binding sites and molecular machines in
general without needing to use the terms "meaning" or "value".

| My 'polemic against the seminal work of Manfred Eigen' exposes his
| confusion of philosophical notions of semantics and information measured
| in bits as well as a number of other basic faults. Eigen feels free to
| introduce conjectures cooked up ad hoc to suit each problem. One can solve
| (sic) any problem with enough ad hoc conjectures. To remedy what he sees
| as an inadequacy in "classical information theory" he calls for a purely
| empirical "value parameter" that is characteristic of "valued
| information". He states that this "valued information" is reflected by
| increased "order". On the contrary, it is well known in information theory
| that 'increased order' decreases the information content of a message.

Here we differ significantly. This is going to get you into lots of trouble at
some point Hubert! You haven't followed our discussions on the point, so I'll
recap here briefly. If it doesn't make sense, just ask questions (on the
net). I take information to be the decrease of uncertainty in a physical
system. Uncertainty is a state function defined:

H = -\sum_i Pi log_2 Pi (bits per state)

so information is:

R = Hbefore - Hafter.

Subtracting Hafter corresponds to Shannon's subtraction of the uncertainty due
to noise. In human communications, such as in a good phone connection, Hafter
is often so small that we tend to neglect it. That is, Hafter = 0. But then
we have a SPECIAL CASE:

R = Hbefore (only when Hafter is zero!)

The caveat is often ignored. The consequence is confusion. Consider that H
corresponds to the entropy S under the special condition that the probabilities
Pi refer to the states of a molecular machine. That is,

S = -k \sum_i Pi ln Pi (joules per degree kelvin)

So when working with molecules,

S = k ln(2) H.

That is, uncertainty corresponds to entropy, an idea that has been around for a
while. If one calls H the entropy then one is using mixed terminology - note
the different units!

Here's the fast track to confusion:

If H corresponds to S and Hbefore corresponds to R, then R corresponds to S.
But wait! The second law of thermo says that closed systems tend to have
increasing disorder, S rises. So information R corresponds to disorder S.
Lower information corresponds to order. Or in your words:

| ... 'increased order' decreases the information content of a message.

It is completely confusing to say that the more information there is in a
newspaper the less order there is in it! Clearly a newspaper carries a large
amount of information (in bits) to a reader and has low disorder. Burning the
newspaper destroys the ability of the reader to lower their own uncertainty.

All this confusion can be avoided by always treating information as a measure
of a state change between two states. Then it becomes clear that 'increased
order' corresponds to increased information and that increased order is a
decrease in entropy.

If you do not follow this course, then you will be completely confused by the
case of binding sites in DNA. (I was very confused for about 6 months when I
started using information theory to work on binding sites because of the
question of how to do the calculation and the confounding problem of small
sample size. But if we work with large sample sized that question can be set

| Anyone who is computer literate knows that, in the context of computer
| technology, the word information does not mean knowledge. Along with many
| other authors, Eigen makes a play on words by using information in the
| sense of knowledge, meaning and specificity. For example, in
| Naturwissenschaften (1971) volume 58 465-523 (in English) he states with
| reference to sequences in DNA that: "Such sequences cannot yet contain
| any appreciable amount of information." He means knowledge or specificy.
| Eigen uses the word 'information' in two different senses in one
| sentence: "Information theory as we understand it today is more a
| communication theory. It deals with problems of processing information
| rather than of "generating" information."

Yes, that is the problem with his work, you have stated it precisely.

| On the other hand, Elitzur, who is a professor in the Department of
| Chemical Physics at The Weizmann Institute, in Rehovot Israel is alarmed
| by my remark (page 313) x "xit is easy to see that thermodynamics has
| nothing to do with Darwin's theory of evolution. Upon reading this
| uncompromising statement the bewildered reader may recall several
| discussions he or she has previously read concerning the apparent conflict
| between the Second Law and biological order-growth." Elitzur calls the
| Second Law of Thermodynamics an explanation of evolution.

I agree here with Elitzur. The basis is the confusion that information
is the same as entropy. They are not.

| The context in which this remark was made is in Section 12.1 where I
| discussed the assertion made by creationists today and by critics of
| Darwin in the nineteenth century, that there is a conflict between
| evolution and the second law. Creationists say that, since the second law
| cannot be challenged, Darwin's theory of evolution must be abandoned in
| favor of special creation. Even a scientist as eminent as Eddington
| believed there was such a conflict. Had Elitzur overcome his bewilderment
| and read the next paragraph on page 313 he would have found the
| explanation: "In fact, evolution requires an increase in
| Kolmogorov-Chaitin algorithmic entropy of the genome in order to generate
| the complexity necessary for higher organisms".

Shouldn't that be the Kolmogorov-Chaitin algorithmic information?

| Elitzur is dismayed by my finding (Yockey, 1974, 1977) that there is no
| relation between Shannon entropy in information theory and
| Maxwell-Boltzmann-Gibbs entropy in statistical mechanics.

And so am I!

| Let us see why
| that is so. According to modern theory of probability one cannot speak of
| a probability without first establishing a probability space and setting
| up a probability distribution of the random variables, appropriate to the
| problem. The axioms of probability theory must be satisfied in order to
| avoid a "Dutch book" and to be sure that we are not using knowledge we do
| not have (Yockey, 1992, p20-33)
| The probability space in statistical mechanics, called phase space by
| theoretical physicists, is six dimensional and is defined by the position
| and momentum vectors of the ensemble of particles. The values of these
| position and momentum vectors are random variables and the pi form
| probability vectors referring to particle i. The function for entropy in
| statistical mechanics, S, has the dimensions of the Boltzmann constant k
| and has to do with energy, not information. Shannon entropy has no thermal
| or mechanical dimensions.

No. Shannon's work was based rather solidly on thermal and mechanical
considerations. Look at his neglected but beautiful 1949 paper. He uses
Nyquist's equation (which is N = WKT, note the K!!) to apply the temperature to
the electrical circuit. He recognized that, although (as you say) large parts
of the theory are without connection to the physical world, the communications
systems that one builds are in the physical world and these must deal with
thermal noise. Indeed, if there were no thermal noise, the channel capacity
would go to infinity and there would be no problem to communicate perfectly
over any channel.

| Information theory is concerned with messages expressed in sequences of
| letters selected from a finite alphabet by a Markov process. The
| probability space is defined by the letters of the alphabet under
| consideration, which are random variables and the pi form probability
| vectors accordingly.

That is only a portion of the theory.

| To illustrate this point further, one may consider the probability space
| of a dice game that consists of the numbers 2 through 12 and calculate the
| corresponding entropy (Yockey, see exercise on page 88). Clearly, the
| entropy of a dice game has nothing to do with statistical mechanics and
| thermodynamics.

No. If we always consider differences between states of systems, then the
microstates drop out. Rather than repeat it here, see the discussion in

| It may have something to do with information theory since
| a sequence of letters selected from the alphabet is generated as a Markov
| process by a series of tosses of the dice. Such a sequence of letters
| forms a message in which some gamblers find meaning. For these reasons
| entropy in statistical mechanics and entropy in information theory are
| different concepts that have no relationship that enables us to make an
| equivalence of one to the other.

See above.

| Elitzur gets off a number of bloopers: "Information theory, according to
| Yockey, 'shows that it is fundamentally undecidable whether a given
| sequence has been generated by a stochastic process or by a highly
| organized process' (p82). This must be an amazing statement for anyone
| familiar with the basic concepts of information theory where information
| is defined as the very opposite of randomness."
| 'Information' is, of course, not the very opposite of randomness. Elitzur
| is using the word 'information' in the semantic sense as synonym for
| knowledge or meaning. Everyone knows that a random sequence, that is, one
| chosen without intersymbol restrictions or influence, carries the most
| information in the sense use by Shannon and in computer technology. Note:
| For a brief explanation of randomness, complexity, order and information
| see Yockey Nature 344 p823 (1990).

Here you have made the mistake of setting Hafter to zero. So a random sequence
going into a receiver does not decrease the uncertainty of the receiver and so
no information is received. But a message does allow for the decrease. Even
the same signal can be information to one receiver and noise to another,
depending on the receiver!

| Give me your opinion but after you read the book!

I will write more at that point ...

Tom Schneider
Brian Harper |
Associate Professor | "It is not certain that all is uncertain,
Applied Mechanics | to the glory of skepticism" -- Pascal
Ohio State University |