One cannot measure an information content from a single sequence. Dyad
symmetries in DNA (palindromes) are an exception because both the sequence of
the palindrome and its complement are available. This allows us to estimate
how much information appears in the lac operator
(Beckwith, 1978; Goeddel et al., 1978;
Sadler et al., 1983a). Gilbert and Maxam (1973) found that the
tetrameric lac repressor protein protects 24 base pairs
from DNase digestion.
This is a region from -13 to +10, where the zero is the central base. More
recently, exonuclease III digestion gave the range -14 to +16
(Shalloway et al., 1980).
To analyze the site
we extended the range -16 to +16 by 5 bases on both
sides (Fig. 5).
This range includes the
"extended operator" (Dickson et al., 1975;
Heyneker et al., 1976). As with
other operators, the sequence was compared to its complement using the program
Rseq. The central position was included,
giving
Rsequence = 19.2 bits per
site. Because there are only two examples, there is a large sampling error.
If there is only one functional lac repressor
binding site in the E. coli
genome, then
Rfrequency = 21.9 bits per site.
"Pseudo"-operator sequences
exist for which there is no known function
(Reznikoff et al., 1974; Winter and
von Hippel, 1981). If we include the strong secondary "pseudo"-operator,
Rsequence = 16.22.6 and
Rfrequency = 20.9 bits.