We showed how to estimate the information contained in several binding sites ( Rsequence), and we determined values for different kinds of sites. But what determines how much information is in a site? One way to approach this question is to make a different measurement, based on "how much information should be needed to locate the sites?" ( Rfrequency) and then compare this to the first measurement. The results of each analysis are summarized by the ratio of Rsequence to Rfrequency and their difference (Table 1). For ribosomes, LexA, TrpR, LacI, ArgR and cI/cro, the ratio is close to 1. The sum of the differences for the same six systems is -0.7 bits (out of more than 100 bits of total Rsequence).
The large amount of information at T7 polymerase promoters is surprising. We cannot account for this result by using a different size genome, by changing the number of sites, by sampling error, by overspecification to avoid host sites, or by comparison to E. coli promoters. However, there is a simple explanation. The sites have twice as much information as is necessary to locate them in a genome the size of E. coli. Therefore, a second recognizer could be using the extra bits. The sites have symmetry elements that by themselves contain roughly half the information of the entire site. Since T7 RNA polymerase transcribes T7 DNA strictly in one direction (Chamberlin et al., 1970; Summers and Siegel, 1970; Carter et al., 1981; Zavriev and Shemyakin, 1982), it is surprising to find such strong symmetry elements in the promoter sequences. Because the polymerase acts asymmetrically, we assign it to the asymmetric portion of the site.
The symmetric elements could then be the binding site for the second recognizer. Symmetric elements in promoters suggest the presence of operators (Chamberlin, 1974; Dickson et al., 1975; Dykes et al., 1975; Smith, 1979; Ptashne et al., 1980; Gicquel-Sanzey and Cossart, 1982; Joachimiak et al., 1983). With this in mind, it is intriguing that wild type T7 bacteriophage decreases late mRNA synthesis around 10 minutes after infection, while an amber mutation in gene 3.5 prevents the shutoff; therefore gene product 3.5 is a candidate repressor of late T7 transcription (McAllister and Wu, 1978; McAllister et al., 1981; Studier, 1972; Inouye et al., 1973; Jensen and Pryme, 1974; Kerr and Sadowski, 1975; Silberstein et al., 1975; Kleppe et al., 1977; Miyazaki et al., 1978; Kruumlautger and Schroeder, 1981; Dunn and Studier, 1983).
The Rsequence to Rfrequency ratio of 2 suggests that there are likely to be two sites at T7 late promoters. In almost all the examples other than T7, a ratio of 1 for Rsequence/ Rfrequency suggested one site. The exceptional case now becomes the l operators, where we know that two different proteins bind: cI repressor and cro. (The effects of the third protein that binds these regions, E. coli RNA polymerase, are probably blurred out when Rsequence is measured.) The existing biochemical and genetic data show that cI and cro bind to the same nucleotides (Johnson et al., 1981). Both l repressor and cro are dimers that can bind symmetrically and so may share binding site information. If the two proteins used identical information, the ratio would be 1. If they had used different information the ratio could have been as high as 2, as occurs in the T7 promoter/operator sites. In T7, the proposed repressor would bind symmetrically, and so it could not depend only on information in the asymmetric promoter. Conversely, the polymerase could not depend entirely on symmetrical patterns. That is, asymmetric and symmetric sites must have some separate information.