Many authors have estimated the frequency of a binding site by considering the site size (Gilbert and Muumlautller-Hill, 1970; Riggs et al., 1970; Muumlautller-Hill et al., 1977; Nei and Li, 1979; Pribnow, 1979; von Hippel, 1979; Harel, 1980). Rsequence, the sum of Rsequence(L) over a binding site, is similar to counting the number of bases recognized by a macromolecule. In addition, it takes into account the variation of individual sequences. The sampling error correction prevents one from overestimating the amount of information in the sequences, but can lead to underestimation in some circumstances (see Fig. 1 and Appendix I, page ).
Rsequence does not tell us anything about the physical mechanisms a recognizer uses to contact the nucleic acid. For example, the ribosome prefers a particular base composition in the Shine and Dalgarno region. The mechanism is an RNA/RNA contact. regA, the translational repressor of bacteriophage T4 (Wiberg and Karam, 1983) uses protein/RNA contacts. It is possible for two such recognizers to have the same base preferences. Since we use sequences to estimate the probabilities of bases at each position, the analysis will give the same information content for two entirely distinct mechanisms. That is, not only is the mechanism irrelevant to the analysis, but one cannot infer anything about the mechanism from the sequence data, the frequency of bases or the information content because several mechanisms may give the same results. How physical and chemical contacts determine the preferred base frequencies is a separate question (Pabo and Sauer, 1984).