Sequence logos are a graphical technique for summarizing a set of aligned sequences. They were invented by Tom Schneider and his first high school student Mike Stephens to resolve a paradox. Nucleic Acids Res. 18: 6097-6100, 1990 . Weblogo is a web-based server to create sequence logos, written and supported by Steven Brenner and Gavin Crooks's groups. Like any other tool it can be misused.
Below are recommendations for proper use of the logos so that they provde useful data for further studies. Links to the Glossary are provided.
The discovery of base flipping
to initiate DNA replication and RNA transcription would not have occurred using frequency logos. |
Under most circumstances one should not use "flat" frequency logos (left figure) because important biological information is lost. A 'frequency logo' or 'equallogo' is like a regular sequence logo but all stacks have the same height so that each letter height is proportional to the frequency of the corresponding nucleotide or amino acid. In contrast, the height of each stack of the standard sequence logo (right figure) represents the sequence conservation of the sequences measured in bits, a precise and unique unit that is related (but not proportional) to the binding energy (see the papers edmm and emmgeo for more on this important relationship). This is a biologically important summary and if it is not given then a person cannot easily tell what parts are more important that other parts. Furthermore, the user will miss the beautiful sine wave on many conventional DNA sequence logos (See the gallery which was pubished in figure 6 of the paper Information Analysis of Sequences that Bind the Replication Initiator RepA Papp, P. P., D. K. Chattoraj, and T. D. Schneider. 1993. J. Mol. Biol. 233: 219-230.).
For example, the figure of a `flat' logo for the RepA DNA binding protein from bacteriophage P1 (helixrepa) does not give much indication of anything special. However, the corresponding sequence logo shows two major clusters of sequence conservation in positions -1 to +3 and +11 to +13 with an additional strong conservation at +7 and +8. By placing a sine wave over the logo that has a wave length 10.6 bases long, it becomes clear that the two big patches of sequence conservation are one turn of double stranded DNA apart. That RepA protein binds to the face of the DNA with the two strong conservation patches was subsequently confirmed experimentally (Papp et al 1993, Papp and Chattoraj 1994), as indicated by the solid (instead of dashed) sine wave. A protein binding to DNA through the major groove can distinguish up to 2 bits of information and this is consistent with the two large conservation patches. A protein binding to DNA through the minor groove DNA cannot specify more than 1 bit (baseflip) so the T at postion +7 violates B-form DNA. We proposed that RepA flips a base out of the DNA to initiate DNA replication and partially confirmed this experimentally (repan3).
When a crystal structure of RNA polymerase in the initiation compex was obtained, it showed base flipping as predicted from the information type sequence logo:
'Base-specific interactions occur primarily with A(-11) and T(-7), which are flipped out of the single-stranded DNA base stack and buried deep in protein pockets.'and
'the entire T-7 nucleotide is flipped out of the base stack (as predicted by Schneider, 2001)'(A. Feklistov and S. A. Darst, Structural basis for promoter -10 element recognition by the bacterial RNA polymerase sigma subunit, Cell 147: 1257-1269, pmid 22136875, 2011)
Indeed, using a frequency logo means that one will miss the original reason we invented sequence logos. We observed that human donor and acceptor splice sites have the same consensus sequence: CAG|GT (see figure to the right). Yet the information conservation across the binding sites is not the same at each position. How could two RNA binding sites have the same consensus but be different? We invented sequence logos to visualize and resolve this paradox (Stephens.Schneider-splice1992).
As a final example that comes full circle to the original observation of base flipping in RepA iterons, in 2018 the bacteriophage D6 iterons were reported as a consensus, and so the flipping base, which is easily identified in a sequence logo was missed (pubmed 29304472). The sequence logo shows that bases +2, +3 and +4 are fully conserved as TGT, as in the RepA site. Bases +13, +14 and +15 are fully conserved as CCC, the complement of the RepA GGG. This is consistent with proposed 180 degree rotations of the GGG binding subunit during evolution which is observed in the F and pCU1 plasmids (Chattoraj.Schneider1997. The block of conservation around the CCC, 3 bases on the 5' side and 3 bases on teh 3' side, suggests assignment of the middle C at +14 to be the center of the major groove, represented as the peak of a sine wave with wavelength 10.6 bases. With that assignment, the TGT is exactly in the major groove, just as in RepA. This places an A at position +8 exactly into the center of the minor groove.
The point of these examples is not to discourage people from using flat logos, but rather to show that there is a risk in missing important biological phenomena if the conservation of the binding site is not presented to the user quantitatively using the standard sequence logo.
Examples of "flat" logos:
@article{Jager.Schmitz2009, author = "D. Jager and C. M. Sharma and J. Thomsen and C. Ehlers and J. Vogel and R. A. Schmitz", title = "{Deep sequencing analysis of the \emph{Methanosarcina mazei} G\"{o}1 transcriptome in response to nitrogen availability}", journal = "Proc. Natl. Acad. Sci. USA", volume = "106", pages = "21878--21882", pmid = "19996181", pmcid = "PMC2799843", comment = "2013/03/05 15:24:00", year = "2009"} Reference,Figure with legend, Image (jpg)
Differences in microRNA detection levels are technology and sequence dependent. Leshkowitz D, Horn-Saban S, Parmet Y, Feldmesser E. RNA. 2013 Feb 19. [Epub ahead of print] PMID: 23431331
@article{Humphreys.Preiss2012, author = "D. T. Humphreys and C. J. Hynes and H. R. Patel and G. H. Wei and L. Cannon and D. Fatkin and C. M. Suter and J. L. Clancy and T. Preiss", title = "{Complexity of murine cardiomyocyte miRNA biogenesis, sequence variant expression and function}", journal = "PLoS One", volume = "7", pages = "e30933", pmid = "22319597", pmcid = "PMC3272019", year = "2012"}
@article{Chou.Schwartz2011, author = "M. F. Chou and D. Schwartz", title = "{Biological sequence motif discovery using motif-x}", journal = "Curr Protoc Bioinformatics", volume = "Chapter 13", pages = "Unit 13.15--24", pmid = "21901740", year = "2011"}
@article{Oman.vanderDonk2010, author = "T. J. Oman and W. A. {van der Donk}", title = "{Follow the leader: the use of leader peptides to guide natural product biosynthesis}", journal = "Nat Chem Biol", volume = "6", pages = "9--18", pmid = "20016494", pmcid = "PMC3799897", year = "2010"} Figure 2
@article{Viola.Gonzalez2012, author = "I. L. Viola and R. Reinheimer and R. Ripoll and N. G. Manassero and D. H. Gonzalez", title = "{Determinants of the DNA binding specificity of class I and class II TCP transcription factors}", journal = "J Biol Chem", volume = "287", pages = "347--356", pmid = "22074922", pmcid = "PMC3249086", year = "2012"} Figure 1
@article{Ugolev.Schuldiner2013, author = "Y. Ugolev and T. Segal and D. Yaffe and Y. Gros and S. Schuldiner", title = "{Identification of conformationally sensitive residues essential for inhibition of vesicular monoamine transport by the noncompetitive inhibitor tetrabenazine}", journal = "J Biol Chem", volume = "288", pages = "32160--32171", pmid = "24062308", pmcid = "PMC3820856", year = "2013"} Figure 4 and Figure 6
@article{Ranjani.Goh2014, author = "V. Ranjani and S. Janecek and K. P. Chai and S. Shahir and R. N. {Abdul Rahman} and K. G. Chan and K. M. Goh", title = "{Protein engineering of selected residues from conserved sequence regions of a novel Anoxybacillus alpha-amylase}", journal = "Sci Rep", volume = "4", pages = "5850", pmid = "25069018", year = "2014"} Figure 1
@article{Borrok.Tsui2015, author = "M. J. Borrok and Y. Wu and N. Beyaz and X. Q. Yu and V. Oganesyan and W. F. Dall'Acqua and P. Tsui", title = "{pH-dependent Binding Engineering Reveals an FcRn Affinity Threshold That Governs IgG Recycling}", journal = "J Biol Chem", volume = "290", pages = "4282--4290", pmid = "25538249", pmcid = "PMC4326836", comment = "2015/03/14 17:04:41 ", year = "2015"}Figure 3.
@article{Jolma.Taipale2013, author = "A. Jolma and J. Yan and T. Whitington and J. Toivonen and K. R. Nitta and P. Rastas and E. Morgunova and M. Enge and M. Taipale and G. Wei and K. Palin and J. M. Vaquerizas and R. Vincentelli and N. M. Luscombe and T. R. Hughes and P. Lemaire and E. Ukkonen and T. Kivioja and J. Taipale", title = "{DNA-binding specificities of human transcription factors}", journal = "Cell", volume = "152", pages = "327--339", pmid = "3332764", year = "2013"}
@article{Schneider.Stephens1990, author = "T. D. Schneider and R. M. Stephens", title = "Sequence Logos: A New Way to Display Consensus Sequences", journal = "Nucleic Acids Res.", volume = "18", pages = "6097--6100", pmid = "2172928", pmcid = "PMC332411", note = "\htmladdnormallink {http://dx.doi.org/10.1093/nar/18.20.6097} {http://dx.doi.org/10.1093/nar/18.20.6097}, \htmladdnormallink {https://alum.mit.edu/www/toms/papers/logopaper/} {https://alum.mit.edu/www/toms/papers/logopaper/}", year = "1990"}
For example, two papers on malarial proteins were published back-to-back in Science in which sequence logos were given for similar data. One paper (Hiller.Haldar2004) apparently used relative entropy and so showed an impossible amount of sequence conservation, near 5 bits for the 20 amino acids. To chose one object in 20 never takes more than log220 = 4.3 bits, see their Figure 2. The other paper (Marti.Cowman2004) did not cite the source of their method but it was presumably the original logo paper since the height of a fully conserved position is around 4.3 bits, see their Figure 1, and so the two logos show inconsistent heights. A reader could be left puzzled by the discrepancy. (Note also the lack of error bars on the figures.)
@article{Marti.Cowman2004, author = "M. Marti and R. T. Good and M. Rug and E. Knuepfer and A. F. Cowman", title = "{Targeting malaria virulence and remodeling proteins to the host erythrocyte}", journal = "Science", volume = "306", pages = "1930--1933", pmid = "15591202", year = "2004"} @article{Hiller.Haldar2004, author = "N. L. Hiller and S. Bhattacharjee and C. {van Ooij} and K. Liolios and T. Harrison and C. Lopez-Estrano and K. Haldar", title = "{A host-targeting signal in virulence proteins reveals a secretome in malarial infection}", journal = "Science", volume = "306", pages = "1934--1937", pmid = "15591203", year = "2004"}
These recommendations are available online at https://alum.mit.edu/www/toms/logorecommendations.html