By downloading this code you agree to the
Source Code Use License (PDF). |
{ version = 1.28; (* of diffribl.p 2005 Jun 3} (* begin module describe.diffribl *) (* name diffribl: calculate the difference between two ribls synopsis diffribl(ribla: in, riblb: in, diffriblp: in, posdiff: out, drxyin: out, output: out) files ribla: The ribl output of the Ri program for the first of 2 compared ribls riblb: The ribl output of the Ri program for the second of 2 compared ribls posdiff: An output file which can be used with the xyplo program. This file lists all the difference or distance values for each position. The columns are as follows: 1. This coordinate is the relative position of ribl A. 2. This value is the sum of differences or distances at that position This file is only useful for the non-scrolling calculations. diffriblp: parameters to control the program. The file must contain the following parameters, one per line: parameterversion: The version number of the program. This allows the user to be warned if an old parameter file is used. range of calculation (char OR char with integers): This allows for the user to specify the range of the matrix. The user can use the range of the matrix by using an 'r'. The user can specify their own range by using a 'u' and then the desired range. The user range must be smaller than the range of the ribl. scrolling function (char OR char with integers): The matrix can be scrolled over itself by using 'v' and the range of the scroll. To not use the scrolling function, use 'n'. calctype: type of calculation (char): The user can use one of several types of calculations. In many cases, the units reported are in bits. (e) The first, specified by "e" is a measurement of the Euclidean distance between two positions in two matrices. This is done with the following equation: Positional distance = square root( (A1 - A2)^2 + (C1 - C2)^2 + (G1 - G2)^2 + (T1 - T2)^2 ) This positional distance is then summed for all positions giving the total sum of positional distances. When "e" is used with the scrolling function, it calculates only for the overlapping part of the matrices. This feature can be used with both symmetric and asymmetric models. (o) The second, specified by "o", is a measurement of the Euclidean distance between two matrices. As opposed to the calculation done in "e", this treats each matrix as a point in 4^(l) dimensional space. Since there is only one point, in this case there are no positional differences and so the values in posdiff are reported the same as for the "e" option. (d) The third, specified by "d", is a measurement of difference between the two matrices. This is done with the following equation: Positional difference = (A1 - A2) + (C1 - C2) + (G1 - G2) + (T1 - T2) This positional difference is then summed for all positions giving the total sum of positional differences. When "d" is used with the scrolling function, it calculates only for the overlapping part of the matrices. This feature cannot be used with an asymmetric model. (s) The fourth, specified by "s", is a measurement of the average response the ribla should make to passing across sites in riblb. This is computed as: $\sum_l \sum_b f_b(b,l-offset)*Ri_a(b,l)$ (This is LaTeX typesetting notation, \sum is sum; "_" means subscript.) NOTE: the frequency f is computed from the number of bases at the given position. (z) The fifth, specified by "z", is a measurement of the three dimentional distance between the two matrices, following Zhang.Zhang1991a and Zhang.Zhang1991b. Base frequencies are computed from the ribl data file. Then each set of frequencies, a, c, g, and t, for which, by definition, a + c + g + t = 1 (1) can be represented in three dimensions as: x = (a+g) - (t+c) = 2(a+g) - 1 (2) y = (a+c) - (t+g) = 2(a+c) - 1 z = (a+t) - (g+c) = 2(a+t) - 1 These are three independent variables defined by Zhang. They map into a tetrahedron in three dimensions. Zhang consideres the above to be a 'reduced' coordinate system. The non-reduced system is: X = [sqrt(3)/4] x (3) Y = [sqrt(3)/4] y Z = [sqrt(3)/4] z The positional distance is calculated as: positional distance = sqrt ( (X2-X1)^2 (4) + (Y2-Y1)^2 + (Z2-z1)^2 ) where sqrt is the square root. From Zhang.Zhang1991a (page 46 and 47), this simplifies to: positional distance = [sqrt(3)/2] (5) * sqrt( (a2-a1)^2 + (g2-g1)^2 + (c2-c1)^2 + (t2-t1)^2 ) where the (a1, g1, c1, t1), (a2, g2, c2 and t2) are probabilities of two different matrices at one position. This positional distance is then summed for all positions giving the total sum of positional distances. Because frequencies sum to 1, there really are only three independent degrees of freedom and therefore only three dimensions. So equations for three and four dimensions give the same results. (y) The seventh, specified by "y", is computed as "z" and then it is normalized by the maximum possible distance between points in the tetrahedrons. With frequencies, the largest distance in the Zhang tetrahedron is sqrt(3/2), along the edge. For L positions, the largest possible distance is therefore sqrt(3/2)L. All values are divided by this maximum. The following shows that the maximum distance in the non-reduced coordinate system is sqrt(3/2). Using equations (2) and (3), for the case of all G, the point is at (X, Y, Z) = (sqrt(3)/4, -sqrt(3)/4, -sqrt(3)/4) while for all C, the point is (X, Y, Z) = (-sqrt(3)/4, sqrt(3)/4, -sqrt(3)/4) The distance between these points is sqrt(3/2). drxyin: This gives the total sum of distance or difference, depending on which calculation function is being used. When the scrolling function is being used, it will report the total sum value along with the position of the scroll. The position of the scroll is the distance between the zero coordinates of the two matrices. output: messages to the user description This program looks at the differences in two individual information weight matricies (ribls) by finding the difference in information at each position, for each base, and then summing the differences. Then all of the differences at each position are summed to express a diffribl value. Actually, the program now has a number of other ways of comparing the ribls, depending on a user parameter. examples examples of diffriblp 1.24 version of diffribl that this parameter file is designed for. r u -10 +10 r use the from/to coords from ribl, u means use user specified n v -21 +21 v and coords=move riblB across riblA for the range, n=no eodszy e:Euclid, o:Euclid4d, d:difference, s:scan, z:Zhang, y:znorm documentation @article{Shultzaberger.Schneider2001, author = "R. K. Shultzaberger and R. E. Bucheimer and K. E. Rudd and T. D. Schneider", title = "{Anatomy of \emph{Escherichia coli} Ribosome Binding Sites}", journal = "J. Mol. Biol.", volume = "313", pages = "215-228", comment = "Shultzaberger.Schneider.flexrbs", note = "\htmladdnormallink {https://alum.mit.edu/www/toms/paper/flexrbs/} {https://alum.mit.edu/www/toms/paper/flexrbs/}", year = "2001"} see also example parameter file: diffriblp Description of use is in Shultzaberger.Schneider2001: https://alum.mit.edu/www/toms/paper/flexrbs/ program that generates ribls: ri.p program that uses ribls to find sites: scan.p graphics program for xyin: xyplo.p source of program modules: lister.p author Ryan Shultzaberger Thomas D. Schneider Zehua Chen bugs There is a problem with comparing different sized ribls. I (Ryan?) need to fix this. For now, only use this program with same sized ribls. The result will be wrong if done otherwise. Comparisons in 4 dimensional space are not appropriate because the 4 probabilities are not independent. To avoid this, one can replace the 4 dimensional space with a 3 dimensional one according to Zhang's methods: @article{Zhang.Zhang1991a, author = "C.-T. Zhang and R. Zhang", title = "Diagrammatic representation of the distribution of {DNA} bases and its applications", journal = "Int. J. Biol. Macromol.", volume = "13", pages = "45-49", note = "tetrahedron method", year = "1991"} @article{Zhang.Zhang1991b, author = "C.-T. Zhang and R. Zhang", title = "Analysis of distribution of bases in the coding sequences by a diagrammatic technique", journal = "Nucleic Acids Res.", volume = "19", pages = "6313-6317", note = "tetrahedron method", year = "1991"} @article{Zhang1997, author = "C.-T. Zhang", title = "A Symmetrical Theory of {DNA} Sequences and Its Applications", journal = "J. Theor. Biol.", volume = "187", pages = "297-306", year = "1997"} To do this, the Ri can be converted to probabilities according to Ri = 2 + log2(Pi). Then the probabilities are converted to the Zhang XYZ space. Distances are then measured in that XYZ space. However it is better to use the Pi directly from the ribl file. technical notes *) (* end module describe.diffribl *) {This manual page was created by makman 1.45}{created by htmlink 1.62}