Next: (a) Exact method Up: The Information Content of Previous: (e) Why is R_sequence R_frequency?

APPENDIX

Calculation of Sampling Uncertainty and Variance
Thomas D. Schneider, Jeffrey S. Haemer and Gary D. Stormo

Using sampling frequencies in place of population probabilities leads to a bias in the uncertainty measurement H(Basharin, 1959). Here we discuss two methods to find the correction factor when estimating Hfrom a few examples. The first method uses an exact calculation of the average uncertainty for small samples. The probability of obtaining a particular combination of n bases, nb, can be found from a multinomial distribution. The information for the combination, H_nb, is calculated and weighted by the probability of obtaining the combination. The weighted information summed for all combinations is the desired result, the expectation of H_nb, E(H_nb). The second method uses a formula to approximate the correction factor.

(a) Exact method
(b) Approximate method
(c) Use of the Correction Factor
(d) Variance of the Correction Factor

Tom Schneider
2002-10-16