> next up previous
Next: (a) Exact method Up: The Information Content of Previous: (e) Why is Rsequence Rfrequency?


Calculation of Sampling Uncertainty and Variance
Thomas D. Schneider, Jeffrey S. Haemer and Gary D. Stormo

Using sampling frequencies in place of population probabilities leads to a bias in the uncertainty measurement H(Basharin, 1959). Here we discuss two methods to find the correction factor when estimating Hfrom a few examples. The first method uses an exact calculation of the average uncertainty for small samples. The probability of obtaining a particular combination of n bases, nb, can be found from a multinomial distribution. The information for the combination, Hnb, is calculated and weighted by the probability of obtaining the combination. The weighted information summed for all combinations is the desired result, the expectation of Hnb, E(Hnb). The second method uses a formula to approximate the correction factor.


Tom Schneider