By downloading this code you agree to the
Source Code Use License (PDF). |
{ version = 2.29; (* of calhnb.p 2005 Jul 16}
(* begin module describe.calhnb *)
(*
name
calhnb: small-sample correction for information and uncertainty
synopsis
calhnb(fin: in, fout: out, output: out)
files
fin: the genomic composition (integers) on one line followed by
a set of integers, one per line representing values of n
fout: a table showing n, e(hnb), ae(hnb) and their difference.
the variances var(hnb) and avar(hnb) are tabulated along with
the difference between their square roots. This is the difference
between the standard deviations. e(n) is found from the genomic
uncertainty minus e(hnb). Finally, sd(n) = sqrt(var(hnb)) is given.
output: messages to the user.
describe
Given a genomic composition and a series of integers (n) that represent
the number of sample sites, calhnb calculates the sampling error as e(hnb)
and the variance var(hnb). It also finds the approximations ae(hnb) and
avar(hnb). These values are presented in a table along with the
differences between the exact and approximate calculations. This table
will allow a user to decide when to use the approximations. Beware that
the exact calculation becomes very expensive for large n. For this
reason, I use the approximate computation for n > 20 in rseq and alpro.
examples
When used as fin, the calhnb.fin file should generate the calhnb.fout file
in the fout. The data should be identical those given in Figure A.2 on
page 428 of the Appendix of Schneider et al 1986.
documentation
"Information content of binding sites on nucleotide sequences"
T. D. Schneider, G. D. Stormo, L. Gold, and A. Ehrenfeucht
JMB 188:415-431 (1986) [see link below]
see also
Example input file, fin: calhnb.fin
Corresponding output file, fout: calhnb.fout
fin file for values up to n = 50: calhnb.50.fin
fout file for values up to n = 50: calhnb.50.fout
Discussion about correctiing for small sample size:
https://alum.mit.edu/www/toms/small.sample.correction.html
Schneider et al. (1986):
https://alum.mit.edu/www/toms/paper/schneider1986
related programs: rseq.p, alpro.p
author
Thomas D. Schneider
bugs
It would be nice to have a generalized algorithm for any number
of symbols.
*)
(* end module describe.calhnb *)
{This manual page was created by makman 1.45}
{created by htmlink 1.62}