{version = 1.53; (* of encfrq.p 1994 sep 5} (* begin module describe.encfrq *) (* name encfrq: encoded sequence frequency analysis synopsis encfrq(encseq: in, cmp: in, fout: out, output: out) files encseq: the output of the encode program cmp: a composition from the comp program. fout: frequency tables for each parameter set. these are followed by z values for each frequency. if cmp is empty, then equal frequencies are assumed. output: messages to the user. description the frequency of each n-tide (mono- or di- or etc) is displayed in fout. the actual number of sequences passing through a particular n-tide and position (ie, a parameter window) is taken into account. a second set of tables of z values are also presented. these are calculated from the composition provided in comp (p, the probability of obtaining the n-tide), the actual number of occurences (b) and the number of sequences at that position (n). the distribution of b can be described as a binomial distribution, with mean (m) np and standard deviation (s) sqrt(npq). b is then normalized to obtain z: z=(b-m)/s. if n is large, then z is normally distributed, and the probabilities can be found on any table for the normal distribution (use a two tailed test). a rule of thumb for when the normal distribution can be used is that both np and n(1-p) should be greater than 5. locations that violate this rule are marked with a '*'. locations of the z table that contain z values of 3 or greater are displayed to the right of the z table. since these look somewhat like a dna footprint, they are called z-footprints. the output for dinucleotide z-footprints is very wide, so one must split it up using the split program. recommended values for splitp are p/14/112/4, where the slash means "start a new line". see also encode.p, comp.p, split.p author thomas d. schneider bugs none known *) (* end module describe.encfrq *) {This manual page was created by makman 1.45}{created by htmlink 1.62}