By downloading this code you agree to the
Source Code Use License (PDF). |
{version = 1.53; (* of encfrq.p 1994 sep 5}
(* begin module describe.encfrq *)
(*
name
encfrq: encoded sequence frequency analysis
synopsis
encfrq(encseq: in, cmp: in, fout: out, output: out)
files
encseq: the output of the encode program
cmp: a composition from the comp program.
fout: frequency tables for each parameter set. these are followed
by z values for each frequency. if cmp is empty, then equal
frequencies are assumed.
output: messages to the user.
description
the frequency of each n-tide (mono- or di- or etc) is displayed in
fout. the actual number of sequences passing through a particular
n-tide and position (ie, a parameter window) is taken into account.
a second set of tables of z values are also presented.
these are calculated from the composition provided in comp (p, the
probability of obtaining the n-tide), the actual number of
occurences (b) and the number of sequences at that position (n).
the distribution of b can be described as a binomial distribution,
with mean (m) np and standard deviation (s) sqrt(npq). b is then
normalized to obtain z: z=(b-m)/s. if n is large, then z is
normally distributed, and the probabilities can be found on any
table for the normal distribution (use a two tailed test). a rule
of thumb for when the normal distribution can be used is that
both np and n(1-p) should be greater than 5. locations that violate
this rule are marked with a '*'.
locations of the z table that contain z values of 3 or greater are
displayed to the right of the z table. since these look somewhat
like a dna footprint, they are called z-footprints. the output
for dinucleotide z-footprints is very wide, so one must split
it up using the split program. recommended values for splitp are
p/14/112/4, where the slash means "start a new line".
see also
encode.p, comp.p, split.p
author
thomas d. schneider
bugs
none known
*)
(* end module describe.encfrq *)
{This manual page was created by makman 1.45}
{created by htmlink 1.62}