By downloading this code you agree to the
Source Code Use License (PDF). |
{version = 5.27; (* of comp.p 1999 Oct 13}
(* begin module describe.comp *)
(*
name
comp: determine the composition of a book.
synopsis
comp(book: in, cmp: out, compp: in, output: out)
files
book: the sequences;
cmp: the composition, determined for mononucleotides up to
oligonucleotides of length "compmax", see file compp;
compp: parameter file used to set the length of the oligonucleotides for
which the composition is to be determined ("compmax"); that number
must be the first thing in the file; if the file is empty
compmax is set by default to the constant "defcompmax";
output: for messages to the user.
description
Comp counts the number of each oligonucleotide (from length 1 to
compmax) in the book and prints that to file "cmp". The output is
printed in order of increasing length of oligonucleotide (i.e., first
the monos, then the dis, ...). If there are no occurences of an
oligonucleotide, but its one-shorter parent did occur, it will be given
a zero. None of its descendants will be printed in the composition
file.
examples
As an example of the output format, the composition to depth 3 of E. coli
(U00096, 16-OCT-1997) is:
comp 5.27: composition of
* 1999/05/04 14:41:13, 1999/05/04 14:38:08, dbbk 3.33
3 is the longest oligo counted
*
0-long oligos (the total number of bases)
4639221
*
1-long oligos
a 1142136 c 1179433 g 1176775 t 1140877
*
2-long oligos
aa 337835 ac 256658 ag 237851 at 309792
ca 325118 cc 271649 cg 346636 ct 236029
ga 267234 gc 383865 gg 270083 gt 255593
ta 211948 tc 267261 tg 322205 tt 339463
*
3-long oligos
aaa 108901 aac 82578 aag 63364 aat 82992
aca 58633 acc 74899 acg 73263 act 49863
aga 56618 agc 80848 agg 50611 agt 49774
ata 63692 atc 86476 atg 76229 att 83395
caa 76607 cac 66752 cag 104785 cat 76974
cca 86442 ccc 47764 ccg 87031 cct 50412
cga 70934 cgc 115673 cgg 86870 cgt 73159
cta 26762 ctc 42714 ctg 102900 ctt 63653
gaa 83490 gac 54737 gag 42460 gat 86547
gca 96010 gcc 92961 gcg 114609 gct 80285
gga 56199 ggc 92123 ggg 47470 ggt 74291
gta 52670 gtc 54225 gtg 66108 gtt 82590
taa 68837 tac 52591 tag 27241 tat 63279
tca 84033 tcc 56025 tcg 71733 tct 55469
tga 83483 tgc 95221 tgg 85132 tgt 58369
tta 68824 ttc 83846 ttg 76968 ttt 109825
see also
compan.p, histan.p, markov.p
authors
Gary Stormo and Tom Schneider
bugs
none known
technical note
The algorithm is an interesting application of linked lists. The
composition is stored as a tree, and a number of "spiders" climb the
tree during its construction.
*)
(* end module describe.comp *)
{This manual page was created by makman 1.45}
{created by htmlink 1.62}