By downloading this code you agree to the
Source Code Use License (PDF). |
{version = 8.09; (* of sites.p 2002 Mar 6}
(* begin module describe.sites *)
(*
name
sites: analyse sites from randomized sequence data base
synopsis
sites(database: in, standard: in,
caps: out, latex: out, list: out, sorted: out,
stats: out, tables: out, rsdata: out,
sequ: out, makebkp: out, output: out)
files
database: database consisting of DNA sequence data.
The first line is the name of the database.
The remaining lines consist of experimental packages.
The start of a package is a line like:
@ -27 11 -21 5 0.85
The '@' must be left justified as the first character on the
line. The numbers are defined to be:
@ FROM.range TO.range FROM.random TO.random fraction.canonical
FROM.range: the coordinate of the first base reported in the database
TO.range: the coordinate of the last base reported in the database
FROM.random: the coordinate of the first randomized base
TO.random: the coordinate of the last randomized base
fraction.canonical: the fraction of the canonical base during
chemical synthesis.
The next line defines the canonical sequence which was 'randomized'. It
is in the format of the remaining sequences. The first sequence in the
package is always the standard, so do not forget to include it!
The sequences follow the standard. The format of the standard and the
randomized sequences consists of:
DNA sequence, plasmid name, primer, experiment, date (year, month, day)
separated by one space each instead of commas.
The sequence may contain any of the characters: "acgtxd.".
"x" means that the base is not known. "d" means that that base
was deleted. The program will reject these sequences (to make pure
data), but this allows them to be stored in the database. "." means
'the same as the standard sequence in this position'. This allows
one to enter sequences as a set of changes from the standard.
The next experimental package begins with another '@'. The data from
each experimental package are gathered as frequencies and normalized by
using the given canonical base frequency. The normalized frequencies
from all the packages are averaged to produce the final results. This
allows one to combine several experiments together, however all
experiments are given the same weight. This is reasonable if the
experiments have similar canonical frequencies and numbers of sequences,
but is probably not correct if one experiment carries more "importance"
than another. A method to accounting for these different weightings is
not known.
standard: Use the rsdata output of the rseq program from the natural
sequences as your standard. It is used for statistical comparison of the
experiment to wild-type sequences.
caps: listing of the database sorted and with capital letters showing
changes from the standard and database errors.
latex: just like list, but in a form that can be run through the typesetting
program LaTeX.
list: listing of the database in an easy-to-read format showing only the
changes from the standard. Also gives the tables of numbers of bases.
sorted: the list sorted by sequence
stats: frequency statistics of the database differences.
summary of information results.
tables: frequency tables for various stages of the normalization.
rsdata: This simulates the output of the rseq program by giving the
numbers of bases (b) at each position (i). When the frequency tables are
normalized in this program, the effective number of sequences is lost.
To make sure that the numbers reported in rsdata are accurate, they are
multiplied by constant scaleup. The table can be run through dalvec and
makelogo to make a sequence logo. The variance, varhnb, is set to be
negative to indicate that no method is known for how to calculate it. An
earlier version of the program gave the minimum error based on the number
of sequences in the database, but people tended to miss this fact when
looking at the final sequence logo, so were unduely impressed by the
data.
sequ: raw sequences (after processing) ready for makebk
makebkp: input for makebk to create the book
output: messages to the user
description
The function of the sites program is to gather, collate and analyze
data from a randomization experiment. See the reference given below.
It was designed to help enter sequence data. One may enter several copies
of a particular sequence, and they will be joined together by merging their
data. Sequences of the same clone are identified by their common plasmid
names. Inconsistent data are flagged.
First the program sorts the data and checks that multiple entries are
consistant with one another. If they are not, the program halts and you
should look into the caps file to figure out what is wrong.
The program converts the database into a more readable form in list, and
provides statistical analysis. If the standard is:
gaattcaaattaatacgactcactatagggagaaagctt pTS37 kc7 ex100 87 nov 2
and one of the data base lines is:
gaattcaaattaattcgactcactttagggaaaaagctt pTS331 1204 ex394 87 nov 2
the program presents the data in file list as:
..............t.........t......a....... pTS331 1204 ex394 87 nov 2
which is more readable. This allows entry as a sequence, but display
in a form that is easy to understand.
If two primers are used, and data are found for both, then the
name becomes 'both'.
The stats file contains tables of the wild type frequencies and
the experimental frequencies.
examples
See database.t7 and standard.t7.
documentation
@article{Schneider1989,
author = "T. D. Schneider
and G. D. Stormo",
title = "Excess Information at Bacteriophage {T7} Genomic Promoters
Detected by a Random Cloning Technique",
year = "1989",
journal = "Nucl. Acids Res.",
volume = "17",
pages = "659-674"}
see also
Examples: database.t7 and standard.t7
Related programs:
siva.p, dalvec.p, makelogo.p, makebk.p
author
Tom Schneider
bugs
For sorting all plasmid initials are ignored, sorting is by the plasmid
number only.
A correction for small sample size is not known for the normalized
experimental data. Certainly the method given in program Calhnb is not
right. Therefore, the program does not report the expected variation.
*)
(* end module describe.sites *)
{This manual page was created by makman 1.45}
{created by htmlink 1.62}