By downloading this code you agree to the
Source Code Use License (PDF). |
{ version = 3.01; (* of tod.p 2016 Jan 24}
(* begin module describe.tod *)
(*
name
tod: to database format for sites program
synopsis
tod(abi: in, thedate: in, ssb: in, todp: in,
results: out, summary: out, db: out, output: out)
files
abi: Raw sequences from the ABI sequencing machine. The
files called *_??.Seq.txt are manipulated under Unix by:
more *_??.Seq.txt | cat > abi
echo "" >> abi
The more program puts each name followed by the contents, and it is smart
enough to pipe it to cat which joins the results together. Thus the abi
file contains the sample names followed by the sequences. The echo puts
a single carriage return at the end of the file so that it ends cleanly.
thedate: Date that the sequences were run, output of makedate program.
ssb: This must be a copy of the "Sample_Sheet.bin" file from the ABI
machine. It contains:
lane number
plasmid name
primer name
in a funny non-ASCII format which this program extracts from. (The
program extracts the data from their known rigid locations.) The sample
name column of the sample sheet must contain the plasmid name. Any
number of spaces, slashes (/) or null characters are then skipped and the
next non null word (ending in null or space) is taken as the primer
name. Thus the format is "plasmid/primer". For example: pTS421/pTS37f1
There is a bug in the ABI code which will replace the first letter
of the 24th lane with a null character sometimes. To get around
the bug, we will try to rewrite the sample sheet if this appears.
todp: parameters to control the program
first line: a string of characters, called R1, which represents
a restriction site or other sequence.
NOTE: it should be self complementary.
second line: a string of characters, called R2, which represents
a restriction site or other sequence.
NOTE: it should be self complementary.
following lines: Editing commands for sequences. There is one editing
command per line, and each consists of three integers (called N, P1
and P2) followed by a string (S). N is the lane number to edit.
The P1 and P2 define two positions in the sequence. The sequence
between these positions is deleted and replaced by the string S. The
string must contain only the letters 'acgt' or the single letter 'd'.
This allows one to insert sequence (make the P1 = P2 + 1, string is
'acgt' form), to delete (P1 > P2 + 1, string is 'd') and to replace
(make P1 = P2 + 1 + length of string in 'acgt' form). Lines that
begin with '*' are comments, copied to the results file. Comments
for each edit must be placed just below the edit command line.
results: running commentary of the processing of the sequences.
The following changes are made:
1. Each sequence is edited according to instructions in todp.
2. Each sequence is converted to lower case.
3. The letter 'n' is converted to 'x'.
4. When there is exactly one copy of R1 and one copy of R2,
the region between R1 and R2 is printed (including R1 and R2).
Otherwise, the entire original sequence is printed.
5. The sequence complement is printed if necessary to assure that R1
is printed before R2. The program will print the original
sequence if R1 and R2 cannot be found on the complement.
The sequence is then joined to the data from ssb, and the results printed.
summary: summary of the results.
db: The sequences from abi are reformed into the database format needed
by the sites program.
output: messages to the user
description
Convert output sequence from ABI sequencing machine into format usable
by the sites program.
examples
documentation
see also
sites.p, makedate.p, dotod
author
Thomas Dana Schneider
bugs
technical notes
*)
(* end module describe.tod *)
{This manual page was created by makman 1.45}
{created by htmlink 1.62}