By downloading this code you agree to the
Source Code Use License (PDF). |
{ version = 3.50; (* of dbbk.p 2018 Jan 06}
(* begin module describe.dbbk *)
(*
name
dbbk: database to delila book conversion program
synopsis
dbbk(db: in, l1: out, changes: out, output: out)
files
db: contains one or more complete entries from either the EMBL
or GenBank genetic sequence data bases. These entries may be
obtained by using the original libraries or by using an entry
extraction program. Dbpull is the delila program for data base
accessing; to get complete entries the instruction 'all' must
have been used in the dbpull fin file. (See delman.use.dbpull)
l1: each db entry is represented in l1 by a delila style
entry containing information extracted from the db entry.
All of l1 has the biologically oriented structure of
a standard delila book. The first line of l1 is not part
of an entry, but contains the computer system date and the
title of the book.
changes: Delila programs cannot handle sequences that have
ambiguities because Delila was designed on the assumption
that people would finish their sequences. Unfortunately
this is not true, and the databases contain bases other
than acgt to indicate ambiguity. These are converted to
"a" and the cases are reported in this file as "unknown".
NOTE: "u" is converted to "t".
The format is the one that the lister program uses as
features. In the lister map the unknown region is
marked by a string of question marks: "???????????".
output: messages to the user.
description
This program converts GenBank and EMBL data base entries into a
book of delila entries. The organism name is fused together
with a period and is used for both organsim and chromosome
names. Organism and chromosome only change if the name changes
in db.
The names of pieces were given by the ACCESSION number (1994
June 10) but this does not track the versions. So on 2008 Nov
03 I switched it to VERSION which looks like: J04553.1. This
works with catal and delila.
examples
The changes file looks like:
define "unknown:1220-4867" "?" "[]" "[]" 0 3646
@ AC012525 1220.0 +1 "unknown:1220-4867" ""
Lister displays this as:
* *1210 * *1220
5' c g t g g a a c a a g g a a g a a t t a a a a a 3'
[????????? ... unknown:1220-4867
[for brevity the middle part is skipped]
*4850 * *4860 * *4870
5' a a a a a a a a a a a a a a a a a a a a t a g a 3'
... ??????????????????????????????????] unknown:1220-4867
see also
delila.p, dbpull.p, catal.p, libdef, lister.p
author
Matthew Yarus and Tom Schneider (modifications)
bugs
Databases do not have enough data on genes within each piece to make
a book with gene sections.
The changes file is a design bug in Delila.
Genus names are limited to genuslimit (a constant) to avoid
names longer than the standard Delila limit.
If a name is larger than idlength the program simply stops
reading the name and then dies when it reads the number of bases
in the entry. This is currently fixed by making the name 100
characters but should be done better later.
technical notes
dbbk is known to convert GenBank entries from July 1989.
It may not work on later versions.
*)
(* end module describe.dbbk *)
{This manual page was created by makman 1.45}
{created by htmlink 1.62}