By downloading this code you agree to the
Source Code Use License (PDF). |
{ version = 1.36; (* of orf.p 2022 Dec 16}
(* begin module describe.orf *)
(*
name
orf: find ORFs for ribosome binding sites
synopsis
orf(genomebook:in, scanfeatures: in, orfp: in, orffeatures: out,
output: out)
files
genomebook: a book from the Delila system containing a genome.
If a genome is not available, a regular Delila book works ok.
scanfeatures: output of the multiscan program containing
locations of Initiation Region (ir) features.
orfp: parameters to control the program. The file must contain the
following parameters, one per line:
parameterversion: The version number of the program. This allows the
user to be warned if an old parameter file is used.
shortestorf: the shortest orf to report
longestorf: the longest orf to report
orffeatures: ORF features reported as features for the lister program
This file REPLACEs the scanfeatures in the features file.
output: messages to the user
description
The orf program reads a book containing a complete genome and,
given the start points of translation (initiation regions, ir, the
AUG, GUG or UUG starts) it finds the corresponding stop codons.
These are called 'orf' features. They reside at the end of the
open reading frame and report the length of the frame in codons.
The length includes the initiation and termination codons.
Be sure to turn on 'predict peptides' in listerp!
The procedure used by the program is:
* Read in a whole genome book (or Delila book).
* Read in a scanfeatures.
* Look through the scanfeatures for ir features. Determine the orf
for each ir feature in the given orientation at that position in
the genome.
Note that the orffeatures REPLACE the scanfeatures in a features
file and are given to the lister.p program for display. This is
because the scanfeatures are read in and then modified: the total
lines corresponding to the ir get orf data added to them and an orf
definition and features are inserted into the list.
The orffeatures have special parameters for total features:
The Aparam of the total feature is the information (in bits).
The Bparam of the total feature is the orf length (in codons,
including the stop).
The Cparam of the total feature is the last base of the orf.
The Dparam: see below.
The parameters shortestorf and longestorf only affect one thing,
the Dparam of the total feature of the ribosome binding site. If
the number of codons (including initiation codon and stop codon) is
greater than or equal to shortestorf and less than or equal to
longestorf then the total Dparam is set to '1', otherwise it is 0
(zero). This allows selection of the total features to generate
tables.
examples
Example parameters to use in listerp:
39 basesperline: number of bases per line in the listing
1 aastate: 0=no aa; 1=predict peptides; 2=translate all frames
7 frameallowed: binary; highest bit is highest frame on, etc.
1 codelength: 1 or 3 letters per amino acid
basesperline: must be a multiple of 3 for peptides
aastate: 1=predict peptides
frameallowed: 7 for all frames
codelength: 1 or 3 letters per amino acid could do either.
documentation
We used the orf program to identify small proteins:
@article{Hemm.Rudd2008,
author = "M. R. Hemm
and B. J. Paul
and T. D. Schneider
and G. Storz
and K. E. Rudd",
title = "{Small membrane proteins found by comparative genomics and
ribosome binding site models}",
journal = "Mol. Microbiol.",
volume = "70",
pages = "1487--1501",
pmid = "19121005",
pmcid = "PMC2614699",
note = "\htmladdnormallink
{https://doi.org/10.1111/j.1365-2958.2008.06495.x}
{https://doi.org/10.1111/j.1365-2958.2008.06495.x}, \htmladdnormallink
{https://alum.mit.edu/www/toms/papers/smallproteins/}
{https://alum.mit.edu/www/toms/papers/smallproteins/}",
year = "2008"}
see also
program for display the results: lister.p
author
Thomas Dana Schneider
bugs
Orf is designed for a single bacterial genome at the moment, since
it only handles one piece at a time.
Can we use the ir color for the stop codon color bar? No! The
color is available yet because mkpetal happens LATER - there is no
way to get that color now! This function can be done only in
lister when there are colors available after mkpetals has been run.
Method for lister: search the features for ir. If the ir is
followed by an orf, assign the color of the ir to the orf.
When two ribosome binding sites are in frame, two orf stops are
generated that are in the same place. They only differ by the
other string and so in the current lister they are considered to be
duplicates. Two solutions are (1) make lister check the 'other'
string for identity and (2) do an 'other' string check only when
the namestring is 'orf'.
technical notes
*)
(* end module describe.orf *)
{This manual page was created by makman 1.45}
{created by htmlink 1.62}