A Tutorial on Delila Instructions

with examples

by Tom Schneider

Outline

Introduction to Delila Instructions
Making a Delila Library
Title
Specification
Requests
Relative Coordinate Requests
Making Mutations
Controling Lister
Mutation Analysis: example
Comments
Making Delila Instructions for Symmetric Sites
Setting Parameters
Full Definition of Delila
Automatic Generation of Delila Instructions

Introduction to Delila Instructions

Terminiology used is described in a glossary.

The concept of the Delila system is to extract fragments of sequence from a library (database) of sequences before beginning any analysis of the sequences. This has a number of advantages, including automating the analysis process, avoiding editing sequences (which will lead to mistakes!), the ability to permanently record the sequences used in a compact form (instructions) and therefore the ability to repeat an analysis. The extraction is done by a librarian program named Delila. One gives Delila instructions for what fragments to obtain and how to mutate them. The returned result given by the librarian is -- of course! -- a book.

An important feature of Delila is that the coordinate system of each sequence in the book corresponds to that in the parent library. This way you won't go crazy trying to figure out the locations of bases - all output has the same coordinate system. (The exception is if you make mutations, in which case coordinates get renumbered on the 'downstream' side.)

Making a Delila Library

If you already have a Delila Library (i.e. the 6 files lib1, lib2, lib3, cat1, cat2, cat3) then you can skip this section. If not, you need to create one.

The first step is to create a Delila book containing the genomic or artificial sequence you want to manipulate. There are a number of programs you can use to do this:

makebk Make a book from a raw sequence.
dbbk Make a book from a GenBank entry.

Next, you need to create the Delila library. In a Unix system:

cp book l1 # copy your book to the file l1
touch l2 l3 catalp # make empty files
catal # run the catal program

Delila will now run using the 6 library files and an instruction file.

Title

Since Delila produces a book, it is natural that the first instruction in a set of Delila instructions is the title to be given to the book:

title "An example book";

Note that delila will accept both single (') and double (") quotes.

You can have any title you like. I would, however, recommend this format:

title 'Fis sites version = 1.81 of fis.inst 2002 Apr 24';

This includes four important components:

Fis sites: the name of the sites
1.81: the version number which can be used by the ver program. All Delila programs pass the title to the next program (though they may be in comments). So by changing the version every time you change anything in the file, you will always know exactly what is happening.
fis.inst: the file name (note that the type is 'inst'),
2002 Apr 24: the date.

If you use this format, then you can save backup copies in the form fis.inst.1.81 by using the save script.

Specification

Next, the desired source sequence must be specified. Delila was built before GenBank existed and it assumes that the database is organized by organism and chromosome (as opposed to the current mess of entries). So one defines these:

organism H.sapiens;
chromosome H.sapiens;

Next one needs to choose the particular sequence of DNA, called a piece:

piece LINEAR;

where LINEAR would usually be the GenBank ACCESSION number.

Requests

Having specified the sequence we want, we now can make a series of requests to get particular parts of the sequence. Suppose that the wild-type sequence named LINEAR begins with the EcoRI site 5' gaattc 3', with bases numbered 1 to 180. Then to obtain the entire sequence we can say:

get all piece;

The DNA sequence 5' gaattc 3' with the second t marked
at position 5. To get the first 6 bases (containing just the EcoRI site) we say:

get from 1 to 6;

The lister program puts an asterisk ('*') every 5th base, and numbers every 10th base. (This way you won't go crazy counting bases - you never need to count more than 3 positions to identify a base.)

The DNA sequence 5' aattc 3' with the second t marked at
position 4. To get the second to sixth bases one can say:

get from 2 to 6;

which gives 5' aattc 3'.

The DNA sequence 5' gaattc 3' with the first a marked at
position 2. One can also get the complement:

get from 6 to 2 direction -;

which also gives 5' gaatt 3'. Note that the asterisk in the figure is still over base 5. Delila retains the original coordinate system, which means that you can compare output from different extractions and the coordinates of the bases remain the same.

Here's a puzzler: The DNA sequence 5' aatt 3' with the second t marked at
position 5.

get from 2 to 5 direction +;

The DNA sequence 5' aatt 3' with the first a marked.

get from 5 to 2 direction -;

Why are these the same?

The DNA sequence 5' aaagtcaactaactgaattc 3' with every
5th base marked with an asterisk starting at 20 and
decreasing in numbering. Positions 20 and 10 have numbers. An example longer sequence is:

get from 20 to 1 direction -;

giving 5' aaagtcaactaactgaattc 3', which shows how the coordinate system decreases. (Note the EcoRI site at the 3' end.)

Having obtained the sequence(s) we want, Delila's job is over. Other programs are used to display and analyze the sequence. For these examples I used the Lister program for the figures. Lister gives the sequence, carefully labeled with 5' and 3' on the ends. Every 5th base is marked by an asterisk, and every 10th base is numbered. This way you will never need to count more than 3 bases to determine the coordinate of any base.

Relative Coordinate Requests

The DNA sequence 5' gaatt 3' with the second t marked at
position 5. A powerful way to get sequences is relative to a particular point:

get from 3 -2 to 3 +2;

which gets 2 bases before coordinate 3 to 2 bases after coordinate 3, that is from base 1 to base 5: 5' gaatt 3'. Generally one does not want to repeat the second coordinate, so one can use the command: The DNA sequence 5' gaatt 3' with the second t marked at
position 5.

The DNA sequence 5' gaatt 3' with the second t marked at
position 5.

get from 3 -2 to same +2;

where 'same' refers to the coordinate given after the word 'from'. This is the most convenient form for specifying binding site locations. For more examples, see: Making Delila Instructions for Symmetric Sites.

Making Mutations

There are three ways to make changes.

The DNA sequence 5' gaattc 3' changed to 5' taattc 3'. 1. A CHANGE requires the previous base, the coordinate to change and then the new base:

get from 1 to 6 with g1t;

gives taattc. The base that changes from a G at 1 to an T is marked by the tail and head of an arrow. The figure is produced by first running Delila to extract the sequence(s) and to produce the marking information. This information is then used by Lister to create the postscript.

How do I write my instructions if I want the complementary sequence?
Glad you asked. Coordinates of changes are always given on the original wild-type coordinate system. The rule is:

The coordinates given in the mutation and the sequences given refer always to the sequence written 5' to 3' in the *positive* coordinate direction.

The reason for doing things this way is that you would go absolutely crazy if you had to change the definition of the mutation merely if you wanted the complementary sequence!

The DNA sequence 5' gaattc 3' changed to 5' gaatta 3' For example, starting again from 5' gaattc 3':

get from 6 to 1 with g1t;

Delila makes the mutation and then complements the sequence to give 5' gaatta 3'. Note that the first sequence in the illustration is already complemented. You can see this because the asterisk ('*') marks the 5th base.

The DNA sequence 5' gaattc 3' changed to 5' gaccttc 3' 2. An INSERTION uses two coordinates and a sequence. The sequence BETWEEN the coordinates is removed and the given sequence is inserted.

get from 1 to 6 with i2,3cc;

gives gaccattc.

The DNA sequence 5' gaattc 3' changed to 5' gccttc 3' Changing that to:

get from 1 to 6 with i1,4cc;

does a replacement to give gccttc.

The DNA sequence 5' gaattc 3' changed to 5' gttc 3' Finally,

get from 1 to 6 with i1,4;

deletes to give gttc.

Note that any change can be made with this definition; the other methods are available for convenience.

The DNA sequence 5' gaattc 3' changed to 5' gc 3' 3. A DELETION takes two coordinates. The sequence INCLUDING the coordinates is removed.

get from 1 to 6 with d2,5;

gives gc. Coordinates outside the end of the piece are allowed.

The DNA sequence 5' gaattc 3' changed to 5' tccttc 3' in
two steps. Combined changes are possible. Separate the changes with periods:

get from 1 to 6 with g1t.i1,4cc;

gives tccttc.

Mutation Analysis: example

title "ABCR mutation";
organism H.sapiens;
chromosome H.sapiens;
set doubling on;
piece Y15651;
name "mutation at exon 17 acceptor";
get from 63 -25 to same +7 with g64a;

Two new commands are introduced here:

set doubling on;

which tells Delila to give both the original sequence and the sequence with the mutation and

name "mutation at exon 17 acceptor";

which tells Delila to name the new sequence. The result, when displayed by the lister program, is:

Sequence from GenBank Locus Y15651, the ABCR gene and a
mutated sequence. The top one has the tail of an arrow
pointing at position 64, a g which is the middle base of
the codon gga, coding for G (glycine). The bottom sequence
shows this base changed to an a, gaa now coding for E
(glutemate). Below the top sequence are two sequence
walkers for human splice acceptor sites of 11.6 bits
(exactly at the end of the exon) and 3.9 bits (3 bases to
the left end of the exon). After the mutation the first
walker becomes 10.7 bits and the second one becomes 5.6
bits.

Note how the mutation affects both walkers simultaneously. (See ABCR Mutation G863A for more information about this curious mutation.)

Controling Lister

In all of the examples above, the book was given to the lister program, which generated PostScript output. Lister has a special mode for displaying sequences along with their mutations: the 'pagetrigger' parameter is set to 'd'. To use this feature, create a mutation instruction using 'with' and be sure to 'set doubling on' before that point in the instructions. In the book Delila will put the original sequence along with the mutation sequence. Delila will also create a 'marksdelila' file which contains information about how to mark the mutation. Append the 'marksdelila' file to the end of the arrow definition file (marks.arrow) and run lister:

delila
cat marks.arrow marksdelila > marks
lister

The resulting 'map' file is in PostScript and can be sent to a printer, displayed on your screen or converted to PDF.

Comments

The Delila language provides two ways to create comments in the instruction files. Both are 'Pascal-like' since the same form is used in the computer language Pascal:

(* Two character comments *)

and

{ One character comments }

Material inside comments is ignored by Delila. Comments of one type can be nested inside the other type. I commonly make my comments using (* and *) and then use the braces { and } to block off instructions I don't want temporarily.

I strongly recommend putting in the date and the file name in the title, and at least a short description of what the instruction set is about in a comment. It is also useful to add citations for evidence that the sequence is a binding site, and to mention the kind of data that supports this (e.g. footprinting, gel shift assay, mutations).

Making Delila Instructions for Symmetric Sites

Binding sites can have three kinds of symmetry, as discussed in the glossary entry on binding site symmetry. The corresponding Delila instructions are of increasing difficulty:

asymmetric. There is one instruction per site:
```
get from 2718 -20 to same +30 direction +;
get from 3141 -20 to same +30 direction +;
get from 6931 +20 to same -30 direction -;
```
Note how the last one would be in the opposite orientation relative to the first two.

odd symmetric. There are two instructions per site. The second instruction switches the direction but keeps the 'from' coordinate the same:

get from 2718 -20 to same +20 direction +;
get from 2718 +20 to same -20 direction -;

get from 3141 -20 to same +20 direction +;
get from 3141 +20 to same -20 direction -;

get from 6931 -20 to same +20 direction +;
get from 6931 +20 to same -20 direction -;

Note how the pattern of the ranges switches between positive and negative.

even symmetric. There are two instructions per site. The second instruction switches the direction AND CHANGES the 'from' coordinate by one base. This follows our convention that the center of symmetry is between bases 0 and 1:
```
get from 2718 -20 to same +20 direction +;
get from 2719 +20 to same -20 direction -;

get from 3141 -20 to same +20 direction +;
get from 3142 +20 to same -20 direction -;

get from 6931 -20 to same +20 direction +;
get from 6932 +20 to same -20 direction -;
```
Note 1: how the pattern of the ranges switches between positive and negative.
Note 2: These instructions would make the last base 'ragged' - missing half of the sequences. I'll leave it as an exercise to the reader to create the example which would have smooth edges.

Note: the ranges given above are only examples. We generally take a very large range such as -200 to +200 for our initial analysis to get a feeling for the background noise of the information curve.

Setting Parameters

Delila has a number of parameters that have preset values which you can change. You can use the word 'default' or 'set' to change them.

Numbering Pieces The default is to number pieces in the book and to start numbering at 1:

set numbering piece; (* number the pieces *)
set numbering 1;     (* start numbering at 1 *)
set numbering off;   (* turn off or on numbering *)
set numbering all;   (* number all book parts *)

Range What should Delila do if one requests sequence beyond the end of the piece?

set out-of-range reduce-range; (* reduce to the nearest end *)
set out-of-range halt;         (* stop *)
set out-of-range continue;     (* keep on going *)

Coordinate Set the coordinate system zero.

set coordinate zero; (* the 'from' base becomes coordinate zero
                            in the resulting book *)

For example, the instruction 'get from 20 -10 to same +10;' results in a new coordinate system that runs from -10 to +10.

set coordinate 5; (* the 'from' base becomes coordinate 5
                         in the resulting book *)
set coordinate normal; (* return to using the original coordinates *)

Doubling When creating a mutation one often wants the wildtype sequence followed by the mutation for comparison. This parameter is used in conjuction with the 'with' instruction. When it it turned on, first the wild type sequence is given in the book and then the mutated sequence. The lister program takes advantage of this by its 'pagetrigger' doubling parameter for pieces, in which case pairs of sequences are displayed on each page, with wild type on top and the mutation underneath.
```
set doubling on; (* double pieces *)
```
An example is Medical Applications of Sequence Walkers: ABCR Mutation G863A.
Arrow Length The length of the arrows that point at mutations can be controlled with this parameter.
```
set arrowlength 1.5; (* Default arrow length is just a triangle *)
```
Examples are shown above.

Full Definition of Delila

If you would like to know more about the Delila language, then you can look at the LIBrary DEFinition, LIBDEF.

Automatic Generation of Delila Instructions

The delila system has a number of ways to automatically generate delila instructions:

dbinst takes a GenBank flat file and creates instructions for features. It is similar to exon.
exon takes a GenBank flat file and creates instructions for exons. It is similar to dbinst.
search searches a Delila book for patterns and can create instructions (searchinst) to extract the regions around the locations that are found.
scan is part of the individual information theory package. Scan will search a Delila book using an individual information weight matrix and produce Delila instructions (scaninst) for the (putatitive) sites that are found.
malin takes as input the set of multiple alignment produced by malign and extracts one of the alignments in the form of delila instructions. To do this you have to start with a set of delila instructions, create a book, use malign to align the sequences in the book (to maximize the information content) and then you can use malin.
instshift shifts Delila instructions and changes the ranges. For example, if you have made a sequence logo and have defined the zero coordinate in a location you don't like, instshift can be used to alter the delila instructions to put the zero coordinate in a better place, depending on the binding site symmetry.

Original References:

Schneider TD, Stormo GD, Haemer JS, Gold L. A design for computer nucleic-acid-sequence storage, retrieval, and manipulation.
Nucleic Acids Res. 1982 May 11;10(9):3013-24.
PMID: 7099972 [PubMed - indexed for MEDLINE]
This paper describes the basic Delila system.
Stormo GD, Schneider TD, Gold LM. Characterization of translational initiation sites in E. coli.
Nucleic Acids Res. 1982 May 11;10(9):2971-96.
PMID: 7048258 [PubMed - indexed for MEDLINE]
This paper shows a use of the Delila system.
Stormo GD, Schneider TD, Gold L, Ehrenfeucht A. Use of the 'Perceptron' algorithm to distinguish translational initiation sites in E. coli.
Nucleic Acids Res. 1982 May 11;10(9):2997-3011.
PMID: 7048259 [PubMed - indexed for MEDLINE]
This paper shows a use of the Delila system.
Schneider TD, Stormo GD, Yarus MA, Gold L. Delila system tools.
Nucleic Acids Res. 1984 Jan 11;12(1 Pt 1):129-40.
PMID: 6694897 [PubMed - indexed for MEDLINE]
This paper describes additional tools of the Delila system.

color bar Small icon for Theory of Molecular Machines: physics,
chemistry, biology, molecular biology, evolutionary theory,
genetic engineering, sequence logos, information theory,
electrical engineering, thermodynamics, statistical
mechanics, hypersphere packing, gumball machines, Maxwell's
Daemon, limits of computers

Schneider Lab
origin: 1999 May 2
updated: version = 2.05 of delilainstructions.html 2009 Jan 27
color bar