SIRENS

SImulating REcombination in Nucleotide Sequences

© Copyright 2002, Dirk Husmeier, Biomathematics and Statistics Scotland (BioSS)


Preliminaries

SIRENS is a free program written in MATLAB for simulating recombination in DNA sequence alignments of four sequences. Recall that for four taxa, you have three different tree topologies:

Tree topologies

State_1 is the "true" topology, which applies to those parts of the alignment that are not subject to recombination. The four sequences are evolved along the interior and then the exterior branches using the Kimura model of nucleotide substitution until their length is RLa times the final exterior branch length. (Here: RLa=0.25).

At this point the subsequence between n1 and n1+nlengthA-1 in Strain_3 is replaced by the corresponding subsequence in Strain_1.

The sequences then continue to evolve along the exterior branches until their length is RLb (> RLa) times the final exterior branch length. (Here: RLb=0.75).

This is followed by a second recombination event, where the subsequence between n2 and n2+nlengthB-1 in Strain_2 replaces the corresponding subsequence in Strain_3.

The sequences then continue to evolve along the exterior branches for the remaining length.

In the main part of the alignment, Strain_3 is most closely related to Strain_4. However, in the region between n1 and n1+nlengthB-1 (here: 201-400) it is most closely related to Strain_1, and in the region between n2 and n2+nlengthB-1 (here: 601-800) it is most closely related to Strain_2

Thus, the first, more ancient, recombination event corresponds to a transition from State_1 into State_2. The second, more recent, recombination event correponds to a transition from State_1 into State_3. This simulates a realistic scenario where an ancestor of Strain_3 incorporates genetic material from ancestors of other extant strains, which in each case is followed by subsequent evolution.


Download the software

To download the software, click here. This gives you a gzipped TAR file called Sirens.tar.gz. To extract the files under UNIX, proceed as follows:

gunzip Sirens.tar.gz
tar xvf Sirens.tar

You can now delete the TAR file:

rm Sirens.tar


Program files

The main function is Sirens.m, which calls the following functions:

Information about each function is obtained by typing help plus the function name (without the extension ".m") at the MATLAB prompt.


Run the program

Here are some applications for how you can run the program. For further details, type

help Sirens

at the Matlab prompt.

Sirens

This runs the program with the default options.

Sirens(17)

This runs the program with the default options, but a random number generator seed set by the user.

Sirens(17,0.05)

Here, the user has not only set the random number generator seed (17), but also the unit branch length (0.05) of the phylogenetic tree.


Output

Use program job_write_out_ascii_data.m to write the DNA sequence alignment out as an ASCII file in FastA and RecPars format. The respective file names are dna.fastA and dna.recPars. The FastA file can easily be transformed into any other format with the program readseq. For example, to transform the file into interleaved PHYLIP format, give the command (in UNIX):

readseq -f=Phylip dna.fastA -out=dna.phy -a

And to transform the file into sequential Phylip format (needed for BARCE), type (in UNIX):

readseq -f=Phylip3.2 dna.fastA -out=stercus -a
sed 's/ YF//g' stercus > dna.phy

In both cases, the name of your new file is dna.phy.


Back to my homepage
Last modified: February 2002