SERAD is a free software package for the detection of recombination in multiple DNA sequence alignments with hidden Markov models. The algorithm is described in the following paper:
Note: Click here for a more recent software implementation of a related algorithm, which has recently been incorporated in the more user-friendly softeware package TOPALi.
Legal Agreement
The SERAD software package is provided without guarantee
of maintenance or support, and without warranty.
The copyright holder is not
liable for any damages which may result in
any manner from the use of this software.
Permission is granted to copy and use this software package
for personal academic use.
Downloading and Installing the Software
The programs and documentation files are contained
in a zipped tar file, called SERAD.tar.gz.
To extract them under UNIX,
give the following commands:
% gunzip SERAD.tar.gz
% tar xvf SERAD.tar
If you use WINDOWS,
use a program like WINZIP. Decide on a name for the
directory in which you want to keep the new files,
and add this name to your MATLAB path. For example,
if you use UNIX, put the following
command into your .login file:
setenv MATLABPATH new_path
where new_path
is the name of the directory into which you have copied
the program files.
Running the Programs
To run the software, carry out the following two steps:
Data
The following data files are included in the package:
| DataNeisseria.dat | Ascii file with a 787 base-pair alignment of four strains of Neisseria |
| make_data.m | MATLAB program for generating the synthetic data. This gives the three files listed below. |
| Data.mat | Synthetic DNA sequence alignments |
| Data_true_states.mat | True states of the synthetic DNA sequence alignments |
| Data_true_w.mat | True branch lengths for the synthetic DNA sequence alignments |
Your own data must be in file mydata.dat.
Overview of the Software Package and the Program Files
The programs are written in MATLAB 5, making use of the
object-oriented elements of the language.
Functions belonging to a given class are kept
in a directory @Object_Name .
There are currently three such classes.
@Tree4 is the class of 4-species trees.
@Phylo is the class of emission probabilities
of the HMM, which is the set of all possible
@Tree4-classes (note that for 4-species trees there are
3 different topologies). @EmissionModel is an
alternative class of emission probabilities, simulating
a fair and a loaded die. To find out about the methods
of each class, give the command
methods Object_Name. The backbone of the HMM
has (unfortunately!) not been programmed as an object.
Programs acting on the HMM backbone, like those
carrying out the forward-backward algorithm and the
Viterbi path, make extensive use of
global variables
(follow this link to find a complete list of all the global
variables used). It would have been better to program the
HMM backbone as an object, as
this would make the code more
modular and would make it easier to add new
algorithms or
to combine algorithms in novel ways.
For this reason you will find some functions that require
a global variable to be passed to it
in the argument list. This is a first step towards
rewriting part of the code, should I ever have the time to
do it ...
For finding out more about the functions,
make use of the help
command.
Future Work
The main shortcoming of the program package in its current
version is that the
branch-length optimization is slow, much slower than,
for instance, with DNAML of the
PHYLIP
package.
This has two reasons. First, I wrote the programs as a
model exploration tool and was mainly interested in a
clear and transparent program code.
I have not made any attempts to optimize the programs
with respect to execution time, and I assume a lot
could be gained with a better vectorization of the loops.
Second, the iterative optimization of the branch lengths
follows a simple
gradient ascent scheme.
I did not find any significant improvements by
using second-order methods .
However, I think that a considerable speed-up can be
achieved by incorporating DNAML of the
PHYLIP
package in my programs.
The only modification required would be a small
extension of DNAML so as to include a weighting
scheme for the different sites in the alignment.
I would be grateful for any comments if you
know of a way how this can easily be accomplished.
Back to my homepage