Statistical Bioinformatics

Statistical analysis of molecular sequence alignments using web services and cluster computing

A growing number of biological questions can be tackled by aligning homologous regions of DNA from different organisms or from related genes within the same organism. We have extended our TOPALi Java application to launch several statistical analyses of multiple alignment data from the user's desktop which run as “web services” on remote, powerful computer clusters, with monitoring of the remote job and results displayed locally. Some features of TOPALi v2 are described below.

Recombination breakpoint location estimation

DNA sequences can recombine during evolution. This can result in a recombinant sequence comprising regions, separated by recombination breakpoints, that have different evolutionary histories. Initial testing for breakpoints is crucial as many subsequent analyses assume no recombination. Our methods that use a parametric bootstrapping approach to assess statistical significance make optimal use of cluster computing resources.

Model selection, tree and ancestral sequence estimation

Screenshot of our TOPLi software
Screenshot of our TOPLi software.

Model-based methods to construct phylogenetic trees require parameters in the evolutionary model to be optimised prior to tree estimation. TOPALi v2 has a model selection web service (ModelGenerator software) which ranks substitution models (88 models for proteins or 55 for DNA) according to statistical criteria.

 
Tree estimation web services include implementations of Maximum Likelihood (PhyML software) and Bayesian Inference (MrBayes software) methods. Ancestral sequences are predicted using a FASTML web service.

Positive selection analysis

TOPALi v2 has a “branch model” web service (using PAML software) to test for differences in evolutionary rates among groups of sequences (e.g. after a past gene duplication event) and also a “sites model” web service (also PAML) to test for sites evolving faster than the neutral model which may be functionally important.

Further details from: Frank Wright

Research

Statistical Bioinformatics

Process and Systems Modelling

Statistical Methodology

PhD Opportunities