We have continued the development of the TOPALi software package for the statistical analysis of DNA and protein multiple sequence alignment data, with particular emphasis on improved model selection and phylogenetic tree estimation applied to protein-coding DNA. Modern statistical methods for estimating phylogenetic trees from molecular sequence data perform better when the underlying model of evolution is optimal. We have implemented an improved model selection protocol in our TOPALi software that greatly simplifies the procedure for biologists, especially for the analysis of protein-coding DNA. Our software automatically fits eighty-eight evolutionary models and displays the results graphically with a suggested choice based on three statistical criteria. Unlike existing phylogenetic model selection approaches, we jointly estimate a tree for each model, resulting in improved estimates of the likelihood and derived quantities, for example the Akaike or Bayesian information criterion (AIC, BIC).
The graphical display
shows the magnitude
of the parameter
values and also
the estimated tree,
allowing the user to
see the influence of
model choice on the
tree topology. The
user can accept the
choice or choose an
alternative before
proceeding to full
tree estimation
using Bayesian or
Maximum Likelihood
approaches.
Further details from: Frank Wright