Biomathematics & Statistics Scotland

KNOWLEDGE TRANSFER: User Friendly Software

Algorithms developed as part of our research and consultancy programme can be made available as stand-alone programs or as Web services, enabling them to be widely used.

 

TOPALi v2

Conventional phylogenetic tree estimation methods, applied to DNA multiple alignments, assume that all sites have the same evolutionary history. This assumption is violated if recombination has occurred among any sequences. Recombination produces mosaic sequences, which may cause errors in phylogenetic tree estimation. If recombination is possible, a check for mosaic sequences is essential prior to phylogenetic analysis

TOPALi screen

Our software, TOPALi, provides an interface to three methods of recombination detection (Difference of Sums of Squares, Probabilistic Divergence Measures, and a Hidden Markov Model) that look for changes in phylogenetic tree topologies as we move a window along an alignment. These methods differ in the number of sequences that can be analysed and in their computational speed. TOPALi provides a complete graphical analysis tool for detecting recombinants in DNA multiple alignments. All tasks can be automated, requiring minimal user-intervention

Recent development of TOPALi(in collaboration with the University of Dundee and the European Bioinformatics Institute in Cambridge) has concentrated on increasing its efficiency by programming many of its analyses for parallel computation, and accessing and running them via one or more high-performance computing clusters. As part of this work, new ways of interacting with the statistical programs running remotely have been designed, primarily through the use of web services. The ultimate goal is for the user to benefit from accessing distributed, high-performance facilities (such as the 28 CPU cluster at SCRI) from their normal desktop environment via a seamless high-quality graphical interface

^ Top

TetraploidMap

tetraploid map screen

The TetraploidMap program has been developed for calculating linkage maps and QTL mapping for species with four copies of each chromosome, such as potato. The program separates molecular markers into linkage groups, orders the markers within each linkage group, and does interval mapping to locate QTLs. TetraploidMap is based on methodology developed by BioSS in collaboration with SCRI scientists since 1996, partly funded by a grant from the BBSRC GAIT initiative. It is the only autotetraploid linkage software currently available, and is being used by groups in Europe and North and South America working on potato, leek and alfalfa.

TetraploidMap can be used to locate QTLs in tetraploid crops such as potato.

^ Top

Imagin

Image of parsnip roots after annotation
Image of parsnip roots after annotation.

Measuring plant part characteristics by image analysis has several advantages over traditional manual methods: it is potentially quicker, more accurate and allows objective measurement of features which are more difficult to quantify. A permanent record of the appearance of each sample can also be kept for future reference. BioSS, in collaboration with SASA, has built an easy-to-use software tool, Imagin, to make the image analysis methods routinely available to scientists involved in plant breeding and crop variety registration.

SEGS - segmentation of genomic sequences

Cusum plot

The successful annotation of a genomic sequence (consisting of the four bases A, C, G and T) requires a wide range of techniques and many methods exist to assist in this process. One feature of interest is the G+C content (the proportion of bases which are either C or G) which in many organisms is known to be higher in regions that code for proteins than in the rest of the sequence. We are developing a new program, SEGS, to assist in segmenting sequences of different lengths according to their G+C content. The program incorporates a flexible algorithm, based on the cumulative sums (cusums) of G+C content. SEGS produces graphical output of a sequence with appropriate segment boundaries marked. The statistical significance of observed changes in G+C content is assessed using a Kolmogorov-Smirnov type statistic.

Cusum plot showing segments of differing G+C content, based on moving windows of 5% of the sequence length.

^ Top

STAR: Sheep Tomogram Analysis Routines

sheep scan

Computed tomography (CT) scans are used to aid sheep breeding programmes by identifying the best stock. Based on the scans, the proportions of different tissue types in crosssections of live sheep can be estimated. An interactive, Windows-based package called STAR has been developed to automate this process, incorporating the dynamic programming methods developed in our research programme. The latest techniques for 3-D scans are now being added.

STAR is used at the SAC-BioSS CT unit, which offers a scanning service for sheep breeders.

^ Top