Tutorial A: Introduction to phylogenetics

The first part of the tutorial discusses the reconstruction of the evolutionary history of a group of species, depicted in a so-called phylogenetic tree, from a DNA sequence alignment. Besides being of fundamental importance in itself - aiming to estimate, for instance, the ancestry of the human race or to infer the whole tree of life - this methodology has recently become of practical relevance in epidemiology and forensic science. The tutorial will start with a brief discussion of the shortcomings of the 'classical' clustering methods, which will then be contrasted with the newer probabilistic approach. Based on a concrete model of the evolutionary process in terms of a homogeneous Markov chain, a phylogenetic tree can be interpreted as a probabilistic generative model that allows the calculation of the likelihood of the observed DNA sequence alignment. The practical computation draws on well-established algorithms for directed acyclic graphs, which pass 'messages' from the external nodes along the branches and inner nodes down to the root. This, in principle, enables the optimization of both the parameters and the model, that is, the branch lengths and the tree topology, in a maximum likelihood sense. The tutorial discusses the question of statistical significance of the results, and contrasts the two predominant methods of significance estimation: bootstrapping versus the Bayesian approach with Markov chain Monte Carlo.

The methods described will be used in the second part of the tutorial.


Tutorial B: Detecting recombination in DNA sequence alignments

The recent advent of multiple-resistant pathogens has led to an increased interest in recombination as an important, and previously underestimated, source of genetic diversification in bacteria and viruses. In the second part of the tutorial, I will describe a statistical method for detecting recombination in multiple DNA sequence alignments. This approach is based on the combination of two probabilistic graphical models: (1) a taxon graph (phylogenetic tree) representing the relationship between the taxa, and (2) a site graph (hidden Markov model) representing interactions between different sites in the DNA sequence alignment. I will compare three different parameter estimation techniques, and will discuss the results obtained on various synthetic and real-world DNA sequence alignments.

Lecture: Interpreting microarray data and modelling genetic regulatory interactions with Bayesian networks

Molecular pathways consisting of interacting proteins underlie the major functions of living cells. A central goal of molecular biology is therefore to understand the regulatory mechanism that governs protein synthesis and activity.

While traditional methods in molecular biology could only report the expression levels of single genes, microarrays measure the abundance of thousands of mRNA targets simultaneously. This provides new rich data for understanding gene expression and regulation.

In my talk I will start with a concise yet self-contained introduction to probabilistic modelling with Bayesian networks. I will then show how these models can be applied to the analysis of microarray experiments to infer gene regulatory interactions: Groups of genes, which by correlation analysis alone are simply clustered together, can be organized in clear functional subnetworks. These subnetworks provide a much richer context for regulatory and functional analysis and assist us in understanding the roles of genes and in assigning them putative novel functions.


Last update: 24 July 2002.