Dirk Husmeier - Research

small picture of Dirk subject logo

 

Systems Biology

(Biomathematics and Statistics Scotland, since 2002)

Inferring genetic networks from microarray gene expression data.

Bioinformatics

(Biomathematics and Statistics Scotland, since 1999)

I have recently moved into bioinformatics, where my main interests are the development of statistical methods for analysing DNA sequences and the application of machine learning techniques in phylogenetics. The objective of phylogenetics is the reconstruction of the evolutionary history of species, expressed in a so-called phylogenetic tree, from a DNA sequence alignment. Besides being of fundamental importance in itself - aiming to estimate, for instance, the ancestry of the human race or to derive the whole tree of life - this methodology has recently become of immense practical relevance in epidemiology (suggesting, e.g., cross-infection between humans and apes in the emergence of AIDS) and forensic science (e.g., proving that a dentist in Florida infected several of his patients with HIV). Evolution is driven by stochastic forces that act on genomes, and phylogenetics essentially tries to discern significant similarities between diverged sequences amidst a chaos of random mutation, natural selection, and genetic drift. Faced with a poor signal-to-noise ratio, the most powerful methods make use of probability theory. I am currently working on a project to detect sporadic recombination in multiple DNA sequence alignments. Conventional phylogenetic tree estimation methods assume that all sites in a DNA multiple alignment have the same evolutionary history. This is a reasonable approach when applied to DNA sequences obtained from most species. However, this assumption is violated in certain bacteria and viruses due to sporadic recombination, which is a process that leads to the transfer of DNA subsequences between different strains. The resulting mixing of the genetic material by the formation of so-called mosaic sequences establishes an important source of genetic variation and constitutes a mechanism through which many disease-causing bacteria may acquire resistance to antibiotics. While the detection of recombination is known to be important in its own right for many medical applications (HIV-1, for instance, shows a high recombination frequency, and the existence of mosaic sequences needs to be considered during the design of a potential vaccine. For further information, click here), it is also a crucial prerequisite for consistently inferring the evolutionary history of a set of DNA sequences. Click here to find out more about previous work done on this project.

^ Top

Bayesian Machine Learning and Medical Applications

(Imperial College London, 1997-1999)

Recently there has been much interest in the use of Bayesian methods in problems of machine learning and inference, particularly in combination with powerful nonlinear function approximators such as neural networks. Since nonlinear models induce complicated probability densities, approximations become necessary. It is not clear how much of the elegance of the Bayesian framework is lost in the presence of these approximations. In a recent empirical study I carried out an extensive evaluation on a set of various benchmark classification problems, where the objective was to study the sensitivity of the Bayesian scheme to changes in the prior distribution of the parameters and hyperparameters, and to evaluate the efficiency of the so-called automatic relevance determination (ARD) method. On the practical side, I am applying Bayesian neural networks to predict the development and progression of Kaposi's Sarcoma (KS). This is a joint project between the Department of Electrical and Electronic Engineering , Imperial College and the Department of Genito-urinary Medicine, St. Mary's Hospital. Kaposi's sarcoma (KS) is a vascular tumour, which is more common and often aggressive in patients with underlying immunosuppression (post-transplant KS and AIDS-associated KS). The aim is to determine factors that influence the variable progression rate of KS in HIV infected individuals by multi-variable analysis in order to define clinical end-points and provide guidelines for better patient management. To this end I apply the automatic relevance determination (ARD) method for Bayesian neural networks as well as the determination of vertices on receiver operational characteristic (ROC) curves.

^ Top

Neural Computation

(King's College London, 1994-1997)

My research during my PhD studies at King's College London focused on time series prediction and the estimation of conditional probability densities with neural network. An overview of this work can be found in the synopsis of my recently published book:
Conventional applications of neural networks usually predict a single value as a function of given inputs. In forecasting, for example, a standard objective is to predict the future value of some entity of interest on the basis of a time series of past measurements or observations. Typical training schemes aim to minimise the sum of squared deviations between predicted and actual values (the `targets'), by which, ideally, the network learns the conditional mean of the target given the input. If the underlying conditional distribution is Gaussian or at least unimodal , this may be a satisfactory approach. However, for a multimodal distribution, the conditional mean does not capture the relevant features of the system, and the prediction performance will, in general, be very poor. This calls for a more powerful and sophisticated model, which can learn the whole conditional probability distribution. Chapter~1 demonstrates that even for a deterministic system and `benign' Gaussian observational noise, the conditional distribution of a future observation, conditional on a set of past observations, can become strongly skewed and multimodal. In Chapter~2, a general neural network structure for modelling conditional probability densities is derived, and it is shown that a universal approximator for this extended task requires at least two hidden layers. A training scheme is developed from a maximum likelihood approach in Chapter~3, and the performance of this method is demonstrated on three stochastic time series in Chapters~4 and 5. Several extensions of this basic paradigm are studied in the following chapters, aiming at both an increased training speed and a better generalisation performance. Chapter~7 shows that a straightforward application of the Expectation Maximisation (EM) algorithm does not lead to any improvement in the training scheme, but that in combination with the random vector functional link (RVFL) net approach, reviewed in Chapter~6, the training process can be accelerated by about two orders of magnitude. An empirical corroboration for this `speed-up' can be found in Chapter~8. Chapter~9 discusses a simple Bayesian approach to network training, where a conjugate prior distribution on the network parameters naturally results in a penalty term for regularisation . However, the hyperparameters still need to be set by intuition or cross-validation, so a consequent extension is presented in Chapters~10 and 11, where the Bayesian evidence scheme, introduced to the neural network community by MacKay for regularisation and model selection in the simple case of Gaussian homoscedastic noise, is generalised to arbitrary conditional probability densities. The Hessian matrix of the error function is calculated with an extended version of the EM algorithm. The resulting update equations for the hyperparameters and the expression for the model evidence are found to reduce to MacKay's results in the above limit of Gaussian noise and thus provide a consequent generalisation of these earlier results. An empirical test of the evidence-based regularisation scheme, presented in Chapter~12, confirms that the problem of overfitting can be considerably reduced, and that the training process is stabilised with respect to changes in the length of training time. A further improvement of the generalisation performance can be achieved by employing network committees , for which two weighting schemes -- based on either the evidence or the cross-validation performance -- are derived in Chapter~13. Chapters~14 and 16 report the results of extensive simulations on a synthetic and a real-world problem, where the intriguing observation is made that in network committees, overfitting of the individual models can be useful and may lead to better prediction results than obtained with an ensemble of properly regularised networks. An explanation for this curiosity can be given in terms of a modified bias-variance dilemma , as expounded in Chapter~13. The subject of Chapter~15 is the problem of feature selection and the identification of irrelevant inputs. To this end, the automatic relevance determination (ARD) scheme of MacKay and Neal is adapted to learning in committees of probability-predicting RVFL networks. This method is applied in Chapter~16 to a real-world benchmark problem, where the objective is the prediction of housing prices in the Boston metropolitan area on the basis of various socio-economic explanatory variables. The book concludes in Chapter~17 with a brief summary.

^ Top

Theoretical Biophysics

(RUB Bochum, 1989-1991)

During my `Diplomarbeit' at the Department of Biophysics in the University of Bochum (RUB) I was working on molecular dynamics in proteins. The objective was to numerically solve the Hamiltonian equations of motion of a complex biological system (hemoglobin and solvent) and to compute the trajectories in the high-dimensional phase space of all atomic coordinates and momenta. This allowed the simulation of the dynamic and kinetic processes in hemoglobin and the analysis of their structural-functional relationship. Of particular interest was an intramolecular reaction, where a covalent bond is formed between the N-epsilon of HisE7 and the Heme-Fe. This blocks the active site of the protein and disables its physiological function as a transport molecular for ligands (like oxygen). By coupling the system to a heat bath and applying the method of thermodynamic integration I computed the thermodynamic entities (enthalpy and entropy) of this reaction, which were found to be in reasonable agreement with earlier temperature-jump and EPR experiments. The results of this study contributed to the attainment of a deeper understanding of the role of the entropy in the physiological function of proteins (entropy-enthalpy compensation).

Last updated: July 2002

^ Top

Staff & Students

Staff

Research Students

BioSS Associates

Dirk's Pages

» My Research Interests
» My Publications
» Talks & Lectures
» Software
» Essays
» My CV
» Students and Postdocs
» Scientific steering committees