Statistical Bioinformatics

Systems biology

Understanding the mechanisms of gene transcriptional regulation through analysis of high-throughput postgenomic data is one of the central problems of computational systems biology. Various approaches have been proposed, but most of them fail to address at least one of the following objectives: (1) allow for the fact that transcription factors (TFs) are potentially subject to post-transcriptional regulation; (2) allow for the fact that transcription factors co-operate as a functional complex in regulating gene expression, and (3) provide a model and a learning algorithm with manageable computational complexity. The objective of our work has been to propose and test a method that addresses all three issues. The model we employ is a mixture of factor analyzers (FA), in which the latent variables correspond to different transcription factors, grouped into complexes or modules. We pursue inference in a Bayesian framework, using the Variational Bayesian Expectation Maximization (VBEM) algorithm. We have evaluated the performance of the proposed method on three criteria: activity profile reconstruction, gene clustering, and network inference. The objective of the first criterion is to assess whether the activity profiles of the transcriptional regulatory modules can be reconstructed from gene expression data. The second criterion tests whether the method can discover biologically meaningful groupings of genes, indicated by significant enrichment for known gene ontologies. The third criterion addresses the question of whether the proposed scheme can make a useful contribution to computational systems biology, in which one is interested in the reconstruction of gene regulatory networks from diverse sources of postgenomic data. On all three criteria, our model has performed better than established state-of-the art methods.

A schematic transcriptional regulatory network A schematic transcriptional regulatory network. The left panel illustrates the concept of a transcriptional regulatory network in the form of a bipartite graph, in which a small number of transcription factors (TFs), represented by circles, regulate a large number of genes (represented by squares) by binding to their promoter regions. The right panel shows a more accurate representation of transcriptional regulation that allows for the cooperation of several TFs forming functional complexes: this complex formation is particularly common in higher eukaryotes.

ROC (receiver
operating characteristic) curves TF regulatory network reconstruction for yeast. The figure shows ROC (receiver operating characteristic) curves obtained from microarray gene expression profiles and immunoprecipitation data for S. cerevisiae (baker’s yeast) with three different methods: [red] our Bayesian mixture of FAs model; [orange] Bayesian FA; [green] maximum likelihood FA. Larger areas under the curve indicate a better network reconstruction accuracy.

Further details from: Dirk Husmeier

Research

Statistical Bioinformatics

Process and Systems Modelling

Statistical Methodology

PhD Opportunities