Interpreting gene expression data

Dirk Husmeier
Biomathematics and Statistics Scotland (BioSS)
JCMB, The King's Buildings, Edinburgh EH9 3JZ, United Kingdom

Lectures and seminar given as part of the GTI module on Postgenomic and Pathway Biology for postgraduate students in biology.


Lectures

Lecture 1: Clustering gene expression data (unsupervised learning)

Keywords: Partitive versus hierarchical clustering, divisive (top-down) versus agglomerative (bottom-up) clustering, K-means and soft/fuzzy K-means, UPGMA and hierarchical average linkage clustering, tree-structured vector quantization, Euclidean versus correlation distance, pitfalls of clustering, estimating the number of clusters with the gap statistic, application to cancer research: molecular subtypes of diffuse large B-cell lymphoma.

Lecture 2: Visualizing gene expression data

Keywords: Principal component analysis (PCA), self-organizing map (SOM), generative topographic map (GTM), application to gene expression profiles from yeast, prediction of novel gene functions.

Lecture 3: Discriminating gene expression data (supervised learning)

Keywords: Linear discriminant analysis versus non-linear methods, neural networks, backpropagation algorithm, training set, test set, overfitting, generalization performance, n-fold crossvalidation, classification and diagnostic prediction of cancer from gene expression profiles with neural networks.

Lecture 4: Modelling gene expression data with Bayesian networks

Summary: This lecture will give an introduction to Bayesian networks and their application to the reverse engineering of genetic networks. I will focus on the basic concepts, strengths and limitations of the methodology. I will also discuss an application to gene expression data from the yeast cell cycle.

Seminar

The objective of the seminar is for you to learn to critically evaluate recent publications in bioinformatics and machine learning related to the topics covered in my lectures. You are not required to understand all the mathematical details of the algorithms - they are for researchers in machine learning who work on improving the presented schemes. You should, however, Here is a list of papers that I would like to discuss: Please read all three papers and try to be able to summarize them in an informal presentation (no need for a glossy Powerpoint presentation!) for about 10 minutes, addressing the points mentioned above. Each presentation will be followed by a discussion for about 20 minutes, which will involve the whole group. The decision about who is to present which paper will be made at random - I will bring some dice ...
Last update: 29 January 2004.
Back to my homepage.