Interpreting gene expression data
Dirk Husmeier
Biomathematics and Statistics Scotland (BioSS)
JCMB, The King's Buildings, Edinburgh EH9 3JZ,
United Kingdom
Lectures and seminar given as part of
the GTI
module on Postgenomic and Pathway Biology
for postgraduate students in biology.
Lectures
Keywords:
Partitive versus hierarchical clustering,
divisive (top-down) versus agglomerative (bottom-up)
clustering,
K-means and soft/fuzzy K-means,
UPGMA and hierarchical average linkage clustering,
tree-structured vector quantization,
Euclidean versus correlation distance,
pitfalls of clustering, estimating the number of clusters
with the gap statistic, application to
cancer research: molecular subtypes of
diffuse large B-cell lymphoma.
Keywords:
Principal component analysis (PCA),
self-organizing map (SOM),
generative topographic map (GTM),
application to gene expression profiles
from yeast, prediction of novel gene
functions.
Keywords:
Linear discriminant analysis versus non-linear methods,
neural networks, backpropagation algorithm,
training set, test set, overfitting, generalization
performance, n-fold crossvalidation,
classification and diagnostic prediction
of cancer from gene expression profiles
with neural networks.
Summary:
This lecture will give an introduction to Bayesian
networks and their application to the reverse engineering
of genetic networks. I will focus on the basic concepts,
strengths and limitations of the methodology.
I will also discuss an application to gene expression
data from the yeast cell cycle.
Seminar
The objective of the seminar is for you to learn to
critically evaluate recent publications in
bioinformatics and machine learning related to the
topics covered in my lectures.
You are not required to understand all the mathematical
details of the algorithms - they are for researchers
in machine learning who work on improving the presented
schemes. You should, however,
-
be able to understand the
concepts of the methodology,
-
summarize the concepts in an appropriate "technical" language,
-
describe how the novel concepts are related to
the methods presented in my lectures,
-
critically evaluate their advantages and shortcomings,
and
-
state your personal assessment of the applicability of
these schemes to the biological problems you are working on
and the data you have produced/ are about to produce.
Here is a list of papers that I would like to discuss:
Please read all three papers and try to be able to summarize
them in an informal presentation
(no need for a glossy Powerpoint presentation!)
for
about 10 minutes, addressing the points mentioned above.
Each presentation will be followed by a discussion
for about 20 minutes,
which will involve the whole group.
The decision about who is to present which paper
will be made at random - I will bring some dice ...
Last update: 29 January 2004.
Back to my homepage.