Determining Gene Expression Profiles with cDNA Microarrays
Dirk Husmeier
August 2001
-
Objective
-
Applications
-
Biological facts
-
Normalisation
-
Analysis
-
References
Determine the pattern of gene expression
to understand the following:
-
The function of a gene.
-
Gene regulation:
Genes with similar expression profiles
(coexpressed genes) may share something common in their
regulatory mechanism (coregulation).
Cluster together genes with similar
expression profiles --> find groups of potentially coregulated
genes --> search for putative regulatory signals.
-
Gene-gene and gene-environment interactions.
-
Study gene expression under different environmental
stress conditions (e.g. in yeast).
-
Compare gene expression in normal and disease cells to
identify disease genes, e.g.,
genes implicated in breast cancer.
-
Pharmaceutical applications:
study the effects of a drug
on the level of expression.
-
Compare expression profiles between tumorous and healthy
cells:
etiology of cancer.
-
Samples of
DNA clones with
known sequence content
are spotted and
immobilised onto a glass slide or membrane.
This is the microarray.
-
Pools of mRNA from the investigated
cell population are reverse-transcribed into cDNA
and are labelled with a fluorescent dye.
-
The microarray is simultaneously probed with cDNA pools
from test and reference cells,
labelled with
different dyes (usually red versus green).
-
cDNA in the pool hybridises
to complementary sequences on the array.
Unhybridised cDNA is washed off.
-
Measure the relative
abundance of a particular mRNA:
direct comparison between a test cell state and a
reference cell state.
-
Sources of systematic variation in microarray experiments:
-
Bias:
Different fluorescent intensities of the two dyes.
-
Variance:
Relative gene expression levels from replicate experiments
may have different spreads due to different experimental
conditions.
-
Normalisation = process of removing such variation
to allow the comparison of expression levels
across experiments.
-
Global normalisation inadequate as dye biases can depend on
overall intensity and location on the array (pin-tip effects).
-
Intensity-dependent normalisation:
Robust regression of the log intensity ratio
log R/G on the mean log intensity
log sqrt(RG)
(where R is the red and G the green intensity).
-
Location normalisation:
Repeat this procedure separately for different locations
to allow for systematic differences
between the tips in the print-head of the arrayer.
-
Scale adjustment:
There is a trade-off between the gains achieved
by scale normalisation and the possible increase in
variability by this additional step.
-
Decide on the genes to be used in normalisation.
-
If only a small proportion of the genes are expected to
be differentially expressed, use all the genes.
-
If a large fraction of the genes are expected to change,
use a designated subset of constantly expressed
(`housekeeping') genes.
For further details, see Yang et al.
The high dimensionality of gene expression
data calls for new methods that automatically detect
interesting structures in the data.
After image pre-processing and normalisation,
which in itself is the subject of current microarray research,
one has to solve the following information extraction problems:
-
Dimension reduction: We want to do better than PCA.
-
Variable selection: Which genes are biologically relevant?
-
Classification: Assign tissue samples to phenotypically
characterised categories, e.g. different types of tumour.
-
Clustering: Detect previously unknown relationships among
genes, among tissues, or between genes and tissues.
An interesting approach addressing these problems was
suggested by
v. Heydebreck (MPI Berlin).
ad 2)
Use a score based on diagonal linear discriminant analysis
to find genes whose expression levels strongly correlate with
a known class distinction.
ad 1 and 3)
Project the expression profiles onto discriminant axes
determined by subsets of selected genes and test
if a priori known classes are well separated
(supervised training).
ad 4)
Unsupervised exploration: Look for new, better separated clusters
by combining a heuristic initialisation and a greedy
search in the graph of all bipartitions.
The last step offers a "quality check" of existing
classifications (by measuring the distance to the local
maximum on the bipartition graph) and may detect new
biologically relevant structures related to cell type,
mutational status, response to a drug, tumour
progression etc.
Back to the main page.