Dirk Husmeier - Essays
» Dirk's Home Page
Other Essays:-
Determining Gene Expression Profiles with cDNA Microarrays
August 2001
- Objective
- Applications
- Biological facts
- Normalisation
- Analysis
- References
Objective
Determine the pattern of gene expression to understand the following:
- The function of a gene.
- Gene regulation : Genes with similar expression profiles (coexpressed genes) may share something common in their regulatory mechanism (coregulation). Cluster together genes with similar expression profiles --> find groups of potentially coregulated genes --> search for putative regulatory signals.
- Gene-gene and gene-environment interactions .
Applications
- Study gene expression under different environmental stress conditions (e.g. in yeast).
- Compare gene expression in normal and disease cells to identify disease genes , e.g., genes implicated in breast cancer .
- Pharmaceutical applications: study the effects of a drug on the level of expression.
- Compare expression profiles between tumorous and healthy cells: etiology of cancer .
Biological facts
- Samples of DNA clones with known sequence content are spotted and immobilised onto a glass slide or membrane. This is the microarray.
- Pools of mRNA from the investigated cell population are reverse-transcribed into cDNA and are labelled with a fluorescent dye .
- The microarray is simultaneously probed with cDNA pools from test and reference cells , labelled with different dyes (usually red versus green ).
- cDNA in the pool hybridises to complementary sequences on the array. Unhybridised cDNA is washed off.
- Measure the relative abundance of a particular mRNA : direct comparison between a test cell state and a reference cell state.
Normalisation
- Sources of systematic variation in microarray experiments:
- Bias: Different fluorescent intensities of the two dyes.
- Variance: Relative gene expression levels from replicate experiments may have different spreads due to different experimental conditions.
- Normalisation = process of removing such variation to allow the comparison of expression levels across experiments.
- Global normalisation inadequate as dye biases can depend on overall intensity and location on the array (pin-tip effects).
- Intensity-dependent normalisation: Robust regression of the log intensity ratio log R/G on the mean log intensity log sqrt(RG) (where R is the red and G the green intensity).
- Location normalisation: Repeat this procedure separately for different locations to allow for systematic differences between the tips in the print-head of the arrayer.
- Scale adjustment: There is a trade-off between the gains achieved by scale normalisation and the possible increase in variability by this additional step.
- Decide on the genes to be used in normalisation .
- If only a small proportion of the genes are expected to be differentially expressed, use all the genes.
- If a large fraction of the genes are expected to change, use a designated subset of constantly expressed (`housekeeping') genes.
For further details, see Yang et al.
Analysis
The high dimensionality of gene expression data calls for new methods that automatically detect interesting structures in the data. After image pre-processing and normalisation, which in itself is the subject of current microarray research, one has to solve the following information extraction problems:
- Dimension reduction: We want to do better than PCA.
- Variable selection: Which genes are biologically relevant?
- Classification: Assign tissue samples to phenotypically characterised categories, e.g. different types of tumour.
- Clustering: Detect previously unknown relationships among genes, among tissues, or between genes and tissues.
An interesting approach addressing these problems was suggested by v. Heydebreck (MPI Berlin).
ad 2)
Use a score based on diagonal linear discriminant analysis to find genes whose expression levels strongly correlate with a known class distinction.
ad 1 and 3)
Project the expression profiles onto discriminant axes determined by subsets of selected genes and test if a priori known classes are well separated (supervised training).
ad 4)
Unsupervised exploration: Look for new, better separated clusters by combining a heuristic initialisation and a greedy search in the graph of all bipartitions.
The last step offers a "quality check" of existing classifications (by measuring the distance to the local maximum on the bipartition graph) and may detect new biologically relevant structures related to cell type, mutational status, response to a drug, tumour progression etc.
References
- Yang, Dudoit, Luu, Speed (2000)
Normalization of cDNA microarray data
UC Berkley technical report
- von Heydebreck, Huber, Poustka, Vingron (2001)
Identifying splits with clear separation: a new class discovery method for gene expression data
Bioinformatics 17, pp. S107-S114
- A detailed list of references can be found here .