Microarray and other similar high-throughput experiments can be costly. However, there is usually little prior information available to decide how many samples / arrays are needed to give the scientist sufficient information on the question of interest.
In order to reduce sample size and thus costs in microarray experiments, BioSS has investigated groupsequential / adaptive designs for microarray studies involving a comparison of two groups (eg. treatment vs control). In such designs the total sample size is not chosen in advance but the experiment is conducted in several stages. After each stage a decision is made, as to whether more samples should be added or whether the experiment has generated sufficient information.
Histograms of p-values
after different stages of
a sequentially designed
microarray experiment.
From these histograms we
can estimate the sensitivity,
i.e. the percentage of truly
differential genes that
have been discovered. A
pronounced peak around
0 indicates high sensitivity.
Based on the sensitivity
we decide whether to add
more samples or stop the
experiment.
We studied stopping criteria based on the distribution of p-values across all genes, which can be visualised in a p-value histogram. A peak close to zero indicates that the experiment managed to detect many differentially expressed genes. Using mixture models it is possible to estimate the numbers of true positives / negatives (TP/TN) and false positives/negatives (FP/NP) from this histogram for a given p-value cutoff. These estimates can be used to define several possible measures of success of the experiment. We were particularly interested in sensitivity, i.e. the expected percentage of truly differential genes that have been detected. For example one might choose to stop the experiment if sensitivity exceeds a specific threshold of, say, 80%, i.e. if the vast majority of differential genes have been detected.
Microarrays hybridised at the same stage are likely to be more similar than those hybridised at different stages. We hence use a p-value combination approach to combine the data from the different experimental stages. This is a simple but very efficient approach developed for meta-analyses in which the results of different studies are to be combined.
The most striking result of our research is that in this very high-dimensional situation the stopping decision does not bias the p-values obtained at later stages - a fundamental problem in classical sequential designs, where only one or few variables are being considered.
Further details from: Claus-Dieter Mayer