ANALYSIS OF MOLECULAR VARIANCE
As the name suggests Analysis of Molecular Variance (AMOVA) is a method for studying molecular variation within a species.
Electrophoresis, one of the most widely used methods for studying the structure of DNA, produces data of the form of 0s and 1s where 1 denotes the presence of a band and zero its absence, e.g.
| Sample | Band |
|---|---|
| A | 0101... |
| B | 1000... |
| C | 0110... |
AMOVA works on such data to create a distance matrix between samples in order to measure the genetic structure of the population from which the samples are drawn. In statistical terms, AMOVA is a testing procedure based on permutational analysis and involves few assumptions about the statistical properties of the data.
USES OF AMOVA
|
Study of human evolution. |
|
Investigating genetic variation in Buffalograss. |
|
Monitoring Dall's porpoise in the Pacific . |
HUMAN mtDNA HAPLOTYPES
Restriction haplotypes of human mtDNA were sampled from ten populations,
grouped into five regions. Altogether 672 mtDNAs were assayed, yielding 62
restriction sites of which 34 were polymorphic.
Excoffier, Smouse & Quattro (1992) used this example to find significant differences
RAPD VARIATION IN BUFFALOGRASS
Buffalograss is native to the Great Plains of North America, where it is
important for rangeland forage, soil conservation and as turf.
A survey by Huff, Peakall and Smouse (1993) had aimed to determine the pattern and extent of RAPD marker variation within and among natural populations of diploid buffalograss.
They examined 12 plants from each of two populations in each of two regions (Texas and Central Mexico), i.e. 48 plants in all.
They found
DALL'S PORPOISE mtDNA VARIATION
mtDNA samples were taken from 101 Dall's porpoises, collected from three regions
in the northern Pacific, the Bering Sea, the Aleutians and the Western North
Pacific.
Digestion with 11 restriction enzymes yielded 34 distinct restriction site haplotypes.
McMillan & Bermingham (1996) used AMOVA to find a significant difference among the three populations, to which western North Pacific made the greatest contribution.
They concluded that separate management policies should be used for the three populations.
ANALOGY WITH ANOVA
AMOVA is like an hierarchical analysis of variance in that it
separates and tests tiers of genetic diversity:
AMOVA differs from analysis of variance in that:
HAPLOID DATA
With haploid data each individual is represented by one haplotype.
A haplotype may be obtained from:
For instance
DISTANCES BETWEEN HAPLOTYPES
We are interested in the genetic distance between pairs of haplotypes.
Let Xj be a Boolean vector, of length S, representing the jth
haplotype.
S is the number of restriction sites or marker bands.
The squared distance between haplotypes hj and hk is:
The total sum of squares of the distances between all pairs of haplotypes is equivalent to the conventional sum of squares of deviations of multidimensional vectors from the centroid of a multidimensional space.
The analysis of molecular variance is essentially an analysis of variance using squared distances between pairs of haplotypes as the data.
HIERARCHICAL MODEL
Now extend the notation to include the population structure.
If X(jig) indexes haplotype j in subpopulation i in group g, it can be represented by the linear model
where
ANALYSIS

The sums of squares of deviations (SSD) are functions of the squared distances between pairs of haplotypes. Note that MSD = SSD/df
The terms for n, n', n'' in the Expected(MSD) represent the average sample sizes of particular hierarchical levels.
DISTANCE METRICS
The analysis of molecular variance is performed using the distances between
pairs of haplotypes as data
, and with W as a weight matrix, as follows.
Distance can be defined in a variety of ways.
AMOVA is not strictly rigorous with a non-Euclidean metric.
WEIGHT MATRIX
W is a weight matrix of dimension, S = number of restriction sites
or marker bands. W has several forms:
EQUIDISTANCE
Use this method for allozymes and other protein systems, for antigenic
systems and for molecular fingerprint analysis.
Peakall, Smouse and Huff (1995) use this method to analyse allozyme data with multiple alleles.
In this case AMOVA is mathematically identical to Weir and Cockerham's (1984) analysis of variance for multiple alleles except that they use a different testing procedure.
EXAMPLES
Two ways of constructing haplotypes from restriction sites.
Example 1: three restriction sites, 1 = presence, 0 = absence
There are five distinct haplotypes in the data set.
Example 2: Four mutational events obtained from the network constructed from the p vectors, 1 = presence, 0 = absence of a mutational event. There are still five distinct haplotypes.
PHI STATISTICS
The variance components from the analysis are used to estimate
statistics which are similar
to F statistics.
is the correlation of random pairs
of haplotypes drawn from a group relative to the correlation of pairs of random
haplotypes drawn from the whole population.
is the correlation of random pairs
of haplotypes drawn from a subpopulation relative to the correlation of pairs
of random haplotypes drawn from the whole region, averaged over all
subpopulations.
is the correlation of random pairs
of haplotypes drawn from within subpopulations relative to the correlation of
pairs of random haplotypes drawn from the whole population.
Note that
POPULATION STRUCTURE
Consider several subpopulations of the same species separated geographically.
In the absence of selection and with random mating, two trends are expected:
Cockerham (1973) showed how F statistics could be estimated from an analysis of variance comparing mean gene frequencies.
F statistics apply to two possible alleles at a particular locus.
Long (1986) extended Cockerham's analysis to several loci each having two or more possible alleles. This method was particularly suited to the analysis of allozyme data.
POPLAR TREE EXAMPLE
Poplar trees are grown commercially throughout the United Kingdom.
The Forestry Commission has the task of ensuring that clones can be
unambiguously identified. In the past, this identification was done using
morphological characters but nowadays is performed using molecular data.
|
A 4-year-old Beaupre hybrid |
|
A mature poplar |
Cottrell, Forrest and White (1996) have provided RAPD data for three species and several hybrids.
Unweighted Euclidean distances have been used in the analysis.
POPLAR TREE DATA
The data show the presence (1) or absence (0) of
RAPD bands in each clone examined. 14 clones for which some band data were
missing, have been excluded.
Species and hybrids for which only one sample was available have not been used
in this analysis.
The first column is a code for the clone. The second column is an abbreviation
of either the species name or the cross used to produce a hybrid. There is one
column for each marker band.
POPLAR TREE EUCLIDEAN DISTANCES
The matrix below lists the clones across the top and down the left side.
The body of the matrix contains the squared Euclidean distances between pairs
of clones.
So the distance between clones 1 and 3 is 8 while the distance between clones 3 and 4 is 5. The distance between a clone and itself is 0 and so all diagonal elements are 0.
Some Observations :
The first nine clones are all P. trichocarpa and are closer to one another than to any of clones 11-15 (P.nigra) or clones 16-17 (P.deltoides).
Clones 19-23 and 27-29 are formed from crosses between P.trichocarpa and P.
deltoides. These are closer to each of the parent species than the parents are
to one another.
SOFTWARE FOR AMOVA ANALYSIS
AMOVA is a computer program developed by
Laurent Excoffier
at the
Genetics and Biometry Laboratory
of the
Department of Anthropology
& Ecology in the University of Geneva.
The program is a very useful tool for the analysis of population
genetic structure at the molecular level. It runs under PC Windows and
is currently available as freeware from :
http://acasun1.unige.ch/LGB/software/win/amova/
To run the AMOVA program it is necessary to construct
Using the input files and settings as illustrated an output file is obtained.
Since developing AMOVA the Genetics & Biometry Laboratory has produced a much more sophisticated program - ARLEQUIN - which can handle data from RFLPs, DNA sequences, microsatellites as well as standard multi-locus or allele frequency data.
SUMMARY
Analysis of Molecular Variance is a method for analysing population variation
using molecular data.
It may be used on RFLP, RAPD, protein or allozyme data.
A variety of distance metrics may be used for establishing distances between pairs of chromosomes but AMOVA is strictly valid for squared Euclidean distances only.
AMOVA assumes that the restriction sites, RAPD markers, or allozymes are independent.
AMOVA provides a way of estimating
statistics which are an extension of
Wright's F statistics.