Improving MCMC
by
Metropolis coupling of different trajectories:
Simulation study 1
Dirk Husmeier
Biomathematics and Statistics Scotland (BioSS)
SCRI, Dundee DD2 5DA, United Kingdom
March 2002
We simulated recombination in a DNA sequence alignment
of four taxa with the program
SIRENS, giving the command
SIRENS(17,0.1).
To download the data (in sequential PHYLIP format),
click here.
The total length of the alignment is 1000 nucleotides,
with two recombinant regions of length 200 nucleotides each.
For a detailed description of the simulation process,
click here.
Note that the first recombination
event is more ancient and therefore more difficult to detect.
The true mosaic structure is shown in the figure below,
where the horizontal axis shows the position in
the DNA sequence alignment, and the vertical axis represents the
three possible tree topologies.
We applied BARCE to the DNA
sequence alignment, using the three variations
of the default
parameter settings described below.
Each of the figures below contains three graphs, which show
the posterior probabilities for the
three possible tree topologies,
plotted
along the DNA sequence alignment (the horizontal axis
represents sites in the alignment).
| Row |
Posterior probability for |
| top
|
topology 1
|
|
middle
|
topology 2
|
|
right
|
topology 3
|
For the first simulation, no
annealing was used.
The respective setting in the run settings menu
is:
Annealing scheme for lambda
|
Q
|
none
|
The prediction is obviously suboptimal, since no recombination
event is detected.
For the second simulation,
annealing by mixing distributions was used,
with the following modification of
the run settings menu:
Annealing scheme for lambda
|
Q
|
PROB
|
As a result of the annealing scheme, the prediction
has improved, and both recombinant regions have been detected.
However, the results are still suboptimal,
since the first region is misclassified as
topology 3, whereas in fact it is
topology 2.
Finally, in the third simulation, we applied
annealing by mixing parameters.
The respective setting in the run settings menu
is:
Annealing scheme for lambda
|
Q
|
PAR
|
Now, both recombinant regions have been detected and
classified correctly.
The Markov chains in the first two simulations have
not converged properly, but rather got stuck
in some meta-stable regions of the parameter space.
However, in a real-world application, you obviously do
not know
that the last simulation gives the true prediction.
With three different predictions, how are you going
to proceed?
There are four possible strategies you can adopt:
-
Increase the length of the simulations so that the
trajectories will eventually leave the meta-stable
regions and converge to the
main support of the posterior distribution.
This, however, may take very, very long ...
-
Take a naive average over the predictions.
This is obviously suboptimal, since the wrong predictions
corresponding to the meta-stable regions
will not be suppressed.
-
Analyze the trajectories, for instance, by
inspecting the log unnormalized posterior probabilities
plotted against the simulation time (MCMC steps).
However, it is not particularly
easy to tell which simulations correspond to meta-stable,
suboptimal states (
try it).
-
Use Metropolis coupling
of the MCMC trajectories.
This is what you get:
Now, the mosaic structure of the DNA sequence alignment
is predicted correctly, with a proper location and
identification of the recombinant regions
in agreement with the
true recombination scenario,
and the two sub-optimal predictions corresponding to the
meta-stable domains of the parameter space have been
successfully suppressed.
Back to the BARCE manual.
Back to the discussion of MCMCMC.
Back to the comparison between BARCE and RECPARS.