Improving MCMC by Metropolis coupling of different trajectories:
Simulation study 1

Dirk Husmeier
Biomathematics and Statistics Scotland (BioSS)
SCRI, Dundee DD2 5DA, United Kingdom
March 2002


Data

We simulated recombination in a DNA sequence alignment of four taxa with the program SIRENS, giving the command SIRENS(17,0.1). To download the data (in sequential PHYLIP format), click here. The total length of the alignment is 1000 nucleotides, with two recombinant regions of length 200 nucleotides each. For a detailed description of the simulation process, click here. Note that the first recombination event is more ancient and therefore more difficult to detect. The true mosaic structure is shown in the figure below, where the horizontal axis shows the position in the DNA sequence alignment, and the vertical axis represents the three possible tree topologies.


Simulations

We applied BARCE to the DNA sequence alignment, using the three variations of the default parameter settings described below. Each of the figures below contains three graphs, which show the posterior probabilities for the three possible tree topologies, plotted along the DNA sequence alignment (the horizontal axis represents sites in the alignment).

Row Posterior probability for
top
topology 1
middle
topology 2
right
topology 3

MCMC simulation 1

For the first simulation, no annealing was used. The respective setting in the run settings menu is:

Annealing scheme for lambda Q none

The prediction is obviously suboptimal, since no recombination event is detected.


MCMC simulation 2

For the second simulation, annealing by mixing distributions was used,
with the following modification of the run settings menu:

Annealing scheme for lambda Q PROB

As a result of the annealing scheme, the prediction has improved, and both recombinant regions have been detected. However, the results are still suboptimal, since the first region is misclassified as topology 3, whereas in fact it is topology 2.


MCMC simulation 3

Finally, in the third simulation, we applied annealing by mixing parameters.
The respective setting in the run settings menu is:

Annealing scheme for lambda Q PAR

Now, both recombinant regions have been detected and classified correctly.


Metropolis coupling of MCMC trajectories

The Markov chains in the first two simulations have not converged properly, but rather got stuck in some meta-stable regions of the parameter space. However, in a real-world application, you obviously do not know that the last simulation gives the true prediction. With three different predictions, how are you going to proceed? There are four possible strategies you can adopt:

  1. Increase the length of the simulations so that the trajectories will eventually leave the meta-stable regions and converge to the main support of the posterior distribution. This, however, may take very, very long ...
  2. Take a naive average over the predictions. This is obviously suboptimal, since the wrong predictions corresponding to the meta-stable regions will not be suppressed.
  3. Analyze the trajectories, for instance, by inspecting the log unnormalized posterior probabilities plotted against the simulation time (MCMC steps). However, it is not particularly easy to tell which simulations correspond to meta-stable, suboptimal states ( try it).
  4. Use Metropolis coupling of the MCMC trajectories. This is what you get:

Now, the mosaic structure of the DNA sequence alignment is predicted correctly, with a proper location and identification of the recombinant regions in agreement with the true recombination scenario, and the two sub-optimal predictions corresponding to the meta-stable domains of the parameter space have been successfully suppressed.


Back to the BARCE manual.
Back to the discussion of MCMCMC.
Back to the comparison between BARCE and RECPARS.