In principle, the initialization of the hidden states is unimportant since the Markov chain will forget its initial configuration and converge to the equilibrium distribution irrespective of its starting point. In practice, however, extreme starting values could slow down the mixing of the chain and result in a long burn-in, in which case the MCMC sampler may fail to converge towards the main support of the posterior distribution within the available simulation time. In general, it may therefore be advisable to start from an informative initialization, like the output of a prediction with RECPARS. However, we found that the annealing scheme improves mixing very effectively during the burn-in period and thereby helps the Markov chain to lose its memory of the initialization very fast. Consequently, the initialization does not seem to be that important if annealing is used. If a strong dependence on the initialization is found, this indicates that the equilibration and sampling periods are too short and should be increased.
As an example, take the application of BARCE to the sequence alignment of mosaic structure B, tree height 0.2. We tried two initializations:
mosaic.in
looks as follows:
2
500
0 0
mosaic.in:
5
200 400 600 800
0 1 0 2 0
We start with a reminder of the true mosaic structure, where the horizontal axis shows the position in the DNA sequence alignment, and the vertical axis represents the three possible tree topologies.
Each of the figures below contains three graphs, which show the posterior probabilities for the three possible tree topologies, plotted along the DNA sequence alignment (the horizontal axis represents sites in the alignment).
Length of the burn-in period
|
B
|
100000
|
Length of the sampling period
|
.
|
100000
|
Number of points to return
|
N
|
1000
|
Thinning interval
|
I
|
100
|
Otherwise, the default settings were used.
The results are shown in the figure below. When classifying the sites of the alignment by assigning them to the topology with the highest probability, the true mosaic structure is predicted. However, the classification of the second recombinant region shows a high degree of uncertainty.
Length of the burn-in period
|
B
|
100000
|
Length of the sampling period
|
.
|
100000
|
Number of points to return
|
N
|
1000
|
Thinning interval
|
I
|
100
|
Otherwise, the default settings were used.
The results are shown in the figure below. It is seen that the uncertainty in the prediction of the second recombinant region is significantly reduced.
Length of the burn-in period
|
B
|
100000
|
Length of the sampling period
|
.
|
200000
|
Number of points to return
|
N
|
2000
|
Thinning interval
|
I
|
100
|
Otherwise, the default settings were used.
Result:
| Inititialization | Sampling time | Sensitivity | Specificity | Relative entropy | Average log likelihood plus log prior |
|---|---|---|---|---|---|
Back to the main page.