Question (22 October 04)
What is the meaning of the
thinning and tuning interval
parameters in the dialog box
run settings?
Answer
The thinning interval determines the
number
of configurations that are written
out to file.
In a Markov chain, adjacent
configurations
are highly correlated; consequently,
one does not
gain much extra information by
keeping
every single configuration (while
wasting disk space).
A thinning interval of 100 means that
only
every 100th configuration in the
Markov chain is saved,
that is, 99 in 100 configurations are
discarded.
The tuning intervall determines how
often
the parameters of the proposal
distribution are
adjusted. A tuning interval of 100
means that
the adjustment is done every 100 MCMC
steps; see
Molecular Biology and Evolution
20(3):315-337
for further details.
Question (8 December 03)
I believe when
you run the HMM
analysis that the results should be HMM-bayes but the
results I get look
more like HMM-heuristic results.
Answer
HMM-heuristic graphs are usually characterized by
frequent oscillations as a consequence of sub-optimal
parameter settings. This can occasionally happen
for HMM-bayes as well
if the model is misspecified or if the
MCMC simulation has not converged.
Misspecification may arise in the presence of
strong rate heterogeneity, which is not captured
by the model. To check for sufficient convergence,
try different initializations, different initial
values of lambda, different initial sequence segmentations,
and different annealing options. Monitor the
log likelihood trace plots (actually:
log unnormalized posterior trace plots) of the MCMC
simulations. Suboptimal predictions, giving rise to
HMM-heuristic like graphs, are indicated by lower
log posterior scores.
Question (30 October 03)
I am trying to use BARCE on a Unix station.
The compilation seems to
work (with the g++ compiler),
but when I am trying to submit an
alignment I have a message:
"HmmCalc::ReadInitialMosaicStructure():
Cannot open file mosaic.in"
Indeed, I have no file called mosaic.in.
Can you help me ?
Answer
When you start BARCE, select
1) Model submenu
where you find the following option:
Read in initial hidden state sequence?
J YES
By default, BARCE reads in an initial
sequence of hidden states
from file
mosaic.in.
If you don't want to initialize the hidden state sequence,
type J, which toggles between
the following two options:
Read in initial hidden state sequence?
J YES
Read in initial hidden state sequence?
J NO
If you select NO, then the initial hidden
state sequence will be selected randomly.
Tip: For a
"reasonable"
initialization of the hidden state sequences,
use
TOADS
.
Answer
The idea of the HMM method is to sample the model parameters
from the posterior distribution - so in principle there is no need
for any bootstrapping etc., since the posterior probabilities already
indicate the degree of confidence in or uncertainty about the prediction.
Since direct sampling from the posterior distribution is intractable,
the method of Markov chain Monte Carlo (MCMC) is employed.
Theoretically, the Markov chain converges to the desired posterior
distribution irrespective of the initialization. In practice, however,
this convergence can take an enormous amount of time.
Hence if the results vary substantially as a consequence of changing
the initialization, this is an indication that the Markov chain has not
yet converged.
The straightforward way to proceed in this case is to re-run the MCMC simulations with longer burn-in and sampling times. However, if this doesn't make any difference - and there is a good chance that it doesn't - then the convergence of the Markov chain is too slow, and heuristic, although theoretically unsatisfactory procedures have to be adopted.
One possible approach is to combine the results obtained from different simulations and to compute an average posterior probability profile. This approach makes sense when the (high-dimensional) log-likelihood or log-posterior landscape in parameter space contains many deep valleys, which are unlikely to be crossed by a single MCMC simulation (which therefore is effectively no longer ergodic). Hence by combining different trajectories, a larger part of the parameter space can be covered.
A diametrically different approach is to reduce the variability by always starting the MCMC simulations from the same "reasonable" initialization of the hidden state sequences, as obtained, for instance, with TOADS . This approach can be interpreted in a Bayesian way as an introduction of prior knowledge via the initialization rather than the prior (although mathematically this is unsatisfactory). The risk one takes in this way, however, is to place a good amount of trust into the initialization. TOADS , for instance, is based on parsimony principles; hence the results could be misleading if the DNA sequence alignment contains a large amount of homoplasious sites.
Back to the BARCE webpage.
Back to the TOPALi web page.