Questions and answers about BARCE


Question (22 October 04)
What is the meaning of the thinning and tuning interval parameters in the dialog box run settings?

Answer
The thinning interval determines the number of configurations that are written out to file. In a Markov chain, adjacent configurations are highly correlated; consequently, one does not gain much extra information by keeping every single configuration (while wasting disk space). A thinning interval of 100 means that only every 100th configuration in the Markov chain is saved, that is, 99 in 100 configurations are discarded. The tuning intervall determines how often the parameters of the proposal distribution are adjusted. A tuning interval of 100 means that the adjustment is done every 100 MCMC steps; see Molecular Biology and Evolution 20(3):315-337 for further details.


Question (8 December 03)
I believe when you run the HMM analysis that the results should be HMM-bayes but the results I get look more like HMM-heuristic results.

Answer
HMM-heuristic graphs are usually characterized by frequent oscillations as a consequence of sub-optimal parameter settings. This can occasionally happen for HMM-bayes as well if the model is misspecified or if the MCMC simulation has not converged. Misspecification may arise in the presence of strong rate heterogeneity, which is not captured by the model. To check for sufficient convergence, try different initializations, different initial values of lambda, different initial sequence segmentations, and different annealing options. Monitor the log likelihood trace plots (actually: log unnormalized posterior trace plots) of the MCMC simulations. Suboptimal predictions, giving rise to HMM-heuristic like graphs, are indicated by lower log posterior scores.


Question (30 October 03)
I am trying to use BARCE on a Unix station. The compilation seems to work (with the g++ compiler), but when I am trying to submit an alignment I have a message: "HmmCalc::ReadInitialMosaicStructure(): Cannot open file mosaic.in" Indeed, I have no file called mosaic.in. Can you help me ?

Answer
When you start BARCE, select

1) Model submenu

where you find the following option:

Read in initial hidden state sequence? J YES

By default, BARCE reads in an initial sequence of hidden states from file mosaic.in. If you don't want to initialize the hidden state sequence, type J, which toggles between the following two options:

Read in initial hidden state sequence? J YES
Read in initial hidden state sequence? J NO

If you select NO, then the initial hidden state sequence will be selected randomly.
Tip: For a "reasonable" initialization of the hidden state sequences, use TOADS .


Question (20 October 03)
I just found Topali on the net and it is GREAT! Excellent Programme and an excellent surface. I am using HMM and if I repeat my analyses I get minor to bigger differences in the predictions without changing any parameter. Though the predictions agree in general, it would be perfect, if a quantitiative measure (like a bootstrap/repeat of different predictions) could be done. Otherwise the selection of a single analyses would be a little bit arbitrary. Is there any possibility to compare several analyses of HMM?

Answer
The idea of the HMM method is to sample the model parameters from the posterior distribution - so in principle there is no need for any bootstrapping etc., since the posterior probabilities already indicate the degree of confidence in or uncertainty about the prediction. Since direct sampling from the posterior distribution is intractable, the method of Markov chain Monte Carlo (MCMC) is employed. Theoretically, the Markov chain converges to the desired posterior distribution irrespective of the initialization. In practice, however, this convergence can take an enormous amount of time. Hence if the results vary substantially as a consequence of changing the initialization, this is an indication that the Markov chain has not yet converged.

The straightforward way to proceed in this case is to re-run the MCMC simulations with longer burn-in and sampling times. However, if this doesn't make any difference - and there is a good chance that it doesn't - then the convergence of the Markov chain is too slow, and heuristic, although theoretically unsatisfactory procedures have to be adopted.

One possible approach is to combine the results obtained from different simulations and to compute an average posterior probability profile. This approach makes sense when the (high-dimensional) log-likelihood or log-posterior landscape in parameter space contains many deep valleys, which are unlikely to be crossed by a single MCMC simulation (which therefore is effectively no longer ergodic). Hence by combining different trajectories, a larger part of the parameter space can be covered.

A diametrically different approach is to reduce the variability by always starting the MCMC simulations from the same "reasonable" initialization of the hidden state sequences, as obtained, for instance, with TOADS . This approach can be interpreted in a Bayesian way as an introduction of prior knowledge via the initialization rather than the prior (although mathematically this is unsatisfactory). The risk one takes in this way, however, is to place a good amount of trust into the initialization. TOADS , for instance, is based on parsimony principles; hence the results could be misleading if the DNA sequence alignment contains a large amount of homoplasious sites.


Back to the BARCE webpage.
Back to the TOPALi web page.