CHOICE OF THE MODEL

[Link Map]
begin comments
line

Exceptional words are over- or under-represented regarding to a model. The model determines what has to be expected with respect to the information taken into account in the sequence. Exceptional words will not be necessarily the same using different models (see the comparison of exceptional words under two models, for instance).

The choice of the model is then important.

Models with/without phase

The purpose of using a model Mm_3, with phase, is mainly to distinguish word occurrences on a given phase in a coding DNA sequences.

Indeed, a word may have an expected count under Mm or Mm_3, but an unexpected count on a particular phase. This cannot be identified with a model Mm but with a model Mm_3 (see the comparison of exceptional words on two phases, for instance).

Order of the model

The order of the Markovian model defines the sequence composition one wants to take into account.
For instance, M0 is exactly fitted to the counts of letters, M1 is exactly fitted to the counts of 2-words, and so on. Similarly, M0_3 is exactly fitted to the counts of letters on phase 1, to the counts of letters on phase 2 and to the counts of letters on phase 3; M1_3 is exactly fitted to the counts of 2-words on phase 1, on phase 2 and on phase 3, and so on. Therefore, when we are interested in identifying exceptional h-words, the largest Markov chain model to consider, called the maximal model, is of order h-2.

line
Finding words with unexpected frequencies in DNA sequences. 11.9.98 Page: 10 of 21