CHOICE OF THE MODEL |
|
Exceptional words are over- or under-represented regarding to a model. The model determines what has to be expected with respect to the information taken into account in the sequence. Exceptional words will not be necessarily the same using different models (see the comparison of exceptional words under two models, for instance).
The choice of the model is then important.
Indeed, a word may have an expected count under
Mm or Mm_3, but an
unexpected count on a particular phase. This cannot be identified with
a model Mm but with a model Mm_3
(see the
comparison of exceptional words on
two phases, for instance).
Order of the model
The order of the Markovian model defines the sequence composition one
wants to take into account.
For instance, M0 is exactly fitted to the counts of
letters, M1 is exactly fitted to the counts of 2-words, and so on.
Similarly, M0_3 is exactly fitted to the counts of
letters on phase 1, to the counts of letters on phase 2
and to the counts of letters on phase 3;
M1_3 is exactly fitted to the counts of
2-words on phase 1, on phase 2 and on phase 3, and so on. Therefore, when we are interested in identifying exceptional
h-words, the largest Markov chain
model to consider, called the maximal model,
is of order h-2.
Finding words with unexpected frequencies in DNA sequences. 11.9.98 Page: 10 of 21 |
|