MODELS FOR DNA SEQUENCES

[Link Map]
begin comments
line

We use Markov chain models of order m > 0 (or m = 0), denoted by Mm. The order m of the model means that the probability that a letter A, C, G or T occurs at a given position in the sequence depends on the m previous letters.

For instance, in a 1-order Markov chain (model M1), the probability of observing an A at a position i depends on the letter at position i - 1; we then have the probabilities of observing an A after an A, an A after a G, an A after a C and an A after a T, denoted respectively tex2html_wrap_inline15(A,A), tex2html_wrap_inline15(G,A), tex2html_wrap_inline15(C,A) and tex2html_wrap_inline15(T,A), and known as transition probabilities.

M0 is the simplest model where the sequence letters are independent.

To take into account the phase in coding DNA sequences, we may use Markov chains with 3-periodic transition probabilities, meaning that we distinguish the probability of observing an A (for instance) on phase 1 from an A on phase 2 and an A on phase 3. This class of model is denoted by Mm_3.

For instance, in the model M1_3, the probability of observing an A at a position i depends both on the phase associated with the ith letter of the sequence and the letter at position i -1.

We then have the probabilities of observing an A on phase 1 after an A (tex2html_wrap_inline15(A,A)), an A on phase 1 after a G (tex2html_wrap_inline15(G,A)), an A on phase 1 after a C (tex2html_wrap_inline15(C,A)), and so on up to an A on phase 3 after a T (tex2html_wrap_inline17(T,A)).

line
Finding words with unexpected frequencies in DNA sequences. 11.9.98 Page: 3 of 21