SUMMARY

[Link Map]
begin comments
line

Statistical methods have been developed to detect if a word appears with an unexpected frequency in DNA sequences.

DNA sequences are modelled by m-order Markov chains either with homogeneous or 3-periodic transition probabilities. This last class of models is of interest for coding DNA sequences.

For rare words, the statistical analysis is based on a compound Poisson approximation of the counts. Otherwise, the analysis is based on an asymptotically Gaussian z-score.

R'MES is a set of programs which detect exceptional words in DNA sequences. It provides the word statistics and allows a detailed analyis of the DNA sequence vocabulary from adapted graphical displays.

line
Finding words with unexpected frequencies in DNA sequences. 11.9.98 Page: 18 of 21