SUMMARY |
|
Statistical methods have been developed to detect if a word appears with an unexpected frequency in DNA sequences.
DNA sequences are modelled by m-order Markov chains either with homogeneous or 3-periodic transition probabilities. This last class of models is of interest for coding DNA sequences.
For rare words, the statistical analysis is based on a compound Poisson approximation of the counts. Otherwise, the analysis is based on an asymptotically Gaussian z-score.
R'MES is a set of programs which detect exceptional words in DNA sequences. It provides the word statistics and allows a detailed analyis of the DNA sequence vocabulary from adapted graphical displays.
Finding words with unexpected frequencies in DNA sequences. 11.9.98 Page: 18 of 21 |
|