EXCEPTIONALITY OF A WORD AND ITS SUBWORDS

[Link Map]
begin comments
line

The exceptionality of a word under a given model may be related to the exceptionality of some of its subwords under the same model (see the comparison of exceptional words under two models); in this case, it results on a contamination phenomena. In other cases, it is an additional constraint for the sequence.

The pyramidal display is then very convenient to study simultaneously the exceptionality of a word and of its subwords.

A word W of length h can be represented by a pyramid of h-2 stages; each stage corresponds to a word length. The higher stage is composed with a unique square coloured according to the statistic of W. The stage beneath is made of two squares corresponding to the 2 subwords of length h-1 of W, and so on up to the stage made of h-2 squares associated with the subwords of length 3.

A pyramid can be made either under a single model (M1, for instance) or under maximal models for each stage (M1 for 3-words, M2 for 4-words, ...). Models with phase can be used.

For instance, let us look at the pyramids of some of the most under-represented 6-words (the first, the fifth and the sixth) in a sequence of E. coli (111,402 bases) under maximal models without phase.