Christoforos

Phone: 0030 210 654 1951

Motif discovery: methods and software


28 May 2004

Problem
Identify co-regulated genes on the basis of their promoter sequences.

Task
Embed known motifs in background sequences. Vary the noise level of the motifs (e.g., include motifs which deviate from the original ones by a pre-specified Hamming distance). Try and distinguish these sequences from pure background sequences.

Seeding
Local greedy search is susceptible to suboptimal local optima. Therefore, explore the effects of seeding for clever initialization. To extract binding motifs, use Meme or Friedman's hyper-geometric approach.

Data
Here is a website with transcription factor binding sites. It was used in the paper Modeling Dependencies in Protein-DNA Binding Sites by Barash, Elidan, Friedman and Kaplan.


21 May 2004

Here is the paper by Segal, Yelensky and Koller:
Genome-wide discovery of transcriptional modules from DNA sequence and gene expression.

Start encoding the model bottom-up, starting from the sequences (or top-down, if you refer to Figure 2 of the paper).

Try and learn motifs from synthetic sequences. Generate sequences as follows:

Training is supervised: for the training set, you know the class membership. You test the generalization performance on an independent test set.

Investigate how the performance depends on the following settings:

Now assume that the actual motifs were generated from a more complex dependency model, say a first-order Markov chain. How does this affect the prediction accuracy of the model?

Minutes


Journals and conferences


Back to my homepage.