[BMI776] HW #2 ok

Colin Dewey cdewey at biostat.wisc.edu
Thu Mar 8 23:59:10 CST 2007


Hi all,

I checked both the EM and Gibbs for finding the motif in HW #2 and I  
had success with both methods.  For EM, you should definitely use the  
substring starting point method (lecture 5, slide 35).  I confirmed  
that if you try all substrings of width 14 in the first sequence as  
starting point initializers, you *will* find the motif.  In my simple  
Python implementation, it takes under 30 minutes to try all of  
substrings in the first sequence, so this is definitely doable.   
Therefore, the due date for HW #2 will remain the same.

One key point to remember is that you need to be calculating the  
likelihood after every iteration in both EM and Gibbs.  With Gibbs,  
you will output the motif positions (and profile) that gave the  
maximum likelihood.  For EM, choose the final profile from the run  
that gave the maximum likelihood and predict the most likely motif  
positions using this profile.  In both EM and Gibbs, you will need to  
use the *log* likelihood, because the likelihood will be too small  
for a floating point number (but you don't need to use log  
probabilities for any other part of the algorithm).

Happy motif finding!

Colin



More information about the BMI776 mailing list