[Bmi776] answers to various questions

Mark Craven craven at biostat.wisc.edu
Thu Feb 16 23:01:48 CST 2006


1) I updated the Genscan lecture notes with a slide that perhaps better 
explains how an MDD model is used to determine the probability of a given
sequence.  It's the first slide entitled "Explaining a Sequence with an
MDD Tree".  I've tried to illustrate the partial PWMs that are associated
with each node in the tree.  The subsequent slide describes the algorithm
for using these PWMs.  I can go over this in class if it's still not clear.

2) Someone asked about how we calculate the "d"  values that are used for
determining lambdas in GLIMMER.  Here's how it's done:
 - A chi-square test is a standard statistical test for comparing two
   empirical distributions of discrete-valued data.  The output of the
   test is a number called the chi-squared statistic.  
 - From this number, and knowledge of how many values the data can take 
   (i.e. 4 values for DNA) we can look up a p-value.  This is the probability
   that we would see the counts we observed even if both sets of data
   came from the same underlying distribution.
 - We set d = 1 - p.  We can think of this as our confidence that the
   data is coming from different distributions.

Mark



More information about the Bmi776 mailing list