[BMI776] Clarification of AMAP weights and heap ordering
Colin Dewey
cdewey at biostat.wisc.edu
Mon Mar 19 13:15:28 CDT 2007
Hi all,
I just wanted to clarify briefly the issue of column pair weights and
the heap ordering in AMAP. The algorithm for finding the column pair
with highest weight is as follows:
1. Pop the top column pair (p_cand) off of the heap.
2. Recalculate the weight of p_cand, given the other columns with
which its member columns may have merged
3. If this recalculated weight of p_cand is greater than or equal to
the (non-recalculated) weight of the top column pair of the heap
(p_next), then this is a maximally weighted pair. Otherwise, it may
not be, so reinsert p_cand into the heap with its recalculated weight
and repeat starting at step #1.
The correctness of 3 follows from the fact that if weights can only
decrease with merges, the (non-recalculated) weight of p_next is
greater than the weights of all other pairs below it in the heap.
The AMAP paper does not discuss the worst-case runtime of this
algorithm (i.e., how many times might we have to reinsert p_cand
before finding a maximally-weighted pair?). This would be
interesting to explore.
Hope that helps,
Colin
More information about the BMI776
mailing list