The Median Maintenance Problem: Difference between revisions

From NovaOrdis Knowledge Base
Jump to navigation Jump to search
Line 14: Line 14:
Every time a new element arrives, it will be inserted in H<sub>low</sub>.  
Every time a new element arrives, it will be inserted in H<sub>low</sub>.  


While processing an odd element, H<sub>low</sub> contains (k + 1)/2 elements and H<sub>high</sub> contains (k-1)/2 elements. The median, which according to the definition, is the (k + 1)/2 element in the sequence of sorted k elements, is at the top of the H<sub>low</sub> heap and can be retrieved with the MAXIMUM() operation.  
While processing an element with an odd index, H<sub>low</sub> contains (k + 1)/2 elements and H<sub>high</sub> contains (k-1)/2 elements. The median, which according to the definition, is the (k + 1)/2 element in the sequence of sorted k elements, is at the top of the H<sub>low</sub> heap and can be retrieved with the MAXIMUM() operation.  


While processing an even element, H<sub>low</sub> contains immediately after insertion a larger number of elements than it should, so both heaps should be rebalanced by extracting the maximum element from H<sub>low</sub> with REMOVE-MAX() and inserting it in H<sub>high</sub> with INSERT(). After the sequence of operations is complete, H<sub>low</sub> and H<sub>high</sub> will both contain the same number of elements k/2. The current median will still be at the top of H<sub>low</sub> and can be retrieved with the MAXIMUM() operation.  
While processing an element with an even index, H<sub>low</sub> contains immediately after insertion a larger number of elements than it should, so both heaps should be rebalanced by extracting the maximum element from H<sub>low</sub> with REMOVE-MAX() and inserting it in H<sub>high</sub> with INSERT(). After the sequence of operations is complete, H<sub>low</sub> and H<sub>high</sub> will both contain the same number of elements k/2. The current median will still be at the top of H<sub>low</sub> and can be retrieved with the MAXIMUM() operation.  
 
<font size=-1>
while there is an x element:
  if x has an odd index
    H<sub>low</sub>.INSERT(x)
    record running median as H<sub>low</sub>.MAXIMUM()
  else <font color=teal># x as an even index</font>
    H<sub>low</sub>.INSERT(x)
    H<sub>high</sub>.INSERT(H<sub>low</sub>.REMOVE-MAX()) <font color=teal># rebalance the heaps</font>
    record running median as H<sub>low</sub>.MAXIMUM()
</font>


More details: {{External|https://www.coursera.org/learn/algorithms-graphs-data-structures/lecture/iIzo8/heaps-operations-and-applications}}
More details: {{External|https://www.coursera.org/learn/algorithms-graphs-data-structures/lecture/iIzo8/heaps-operations-and-applications}}


=Playground Implementation=
=Playground Implementation=

Revision as of 23:13, 13 October 2021

Internal

Problem

Give a sequence of numbers x1, x2, .... xn, report at each step i the median number among the numbers seen so far. For a given k, the and assuming that x1, x2, .... xk are sorted in an ascending order, the median will be on (k + 1)/2 position if k is odd and on k/2 position is k is even.

Discussion

The problem can be resolved by repeatedly solving the selection problem on the set of numbers seen so far, but the selection problem has a running time of O(n), so repeating the selection algorithm for each number will have a running time of O(n2). The median maintenance problem is a canonical use case for a heap.

Solution

Use two heaps: a max heap Hlow that supports the REMOVE-MAX operation with O(1) running time and a min heap Hhigh that supports the REMOVE-MIN operation, also with O(1) running time.

The idea is to maintain the first smallest half elements in Hlow and the largest half of the elements in Hhigh.

Every time a new element arrives, it will be inserted in Hlow.

While processing an element with an odd index, Hlow contains (k + 1)/2 elements and Hhigh contains (k-1)/2 elements. The median, which according to the definition, is the (k + 1)/2 element in the sequence of sorted k elements, is at the top of the Hlow heap and can be retrieved with the MAXIMUM() operation.

While processing an element with an even index, Hlow contains immediately after insertion a larger number of elements than it should, so both heaps should be rebalanced by extracting the maximum element from Hlow with REMOVE-MAX() and inserting it in Hhigh with INSERT(). After the sequence of operations is complete, Hlow and Hhigh will both contain the same number of elements k/2. The current median will still be at the top of Hlow and can be retrieved with the MAXIMUM() operation.

while there is an x element:
  if x has an odd index
    Hlow.INSERT(x)
    record running median as Hlow.MAXIMUM()
  else # x as an even index
    Hlow.INSERT(x)
    Hhigh.INSERT(Hlow.REMOVE-MAX()) # rebalance the heaps
    record running median as Hlow.MAXIMUM()

More details:

https://www.coursera.org/learn/algorithms-graphs-data-structures/lecture/iIzo8/heaps-operations-and-applications

Playground Implementation