The Median Maintenance Problem: Difference between revisions

From NovaOrdis Knowledge Base
Jump to navigation Jump to search
Line 8: Line 8:


=Solution=
=Solution=
Use two heaps: a [[Heap#Min_Heap_and_Max_Heap|max heap]] H<sub>low</sub> that supports the <tt>[[Heap#REMOVE-MAX|REMOVE-MAX]]</tt> operation and a [[Heap#Min_Heap_and_Max_Heap|min heap]] H<sub>high</sub> that supports the <tt>[[Heap#REMOVE-MIN|REMOVE-MIN]]</tt> operation.  
Use two heaps: a [[Heap#Min_Heap_and_Max_Heap|max heap]] H<sub>low</sub> that supports the <tt>[[Heap#REMOVE-MAX|REMOVE-MAX]]</tt> operation with O(1) running time and a [[Heap#Min_Heap_and_Max_Heap|min heap]] H<sub>high</sub> that supports the <tt>[[Heap#REMOVE-MIN|REMOVE-MIN]]</tt> operation, also with O(1) running time.  


Maintain the first smallest half elements in H<sub>low</sub> and the largest half of the elements in H<sub>high</sub>. When a new element comes, if its smaller than the current median insert it in H<sub>low</sub>, if it is higher than the current median, insert it in H<sub>high</sub>. Keep track of heap sizes and rebalance when necessary, so the total number of elements is evenly split among the heaps.
The idea is to maintain the first smallest half elements in H<sub>low</sub> and the largest half of the elements in H<sub>high</sub>.
 
Every time a new element arrives, it will be inserted in H<sub>low</sub>.
 
While processing an odd element, H<sub>low</sub> contains (k + 1)/2 elements and H<sub>high</sub> contains (k-1)/2 elements. The median, which according to the definition, is the (k + 1)/2<sup>th</sup> element, is at the top of the H<sub>low</sub> heap and can be retrieved with the MAXIMUM() operation.
 
While processing an even element, H<sub>low</sub> contains immediately after insertion a larger number of elements than it should, so both heaps should be rebalanced by extracting the maximum element from H<sub>low</sub> with REMOVE-MAX() and inserting it in H<sub>high</sub> with INSERT(). After the sequence of operations is complete, H<sub>low</sub> will contain (k + 1)/2 elements and H<sub>high</sub> (k-1)/2 elements - the same number of elements. The current median will still be at the top of H<sub>low</sub> and can be retrieved with the MAXIMUM() operation.  


More details: {{External|https://www.coursera.org/learn/algorithms-graphs-data-structures/lecture/iIzo8/heaps-operations-and-applications}}
More details: {{External|https://www.coursera.org/learn/algorithms-graphs-data-structures/lecture/iIzo8/heaps-operations-and-applications}}
=Playground Implementation=
=Playground Implementation=

Revision as of 23:05, 13 October 2021

Internal

Problem

Give a sequence of numbers x1, x2, .... xn, report at each step i the median number among the numbers seen so far. For a given k, the and assuming that x1, x2, .... xk are sorted in an ascending order, the median will be on (k + 1)/2 position if k is odd and on k/2 position is k is even.

Discussion

The problem can be resolved by repeatedly solving the selection problem on the set of numbers seen so far, but the selection problem has a running time of O(n), so repeating the selection algorithm for each number will have a running time of O(n2). The median maintenance problem is a canonical use case for a heap.

Solution

Use two heaps: a max heap Hlow that supports the REMOVE-MAX operation with O(1) running time and a min heap Hhigh that supports the REMOVE-MIN operation, also with O(1) running time.

The idea is to maintain the first smallest half elements in Hlow and the largest half of the elements in Hhigh.

Every time a new element arrives, it will be inserted in Hlow.

While processing an odd element, Hlow contains (k + 1)/2 elements and Hhigh contains (k-1)/2 elements. The median, which according to the definition, is the (k + 1)/2th element, is at the top of the Hlow heap and can be retrieved with the MAXIMUM() operation.

While processing an even element, Hlow contains immediately after insertion a larger number of elements than it should, so both heaps should be rebalanced by extracting the maximum element from Hlow with REMOVE-MAX() and inserting it in Hhigh with INSERT(). After the sequence of operations is complete, Hlow will contain (k + 1)/2 elements and Hhigh (k-1)/2 elements - the same number of elements. The current median will still be at the top of Hlow and can be retrieved with the MAXIMUM() operation.

More details:

https://www.coursera.org/learn/algorithms-graphs-data-structures/lecture/iIzo8/heaps-operations-and-applications

Playground Implementation