The Median Maintenance Problem: Difference between revisions

From NovaOrdis Knowledge Base
Jump to navigation Jump to search
 
(10 intermediate revisions by the same user not shown)
Line 12: Line 12:
The idea is to maintain the first smallest half elements in H<sub>low</sub> and the largest half of the elements in H<sub>high</sub>.
The idea is to maintain the first smallest half elements in H<sub>low</sub> and the largest half of the elements in H<sub>high</sub>.


Every time a new element arrives, it will be automatically inserted in H<sub>low</sub>.  
This is implemented by appropriately inserting each new element in the corresponding heap, and then rebalancing the heaps so H<sub>low</sub> and H<sub>high</sub> have the same number of elements, or H<sub>low</sub> has with at most 1 element more than H<sub>high</sub>. Rebalancing means extracting the top of the heap from the oversized heap and inserting the element in the other heap.


While processing an element with an odd index, H<sub>low</sub> contains (k + 1)/2 elements and H<sub>high</sub> contains (k-1)/2 elements. The median, which according to the definition, is the (k + 1)/2 element in the sequence of sorted k elements, is at the top of the H<sub>low</sub> heap and can be retrieved with the MAXIMUM() operation.  
Maintaining this load for both heaps makes possible to read the current median, at any moment, as the top of H<sub>low</sub>.


While processing an element with an even index, H<sub>low</sub> contains immediately after insertion a larger number of elements than it should, so both heaps should be rebalanced by extracting the maximum element from H<sub>low</sub> with REMOVE-MAX() and inserting it in H<sub>high</sub> with INSERT(). After the sequence of operations is complete, H<sub>low</sub> and H<sub>high</sub> will both contain the same number of elements k/2. The current median will still be at the top of H<sub>low</sub> and can be retrieved with the MAXIMUM() operation.
<font size=-1>
while each x:
  if H<sub>low</sub> is empty || x ≤ H<sub>low</sub>.MAXIMUM()
    H<sub>low</sub>.INSERT(x)
  else
    H<sub>high</sub>.INSERT(x)
  REBALANCE(H<sub>low</sub>, H<sub>high</sub>)
  record running median as H<sub>low</sub>.MAXIMUM()
</font>


<font size=-1>
<font size=-1>
  while there is an x element:
  REBALANCE(H<sub>low</sub>, H<sub>high</sub>) {
   H<sub>low</sub>.INSERT(x)
  <font color=teal># Because we're rebalancing at each step and we are maintaining the halves either equal,
   if x has an even odd index
  # or H<sub>low</sub> larger than H<sub>high</sub> with one element the difference in size can only be 2, 1, 0 or -1</font>
     <font color=teal># rebalance the heaps</font>
   difference = H<sub>low</sub>.SIZE() - H<sub>high</sub>.SIZE()
   if difference == 2
     <font color=teal># Shift H<sub>low</sub> maximum element to H<sub>high</sub></font>
     H<sub>high</sub>.INSERT(H<sub>low</sub>.REMOVE-MAX())
     H<sub>high</sub>.INSERT(H<sub>low</sub>.REMOVE-MAX())
   record running median as H<sub>low</sub>.MAXIMUM()
   else if difference == 1 || difference == 0
    <font color=teal># Balanced, nothing to do</font>
  else if difference == -1
    <font color=teal># Shift H<sub>high</sub> minimum element to H<sub>low</sub></font>
    H<sub>low</sub>.INSERT(H<sub>high</sub>.REMOVE-MIN())
  else
    <font color=teal># We're not supposed to be in this situation</font>
    ERROR()
}
</font>
</font>
More details: {{External|https://www.coursera.org/learn/algorithms-graphs-data-structures/lecture/iIzo8/heaps-operations-and-applications}}
More details: {{External|https://www.coursera.org/learn/algorithms-graphs-data-structures/lecture/iIzo8/heaps-operations-and-applications}}


=Playground Implementation=
=Playground Implementation=
{{External|https://github.com/ovidiuf/playground/blob/master/learning/stanford-algorithms-specialization/06-median-maintenance/src/main/java/playground/stanford/mm/MedianMaintenance.java}}

Latest revision as of 01:30, 14 October 2021

Internal

Problem

Give a sequence of numbers x1, x2, .... xn, report at each step i the median number among the numbers seen so far. For a given k, the and assuming that x1, x2, .... xk are sorted in an ascending order, the median will be on (k + 1)/2 position if k is odd and on k/2 position is k is even.

Discussion

The problem can be resolved by repeatedly solving the selection problem on the set of numbers seen so far, but the selection problem has a running time of O(n), so repeating the selection algorithm for each number will have a running time of O(n2). The median maintenance problem is a canonical use case for a heap.

Solution

Use two heaps: a max heap Hlow that supports the REMOVE-MAX operation with O(1) running time and a min heap Hhigh that supports the REMOVE-MIN operation, also with O(1) running time.

The idea is to maintain the first smallest half elements in Hlow and the largest half of the elements in Hhigh.

This is implemented by appropriately inserting each new element in the corresponding heap, and then rebalancing the heaps so Hlow and Hhigh have the same number of elements, or Hlow has with at most 1 element more than Hhigh. Rebalancing means extracting the top of the heap from the oversized heap and inserting the element in the other heap.

Maintaining this load for both heaps makes possible to read the current median, at any moment, as the top of Hlow.

while each x:
  if Hlow is empty || x ≤ Hlow.MAXIMUM()
    Hlow.INSERT(x)
  else
    Hhigh.INSERT(x)
  REBALANCE(Hlow, Hhigh)
  record running median as Hlow.MAXIMUM()

REBALANCE(Hlow, Hhigh) {
  # Because we're rebalancing at each step and we are maintaining the halves either equal, 
  # or Hlow larger than Hhigh with one element the difference in size can only be 2, 1, 0 or -1
  difference = Hlow.SIZE() - Hhigh.SIZE()
  if difference == 2 
    # Shift Hlow maximum element to Hhigh
    Hhigh.INSERT(Hlow.REMOVE-MAX())
  else if difference == 1 || difference == 0
    # Balanced, nothing to do
  else if difference == -1
    # Shift Hhigh minimum element to Hlow
    Hlow.INSERT(Hhigh.REMOVE-MIN())
  else
    # We're not supposed to be in this situation
    ERROR()
}

More details:

https://www.coursera.org/learn/algorithms-graphs-data-structures/lecture/iIzo8/heaps-operations-and-applications

Playground Implementation

https://github.com/ovidiuf/playground/blob/master/learning/stanford-algorithms-specialization/06-median-maintenance/src/main/java/playground/stanford/mm/MedianMaintenance.java