Heap: Difference between revisions

From NovaOrdis Knowledge Base
Jump to navigation Jump to search
Line 38: Line 38:


Also see: {{Internal|Data_Structures#INSERT|Data Structures &#124; <tt>INSERT</tt>}}
Also see: {{Internal|Data_Structures#INSERT|Data Structures &#124; <tt>INSERT</tt>}}
===<tt>INSERT</tt> Implementation===
When a new key is presented, place it on the first available position at the end of the array. We need to do this to maintain the tree as balanced as possible.
If the new key is larger or equal to its parent, that's all.
If the new key is smaller than its parent, leaving it on the current position would violate the [[#The_Heap_Property|heap property]], so we need to work to restore the invariant. We do this by '''bubbling it up'''  through the array: repeatedly comparing the key with its parent, and swapping their positions, until we find a parent that is smaller than the key. It the worst case, we need to advance all the way to the root of the tree and swap that. Since the tree is balanced, in the worst case we need to perform log n steps.


==<tt>REMOVE-MIN</tt>==
==<tt>REMOVE-MIN</tt>==

Revision as of 20:12, 12 October 2021

External

Internal

Overview

This article is about binary heaps. A binary heap data structure is an array where data is placed to form a complete binary tree, plus the index of the last node in the heap. Each element of the array contains a pointer to the actual tree nodes. Each node contains at least a key. The heap works by comparing key and placing the pointer of the associated nodes in the right position in the array. Duplicate key values are supported. For more details on representation in memory, see the Representation in Memory section below.

The heap data structure maintains an invariant, which is called the heap property.

Another name used for heap is priority queue.

More details CLRS page 151, page 1177.

The Min Heap Property


For every node x, key[x] ≤ all keys of its children.

In case of a max heap, the invariant inequality is reversed.

Canonical Uses of a Heap

The primary reason to use a heap is if you notice that your program does repeated minimum (maximum) computations, usually via exhaustive search. The heap naturally maintains the minimum (maximum) in reach of an O(1) lookup, while insertions and deletions are O(log n).

Also see:

Supported Operations

INSERT

Insert a node in the tree. The running time is O(log n) where n is the number of nodes. For implementation, see INSERT Implementation below.

Also see:

Data Structures | INSERT

INSERT Implementation

When a new key is presented, place it on the first available position at the end of the array. We need to do this to maintain the tree as balanced as possible.

If the new key is larger or equal to its parent, that's all.

If the new key is smaller than its parent, leaving it on the current position would violate the heap property, so we need to work to restore the invariant. We do this by bubbling it up through the array: repeatedly comparing the key with its parent, and swapping their positions, until we find a parent that is smaller than the key. It the worst case, we need to advance all the way to the root of the tree and swap that. Since the tree is balanced, in the worst case we need to perform log n steps.

REMOVE-MIN

Remove the node from the heap with minimum key value. The running time is O(log n) where n is the number of nodes. For implementation see REMOVE-MIN Implementation below.

Also see:

Data Structures | DELETE

Min Heap and Max Heap

A heap is constructed in such a way that it maintains a minimum value or a maximum value, but not both. A heap that maintains the minimum value is called a min heap and supports the REMOVE-MIN operation. A heap that maintains the maximum value is called a max heap and supports the REMOVE-MAX operation. If both extracting minimum and maximum are required, use a binary search tree instead.

HEAPIFY

Initialize the heap in linear time O(n). Using INSERT would take O(n log n), but HEAPIFY does it in O(n) when invoked on a batch of n elements.

DELETE

Delete an arbitrary node in the middle of the heap in O(log n) time. This is useful when implementing Dijkstra's Algorithm

Also see:

Data Structures | DELETE

Representation in Memory

A heap has two conceptual representations: one is a rooted binary tree, as complete as possible, and the other one is an array.

Heap Representation.png

For each node i, the left children is maintained at 2i+1 and the right children is maintained at 2i + 2.

Conversely, for a node i, the parent is maintained at ⌊(i - 1)/2⌋.

The advantage of the heap array representation is that rather than traversing pointers, it is very fast to just go from a node to its parent or to either one of its children by doing index arithmetic with bit shifting. Also, we don't need the overhead of storing references.

The fact that the binary tree is as complete as possible (the nodes are added layer by layer, and left to right) is essential for the heap array representation.

Implementation

REMOVE-MIN Implementation

Remove the root and return it, but not before fixing the tree. The tree is fixed as follows:

  • take the latest node added as leaf and promoted it to root.
  • this may break the heap property - if the new key is larger than any of its children.
    • If this happens does, swap the key with the smallest child and repeat the procedure by bubbling the key down through the tree until there is no more heap property violation.