Selection Problem

From NovaOrdis Knowledge Base
Jump to navigation Jump to search

Internal

Overview

The ith order statistic of a set of n numbers is the ith smallest number in the set. Finding the ith order statistic of a set of n distinct numbers (distinctness is for simplicity) is known as the selection problem.

Finding the median is a particular case of the selection problem, where i is n / 2. Finding the minimum in a sorted array is finding the 0 order statistics and finding the maximum in a sorted array is finding the (n - 1)th order statistic.

The selection problem is exposed as the SELECT() operation of totally ordered sets.

The selection problem can be resolved generically by sorting the entire set and then selecting the desired element, by reducing the selection problem to the sorting problem. However, key comparison sorting cannot be done more efficiently than Ω(n log n), and more specialized and faster algorithms exist for the selection problem.

Fundamentally, selection is an easier problem than sorting, so if only a certain order statistic is required for an unsorted set, we do not need to sort the set.

The general selection problem can be resolved with a randomized divide-and-conquer algorithm with an expected running time of Θ(n). The algorithm is somewhat similar to the one used by randomized Quicksort. There is also a linear time algorithm for selection that does not use randomization; the idea is to use the pivot deterministically in a very careful way using a method called "median of medians".

TODO CLRS page 213.

Randomized Selection

RSelect has small constants and works in place, and it has a time complexity of O(n) on average. The pseudocode is:

RSelect(array A, length n, order statistics i) {
  if n == 1 return A[0]
  choose pivot p from A, uniformly at random
  j = partition(A, p) # j is the order statistic that p is, use the partition subroutine for that
  if i == j return A[i]
  if j > i return RSelect(first part of A, length j - 1, i)
  if j < i return RSelect(second part of A, n - j, i - j)
}

Also see:

The Partition Subroutine

Randomized Selection Analysis

https://www.coursera.org/learn/algorithms-divide-conquer/lecture/obhKq/randomized-selection-analysis

Deterministic Selection

https://www.coursera.org/learn/algorithms-divide-conquer/lecture/vtehr/deterministic-selection-algorithm-advanced-optional
https://www.coursera.org/learn/algorithms-divide-conquer/lecture/2wmHr/deterministic-selection-analysis-i-advanced-optional
https://www.coursera.org/learn/algorithms-divide-conquer/lecture/vOjvG/deterministic-selection-analysis-ii-advanced-optional

DSelect is a deterministic O(n) selection algorithm that uses the pivot very carefully using a method called the "median of medians".

Selection in Binary Search Trees

SELECT Implementation for Binary Search Trees