Selection Problem
Internal
Overview
The ith order statistic of a set of n numbers is the ith smallest number in the set.
Finding the ith order statistic of a set of n distinct numbers (distinctness is for simplicity) is known as the selection problem. Finding the median is a particular case of the selection problem. The selection problem can be resolved generically by sorting the entire set and then selecting the desired element, by reducing the selection problem to the sorting problem. However, key comparison sorting cannot be done more efficiently than Ω(n log n), and more specialized and faster algorithms exist for the selection problem. Fundamentally, selection is an easier problem than sorting.
The general selection problem can be resolved with a randomized divide-and-conquer algorithm with an expected running time of Θ(n). The algorithm is somewhat similar to the one used by randomized Quicksort. There is also a linear time algorithm for selection that does not use randomization; the idea is to use the pivot deterministically in a very careful way using a method called "median of medians".
TODO CLRS page 213.
Randomized Selection
RSelect has small constants and works in place, and it has a time complexity of O(n) on average. The pseudocode is:
RSelect(array A, length n, order statistics i) { if n == 1 return A[0] choose pivot p from A, uniformly at random j = partition(A, p) # j is the order statistic that p is if i == j return A[i] if j > i return RSelect(first part of A, length j - 1, i) if j < i return RSelect(second part of A, n - j, i - j) }
Randomized Selection Analysis
Deterministic Selection
DSelect is a deterministic O(n) selection algorithm that uses the pivot very carefully using a method called the "median of medians".