# Selection Problem

# Internal

# Overview

The i^{th} **order statistic** of a set of n numbers is the i^{th} smallest number in the set. Finding the i^{th} order statistic of a set of n **distinct** numbers (distinctness is for simplicity) is known as the **selection problem**.

Finding the median is a particular case of the selection problem, where i is n / 2. Finding the minimum in a sorted array is finding the 0 order statistics and finding the maximum in a sorted array is finding the (n - 1)^{th} order statistic.

The selection problem is exposed as the SELECT() operation of totally ordered sets.

The selection problem can be resolved generically by sorting the entire set and then selecting the desired element, by reducing the selection problem to the sorting problem. However, key comparison sorting cannot be done more efficiently than Ω(n log n), and more specialized and faster algorithms exist for the selection problem.

Fundamentally, selection is an easier problem than sorting, so if only a certain order statistic is required for an unsorted set, we **do not need to sort the set**.

The general selection problem can be resolved with a randomized divide-and-conquer algorithm with an expected running time of Θ(n). The algorithm is somewhat similar to the one used by randomized Quicksort. There is also a linear time algorithm for selection that does not use randomization; the idea is to use the pivot deterministically in a very careful way using a method called "median of medians".

TODO CLRS page 213.

# Randomized Selection

RSelect has small constants and works in place, and it has a time complexity of O(n) on average. The pseudocode is:

RSelect(array A, length n, order statistics i) { if n == 1 return A[0] choose pivot p from A, uniformly at random j = partition(A, p) # j is the order statistic that p is, use the partition subroutine for that if i == j return A[i] if j > i return RSelect(first part of A, length j - 1, i) if j < i return RSelect(second part of A, n - j, i - j) }

Also see:

## Randomized Selection Analysis

# Deterministic Selection

DSelect is a deterministic O(n) selection algorithm that uses the pivot very carefully using a method called the "median of medians".