Selection Problem: Difference between revisions
(5 intermediate revisions by the same user not shown) | |||
Line 8: | Line 8: | ||
=Overview= | =Overview= | ||
<span id='Order_Statistic'></span><span id='Order_Statistics'></span>The i<sup>th</sup> '''order statistic''' of a set of n numbers is the i<sup>th</sup> smallest number in the set. | <span id='Order_Statistic'></span><span id='Order_Statistics'></span>The i<sup>th</sup> '''order statistic''' of a set of n numbers is the i<sup>th</sup> smallest number in the set. Finding the i<sup>th</sup> order statistic of a set of n '''distinct''' numbers (distinctness is for simplicity) is known as the '''selection problem'''. | ||
Finding the median is a particular case of the selection problem, where i is n / 2. Finding the minimum in a sorted array is finding the 0 order statistics and finding the maximum in a sorted array is finding the (n - 1)<sup>th</sup> order statistic. | |||
Finding the median is a particular case of the selection problem, where i is n / 2. | |||
Finding the minimum in a sorted array is finding the 0 order statistics and finding the maximum in a sorted array is finding the (n - 1)<sup>th</sup> order statistic. | |||
The selection problem is exposed as the [[Data_Structures#SELECT|SELECT()]] operation of totally ordered sets. | The selection problem is exposed as the [[Data_Structures#SELECT|SELECT()]] operation of totally ordered sets. | ||
Line 32: | Line 28: | ||
if n == 1 return A[0] | if n == 1 return A[0] | ||
choose pivot p from A, uniformly at random | choose pivot p from A, uniformly at random | ||
j = partition(A, p) <font color=teal># j is the order statistic that p is</font> | j = partition(A, p) <font color=teal># j is the order statistic that p is, use the [[The_Partition_Subroutine|partition subroutine]] for that</font> | ||
if i == j return A[i] | if i == j return A[i] | ||
if j > i return RSelect(first part of A, length j - 1, i) | if j > i return RSelect(first part of A, length j - 1, i) | ||
Line 38: | Line 34: | ||
} | } | ||
</font> | </font> | ||
Also see: {{Internal|The_Partition_Subroutine#Overview|The Partition Subroutine}} | |||
==Randomized Selection Analysis== | ==Randomized Selection Analysis== | ||
Line 47: | Line 44: | ||
{{External|https://www.coursera.org/learn/algorithms-divide-conquer/lecture/vOjvG/deterministic-selection-analysis-ii-advanced-optional}} | {{External|https://www.coursera.org/learn/algorithms-divide-conquer/lecture/vOjvG/deterministic-selection-analysis-ii-advanced-optional}} | ||
DSelect is a deterministic O(n) selection algorithm that uses the pivot very carefully using a method called the "median of medians". | DSelect is a deterministic O(n) selection algorithm that uses the pivot very carefully using a method called the "median of medians". | ||
=Selection in Binary Search Trees= | |||
{{Internal|Binary_Search_Trees#SELECT_Implementation|SELECT Implementation for Binary Search Trees}} |
Latest revision as of 19:50, 13 October 2021
Internal
Overview
The ith order statistic of a set of n numbers is the ith smallest number in the set. Finding the ith order statistic of a set of n distinct numbers (distinctness is for simplicity) is known as the selection problem.
Finding the median is a particular case of the selection problem, where i is n / 2. Finding the minimum in a sorted array is finding the 0 order statistics and finding the maximum in a sorted array is finding the (n - 1)th order statistic.
The selection problem is exposed as the SELECT() operation of totally ordered sets.
The selection problem can be resolved generically by sorting the entire set and then selecting the desired element, by reducing the selection problem to the sorting problem. However, key comparison sorting cannot be done more efficiently than Ω(n log n), and more specialized and faster algorithms exist for the selection problem.
Fundamentally, selection is an easier problem than sorting, so if only a certain order statistic is required for an unsorted set, we do not need to sort the set.
The general selection problem can be resolved with a randomized divide-and-conquer algorithm with an expected running time of Θ(n). The algorithm is somewhat similar to the one used by randomized Quicksort. There is also a linear time algorithm for selection that does not use randomization; the idea is to use the pivot deterministically in a very careful way using a method called "median of medians".
TODO CLRS page 213.
Randomized Selection
RSelect has small constants and works in place, and it has a time complexity of O(n) on average. The pseudocode is:
RSelect(array A, length n, order statistics i) { if n == 1 return A[0] choose pivot p from A, uniformly at random j = partition(A, p) # j is the order statistic that p is, use the partition subroutine for that if i == j return A[i] if j > i return RSelect(first part of A, length j - 1, i) if j < i return RSelect(second part of A, n - j, i - j) }
Also see:
Randomized Selection Analysis
Deterministic Selection
DSelect is a deterministic O(n) selection algorithm that uses the pivot very carefully using a method called the "median of medians".