Inversions in an Array: Difference between revisions
(26 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
=External= | |||
* https://www.coursera.org/learn/algorithms-divide-conquer/lecture/IUiUk/o-n-log-n-algorithm-for-counting-inversions-ii | |||
=Internal= | =Internal= | ||
* [[ | * [[Recursive_Algorithms#Divide_and_Conquer|Divide and Conquer Algorithms]] | ||
=Problem= | =Problem= | ||
Given an array containing n numbers, in an arbitrary order, find all '''inversions''', where an inversion is defined as a pair (i, j) of array elements where i < j and A[i] > A[j]. Note that i and j need not be adjacent (in this case the inversion is called "out of order"). | Given an array containing n numbers, in an arbitrary order, find all '''inversions''', where an inversion is defined as a pair (i, j) of array elements where i < j and A[i] > A[j]. Note that i and j need not be adjacent (in this case the inversion is called "out of order"). | ||
Line 14: | Line 17: | ||
A better method is using [[Algorithms#Divide_and_Conquer|divide and conquer]]. | A better method is using [[Algorithms#Divide_and_Conquer|divide and conquer]]. | ||
= | First key idea behind the algorithm is to categorize the inversions in the array in three categories: | ||
* Left inversions: all inversions between the elements in the left half of the array. They are pair of indices (i, j) that correspond to an inversion where i, j ≤ n/2. | |||
* Right inversions: all inversions between the elements of the right half of the array. They are pair of indices (i, j) that correspond to an inversion where i, j > n/2. | |||
* Split inversions: all inversions where the first element is in the left half of the array and the second element is in the right half of the array: i ≤ n/2 < j. | |||
The second key idea is to piggyback on [[Merge_Sort|merge sort]] and also '''sort''' the sub-arrays while counting inversions. In the [[Algorithms#Combine|combine]] phase of the divide and conquer routine, we merge the already sorted sub-arrays in the same way we do it for merge sort, but we also we count inversions in an additional variable: when we find an element of the left array that is larger that the current element of the right array, we found inversions: all remaining elements of the left array, including the current one, are inversions and we increment the inversion counter. This "merge and count inversion" is a O(n). | |||
<font size='-1'> | |||
sortAndCountInversions(A, 0, n-1) { | |||
<font color=teal>// return the number of left inversions and sorted left subarray</font> | |||
int leftInversions, B = sortAndCountInversions(A, 0, n/2) | |||
<font color=teal>// return the number of right inversions and sorted right subarray</font> | |||
int rightInversions, C = sortAndCountInversions(A, n/2, n) | |||
int splitInversions = 0 | |||
<font color=teal>// scan B and C from left to right and place the element in order in D while counting inversions</font> | |||
int i = 0 <font color=teal>// current element in B</font> | |||
int j = 0 <font color=teal>// current element in C</font> | |||
int k = 0 <font color=teal>// current element in merged array D</font> | |||
D = new int[B.length + C.length] | |||
<font color=teal>// scan B and C merging into D and also counting inversions</font> | |||
while(k < D.length) { | |||
if (i < B.length && (j == C.length || B[i] <= C[j])) { | |||
D[k] = B[i] | |||
i ++; | |||
} | |||
else { | |||
<font color=teal>// all elements from i to the end of the B array are inversions</font> | |||
<font color=teal>// if we went over the edge, then B.length - i is 0 so the edge case is handled</font> | |||
splitInversions += (B.length - i) | |||
D[k] = C[j] | |||
j ++ | |||
} | |||
k ++ | |||
} | |||
return (leftInversions + rightInversions + splitInversions, D) | |||
} | |||
</font> | |||
=Complexity= | |||
= | This is a case where we apply [[Master_Method|Master Method]]. | ||
< | |||
The number of sub-problems (a) is 2. Each sub-problem processes half of the current problem's elements, so b = 2. The "combine" phase runs in linear time O(n<sup>1</sup>) so d is 1. | |||
a/b<sup>d</sup> = 2/2<sup>1</sup> = 1 so the algorithm complexity is O(n<sup>1</sup> log n) = O(n log n). | |||
</ | |||
=Playground= | |||
{{External|https://github.com/ovidiuf/playground/tree/master/learning/stanford-algorithms-specialization/02-array-inversions}} | |||
<br> | <br> | ||
<center>[[[Algorithms#b3slp3|Next]]]</center> | <center>[[[Algorithms#b3slp3|Next]]]</center> |
Latest revision as of 19:12, 9 November 2021
External
Internal
Problem
Given an array containing n numbers, in an arbitrary order, find all inversions, where an inversion is defined as a pair (i, j) of array elements where i < j and A[i] > A[j]. Note that i and j need not be adjacent (in this case the inversion is called "out of order").
This problem is interesting because it provides a "numerical similarity" measure that quantifies how close two ranked lists are to each other. If two friends rank the same ten movies to least favorite to most favorite. Computing the number of inversions between these two arrays gives a measure of "dissimilarity" between the preference in movies: more inversion, more dissimilar the preferences.
The result can be used in "collaborative filtering".
Algorithm
The brute force algorithm has an O(n2) complexity.
A better method is using divide and conquer.
First key idea behind the algorithm is to categorize the inversions in the array in three categories:
- Left inversions: all inversions between the elements in the left half of the array. They are pair of indices (i, j) that correspond to an inversion where i, j ≤ n/2.
- Right inversions: all inversions between the elements of the right half of the array. They are pair of indices (i, j) that correspond to an inversion where i, j > n/2.
- Split inversions: all inversions where the first element is in the left half of the array and the second element is in the right half of the array: i ≤ n/2 < j.
The second key idea is to piggyback on merge sort and also sort the sub-arrays while counting inversions. In the combine phase of the divide and conquer routine, we merge the already sorted sub-arrays in the same way we do it for merge sort, but we also we count inversions in an additional variable: when we find an element of the left array that is larger that the current element of the right array, we found inversions: all remaining elements of the left array, including the current one, are inversions and we increment the inversion counter. This "merge and count inversion" is a O(n).
sortAndCountInversions(A, 0, n-1) { // return the number of left inversions and sorted left subarray int leftInversions, B = sortAndCountInversions(A, 0, n/2) // return the number of right inversions and sorted right subarray int rightInversions, C = sortAndCountInversions(A, n/2, n) int splitInversions = 0 // scan B and C from left to right and place the element in order in D while counting inversions int i = 0 // current element in B int j = 0 // current element in C int k = 0 // current element in merged array D D = new int[B.length + C.length] // scan B and C merging into D and also counting inversions while(k < D.length) { if (i < B.length && (j == C.length || B[i] <= C[j])) { D[k] = B[i] i ++; } else { // all elements from i to the end of the B array are inversions // if we went over the edge, then B.length - i is 0 so the edge case is handled splitInversions += (B.length - i) D[k] = C[j] j ++ } k ++ } return (leftInversions + rightInversions + splitInversions, D) }
Complexity
This is a case where we apply Master Method.
The number of sub-problems (a) is 2. Each sub-problem processes half of the current problem's elements, so b = 2. The "combine" phase runs in linear time O(n1) so d is 1.
a/bd = 2/21 = 1 so the algorithm complexity is O(n1 log n) = O(n log n).
Playground