Inversions in an Array: Difference between revisions

From NovaOrdis Knowledge Base
Jump to navigation Jump to search
Line 23: Line 23:
The key idea #2 is to piggyback on [[Merge_Sort|merge sort]] and also '''sort''' the sub-arrays while counting inversions. In the [[Algorithms#Combine|combine]] phase of the divide and conquer routine, we merge the already sorted sub-arrays in the same way we do it for merge sort, but we also we count inversions in an additional variable: when we find an element of the left array that is larger that the current element of the right array, we found inversions: all remaining elements of the left array, including the current one, are inversions and we increment the inversion counter. This "merge and count inversion" is a O(n).
The key idea #2 is to piggyback on [[Merge_Sort|merge sort]] and also '''sort''' the sub-arrays while counting inversions. In the [[Algorithms#Combine|combine]] phase of the divide and conquer routine, we merge the already sorted sub-arrays in the same way we do it for merge sort, but we also we count inversions in an additional variable: when we find an element of the left array that is larger that the current element of the right array, we found inversions: all remaining elements of the left array, including the current one, are inversions and we increment the inversion counter. This "merge and count inversion" is a O(n).
<font size='-1'>
<font size='-1'>
  sortAndCountInversions(A, n) {
  sortAndCountInversions(A, 0, n-1) {
  <font color=teal>// return the number of left inversions and sorted left subarray</font>
  int leftInversions, B = sortAndCountInversions(A, 0, n/2)
  int rightInversions, C = sortAndCountInversions(A, n/2, n)
   
   
      
      

Revision as of 18:32, 20 September 2021

External

Internal

Problem

Given an array containing n numbers, in an arbitrary order, find all inversions, where an inversion is defined as a pair (i, j) of array elements where i < j and A[i] > A[j]. Note that i and j need not be adjacent (in this case the inversion is called "out of order").

This problem is interesting because it provides a "numerical similarity" measure that quantifies how close two ranked lists are to each other. If two friends rank the same ten movies to least favorite to most favorite. Computing the number of inversions between these two arrays gives a measure of "dissimilarity" between the preference in movies: more inversion, more dissimilar the preferences.

The result can be used in "collaborative filtering".

Algorithm

The brute force algorithm has an O(n2) complexity.

A better method is using divide and conquer.

The key idea #1 behind the algorithm is to categorize the inversions in the array in three categories:

  • Left inversions: all inversions between the elements in the left half of the array. They are pair of indices (i, j) that correspond to an inversion where i, j ≤ n/2.
  • Right inversions: all inversions between the elements of the right half of the array. They are pair of indices (i, j) that correspond to an inversion where i, j > n/2.
  • Split inversions: all inversions where the first element is in the left half of the array and the second element is in the right half of the array: i ≤ n/2 < j.

The key idea #2 is to piggyback on merge sort and also sort the sub-arrays while counting inversions. In the combine phase of the divide and conquer routine, we merge the already sorted sub-arrays in the same way we do it for merge sort, but we also we count inversions in an additional variable: when we find an element of the left array that is larger that the current element of the right array, we found inversions: all remaining elements of the left array, including the current one, are inversions and we increment the inversion counter. This "merge and count inversion" is a O(n).

sortAndCountInversions(A, 0, n-1) {
  // return the number of left inversions and sorted left subarray
  int leftInversions, B = sortAndCountInversions(A, 0, n/2) 
  int rightInversions, C = sortAndCountInversions(A, n/2, n)

   


}

Playground

TODO

  • problem statement
  • algorithm
  • complexity with Master Method


[Next]