Inversions in an Array

From NovaOrdis Knowledge Base
Jump to navigation Jump to search

External

Internal

Problem

Given an array containing n numbers, in an arbitrary order, find all inversions, where an inversion is defined as a pair (i, j) of array elements where i < j and A[i] > A[j]. Note that i and j need not be adjacent (in this case the inversion is called "out of order").

This problem is interesting because it provides a "numerical similarity" measure that quantifies how close two ranked lists are to each other. If two friends rank the same ten movies to least favorite to most favorite. Computing the number of inversions between these two arrays gives a measure of "dissimilarity" between the preference in movies: more inversion, more dissimilar the preferences.

The result can be used in "collaborative filtering".

Algorithm

The brute force algorithm has an O(n2) complexity.

A better method is using divide and conquer.

First key idea behind the algorithm is to categorize the inversions in the array in three categories:

  • Left inversions: all inversions between the elements in the left half of the array. They are pair of indices (i, j) that correspond to an inversion where i, j ≤ n/2.
  • Right inversions: all inversions between the elements of the right half of the array. They are pair of indices (i, j) that correspond to an inversion where i, j > n/2.
  • Split inversions: all inversions where the first element is in the left half of the array and the second element is in the right half of the array: i ≤ n/2 < j.

The second key idea is to piggyback on merge sort and also sort the sub-arrays while counting inversions. In the combine phase of the divide and conquer routine, we merge the already sorted sub-arrays in the same way we do it for merge sort, but we also we count inversions in an additional variable: when we find an element of the left array that is larger that the current element of the right array, we found inversions: all remaining elements of the left array, including the current one, are inversions and we increment the inversion counter. This "merge and count inversion" is a O(n).

sortAndCountInversions(A, 0, n-1) {

  // return the number of left inversions and sorted left subarray
  int leftInversions, B = sortAndCountInversions(A, 0, n/2) 
  // return the number of right inversions and sorted right subarray
  int rightInversions, C = sortAndCountInversions(A, n/2, n)

  int splitInversions = 0
  // scan B and C from left to right and place the element in order in D while counting inversions
  int i = 0 // current element in B
  int j = 0 // current element in C
  int k = 0 // current element in merged array D
  D = new int[B.length + C.length]
  // scan B and C merging into D and also counting inversions
  while(k < D.length) {
    if (i < B.length && (j == C.length || B[i] <= C[j])) {
        D[k] = B[i]
        i ++;
    }
    else {
        // all elements from i to the end of the B array are inversions
        // if we went over the edge, then B.length - i is 0 so the edge case is handled
        splitInversions += (B.length - i)
        D[k] = C[j]
        j ++
    }
    k ++
  }

  return (leftInversions + rightInversions + splitInversions, D)
}

Complexity

This is a case where we apply Master Method.

The number of sub-problems (a) is 2. Each sub-problem processes half of the current problem's elements, so b = 2. The "combine" phase runs in linear time O(n1) so d is 1.

a/bd = 2/21 = 1 so the algorithm complexity is O(n1 log n) = O(n log n).

Playground

https://github.com/ovidiuf/playground/tree/master/learning/stanford-algorithms-specialization/02-array-inversions


[Next]