Inversions in an Array: Difference between revisions

From NovaOrdis Knowledge Base
Jump to navigation Jump to search
Line 18: Line 18:
* Right inversions: all inversions between the elements of the right half of the array. They are pair of indices (i, j) that correspond to an inversion where i, j > n/2.
* Right inversions: all inversions between the elements of the right half of the array. They are pair of indices (i, j) that correspond to an inversion where i, j > n/2.
* Split inversions: all inversions where the first element is in the left half of the array and the second element is in the right half of the array: i ≤ n/2 < j.
* Split inversions: all inversions where the first element is in the left half of the array and the second element is in the right half of the array: i ≤ n/2 < j.
The key idea #2 is to piggyback on [[Merge_Sort|merge sort]] and sort the sub-arrays, then to a linear scan over the both arrays and find inversions.
The key idea #2 is to piggyback on [[Merge_Sort|merge sort]] and also '''sort''' the sub-arrays while counting inversions. In the [[Algorithms#Combine|combine]] phase of the divide and conquer routine, we sort and we count the inversions across sorted sub-arrays in a linear scan.


=Playground=
=Playground=

Revision as of 18:12, 20 September 2021

Internal

Problem

Given an array containing n numbers, in an arbitrary order, find all inversions, where an inversion is defined as a pair (i, j) of array elements where i < j and A[i] > A[j]. Note that i and j need not be adjacent (in this case the inversion is called "out of order").

This problem is interesting because it provides a "numerical similarity" measure that quantifies how close two ranked lists are to each other. If two friends rank the same ten movies to least favorite to most favorite. Computing the number of inversions between these two arrays gives a measure of "dissimilarity" between the preference in movies: more inversion, more dissimilar the preferences.

The result can be used in "collaborative filtering".

Algorithm

The brute force algorithm has an O(n2) complexity.

A better method is using divide and conquer.

The key idea #1 behind the algorithm is to categorize the inversions in the array in three categories:

  • Left inversions: all inversions between the elements in the left half of the array. They are pair of indices (i, j) that correspond to an inversion where i, j ≤ n/2.
  • Right inversions: all inversions between the elements of the right half of the array. They are pair of indices (i, j) that correspond to an inversion where i, j > n/2.
  • Split inversions: all inversions where the first element is in the left half of the array and the second element is in the right half of the array: i ≤ n/2 < j.

The key idea #2 is to piggyback on merge sort and also sort the sub-arrays while counting inversions. In the combine phase of the divide and conquer routine, we sort and we count the inversions across sorted sub-arrays in a linear scan.

Playground

TODO

  • problem statement
  • algorithm
  • complexity with Master Method


[Next]