Table of Contents
Chapter 5.3: Counting Inversions
Consider a problem that arises in the analysis of rankings. For example, many sites use a technique known as collaborative filtering, in which they try to match your preferences with those of other people out on the internet. The main issue in these types of applications is that of comparing to rankings. If you rank a set of n movies and then a collaborative filtering system consults its database to look for other people who had similar rankings, what is a good way to measure how similar two people’s rankings are? A natural way would be to label the movies from 1 to n according to your ranking and the stranger’s ranking and then see how many pairs are out of order. In other words, we’d count the number of inversions. Two indices i<j form an inversion if a_i>a_j.
The Algorithm
If we were to look at every pair of numbers (a_i,a_j) and determine whether they are inverted, it would take O(n^2) time. However, there is an algorithm that runs in O(nlogn) time. It will count the number of inversions between two halves of the information and also recursively sort the numbers in the two halves as well. We will use the Merge-and-Count and Sort-and-Count algorithms.
Merge-and-Count
Suppose we have recursively sorted the first and second halves of the list and counted the inversions in each. So we have two sorted lists, A and B. Our goal is to produce a single sorted list, C, which is a union of A and B, while also counting the number of inversions between elements in both lists. In other words, an inversion in this case is when a in A, b in B, and a>b.
Merge-and-Count(A,B):
- i=0,j=0
- numInv=0
- C=[]
- while both lists are not empty:
- append min(a_i,b_j) to C
- if min=b_j, count += 1, j += 1
- else: i += 1
- end while
- once one list is empty, append the remainder of the other list to C
- return count and C
runs in O(n) time
Sort-and-Count
Sort-and-Count(L)
- if L has one element then there are no inversions
- else:
- divide L into two halves: A contains the first ceil(n/2) elements, B contains the remaining floor(n/2) elements
- (r_A, A) = Sort-and-Count(A)
- (r_B, B) = Sort-and-Count(B)
- (r, L) = Merge-and-Count(A,B)
- return r=r_A+r_B+r and L
runs in O(nlogn) time
I would rate this section an 8. I understand the algorithm, but it's hard to truly see what's going on without running through an example, since everything (particularly Sort-and-Count) is recursive.