Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision |
| courses:cs211:winter2018:journals:melkersonr:chapter5 [2018/03/13 01:50] – [Section 5.3] melkersonr | courses:cs211:winter2018:journals:melkersonr:chapter5 [2018/03/13 01:52] (current) – [Section 5.3] melkersonr |
|---|
| |
| * **Summary & Motivations:** Section 5.3 is Counting Inversions. Enter the problem of comparing rankings. This problem arises in collaborative filtering, where an algorithm tries to match your preferences to someone else's preferences and make a recommendation, and in //meta-search tools//, where a query is run on multiple search engines and the results are synthesized (p 221-2). In the context of collaborative filtering, we can ask how many rankings of yours are out of order with with respect to another person's rankings. More concisely, we want to find a measure for how out of order a sequence of distinct numbers is. The metric will equal 0 only if a_1 < a_2 < ... < a_n for all n numbers. We can quantify the out-of-order-ness by counting the number of inversions in the ranking/list, where an inversion exists between indices i < j if a_i > a_j. | * **Summary & Motivations:** Section 5.3 is Counting Inversions. Enter the problem of comparing rankings. This problem arises in collaborative filtering, where an algorithm tries to match your preferences to someone else's preferences and make a recommendation, and in //meta-search tools//, where a query is run on multiple search engines and the results are synthesized (p 221-2). In the context of collaborative filtering, we can ask how many rankings of yours are out of order with with respect to another person's rankings. More concisely, we want to find a measure for how out of order a sequence of distinct numbers is. The metric will equal 0 only if a_1 < a_2 < ... < a_n for all n numbers. We can quantify the out-of-order-ness by counting the number of inversions in the ranking/list, where an inversion exists between indices i < j if a_i > a_j. |
| * **About the Algorithms:** The brute-force approach to this problem, determining if each and every pair is inverted, is O(n^2). We suspect we can do better, though. The algorithm uses the recursion relation of Mergesort by dividing the list into two, counting the number of inversions in each half, and counting the number of inversions between halves. We make the recursive step also sort the two halves so that the combination step is "easier" (p 223). We employ a Merge-and-Count algorithm, which assumes that we've already counted the inversions in each half of the list and sorted them. This is nearly merging two sorted lists into one, except that we're also counting inversions. That's easy enough. Merge-and-Count iterates through the two halves with pointers, appending the front element to the full sorted list of whichever half has a smaller front element. Each time we append to the full list, we ask if the smaller element comes from list B, and if so, we increase the inversion counter by the number of elements remaining in list A. If one of the lists becomes empty, we add the rest of the other list to the full list. We effectively count the number of inversions in constant time, which we can do because both lists are sorted. If the smaller element comes from list B, it is smaller than //everything// remaining in list A. We also know that we can simply append the remaining list when the other becomes empty because it is also already sorted and the list item added to the full list was smaller than its front item. The full algorithm is called Sort-and-Count, and it has a O(n log n) runtime. | * **About the Algorithms:** The brute-force approach to this problem, determining if each and every pair is inverted, is O(n^2). We suspect we can do better, though. The algorithm uses the recursion relation T(n) \leq 2T(n/2) + cn when n > 2 and T(2) \leq c (same as for Mergesort) by dividing the list into two, counting the number of inversions in each half, and counting the number of inversions between halves. We make the recursive step also sort the two halves so that the combination step is "easier" (p 223). We employ a Merge-and-Count algorithm, which assumes that we've already counted the inversions in each half of the list and sorted them. This is nearly merging two sorted lists into one, except that we're also counting inversions. That's easy enough. Merge-and-Count iterates through the two halves with pointers, appending the front element to the full sorted list of whichever half has a smaller front element. Each time we append to the full list, we ask if the smaller element comes from list B, and if so, we increase the inversion counter by the number of elements remaining in list A. If one of the lists becomes empty, we add the rest of the other list to the full list. We effectively count the number of inversions in constant time, which we can do because both lists are sorted. If the smaller element comes from list B, it is smaller than //everything// remaining in list A. We also know that we can simply append the remaining list when the other becomes empty because it is also already sorted and the last item added to the full list was smaller than its front item. The full algorithm is called Sort-and-Count, and it has an O(n log n) runtime. |
| * **My Questions:** I'm still a little confused about the Sort-and-Count algorithm, especially the "(r_A, A) = Sort-and-Count(A)" etc. notation. I suppose it's pattern matching. It's just hard to think about recursive algorithms without running through an example. | * **My Questions:** I'm still a little confused about the Sort-and-Count algorithm, especially the "(r_A, A) = Sort-and-Count(A)" etc. notation. I suppose it's pattern matching. It's just hard to think about recursive algorithms without running through an example. |
| * **Second Time Around:** I really liked Figure 5.4 for visualizing inversions with crossed lines. | * **Second Time Around:** I really liked Figure 5.4 for visualizing inversions with crossed lines. |