Chapter 5
Divide and Conquer: class of algorithmic techniques in which one breaks the input into several parts, solves the problem in each part recursively, and then combines the solutions to these subproblems into an overall solution
Recurrence Relation: bounds the running time recursively in terms of the running time on smaller instances
Divide and conquer strategy may reduce the running time to a lower polynomial from the brute-force polynomial time.
5.1 A First Recurrence: The Mergesort Algorithm
Mergesort: sorts a given list of numbers by first dividing them into two equal halves, sorting each half separately by recursion, and then combining the results of these recursive calls using the linear time algorithm for merging sorted lists
Base Case: when input has been reduced to size 2, T(n_ is equal to a constant when n is a constant.
For some constant c, T(n) ⇐ 2T(n/2) + cn when n >2, and T(2) ⇐ c
To gain an explicit bound, we need to solve the recurrence relation so that T appears only on the left-hand side of the inequality, not the right-hand side as well.
Approaches to Solving Recurrences
“unroll” the recursion, accounting for the running time across the first few levels, identifying a pattern that can be continues as the recursion expands. Sum up the running times over all levels → total running time
Start with a guess, substitute in the recurrence relation, check if it works; Use an argument by induction on n to formally justify this approach
Unrolling the MergeSort Recurrence
Analyzing the first few levels: single problem of size n
level 0: takes at most cn plus the time spent in all subsequent recursive calls
level 1: two problems of size n/2; each takes at most cn/2 time
level 3: four problems of size n/4; each takes at most cn/4 time
Identifying a pattern: level j: number of subproblems has doubled j times: 2^j; each ahs shrunk in size by a factor of two j times, and so each has size n/2^j; each level contributes at most 2^j(cn/2^j)=cn to the total running time.
Summing over all levels of recursion: we've found that the recurrence in (5.1) has the property that the same upper bound of cn applies to total amount of work performed at each level. Number of levels: logn. Total running time = O(nlogn)
Substituting a Solution into the Mergesort Recurrence
An Approach Using Partial Substitution
One guesses the overall form of the solution without pinning down the exact values of all the constants and other parameters at the outset.
Suppose we believe that T(n) = O(nlogn)
First write T(n) ⇐ knlogbn for some constant k and base b
Try out one level of the induction as follows
Chose 2 as the base to help with simplification to:
k must be at least as large as c, so:
Personal Thoughts
Using mergesort as the example was helpful, as I am pretty familiar with that algorithm at this point. We went over it in class which helped me follow along with the step-by-step process of coming up with an appropriate recurrence relation. Still though, this material is a little difficult for me and I know I'll need more practice with its application before I really understand it.
Readability: 6.0
Interesting: 6.0
5.2 Further Recurrence Relations
Divide-and-conquer algorithms that create recursive calls on q subproblems of size n/2 each and then combine in O(n) time
For some constant c, T(n) ⇐ qT(n/2) + cn when n>2, and T(2)⇐ c.
The Case of q>2 Subproblems
The Case of One Subproblem
Consider case of q=1
Unrolling the recurrence:
Analyzing the first few levels: first-single problem of size n that takes at most cn time, next-one problem of size n/2 which contributes to cn/2, next-one problem of size n/4 which contributes to cn/4
Identifying a pattern: at level j, size n/2^j and contributes to a cn/2^j running time
Summing over all levels of recursion: There are logn levels of recursion and the total amount of work performed: T(n) ⇐ 2cn = O(n)
Any function T(.) satisfying for some constant c, T(n) ⇐ qT(n/2) + cn when n>2, and T(2)⇐ c“ with q=1 is bounded by O(n)
Geometric series with a decaying exponent: fully half the work performed by the algorithm is being done at the top level of the recursion
The Effect of Parameter q: when q=1, the resulting running time is linear; when q=2, it's O(nlogn); when q>2, it's a polynomial bound with an exponent larger than 1 that grows with q
When q=1, the total running tie is dominated by the top level, whereas when q>2, it's dominated by the work done on constant-size subproblems at the bottom of the recursion
A Related Recurrence: T(n) ⇐2T(n/2)+O(n^2)
Divide the input into two pieces of equal size, solve the two subproblems on these pieces separately by recursion; and then combine the two results into an overall solution, spending quadratic time for the initial division and final recombining.
first reaction is to guess that the solution will be T(n)=O(n^2logn)
Unrolling:
Analyzing the first few levels: first-single problem of size n that takes at most cn^2 time plus the time spent in all subsequent recursive calls, next- we have two problems, each of size 2/2. Each takes at most c(n/2)^2 for a total of at most cn^2/2
Identifying a pattern: at level j, there are j^j subproblems each of size n/2^j → cn^2 / 2^j
Summing over all levels of recusion: we've arrived at almost the same sum that we had for q=1 in the previous recurrence: O(n^2)
Personal Thoughts
This section took a while to digest as there are mathematical equations at each step. I think I will have to refer back to this section and the examples until I get the hang of it, but for the most part the section laid out the recurrence relations in a very step-by-step manner that made it easier to follow.
Readability: 6.0
Interesting: 6.0
5.3 Counting Inversions
The Problem
Rankings: becoming important to a number of current applications
Collaborative Filtering: match your preferences with those of other people out on the Internest
Meta-Search Tools: execute the same query on many different search engines and then try to synthesize the results by looking for similarities and differences among the various rankings that the search engine return
Comparing Two Rankings:
Designing and Analyzing the Algorithm
Look at every pair of numbers (ai,aj) and determine whether they constitue an inversion → O(n^2)
Desired Time: O(nlogn)
Set m = [n/2] and divide the list into two pieces a1,…,am and am+1,…,an.
Count the number of inversions in each of these two halves separately
Then count the number of inversions where the two numbers belong to different halves → must do this in O(n) time
the pairs (ai,aj) where ai is in the first half, aj is in the second half and ai>aj
Recursively sort the numbers in the two halves
Merge-and-Count: left with two sorted lists A and B
produce a single sorted list C from their union, while counting the number of pairs(a,b) with an inversion.
Walk through the sorted lists A and B, removing elements from the front and appending them to the sorted list C. In a given step, we have a Current pointer into each list, showing our current position.
Every time ai is appended to C, no new inversions are encountered, since ai is smaller than everything left in list B, and it comes before all of them. If bj is appended, then there is an inversion so we increase our count of the number of inversions by the number of elements remaining in A. In constant time, we can account for all inversions.
Personal Thoughts
I thought this section provided a very interesting algorithm to count the number of inversions in O(nlogn) time. The combination of this section and the classroom discussion did a pretty good job of helping me understand this material. It also helped that we were introduced to inversions in another chapter/section. This made it so that the concept as a whole wasn't as difficult to understand. The algorithm and runtime both make sense to me.
Readability: 7.0
Interesting: 7.0