Chapter 5

Chapter 5

Divide and Conquer: class of algorithmic techniques in which one breaks the input into several parts, solves the problem in each part recursively, and then combines the solutions to these subproblems into an overall solution
Recurrence Relation: bounds the running time recursively in terms of the running time on smaller instances
Divide and conquer strategy may reduce the running time to a lower polynomial from the brute-force polynomial time.

5.1 A First Recurrence: The Mergesort Algorithm

Mergesort: sorts a given list of numbers by first dividing them into two equal halves, sorting each half separately by recursion, and then combining the results of these recursive calls using the linear time algorithm for merging sorted lists
Base Case: when input has been reduced to size 2, T(n_ is equal to a constant when n is a constant.
- Stop the recursion and sort the two element
For some constant c, T(n) ⇐ 2T(n/2) + cn when n >2, and T(2) ⇐ c
To gain an explicit bound, we need to solve the recurrence relation so that T appears only on the left-hand side of the inequality, not the right-hand side as well.

Approaches to Solving Recurrences

“unroll” the recursion, accounting for the running time across the first few levels, identifying a pattern that can be continues as the recursion expands. Sum up the running times over all levels → total running time
Start with a guess, substitute in the recurrence relation, check if it works; Use an argument by induction on n to formally justify this approach

Unrolling the MergeSort Recurrence

Analyzing the first few levels: single problem of size n
- level 0: takes at most cn plus the time spent in all subsequent recursive calls
- level 1: two problems of size n/2; each takes at most cn/2 time
- level 3: four problems of size n/4; each takes at most cn/4 time
Identifying a pattern: level j: number of subproblems has doubled j times: 2^j; each ahs shrunk in size by a factor of two j times, and so each has size n/2^j; each level contributes at most 2^j(cn/2^j)=cn to the total running time.
Summing over all levels of recursion: we've found that the recurrence in (5.1) has the property that the same upper bound of cn applies to total amount of work performed at each level. Number of levels: logn. Total running time = O(nlogn)

Substituting a Solution into the Mergesort Recurrence

Belief: T(n) ⇐ cnlogn for all n>2
- want to check if this is true
Plug it into the recurrence:
- for n=2: true, since cnlogn = 2c
- By induction: T(m) ⇐ cmlogm, for all values of m less than n, and we want to establish this for T(n).
  - T(n/2) ⇐ c(n/2)log(n/2) → log(n/2) = logn -1

An Approach Using Partial Substitution

One guesses the overall form of the solution without pinning down the exact values of all the constants and other parameters at the outset.
Suppose we believe that T(n) = O(nlogn)
First write T(n) ⇐ knlogbn for some constant k and base b
Try out one level of the induction as follows
- T(n) ⇐ 2T(n/2) + cn ⇐ 2k(n/2)logb(n/2)+cn
Chose 2 as the base to help with simplification to:
- T(n) = (knlogn) - kn + cn
k must be at least as large as c, so:
- T(n) ⇐ (knlogn) - kn +cn ⇐ knlogn

Personal Thoughts

Using mergesort as the example was helpful, as I am pretty familiar with that algorithm at this point. We went over it in class which helped me follow along with the step-by-step process of coming up with an appropriate recurrence relation. Still though, this material is a little difficult for me and I know I'll need more practice with its application before I really understand it.

Readability: 6.0 Interesting: 6.0

5.2 Further Recurrence Relations

Divide-and-conquer algorithms that create recursive calls on q subproblems of size n/2 each and then combine in O(n) time
For some constant c, T(n) ⇐ qT(n/2) + cn when n>2, and T(2)⇐ c.

The Case of q>2 Subproblems

Unrolling in the case q>2:
- Analyzing the first few levels: show an example of this for the case q=3; first level of recursion, single problem of size n takes at most cn plus the time spent in subsequent recursive calls; next level, q problems, each of size n/2, each takes at most cn/2 time for a total of at most (q/2)cn, again plus the time in subsequent recursive calls…
- Identifying a pattern: at level j, there are q^j distinct instances, each of size n/2^j. Thus at level j = q^j(cn/2^j) = (q/2)^j * cn.
- Summing over all levels of recursion: logn levels of recursion → geometric sum. Any function T(.) satisfying “for some constant c, T(n) ⇐ qT(n/2) + cn when n>2, and T(2)⇐ c” with q>2 is bounded by O(n^logq)
Applying Partial Substitution:
- q>2 has the form T(n) ⇐ kn^d for k>0 and d>1.
  - T(n) <qT(n/2)+cn
  - T(n/2) = (q/2^d)kn^d + cn
    - choose d so that q/2^d = 1 → d=logq
    - get rid of the cn term → change the form of our guess for T(n) so as to explicitly subtract it off
    - Handle base case and choose k: choose k large enough so that the formula is a valid upper bound for the case n=2

The Case of One Subproblem

Consider case of q=1
Unrolling the recurrence:
- Analyzing the first few levels: first-single problem of size n that takes at most cn time, next-one problem of size n/2 which contributes to cn/2, next-one problem of size n/4 which contributes to cn/4
- Identifying a pattern: at level j, size n/2^j and contributes to a cn/2^j running time
- Summing over all levels of recursion: There are logn levels of recursion and the total amount of work performed: T(n) ⇐ 2cn = O(n)
Any function T(.) satisfying for some constant c, T(n) ⇐ qT(n/2) + cn when n>2, and T(2)⇐ c“ with q=1 is bounded by O(n)
Geometric series with a decaying exponent: fully half the work performed by the algorithm is being done at the top level of the recursion
The Effect of Parameter q: when q=1, the resulting running time is linear; when q=2, it's O(nlogn); when q>2, it's a polynomial bound with an exponent larger than 1 that grows with q
- When q=1, the total running tie is dominated by the top level, whereas when q>2, it's dominated by the work done on constant-size subproblems at the bottom of the recursion

A Related Recurrence: T(n) ⇐2T(n/2)+O(n^2)

Divide the input into two pieces of equal size, solve the two subproblems on these pieces separately by recursion; and then combine the two results into an overall solution, spending quadratic time for the initial division and final recombining.
- For some constant c, T(n) ⇐2T(n/2)+cn^2 when n>2 and T(2) ⇐c
first reaction is to guess that the solution will be T(n)=O(n^2logn)
- true but we can also show a stronger upper bound
Unrolling:
- Analyzing the first few levels: first-single problem of size n that takes at most cn^2 time plus the time spent in all subsequent recursive calls, next- we have two problems, each of size 2/2. Each takes at most c(n/2)^2 for a total of at most cn^2/2
- Identifying a pattern: at level j, there are j^j subproblems each of size n/2^j → cn^2 / 2^j
- Summing over all levels of recusion: we've arrived at almost the same sum that we had for q=1 in the previous recurrence: O(n^2)
  - Initial guess overestimated becasue of how quickly n^2 decreases as we replace it.

Personal Thoughts

This section took a while to digest as there are mathematical equations at each step. I think I will have to refer back to this section and the examples until I get the hang of it, but for the most part the section laid out the recurrence relations in a very step-by-step manner that made it easier to follow.

Readability: 6.0 Interesting: 6.0

5.3 Counting Inversions

The Problem

Rankings: becoming important to a number of current applications
Collaborative Filtering: match your preferences with those of other people out on the Internest
- can then recommend new things that those other people have liked
Meta-Search Tools: execute the same query on many different search engines and then try to synthesize the results by looking for similarities and differences among the various rankings that the search engine return
Comparing Two Rankings:
- Natural Method: label the movies from 1 to n, order these labels according to the stranger's ranking, see how many out of order pairs there are
  - Sequence of n numbers a1…an. Assume numbers are distinct; define a measure that tells us how far this list is from being in ascending order
  - Count the number of inversions: if ever pair forms an inversion, there are (n choose 2) inversions.

Designing and Analyzing the Algorithm

Look at every pair of numbers (ai,aj) and determine whether they constitue an inversion → O(n^2)
Desired Time: O(nlogn)
Set m = [n/2] and divide the list into two pieces a1,…,am and am+1,…,an.
Count the number of inversions in each of these two halves separately
Then count the number of inversions where the two numbers belong to different halves → must do this in O(n) time
- the pairs (ai,aj) where ai is in the first half, aj is in the second half and ai>aj
- Recursively sort the numbers in the two halves
- Merge-and-Count: left with two sorted lists A and B
  - produce a single sorted list C from their union, while counting the number of pairs(a,b) with an inversion.
  - Walk through the sorted lists A and B, removing elements from the front and appending them to the sorted list C. In a given step, we have a Current pointer into each list, showing our current position.
  - Every time ai is appended to C, no new inversions are encountered, since ai is smaller than everything left in list B, and it comes before all of them. If bj is appended, then there is an inversion so we increase our count of the number of inversions by the number of elements remaining in A. In constant time, we can account for all inversions.

The Sort-And-Count algorithm correctly sorts the input list and counts the number of inversions in O(nlogn) running time for a list with n elements

Personal Thoughts

I thought this section provided a very interesting algorithm to count the number of inversions in O(nlogn) time. The combination of this section and the classroom discussion did a pretty good job of helping me understand this material. It also helped that we were introduced to inversions in another chapter/section. This made it so that the concept as a whole wasn't as difficult to understand. The algorithm and runtime both make sense to me.

Readability: 7.0 Interesting: 7.0

Table of Contents

Chapter 5

5.1 A First Recurrence: The Mergesort Algorithm

Personal Thoughts

5.2 Further Recurrence Relations

Personal Thoughts

5.3 Counting Inversions

Personal Thoughts