Table of Contents

Overview

A good recap before moving on:

Algorithmic Paradigms • Greedy. Build up a solution incrementally, myopically optimizing some local criterion • Divide-and-conquer. Break up a problem into sub-problems, solve each sub-problem independently(??kind of unsure here, we also did the version of overlapping subproblems with q subproblems of size n/2),and combine solution to subproblems to form solution to original problem • Dynamic programming. Break up a problem into a series of overlapping subproblems, and build up solutions to larger and larger sub-problems

Essence here one implicitly explores the space of all possible solutions, by carefully decomposing things into a series of subproblems, and then building up correct solutions to larger and larger subproblems.

6.1 Weighted Interval Scheduling:A Recursive Procedure

Summary Previous interval scheduling problem is just a special case of this problem when all the weights are 1: Greedy won't work here.

Dynamic Programming

Memoization Process: Create a table with the possible inputs • If the value is in the table, return it, without recomputing it • Otherwise, call function recursively

Problem: Goal: find maximum weight subset of mutually compatible jobs Assume we have an optimal solution • OPT(j) = value of optimal solution to the problem consisting of job requests 1, 2, …, j Then at every step there are two choices:

OPT selects job j OPT does not select job j

that makes the algo easy.

case 1:Opt(j) = vj + Opt(p(j)) case 1:Opt(j) = Opt(j-1)

Memoization. Store results of each subproblem in a cache; lookup as needed.

Example run through on slide.

interesting/readable: 7/7

6.2 Principles of Dynamic Programming: Memoization or Iteration over Subproblems

Summary:

A Basic Outline of Dynamic Programming:

From slides:

1.Determine the optimal substructure of the problem  define the recurrence relation 2. Define the algorithm to find the value of the optimal solution 3. Optionally, change the algorithm to an iterative rather than recursive solution 4. Define algorithm to find the optimal solution 5. Analyze running time of algorithms

The big idea: Memorization

interesting/readable: 7/7

6.3 Segmented Least Squares: Multi-way Choices

Summary

The Problem: Formal description Suppose our data consists of a set P of rt points in the plane, (xn, Yn); and suppose x1 < x2 < .. ¯ < xn. Given a line L defined by the equation y = ax + b, we say that the error of L with respect to P is the sum of its squared “distances” to the points in P:

Simply put, we want some lines that minimizes the total cost of the line plus the error.

Properties that make things a little simper: At a certain point, the optimal solution include the point or does not include the point.

some definitions:

OPT(j) = minimum cost for points from 1 up to point j e(i, j) = minimum sum of squares for points from i to j

At the begginning, nothing was included cost = 0 At certian point j, Cost = e(i, j) + c + OPT(i-1). e(i, j): the cost of error from i to j c: cost of adding this line from i to j OPT(i-1): all the cost before i.

interesting/readable: 7/7

6.4 Subset Sums and Knapsacks: Adding a Variabl

Summary Problem: Given n objects and a “knapsack” Item i weighs wi > 0 kilograms and has value vi > 0

1d-dynamical programming won't work.

Recurrence: Either select item i or not

Case 1: OPT does not select item i • OPT selects best of { 1, 2, …, i-1 }

Case 2: OPT selects item i • Accepting item i does not immediately imply that we will have to reject other items  No known conflicts • Without knowing what other items were selected before i, we don't even know if we have enough room for i Needs more memorization power!

Add a variable.

Def. OPT(i, w) = max profit subset of items 1, …, i with weight limit w  Case 1: OPT does not select item i • OPT selects best of { 1, 2, …, i-1 } using weight limit w  Case 2: OPT selects item i • new weight limit = w – wi • OPT selects best of { 1, 2, …, i–1 } using new weight limit, w – wi

iterate through the n-w array.

Runtime: O(nw)

Interesting/readable:7/7

6.5 RNA SECONDARY STRUCTURE

Summary

Problem: RNA. String B = b1b2…bn over alphabet { A, C, G, U } • Secondary structure. RNA is single-stranded so it

A set of pairs S = { (bi, bj) } that satisfy: Constraints:

 [Watson-Crick] S is a matching and each pair in S is a Watson-Crick complement: A-U, U-A, CG, or G-C  [No sharp turns] The ends of each pair are separated by at least 4 intervening bases. If (bi, bj) ∈ S, then i < j - 4  [Non-crossing] If (bi, bj) and (bk, bl) are two pairs in S, then we cannot have i < k < j < l • Free energy. Usual hypothesis is that an RNA molecule will form the secondary structure with the optimum total free energy. • Goal. Given an RNA molecule B = b1b2…bn, find a secondary structure S that maximizes the number of base pairs

OPT(i, j) = maximum number of base pairs in a secondary structure of the substring bibi+1…bj  Case 1. If i ≥ j - 4 • OPT(i, j) = 0 by no-sharp turns condition  Case 2. Base bj is not involved in a pair • OPT(i, j) = OPT(i, j-1)  Case 3. Base bj pairs with bt for some i ≤ t < j - 4 • non-crossing constraint decouples resulting subproblems • OPT(i, j) = 1 + maxt { OPT(i, t-1) + OPT(t+1, j-1) }

Runtime: O(N3) two for loops and a max().

Interesting/readable:7/7

6.6-6.7 SEQUENCE ALIGNMENT

Some general context of the problem:

Google search and dictionaries have functions that recognizes what the user probably mean when typing in a word that does not exist in the dataset. In order to do How should we define similarity between two words or strings?

Dissimilar by gaps and mismatches.

Formalized:

Sequence Alignment • Goal: Given two strings X = x1 x2 . . . xm and Y = y1 y2 . . . yn find alignment of minimum cost • An alignment M is a set of ordered pairs xi-yj such that each item occurs in at most one pair and no crossings • The pair xi-yj and xi'-yj' cross if i < i', but j > j’.

different cases

Case 1: xM and yN are aligned • Pay mismatch for xM-yN + min cost of aligning rest of strings • OPT(M, N) = αXmYn + OPT(M-1, N-1)  Case 2: xM is not matched • Pay gap for xM + min cost of aligning rest of strings • OPT(M, N) = δ + OPT(M-1, N)  Case 3: yN is not matched • Pay gap for yN + min cost of aligning rest of strings • OPT(M, N) = δ + OPT(M, N-1)

Similar to KnapSack problem in the data structure.

space O(mn)

Can do this in linear space: Collapse into an m x 2 array  M[i,0] represents previous row; M[i,1] – current

T(m, n) ⇐ cmn + T(q, n/2) + T(m - q, n/2) = cmn + kqn/2 + k(m - q)n/2 = cmn + kqn/2 + kmn/2 - kqn/2 = (c + k/2)mn.

combination of divide-and-conquer and dynamic programming

we can map the problem to shortest path problem on a weighted graph.

Divide: find index q that minimizes f(q, n/2) + g(q, n/2) using DP Conquer: recursively compute optimal alignment in each piece.

Interesting/readable:7/7

6.8 Shortest Paths in a Graph

Negative weights would change everything that made the Greedy appropriate. Thus we cannot make decision only relying on local inforamtion at each step b/c a big negative edge that come later may drastically change the picture.

Some constraints: No negative weight cycle If some path from s to t contains a negative cost cycle, there does not exist a shortest s-t path. we can loop forever and get ever smaller.

Reccurence:

OPT(i,v): minimum cost of a v-t path P using at most i edges  This formulation eases later discussion • Original problem is OPT(n-1, s)

DP

 Case 1: P uses at most i-1 edges • OPT(i, v) = OPT(i-1, v)  Case 2: P uses exactly i edges • if (v, w) is first edge, then OPT uses (v, w), and then selects best w-t path using at most i-1 edges • Cost: cost of chosen edge

Implementation for every edge number, for possible node v for each edge incident to v find out foreach edge (v, w) ∈ E M[i, v] = min(M[i, v], M[i-1, w] + cvw )

O(mn) space.

General process of DP

Review: Dynamic Programming Process 1. Determine the optimal substructure of the problem  define the recurrence relation 2. Define the algorithm to find the value of the optimal solution 3. Optionally, change the algorithm to an iterative rather than recursive solution 4. Define algorithm to find the optimal solution 5. Analyze running time of algorithms

Interesting/readable: 8/8

6.9 Shortest Paths and Distance Vector Protocols

Problem Motivation: One important application of the Shortest-Path Problem is for routers in a communication network to determine the most efficient path to a destination.

attempt: Dijkstra's algorithm requires global information of network- unrealistic we need to work with only local information.

Bellman-Ford uses only local knowledge of neighboring nodes  Distribute algorithm: each node v maintains its value M[v]  Updates its value after getting neighbor’s values

Problems with the Distance Vector Protocol One of the major problems with the distributed implementation of Bellman- Ford on routers (the protocol we have been discussing above) is that it’s derived from an initial dynamic programming algorithm that assumes edge costs will remain constant during the execution of the algorithm. That is, we might get into a situation where there is infinite looping of mutual dependancy.

could fail if the other node is deleted.

Solution:

Each router stores entire path Not just the distance and the first hop  Based on Dijkstra's algorithm  Avoids “counting-to-infinity” problem and related difficulties  Tradeoff: requires significantly more storage

Interesting/readable: 5/5