This is an old revision of the document!

Overview

A good recap before moving on:

Algorithmic Paradigms • Greedy. Build up a solution incrementally, myopically optimizing some local criterion • Divide-and-conquer. Break up a problem into sub-problems, solve each sub-problem independently(??kind of unsure here, we also did the version of overlapping subproblems with q subproblems of size n/2),and combine solution to subproblems to form solution to original problem • Dynamic programming. Break up a problem into a series of overlapping subproblems, and build up solutions to larger and larger sub-problems

Essence here one implicitly explores the space of all possible solutions, by carefully decomposing things into a series of subproblems, and then building up correct solutions to larger and larger subproblems.

6.1 Weighted Interval Scheduling:A Recursive Procedure

Summary Previous interval scheduling problem is just a special case of this problem when all the weights are 1: Greedy won't work here.

Dynamic Programming

Memoization Process: Create a table with the possible inputs • If the value is in the table, return it, without recomputing it • Otherwise, call function recursively

Problem: Goal: find maximum weight subset of mutually compatible jobs Assume we have an optimal solution • OPT(j) = value of optimal solution to the problem consisting of job requests 1, 2, …, j Then at every step there are two choices:

OPT selects job j OPT does not select job j

that makes the algo easy.

case 1:Opt(j) = vj + Opt(p(j)) case 1:Opt(j) = Opt(j-1)

Memoization. Store results of each subproblem in a cache; lookup as needed.

Example run through on slide.

interesting/readable: 7/7

6.2 Principles of Dynamic Programming: Memoization or Iteration over Subproblems

Summary:

A Basic Outline of Dynamic Programming:

From slides:

1.Determine the optimal substructure of the problem  define the recurrence relation 2. Define the algorithm to find the value of the optimal solution 3. Optionally, change the algorithm to an iterative rather than recursive solution 4. Define algorithm to find the optimal solution 5. Analyze running time of algorithms

The big idea: Memorization

interesting/readable: 7/7

6.3 Segmented Least Squares: Multi-way Choices

Summary

The Problem: Formal description Suppose our data consists of a set P of rt points in the plane, (xn, Yn); and suppose x1 < x2 < .. ¯ < xn. Given a line L defined by the equation y = ax + b, we say that the error of L with respect to P is the sum of its squared “distances” to the points in P:

Simply put, we want some lines that minimizes the total cost of the line plus the error.

Properties that make things a little simper: At a certain point, the optimal solution include the point or does not include the point.

some definitions:

OPT(j) = minimum cost for points from 1 up to point j e(i, j) = minimum sum of squares for points from i to j

At the begginning, nothing was included cost = 0 At certian point j, Cost = e(i, j) + c + OPT(i-1). e(i, j): the cost of error from i to j c: cost of adding this line from i to j OPT(i-1): all the cost before i.

interesting/readable: 7/7

6.4 Subset Sums and Knapsacks: Adding a Variabl

Summary Problem: Given n objects and a “knapsack” Item i weighs wi > 0 kilograms and has value vi > 0

1d-dynamical programming won't work.

Recurrence: Either select item i or not

Case 1: OPT does not select item i • OPT selects best of { 1, 2, …, i-1 }

Case 2: OPT selects item i • Accepting item i does not immediately imply that we will have to reject other items  No known conflicts • Without knowing what other items were selected before i, we don't even know if we have enough room for i Needs more memorization power!

Add a variable.

Def. OPT(i, w) = max profit subset of items 1, …, i with weight limit w  Case 1: OPT does not select item i • OPT selects best of { 1, 2, …, i-1 } using weight limit w  Case 2: OPT selects item i • new weight limit = w – wi • OPT selects best of { 1, 2, …, i–1 } using new weight limit, w – wi

iterate through the n-w array.

Runtime: O(nw)

Interesting/readable:7/7

6.5 RNA SECONDARY STRUCTURE

Summary

Problem: RNA. String B = b1b2…bn over alphabet { A, C, G, U } • Secondary structure. RNA is single-stranded so it

A set of pairs S = { (bi, bj) } that satisfy: Constraints:

 [Watson-Crick] S is a matching and each pair in S is a Watson-Crick complement: A-U, U-A, CG, or G-C  [No sharp turns] The ends of each pair are separated by at least 4 intervening bases. If (bi, bj) ∈ S, then i < j - 4  [Non-crossing] If (bi, bj) and (bk, bl) are two pairs in S, then we cannot have i < k < j < l • Free energy. Usual hypothesis is that an RNA molecule will form the secondary structure with the optimum total free energy. • Goal. Given an RNA molecule B = b1b2…bn, find a secondary structure S that maximizes the number of base pairs

OPT(i, j) = maximum number of base pairs in a secondary structure of the substring bibi+1…bj  Case 1. If i ≥ j - 4 • OPT(i, j) = 0 by no-sharp turns condition  Case 2. Base bj is not involved in a pair • OPT(i, j) = OPT(i, j-1)  Case 3. Base bj pairs with bt for some i ≤ t < j - 4 • non-crossing constraint decouples resulting subproblems • OPT(i, j) = 1 + maxt { OPT(i, t-1) + OPT(t+1, j-1) }

Runtime: O(N3) two for loops and a max().

Interesting/readable:7/7

6.6-6.7 SEQUENCE ALIGNMENT

Some general context of the problem:

Google search and dictionaries have functions that recognizes what the user probably mean when typing in a word that does not exist in the dataset. In order to do How should we define similarity between two words or strings?

Dissimilar by gaps and mismatches.

Formalized:

Sequence Alignment • Goal: Given two strings X = x1 x2 . . . xm and Y = y1 y2 . . . yn find alignment of minimum cost • An alignment M is a set of ordered pairs xi-yj such that each item occurs in at most one pair and no crossings • The pair xi-yj and xi'-yj' cross if i < i', but j > j’.

different cases

Case 1: xM and yN are aligned • Pay mismatch for xM-yN + min cost of aligning rest of strings • OPT(M, N) = αXmYn + OPT(M-1, N-1)  Case 2: xM is not matched • Pay gap for xM + min cost of aligning rest of strings • OPT(M, N) = δ + OPT(M-1, N)  Case 3: yN is not matched • Pay gap for yN + min cost of aligning rest of strings • OPT(M, N) = δ + OPT(M, N-1)

Similar to KnapSack problem in the data structure.

space O(mn)

Can do this in linear space: Collapse into an m x 2 array  M[i,0] represents previous row; M[i,1] – current

T(m, n) ⇐ cmn + T(q, n/2) + T(m - q, n/2) = cmn + kqn/2 + k(m - q)n/2 = cmn + kqn/2 + kmn/2 - kqn/2 = (c + k/2)mn.

combination of divide-and-conquer and dynamic programming

we can map the problem to shortest path problem on a weighted graph.

Divide: find index q that minimizes f(q, n/2) + g(q, n/2) using DP Conquer: recursively compute optimal alignment in each piece.

Interesting/readable:7/7

Table of Contents

Overview

6.1 Weighted Interval Scheduling:A Recursive Procedure

6.2 Principles of Dynamic Programming: Memoization or Iteration over Subproblems

6.3 Segmented Least Squares: Multi-way Choices

6.4 Subset Sums and Knapsacks: Adding a Variabl

6.5 RNA SECONDARY STRUCTURE

6.6-6.7 SEQUENCE ALIGNMENT