Table of Contents
Entry Five
Chapter 4.2
Scheduling to Minimize Lateness: An Exchange Argument
In this section we look at solving another scheduling problem where there is one resource and a set of requests to use that resource for a certain amount of time. Each request has a deadline and a time interval length and can be scheduled any time before the deadline. Different requests cannot overlap. An example of this problem is scheduling jobs to minimize lateness. The algorithm must determine a start and a finish time for each request. If the finish time of a request is greater than the deadline then we note that the job is late. The lateness of a job is calculated by the finish time minus the deadline. If the request is not late then the lateness is 0.
The Algorithm:
There are two greedy ways to approach the problem and sort the data:
- order the jobs smallest to largest by the length of the job
- order the jobs smallest to largest by their slack times (deadline minus the length of the job)
Both of these approaches fail and will not produce the optimal solution. Our approach will be to sort the jobs by their deadlines in increasing order. The jobs will be scheduled in this order with each job's start time beginning at the previous jobs finish time.
Order the jobs in order of their deadlines Assume the notation that d<sub>1</sub> =< ... =< d<sub>n</sub> Initially f = s Consider the jobs i = 1,...,n in this order Assign job i to the time interval from s(i)=f to f(i)=f + t<sub>i</sub> Let f = f + t<sub>i</sub> End Return the set of scheduled intervals [s(i),f(i)] for i = 1,...,n
Analysis
The solution that is produced from this algorithm does not have any idle time, meaning that there are no gaps between jobs where the machine is not doing anything and there are still jobs to complete.
(4.7) There is an optimal schedule with no idle time.
To prove that our solution is optimal and the maximum lateness is minimized, we use the exchange argument. We consider an optimal schedule O and modify O to transform it to the schedule that our greedy algorithm produced. If a schedule puts a job with a greater deadline ahead of a job with a smaller deadline, then we say that schedule has an inversion. Our greedy algorithm produces a schedule with NO inversions.
(4.8) All schedules with no inversions and no idle time have the same maximum lateness
Proof
If there are two schedules without idle time or inversions, they can only be different in the order they schedule jobs with the same deadline. Jobs with the same deadline are scheduled consecutively and the last one as the maximum lateness which is not affected by the order of the jobs. To prove this we take an optimal schedule with no idle time and convert it to a schedule with out any inversions. The changes we make to the schedule will not increase the maximum lateness. So both schedules are optimal
(4.9) There is an optimal schedule that has no inversions and no idle time
Proof
Start with an optimal solution O with no idle time. The schedule can have at most n/2 inversions if all of the pairs are inverted.
- if O has an inversion, there is a pair of jobs where the one with the later deadline is scheduled before the one with the earliest deadline
- after swapping the jobs we get a schedule with one less inversion
- the new swapped schedule has a maximum lateness no greater than that of O
To prove the new schedule's lateness is no greater than the original solution, we show that the by swapping the two jobs their lateness is either the same or it is less. Swapping them cannot make a greater lateness.
(4.10) The schedule A produced by the greedy algorithm has optimal maximum lateness L.
The exchange argument is still pretty confusing to me I'd like to see a detailed outline of it. The book confused me after the lecture we had in class.
Readability 8/10
Chapter 4.4
Shortest Paths in a Graph
We want to be able to find the shortest path from one node to a goal node with a simple algorithm. One example of how we might use this is navigating to a webpage or finding the shortest route to a location. Given a directed graph with a start node we assume that there is a path to every other node in the graph and each edge has a length or weight greater than 0. The length of a path is the sum of the weight of all of the edges in the path. We want to find the minimum path.
Design
We start with Dijkstras algorithm. We determine the length of the shortest path by exploring out from our start node and adding to the set each node that has the shortest path distance from s.
Dijkstra's Algorithm (G, l)
Let S be the set of explored nodes
For each u in S, we store a distance d(u)
Initially S is empty and d(s)=0
While S is not equal to V
Select a node v not in S with at least one edge from S for which d'(v) = mine=(u,v):u in Sd(u)+le is as small as possible
Add v to s and define d(v) = d'(v)
Endwhile
Analysis
To prove the correctness of this algorithm by showing the paths the algorithms produce are the shortest distances. The algorithm is greedy!!! Because we are always forming the shortest path we can make. Every time a node is added to the path we always know that it is the shortest distance.
Proof
By induction we prove the size of S is the smallest it can be.
The algorithm does not work if there are edges with negative weights!!!
Implementation and Running Time
The runtime of this algorithm is O(m*n). The while loop is executed n-1 times. To compute the edge with the minimum weight for m edges takes O(m) time so the overall runtime is O(mn). If we keep the minimum values for each node in a priority queue we improve our runtime. We add the nodes into the PQ and extract the minimum to add to the set of the shortest path nodes. When we do this our runtime is improved to O(m log n) time.
Readability 8/10
Chapter 4.5
The Minimum Spanning Tree Problem
An example of this problem would be if we wanted to connect a neighborhood's communication network. All of the houses must be connected but we want to do this in the least cost/shortest way possible. We represent the set of possible links using a graph with a cost associated with each edge. We want to find a subset of the edges so the graph is connected with minimum total cost. The solution is a tree!
Design There are three different approaches to greedy algorithms that solve this problem
- Kruskal's
- builds a tree by inserting edges in order of increasing cost
- don't add the edge if it creates a cycle
- Primm's
- start with a root node and greedily grow a tree outward
- add the node that is the cheapest edge to attach
- Reverse Delete
- backwards version of Kruskal's
- start with a full graph and delete edges that are most expensive
- delete as long as it doesn't disconnect the graph
Analysis
The cut property: Assume that all edge costs are distinct. Let S be any subset of nodes that is neither empty nor equal to all of V, and let edge e = (v,w) be the minimum cost edge with one end in S and the other in V-S. Then every minimum spanning tree contains the edge e.
Proof We use the exchange argument. If we replace the least cost edge in a tree with another edge that is more cost we result in a tree with less expense. Both Kruskal's and Primm's algorithms only include an edge when it is justified by the cut property. They also both produce a minimum spanning tree.
Implementation of Primm's
- both prime's and kruskal's are implemented in O(m log n) time.
- implementations of prime's and dijkstra's almost identical
- if we use a priority queue we can implement in O(m log n)
This section was difficult to get past all of the obscure variables which made it more difficult to follow than with concrete examples in class. I wish the book had more examples. Readability: 7/10
Chapter 4.6
Implementing Kruskal's Algorithm: The Union-Find Data Structure
We want to store a representation of connected components so we can easily and quickly update the graph and search the graph. The data structure we use lets us keep disjoint sets. The find operation will return the set that has that node in it. We use this to compare two nodes and see if they're in the same set. The union operation merges two sets into one. This data structure only supports adding edges to the graph, it does not allow us to delete edges because it would split the component in two.
The Data Structure
We could use an array that contains the name of the set containing each element. This would make lookup be O(1) but the union function would take O(n) time. To improve the runtime we will explicitly maintain the list of elements in each set. We also choose the name for the union to be the name of one of the sets so we only have to update the values of the component in the other set. We bound the average running time of k Union operations to have O(k log k) time to update component values.
Another data structure we can use uses pointers. Each node will point to the set it belongs to. We will still use the elements of the set as possible set names. MakeUnion operation will initialize a record for each element in the set with a pointer that points to itself to show its in its own set. This confuses me!!!! Why is it pointing to itself?* With this data structure the Union function is constant time and to reduce the linear Find operations we keep the larger set as the name of the union to make it O(long n) and the MakeUnionFind operation is O(n).
This section was a lot more confusing to me and I thought was pretty difficult to read. It took me multiple times to read over before I felt like I could take notes on it. Rate 5-6/10.