CMPT 307 Cheat Sheet

Course summary for 307 DS/Algorithm II with Valentine Kabanet

Breadth First Search

has property that each layer is one edge distance away from the previous layer
runs in O() time when in adjacency list
can be used to see if a graph is 2 colorable / bipartite, by creating a tree G' and see if theres an edge in original graph that connects two nodes of same level in G'. if there is, there is a odd cycle, and cannot be 2 colorable

Depth First Search

useful for direct acyclic graphs DAGS. (checking if there is a cycle)
has property that an edge in original graph G a DFS tree

DAG and Topological ordering

uses DFS to find the topo. ordering, repeatly traceback a starting node until a node without entering edge is found, then pop that, and repeat

Dijkstra's Algorithm

finds the shortest path to any node from starting node s.
can't be used for edges with negative weight.
uses BFS to find nodes to add to the set of explored nodes.
use min heap to order nodes of different weight

Kruskal Algorithm

finds the minimum spanning tree by repeatly adding min weight edges that does not create cycle
when running this algo we should sort the edges from low to high weight

Prim's Algorithm

think modified Dijkstras Algorithm. finds the minimum spanning tree by repeatly adding min weight unexplored vertex to the set of explored vertex - that does not create a cycle

Disjoint Data Structure AKA Union Find

this is used for kruskal algo to find edges that do not create cycle.
we group all the "connected" edges in independent sets. so in kruskal if we are checking if two edges u, v create a cycle, we simply look for the set with u, and see if v is in it.
the "find" part of Union Find combines two independent sets together
this allows Kruskal to be implemented in O(mlogn) time

Link By Size

is the idea of maintaining a wide tree to keep the 2^height(root) and height(root) is log(r)
when we link, the smaller tree becomes another branch under the root of the big tree.

Interval Scheduling - to minimize lateness, we do greedy algorithm (GA) by selecting jobs with earliest deadline first - to get the maximum size of job in a schedule, we do GA by selecting earliest finish time first

Interval Partitioning given n classes with different start time and m classrooms. find the minimum m classrooms needed to create the schedule.

for this, use earliest start time

we can verify the lower bound of the algorithm (the minimum classrooms needed) with the idea of "depth"

if there are x classes that have overlapping schedule, then the depth is x, and no algorithm can use fewer than x classrooms to accomodate that.

Divide and conquer - refers to algorithm that divide the problem into subproblems, so smaller chunks that is easier to solve (like merge sort)

the run time of divide and conquer are usually logarithic , because we can visualize the calls in a tree structure, and each level is dividing the input size by some factor, and there are k such levels.

Counting Inversions - uses divide and conquer approach, divide the list in two halves, sort and count inversons in each half, then combine and count - Inversions have to do with job scheduling as well

*Integer Multiplication / Karatsuba - can use divide and conquer techniques to do addition subtraction

Weighted Interval Scheduling - is a type of dynamic programming (DP) - need to use an array that stores the max weight at n number of jobs.

Knapsack problem - given weight n of a knapsack, find the max value of items to put in the knapsack. - the array will be a table with i items as rows, and n number as columns.

Floyd Warshall (All Shortest Path) - the intuition is that if a path from \(s\) to \(t\) has intermediate nodes, let \(a\) be one of those nodes. so we can breakdown the question into subproblems

shortest path from \(s\) to \(a\) , plus
shortest path from \(a\) to \(t\)

the DP array will have the value of shortest path from S to any node n

Bellman Ford - this is a DP algorithm that accomodates for negative edge cycles - initialize distance from \(s\) to all nodes as infinity. - it runs in n-1 time, (because a path has at most n-1 edges) - each time we update the distance from s to i node by looking every edge (u,i) (relaxing the edge) :::code if d(v) > d(u) + w(v): d(v) = d(u) + w(v)

the algo ends at n-1, if we want to check for negative cycles, we run the algo one more time, and see if there are any changes, if there is, that means theres negative cycle

Longest Common Subsequence - we can use a DP algorithm to solve this problem.

given two strings A and B, the recurrence is

if A and B are same size, check if A[last bit] = B[last bit]
if A and B are not same size, check max(A[lastbit] = B[last bit-1] , A[last bit-1] = B[last bit])

so the memoize array is a table, and the final row / column stores the size of the sequence

Longest Increasing Subsequence

Min Cut Max Flow is a network flow idea that in an s t cut, where graph is separated in two sides where s is on oneside and t is on another. the minimum cut's capacity = the max flow from s to t.

Ford Fulkerson Algorithm Residual Graph -> a \(graph_f\) that has exact same nodes as G, but have added "back edges" to accomodate for unused capacity.

Augment Path -> a path in the residual graph you can add flow to

takes the idea of residual graphs, the algo keeps iterating while there is an augmenting path in \(G_f\)

Choosing Better Path - Capacity Scaling

matching in a bipartite graph network flow can help with finding out match in a graph, connect left side with s, and rightside with t. the edges connected to s will each have 1 capacity.

then run ford fulkerson algo, if the max flow equals to size of L then there is perfect matching, otherwise it shows the max number of matching possible.

this works because of conservation of flow which says for all nodes except s, t, what goes in = what goes out. and integrality invariant which says the flow values are integer, you can't subdivide an integer (if in edge is 1, and theres two out edge, you can't say .5 and .5 for both outflow, only 1 out edge is used)

project selection network flow can solve project selection questions too, given a set of projects with their revenue, a set of components and their cost, and a list of dependencies between project and component, we can create a network to solve the optimum set of projects to take on.

connect s to components, each edge (s,c) will have weight of the cost of the component
connect t to projects, each edge (p,t) will have weight of the revenue of the component
connect components and projects by the list of dependencies , each (c,p) will have weight infinity

then run ford fulkerson algorithm, which will create a graph that has some edges used, and some not.

take a min cut such that the capacity is finite, this will be the optimum project selected

the intuition is that the finite min cut will contain outgoing edges that are either

revenue on missing project
cost of selected project

And then the max profit is just Total possible revenue less those edges

Edge Disjoint Path number of paths from s to t that does not have overlap edges

Menger's theorem says max number of edge disjoint paths is the same as min cut from s to t, assuming each edge has capacity 1

Konig-Egervary Theorem this theorem says the maximum size of matching is <= the minimum size of vertex cover

Randomization this is for content resolution, how to deal with multiple processes wanting to access a resource, we can randomize it

Approximation

theres an \(O(n)\) way of finding vertex covers, the result is approximately

\(S = 2S_{optimal}\)

findVC(graph G):
    init S &lt;- {}  //our vertex cover
    while there are edges in G:
        take an edge (u,v)
        add both u and v into set S
        remove all edges related to u, v
    return S

this is using the idea that S really contains set of pairs of independent nodes,

knapsack problem is NP Complete, but we can approximate the best subset sum by shrinking the values of the weights proportionaly, because finding the optimal value is \(O(nW)\) time

not polynomial in input size. as W is a number can be written in \(log_2(W)\)

redefine the problem as the minimum weight of a knapsack which we can obtain a solution of value >=v

Proofing strategy

Greedy Algorithm

think in terms of promising solutions, for each round of iteration, the invariant is that \(S_{greedy} \subseteq S_{promising} \subseteq S_{greedy} +\)some future choices

the greedy algorithm chooses something that belongs to the \(S_{promising}\)

Dynamic Programming

describe an array \(A(n)\) that stores the answer at \(n\) , this will be the max, or min of the question, or true/false etc.
build a recurrence that uses the \(A(n)\), this is call memoization
build the algorithm that populates \(A(n)\) based on the recurrence.
retrieve the answer by looking at \(A(n)\), depending on the problem, the answer can be at the last spot of the array (for things like max number of something). or we need to build the answers from the array by iterating through it.

NP Completeness

first show that the problem z is NP , this involves checking if there is a certifier that can validate an answer in polytime
find a NP Complete problem X that can reduce to z. this involves converting an instance of X to have the properties z. lets call this new converted instance Y
we then need to proof that Y Is true if and only if z is true, and vice versa.