Discrete Math & Algorithms

Discrete Math & Algorithms
I took these notes in winter 2017 for CME 305, taught by Professor Reza Zadeh.
\tableofcontents
Discrete math
Graph theory
A graph $G$ consists of a set of $n$ vertices and $m$ edges.
A graph is bipartite iff it contains no odd cycles iff its vertices can be partitioned into two independent sets.
A graph $G$ is a tree iff any two of the three conditions hold:
- $G$ is connected
- $G$ has no cycles
- $G$ has $n-1$ edges.
Every tree has a leaf. When removed, the result is another tree, making this useful for induction.
Trees are bipartite.
Handshaking lemma: $\sum_{v \in V} \deg v = 2m.$
Hall’s theorem: for a bipartite graph, there is a matching that covers one side $X$ if and only if for every subset $W$ of $X$ , we have $|W| \le |N(W)|$ , where $N(W)$ is the neighborhood of $W$ . (Proof uses induction on $|X|$ , weak if possible, strong in the remaining case.)
A clique is a subset of vertices whose induced subgraph is complete.
A cut is a partition of the vertices into two sets; its cut size is the number of edges that cross it.
A vertex cover is a subset of vertices such that every edge has at least one endpoint in the set.
An independent set is a subset of vertices in which no two vertices are adjacent.
A dominating set is a subset of vertices in which every vertex is adjacent to a vertex in the set (or is in the set).
A matching (independent edge set) is a subset of edges in which no two edges are incident.
- A set is a vertex cover iff its complement is an indepedent set.
- A set is an independent set iff it is a clique in the complement graph.
- König’s theorem: in a bipartite graph, the size of the minimum vertex cover is the size of the maximum matching. [1]
A Hamiltonian path/cycle is a path/cycle that visits each vertex exactly once.
- Hamiltonian paths on even-degree graphs give perfect matchings (take alternating edges).
A Eulerian path/circuit is a path/circuit that visits each edge exactly once.
- An Eulerian circuit exists iff $G$ is connected and every vertex has even degree.
  - Hierholzer’s algorithm is a proof by construction: Start at an arbitrary vertex $v$ and traverse edges until we return to $v$ (we can’t get stuck because every vertex has even degree). Repeat for every vertex with unselected edges.
  - This is a decomposition into edge-disjoint cycles.
submodularity
Asymptotic analysis
- $f(n) \in O(g(n))$ iff asymptotically, $f(n)$ is bounded above by a multiple of $g(n)$
- $f(n) \in \Theta(g(n))$ iff asymptotically, $f(n)$ is bounded above and below by multiples of $g(n)$
- $f(n) \in \Omega(g(n))$ iff asymptotically, $f(n)$ is bounded below by a multiple of $g(n)$
Proof techniques
Pigeonhole principle: if $n$ items are put into fewer than $n$ containers, then at least one of the containers contains more than one.
Probabilistic method
- A random variable $X$ assumes at least one value at least $\E[X]$ , and at least one value at most $\E[X]$ .
- Less formally, there exists something above average and something below average.
- One important case: if we split a set $S$ into two disjoint subsets, then one of them will contain at least half of $S$ .
- If a random variable $X$ satisfies a property with positive probability, then there exists some value of $X$ with that property.
Important bounds
- The max is at least the average: $\max_i x_i \ge \frac{1}{n}\sum_i x_i.$
- Union bound: $\Pr[\text{at least one of }A_i] \le \sum_i p(A_i),$ $\Pr[\text{all of }A_i] \ge 1 - \sum_i (1 - p(A_i)).$
- Exponential bound: $e^x \ge 1 + x$
  - Try to bound $\prod_i f(x_i)$ with $\prod_i \exp(x_i) = \exp(\sum_i x_i)$
- Markov inequality: for $X$ and $a$ non-negative, $p(X \ge a) \le \frac{\E[X]}{a}.$
- Chernoff bound: let $X$ be the sum of independent Bernoulli random variables with $\mu = \E[X]$ . Then $\begin{aligned} p(X > (1+\delta)\mu) &< \Big(\frac{e^\delta}{(1+\delta)^{1+\delta}}\Big)^\mu < e^{-\delta^2 \mu/3}, \\ p(X < (1-\delta)\mu) &< \Big(\frac{e^{-\delta}}{(1-\delta)^{1-\delta}}\Big)^\mu < e^{-\delta^2 \mu/2}, \end{aligned}$ the simplified bounds requiring that $0 < \delta < 1$ .
  - Proof: see [3].
An Erdős-Rényi graph $G(n,p)$ is a random graph on $n$ vertices, with each edge selected independently with probability $p$ .
- The distribution of degrees is binomial, which may be approximated as a Poisson distribution $\frac{(np)^ke^{-{np}}}{k!}$ .
- If $p > (1+ \epsilon)\frac{\log n}{n}$ , then $G(n,p)$ is almost surely connected.
- Ramsey theory
Algorithmic paradigms
Greedy algorithms
A matroid is a generalization of linear independence in vector spaces. It’s a ground set $E$ with a family of subsets (independent sets) of $E$ that satisfy the following axioms:
- There exists at least one independent set (the empty set).
- Every subset of an independent set is independent.
- If $A$ and $B$ are independent sets with $|A| > |B|$ , then $B \sqcup \{ x \}$ is also independent for at least one element $x \in A$ (and $x \notin B$ ).
A maximal independent set is called a basis. All bases have the same cardinality.
Examples of matroids abound:
- First, note that a graph’s vertices and its independent sets do not form a matroid.
- Graphic matroid: A graph’s edges and acyclic sets. The bases are spanning trees (when the graph is connected)
- Vector matroid: A finite subset of a vector space and its linearly independent sets. The bases are bases. (Note: we can present the finite subset as a matrix, hence “matroid”.)
- Uniform matroid: A finite set and its subsets of cardinality $k$ or less. The bases are sets of cardinality $k$ .
The greedy matroid algorithm, a generalization of Kruskal’s algorithm, optimizes weighted matroids.
Divide and conquer
Dynamic programming
Randomized algorithms
hitting time: $h_{ij}$ is expected number of steps to go from vertex $i$ to vertex $j$
commute time: $C_{ij} = h_{ij} + h_{ji}$
$C_i$ is the expected number of steps to visit all nodes, starting from $i$
Cover time $C(G) = \max_i C_i$
$C_{ij} = 2mR_{ij}$ , where $m$ is the number of edges, $R_{ij}$ is effective resistance
$h_{ij}$ is voltage from $i$ to $j$ if we inject $\deg u$ units of current in to each node $u$ and extract $2m$ units of current from $j$ .
$C(G) \le 2m(n-1)$ look at a spanning tree, and in order traversal $C(G) \le \sum h_{ab} = \sum C_{ab} \le 2m (n-1)$
$mR(G) \le C(G)$ where $R(G) = \max_{a,b} R_{ab},$ $C(G) \ge \max(h_{ij}, h_{ji}) \ge \tfrac{1}{2}C_{ij} = mR_{ij} = mR(G)$
$C(G) \le 2e^3 m R(G) \log n + n$
- Random walks
- Spectral graph theory
- Spectral sparsification
Graph algorithms
Search algorithms
Breadth-first search
Depth-first search
Shortest path
Dijkstra’s algorithm
A* search
Bellman-Ford
Floyd-Warshall
Maximum matching
Edmonds’s blossom algorithm
For bipartite graphs, use max-flow. There is a bijection between (non-infinite) cuts and vertex covers. See [1].
Minimum spanning tree
The minimum spanning tree problem finds a spanning tree with minimum weight. A spanning tree is a cycleless connected subgraph with $n$ vertices. Adding any edge to a spanning tree yields a cycle; deleting any edge from a spanning tree partitions the vertices into two sets.
Some important properties of minimum spanning trees:
- If the edge weights of a graph are distinct, there is a unique minimum spanning tree.
- Cut property: out of all edges crossing a particular cut, each minimum-weight edge is part of some MST.
- Cycle property: in any cycle of the graph, the maximum-weight edge (if unique) cannot be part of a MST.
Kruskal’s algorithm
Kruskal’s algorithm is an algorithm that builds a minimum spanning tree by greedily taking increasingly heavy edges that don’t create a cycle. Concretely:
- Sort the list of edges in order of increasing weight.
- Go through the list once, incrementally building a tree, taking an edge if it does not create a cycle.
The dominating cost is the sort, with $O(m \log m)$ complexity. The building of the tree can be done in $O(n \log n)$ or $O(m \alpha(n))$ time with the union-find data structure, where $\alpha(n)$ is the inverse Ackermann function.
Prim’s algorithm
Max-flow/min-cut
The maximum flow problem involves finding the maximum possible $s$ - $t$ flow in a weighted graph. An $s$ - $t$ flow is a flow from a source $s$ to a sink $t$ in a weighted graph (with weights interpreted as capacities), and its size is the amount leaving $s$ (or indeed, the net amount leaving any subset of vertices containing $s$ ).
The minimum cut problem involves finding an $s$ - $t$ cut of minimum cut size. An $s$ - $t$ cut is a partition of vertices $(S,T)$ with $s \in S$ and $t \in T$ , and its cut size is the sum of the capacities of the directed edges from $S$ to $T$ .
Notice that the size of any $s$ - $t$ flow is at most the size of any $s$ - $t$ cut, since all the flow must cross the cut and cannot exceed the capacity of the edges crossing the cut. In particular, this means that if the size of an $s$ - $t$ flow equals the size of an $s$ - $t$ cut, the flow is a maximum flow, and the cut is a minimum cut.
Ford-Fulkerson
Ford-Fulkerson is the following algorithm for computing the max flow:
- Start with zero flow.
- For each edge, add an opposing edge if it doesn’t already exist, with $0$ capacity.
- While there is a directed path $\gamma$ from $s$ to $t$ with positive capacity, we:
  - send as much additional flow $f$ along $\gamma$ as possible,
  - reduce the capacities of the edges of $\gamma$ by $f$ ,
  - increase the capacities of edges that oppose $\gamma$ by $f$ .
The modified graph is called the residual graph, and the backwards edges allows us to “undo” choices of flow. When this terminates, return the resulting flow.
If the capacities are integers, then the algorithm always terminates, since the flow size (the amount leaving $s$ ) increases but is bounded by the total capacity of the edges leaving $s$ . (In this case, we see that the max flow is always integral.)
If the algorithm terminates, then it returns a maximum flow. To show this, let $S$ be the nodes reachable from $s$ in the residual graph at the end of the algorithm, ignoring edges of capacity $0$ . Consider the edges crossing the cut $(S,T)$ in the original graph.
- The edges from $S$ to $T$ are fully saturated with flow, since the corresponding edges in the residual graph have been reduced to $0$ capacity.
- The edges from $T$ to $S$ are empty, since the residual edges from $S$ to $T$ are empty.
Hence the net flow (the flow size) equals this cut’s cut size, so this flow is a maximum flow, and this cut is a minimum cut.
If $C$ is the maximum flow, then there are at most $C$ iterations, each with $O(m)$ work. This algorithm is $O(mC)$ .
Edmonds-Karp
Edmonds-Karp is Ford-Fulkerson, where the paths are chosen in order of increasing length using breadth-first search. It runs in $O(m^2n)$ time, which comes from [todo].
Push-relabel
Note: in applications, use $\infty$ ’s liberally! They make analysis easier.
Karger’s algorithm
Karger’s algorithm for global min-cut is the following probabilistic algorithm. Generate random cuts as follows, keeping track of the minimum one seen so far. Until there are only two supervertices, contract a uniformly chosen edge (preserving multi-edges but removing self-edges). Return the cut that partitions the two supervertices.
(This is equivalent to assigning random weights and cutting a random edge of the MST.)
Let’s calculate the probability of returning a global minimum cut. Let $(S,T)$ be a global minimum cut, with cut size $k$ . Note that the degree of each vertex is at least $k$ (otherwise we could take just that vertex to find a minimum cut), and $m \ge \frac{1}{2}kn$ by the handshake lemma. The probability that we manage to avoid contracting an edge in $(S,T)$ on the first step is $1 - \frac{k}{m} \ge 1 - \frac{2}{n}$ , and $1 - \frac{k}{m-j} \ge 1 - \frac{2}{n+1-j}$ on the $j$ th step, combining into a probability of at least $\binom{n}{2}^{-1}$ when there are only $2$ nodes left. Thus we require $O(n^2)$ iterations of $O(n^2)$ work each.
Instead of generating each sequence of cuts independently, we can generate $n^2$ sequences at once with recursion. This process yields a $O(n^2 \log^2 n)$ algorithm.
NP-complete problems
Important complexity classes of decision problems:
- P: easy to solve (polynomial time).
- NP: easy to check (polynomial time verifiable). (Important: NP includes P!)
- NP-hard: can solve any NP problem (any NP problem reduces to it in polynomial time).
- NP-complete: NP-hard and NP.
When $A$ reduces to $B$ in polynomial time (we can use $B$ to solve $A$ ), we write $A \le B$ , since we consider $B$ to be harder than $A$ . To prove that an algorithm is NP-hard, show that an NP-hard problem reduces to it. If a problem is NP-hard, there are two main strategies to make it feasible: approximation algorithms, or restricting to an easy case.
Satisfiability
The boolean satisfiability problem (SAT) is the problem of determining whether a boolean statement is satisfiable. The Cook-Levin theorem says that SAT is NP-complete. SAT reduces to 3SAT, where the boolean statement is restricted to $3$ -conjunctive normal form, e.g. $(x_1 \vee x_2 \vee \lnot x_3) \wedge \cdots \wedge (\lnot x_2 \vee x_3 \vee x_4),$ by noting that $a \vee b \vee c \vee d$ can be satisfied iff $(a \vee b \vee x) \wedge (\lnot x \vee c \vee d)$ can be.
There is a $\frac{1}{2}$ -approximation for Max-3SAT: choose all true or all false. Each clause is satisfied by at least one of the two assignments; hence at least half of the clauses are satisfied by one of two assignments. This can be improved to $\frac{7}{8}$ by choosing randomly, if each clause has exactly $3$ .
There is also a polynomial-time randomized algorithm for 2SAT: at each step, choose an unsatisfied clause and flip one literal in it (choose either literal with probability $\frac{1}{2}$ ). If we compare the current assignment to a correct assignment, one or both literals will be wrong, so flipping one of them will increase the number of correct variables by $1$ with probability at least $\frac{1}{2}$ . Thus this is a random walk on a chain from $0$ to $n$ , where $n$ is the number of variables and the current position is the number of correct variables. The cover time is polynomial, making this algorithm polynomial-time. This algorithm isn’t the most efficient, but is easily generalizable to 3SAT (although, in that case, the algorithm becomes exponential, since the transition probabilities becomes $\frac{1}{3}$ and $\frac{2}{3}$ ).
Covering and packing
The following problems are all related and NP-complete:
- Vertex Cover: is there a vertex cover of (at most) size $k$ ?
- Independent Set: is there an independent set of (at least) size $k$ ?
- Clique: is there a clique of (at least) size $k$ ?
- Dominating Set: is there a dominating set of (at most) size $k$ ?
- Set Cover: given a universe $U$ and a set of subsets of $U$ , is there a cover of $U$ using $k$ of the subsets?
The reductions are:
- 3SAT to Independent Set: introduce a triangle for each clause and connect $a$ and $\neg a$ .
- Vertex Cover, Independent Set, and Clique: straightforward.
- Vertex Cover to Dominating Set: introduce an intermediate node $w$ for each edge $(u,v)$ , and connect them into a triangle.
- Dominating Set to Set Cover: closed neighborhoods for each vertex are the subsets; set of vertices is the universe.
- Vertex Cover to Set Cover: sets of incident edges for each vertex are the subsets; set of edges is the universe.
Minimum vertex cover/maximum independent set
There is a $2$ -approximation to Minimum Vertex Cover: find a maximal matching. An optimal cover contains at least one endpoint for each matching edge, so the maximal matching can only have twice as many.
Another $2$ -approximation is by integer program relaxation: write the associated integer program; solve the linear program by relaxing the integer constraint; round the variables. Interestingly, the dual to this linear program is corresponds to maximum matching [2].
Minimum vertex cover cannot be approximated better than $2$ (unless unique games conjecture is false). Maximum independent set cannot be approximated to within a constant factor (unless P = NP).
For a bipartite graph, use max-flow (see above). For a forest (which is bipartite!), you can also take a leaf as part of the cover, remove it and its parent, and recurse.
Set cover/dominating set
Minimum Set Cover has an $H_n$ -approximation: greedily choose the set that contains the most uncovered elements.
Proof #1: it’s more intuitive to work with the weighted version, where the sets have weights (“prices”), and at each step, we choose (“purchase”) the set with the lowest price per uncovered (“unobtained”) element. We try to minimize the total price (in set cover, the price is $1$ per set). Number the elements in the order they are chosen by the algorithm. Then the $k$ th item is purchased for some effective price (the cheapest possible, by greediness). However, we could have purchased that item for an effective price of $\frac{\text{OPT}}{K}$ , where $K$ is the number of remaining items, by just purchasing all the sets in the optimal set cover. Thus the cost of the $k$ th item is $\text{ALG}_k \le \frac{\text{OPT}}{K} \le \frac{\text{OPT}}{n-k+1},$ so $\text{ALG} \le H_n\, \text{OPT}$ . Minimum Dominating Set can use the same algorithm.
Proof #2: At a particular step, suppose there are $K$ uncovered elements. Because choosing all the optimal sets would cover those $K$ elements with $\text{OPT}$ sets, there exists one set that covers at least $\frac{K}{\text{OPT}}$ elements (probabilistic method). Therefore, the algorithm’s choice covers at least that many elements, so that after the step there are at most $K - \frac{K}{\text{OPT}}$ elements remaining. After $m$ steps, there are at most $n\Big( 1 - \frac{1}{\text{OPT}}\Big)^{m}$ elements remaining. Setting $m = \text{OPT}\log n$ makes this number less than $1$ , indicating that the algorithm will have terminated by then.
Sequencing
The following problems are NP-hard:
- Hamiltonian Cycle: Is there a Hamiltonian cycle in the graph?
- Hamiltonian Path: Is there a Hamiltonian cycle in the graph?
- Traveling Salesman: In a weighted complete graph, find a tour of all the nodes of minimum distance.
The reductions are:
- 3SAT to Hamiltonian Cycle: complicated.
- Hamiltonian Cycle to Hamiltonian Path: pick an edge $(u,v)$ , adjoin $u'$ to $u$ and $v'$ to $v$ . Repeat for all edges.
- Hamiltonian Cycle to Traveling Salesman: weight the edges with $1$ and complete the graph, weighting the added edges with $2$ . There is a Hamiltonian cycle iff there is a tour of length $n$ .
Traveling Salesman cannot be approximated, since any approximation will still find Hamiltonian cycles. If the distance is metric (triangle inequality and symmetric), then the Christofides algorithm is a $\frac{3}{2}$ -approximation algorithm. It consists of the following steps:
- Compute a MST a $T$ . Note that $w(T) \le \text{OPT}$ , since $\text{OPT}$ contains a spanning tree, so going around the MST and skipping (shortcutting) duplicate nodes is already a $2$ -approximation.
- Let $O$ be the odd-degree vertices in the subgraph $T$ . By the handshaking lemma, $|O|$ is even. Compute a minimum-weight perfect matching $M$ of the vertices in $O$ . The intuition is that it might be cheaper to connect these up another way instead of just blindly shortcutting. Indeed, shortcutting the optimal tour to a tour on just $O$ yields two disjoint perfect matchings on $O$ , so $w(M) \le \frac{\text{OPT}}{2}$ .
- Combine $M$ and $T$ into a multigraph and compute an Eulerian cycle on it. Shortcut it to yield the final tour.
Numerical
Integer linear programming?
Partitioning
Max cut
Subset Sum
Bin packing - any fit
Randomized max-cut approx, deterministic max-cut approx, ILP, Goemans-Williamson
Matroid/submodularity approx
Linear programming (totally unimodular)
References
[1] http://www.cs.toronto.edu/~siavosh/csc373h/files/TN6.pdf
[2] http://theory.epfl.ch/courses/topicstcs/Lecture52015.pdf
[3] Motwani and Raghavan, pg. 68.