COMS 4995: Randomized Algorithms

Sept 4: Welcome to COMS 4995!
Nov 7: The final (non-cumulative) exam will be in class during the last meeting time, Monday December 9, 10:10-11:25am.

Instructor:

Tim Roughgarden (Office hours: Mondays 11:30am-12:30pm in Mudd 410. Email: tr@cs.columbia.edu.)

Course Assistants:

David Cheikhi (Office hours: Mondays 12:30-2:30pm and Tuesdays Noon-2pm in the TA room, Mudd first floor. Email: d.cheikhi@columbia.edu.)
Chengyu Lin (Office hours: Tuesdays 2-3pm. Email: chengyu@cs.columbia.edu.)

Time/location: 10:10-11:25 AM Mon/Wed in Mudd 545.

Discussion site: We'll be using Piazza.

Prerequisites: Undergraduate algorithms (COMS 4231) or equivalent.

Course Description: Randomness pervades the natural processes around us, from the formation of networks, to genetic recombination, to quantum physics. Randomness is also a powerful tool that can be leveraged to create algorithms and data structures which, in many cases, are more efficient and simpler than their deterministic counterparts. This course covers the key tools of probabilistic analysis, and applications of these tools to understand the behaviors of random processes and algorithms. Emphasis is on theoretical foundations, though we will apply this theory broadly, discussing applications in machine learning and data analysis, networking, and systems.

Textbook: There is no required textbook for the course; lectures are the sole required source of content. Supplementary reading will be posted as part of the lecture schedule, below. You might, however, find one or more of the following books helpful:

Probability and Computing: Randomized Algorithms and Probabilistic Analysis by Mizenmacher/Upfal.
Randomized Algorithms by Motwani/Raghavan.
The Probabilistic Method by Alon/Spencer.
Concentration of Measure for the Analysis of Randomized Algorithms by Dubhashi/Panconesi.

Exercise Sets (0%): Exercise sets will be handed out weekly and are meant to help you keep up with the course material. Do not turn in your solutions. Think about them and discuss with fellow students or the course staff any that you cannot answer. At least half of each exam will consist of problems on these exercise sets or variations thereof.
- Week 1 Exercises (Posted Wed Sept 4)
- Week 2 Exercises (Posted Wed Sept 11)
- Week 3 Exercises (Posted Wed Sept 18)
- Week 4 Exercises (Posted Wed Sept 25)
- Week 5 Exercises (Posted Wed Oct 2)
- Week 6 Exercises (Posted Wed Oct 9)
- Week 7 Exercises (Posted Wed Oct 16)
- [No exercise set on Oct 23.]
- Week 9 Exercises (Posted Wed Oct 30)
  - Typo in Exercise 37 fixed 12/2/19.
- [No exercise set on Nov 6.]
- Week 11 Exercises (Posted Wed Nov 13)
- Week 12 Exercises (Posted Wed Nov 20)
  - Typo in Exercise 48 fixed on 12/5/19.
- Week 13-14 Exercises (Posted Mon Dec 2)
Problem Sets (60%): There will be 5 problem sets. Here is the schedule:
- Problem Set #1 (Due Wed Sept 25.)
- Problem Set #2 (Due Wed Oct 9.)
- Problem Set #3 (Due Wed Oct 30.)
- Problem Set #4 (Due Mon Nov 18.)
- Problem Set #5 (Due Wed Dec 4.)
  - Typo in Problem 30 fixed 11/25/19.
Problem Set Policies:
- These are challenging and you are strongly encouraged to form groups, of up to three students. Each group should turn in a single write-up (all students of the group receive the same score).
- You can form different groups for different problem sets.
- You can discuss problems with students from other groups verbally and at a high level only.
- Except where otherwise noted, you may refer to your course notes, the textbooks and research papers listed on the course Web page only. You cannot refer to textbooks, handouts, or research papers that are not listed on the course home page. If you do use any approved sources, make you sure you cite them appropriately, and make sure that all your words are your own.
- You are strongly encouraged to use LaTex to typeset your write-up. Here's a LaTeX template that you can use to type up solutions. Here and here are good introductions to LaTeX.
- Honor code: We expect you to abide by the computer science department's policies and procedures regarding academic honesty.
- Submission instructions: We are using Gradescope for the homework submissions. Go to www.gradescope.com to either login or create a new account. Use the course code 9D6V5E to register for this class. Only one group member needs to submit the assignment. When submitting, please remember to add all group member names onto Gradescope.
Late Days:
- One late day equals a 24-hour extension.
- Each student has three free late days.
- At most two late days can be applied to a single assignment; after Friday (following the original due date) no solutions will be accepted.
- Each late day used after the first two will result in a 25% penalty.
  - Example: a student had one free late day remaining but his/her group uses two late days on a Problem Set. If the group's write-up earns p points, the student receives a final score of .75*p points for the assignment.
Exams (40%): There will be two exams. The first one is during class on Wed Oct 23, the second one during class on Mon Dec 9. The second exam will be a non-cumulative 75-minute exam, weighted equally with the first exam.
- At least half of the questions will be exercise set questions or variations thereof.
- The exam is closed-book/computer; however, you are allowed to bring one double-sided sheets (2 pages) of notes, which must be handwritten and prepared by yourself.

Lecture Schedule

Lecture 1 (Wed Sept 4): Course goals and deliverables. Random sampling and abundance. Randomized primality testing and the Miller-Rabin test. Supplementary reading/videos:
- Quicksort (Hoare, 1962)
- Probabilistic algorithm for testing primality (Rabin, 1980)
- Lecture notes by Bobby Kleinberg (Cornell) on the Miller-Rabin test.
- QuickSort analysis videos: Part 1 Part 2 Part 3
  - See also Chapter 5 of Algorithms Illuminated.
- Probability review videos: Part 1 Part 2
  - See also Appendix B of Algorithms Illuminated.
Lecture 2 (Mon Sept 9): Lightning review of basic probability concepts. Linearity of expectation. A 7/8-approximation for MAX 3SAT. Other constraint satisfaction problems (CSPs). The Goemans-Williamson randomized hyperplane rounding algorithm for the Maximum Cut problem.
- See Section 2 of these lecture notes for the MAX 3SAT application.
- Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming (Goemans/Williamson, 1995)
- A video and lecture notes with additional details.
Lecture 3 (Wed Sept 11): Karger's randomized contraction algorithm for the Min Cut problem.
- Most of this content is covered also in the following videos: Part 1 Part 2 Part 3 Part 4
Lecture 4 (Mon Sept 16): Hash functions as approximate random oracles. Hash table review (chaining and open addressing). Bloom filters: implementation and heuristic analysis.
- For a review of the basics of hash tables (including open addressing), see the following videos (or Algorithms Illuminated, Sections 12.1-12.4):
  The bloom filter material is covered in the following videos (or Algorithms Illuminated, Sections 12.5-12.6):
  - Bloom Filters: The Basics
  - Bloom Filters: Heuristic Analysis
  For a discussion of applications of bloom filters in networks, see Broder/Mitzenmacher, Network Applications of Bloom Filters: A Survey, 2005.
Lecture 5 (Wed Sept 18): Heavy hitters. The count-min sketch. Relaxing the random oracle assumption (theory and practice).
- Lecture notes
- Cormode/Muthukrishnan, An Improved Data Stream Summary: The Count-Min Sketch and its Applications, 2003.
- One of several count-min sketch implementations.
Lecture 6 (Mon Sept 23): The AMS streaming algorithm for F_2 estimation. Supplementary references:
- Lecture notes (See Sections 2-4)
- Alon/Matias/Szegedy, The space complexity of approximating the frequency moments, STOC '96/JCSS '99.
- Optional: for more on matching lower bounds (via communication complexity), see these notes.
Lecture 7 (Wed Sept 25): Pseudorandom data and hashing. Why simple hash functions work as well as fully random ones in practice. References:
- Mitzenmacher/Vadhan, Why Simple Hash Functions Work: Exploiting the Entropy in a Data Stream, SODA '08.
- See also this later version, with Chung.
- Lecture notes and video.
Lecture 8 (Mon Sept 30): Locality-sensitive hashing. Near-duplicate detection. Jaccard similarity and min-wise hashing. Cosine similarity and a corresponding LSH family.
- Origins of MinHash at Alta Vista: Broder, Identifying and Filtering Near-Duplicate Documents (from 2000).
- Cosine similarity: Charikar, Similarity Estimation Techniques from Rounding Algorithms
- Henzinger, Finding near-duplicate web pages: a large-scale evaluation of algorithms, SIGIR '06.
- Andoni/Indyk, Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions, CACM '08.
- For much more on LSH, see this chapter of the textbook by Leskovec, Rajaraman, and Ullman.
Lecture 9 (Wed Oct 2): Concentration inequalities and the Gaussian baseline. From Chebyshev to the Weak Law of Large Numbers. Beating Chebyshev: inspiration from the Central Limit Theorem. Large deviation bounds for Gaussians.
- This material appears in numerous different textbooks, including e.g. Chapter 9 of the Mitzenmacher-Upfal book listed above.
Lecture 10 (Mon Oct 7): Chernoff bounds and their proofs. Applications: correctness amplification for randomized algorithms with two-sided error; the expected maximum search time in a hash table with chaining.
- Lecture notes on Chernoff bounds by Avrim Blum and your instructor.
Lecture 11 (Wed Oct 9): The Johnson-Lindenstrauss (JL) Lemma. Dimensionality reduction and its applications. A tail bound for sums of chi-squared random variables.
- Lecture notes by Nick Harvey.
Lecture 12 (Mon Oct 14): Killer apps of Chernoff bounds. Randomized rounding and low-congestion routing. Random (Erdos-Renyi) random graphs and recovering planted bisections.
- For lecture notes on randomized rounding, see Section 4 of these lecture notes.
- Random graphs and planted bisections are discussed in Sections 2.1--2.3 of these lecture notes.
Lecture 13 (Wed Oct 16): Killer apps of Chernoff bounds in complexity theory. Newman's theorem in communication complexity (simulating public coins with private coins with logarithmic overhead). Adleman's theorem in computational complexity (i.e., BPP in P/poly).
- For the randomized one-way EQUALITY protocol, see Section 1 of these lecture notes.
- For Newman's theorem, see Section 3.2 of these lecture notes.
- For Adleman's theorem, see this video by Ryan O'Donnell (background starting at 1:00:55, Adleman's theorem at 1:07:15).
Lecture [N/A] (Mon Oct 21): Midterm review (optional).
Lecture [N/A] (Wed Oct 23): In-class midterm exam.
Lecture 14 (Mon Oct 28): PAC learning. Uniform convergence for finite hypothesis classes.
- General reference: Understanding Machine Learning, by Shalev-Shwartz/Ben-David.
Lecture 15 (Wed Oct 30): Growth functions. Uniform convergence for hypothesis classes with polynomially bounded growth functions.
- See also these lecture notes by Anna Karlin.
Lecture [N/A] (Mon Nov 4): No class (academic holiday).
Lecture 16 (Wed Nov 6): VC dimension. Sauer's Lemma. Intro to online learning.
- For VC dimension and Sauer's Lemma, see these lecture notes by Nina Balcan.
- For an intro to online learning, see Section 2 of these lecture notes.
Lecture 17 (Mon Nov 11): Regret-minimization in online learning. Geometric random variables and the FTPL (Follow-the-Perturbed-Leader) algorithm.
- Lecture notes (see Section 3).
Lecture 18 (Wed Nov 13): Introduction to Markov chains. Existence and uniqueness of stationary distributions. Application: a simple sublinear-time algorithm for computing a perfect matching in a regular bipartite graph.
- Lecture notes by Anna Karlin.
- Goel/Kapralov/Khanna, Perfect Matchings in O(n log n) time in Regular Bipartite Graphs, STOC '10.
Lecture 19 (Mon Nov 18): More on random walks. Cover times. A randomized log-space algorithm for undirected connectivity. Introduction to Markov Chain Monte Carlo. Supplementary reading:
Lecture 20 (Wed Nov 20): Expansion characterizes rapid mixing of random walks on undirected graphs.
- Lecture notes by Avrim Blum.
Lecture 21 (Mon Nov 25): Power of two choices. See Section 17.1 of the Mitzenmacher-Upfal book, listed above.
Lecture [N/A] (Wed Nov 27): No class (Thanksgiving).
Lecture 22 (Mon Dec 2): Probabilistic method/Lovasz Local Lemma.
- Lecture notes
Lecture 23 (Wed Dec 4): Schoning's randomized algorithm for 3-SAT.
- Lecture notes (see Section 3)
Lecture [N/A] (Mon Dec 9): In-class final exam (non-cumulative).