COMS 4995: Randomized Algorithms
- Sept 4: Welcome to COMS 4995!
- Nov 7: The final (non-cumulative) exam will be in class during the last meeting time, Monday December 9, 10:10-11:25am.
Instructor:
-
Tim
Roughgarden (Office hours: Mondays 11:30am-12:30pm in Mudd 410. Email: tr@cs.columbia.edu.)
Course Assistants:
-
David Cheikhi
(Office hours: Mondays 12:30-2:30pm and Tuesdays Noon-2pm in the TA room, Mudd first floor.
Email: d.cheikhi@columbia.edu.)
-
Chengyu Lin
(Office hours: Tuesdays 2-3pm.
Email: chengyu@cs.columbia.edu.)
Time/location: 10:10-11:25 AM Mon/Wed in Mudd 545.
Discussion site: We'll be using
Piazza.
Prerequisites: Undergraduate algorithms (COMS 4231) or equivalent.
Course Description:
Randomness pervades the natural processes around us, from the
formation of networks, to genetic recombination, to quantum
physics. Randomness is also a powerful tool that can be leveraged to
create algorithms and data structures which, in many cases, are more
efficient and simpler than their deterministic counterparts. This
course covers the key tools of probabilistic analysis, and
applications of these tools to understand the behaviors of random
processes and algorithms. Emphasis is on theoretical foundations,
though we will apply this theory broadly, discussing applications in
machine learning and data analysis, networking, and systems.
Textbook: There is no required textbook for the course;
lectures are the sole required source of content.
Supplementary reading will be posted as part of the lecture schedule, below.
You might, however,
find one or more of the following books helpful:
- Probability and Computing: Randomized Algorithms and
Probabilistic Analysis by Mizenmacher/Upfal.
- Randomized Algorithms by Motwani/Raghavan.
- The Probabilistic Method by Alon/Spencer.
- Concentration of Measure for the Analysis of Randomized
Algorithms by Dubhashi/Panconesi.
- Exercise Sets (0%):
Exercise sets will be handed out weekly and are meant to help
you keep up with the course material.
Do not turn in your solutions. Think about them and discuss with
fellow students or the course staff any that you cannot answer.
At least half of each exam will consist of
problems on these exercise sets or variations thereof.
- Problem Sets (60%):
There will be 5 problem sets. Here is the schedule:
- Problem Set Policies:
- These are challenging and you are strongly encouraged to form
groups, of up to three students. Each group should turn in a single
write-up (all students of the group receive the same score).
- You can form different groups for different problem sets.
- You can discuss problems with students from other groups
verbally and at a high level only.
-
Except where otherwise noted, you may refer to your course notes, the
textbooks and research papers listed on the course Web
page only. You cannot refer to textbooks, handouts, or
research papers that are not listed on the course home page. If you
do use any approved sources, make you sure you cite them
appropriately, and make sure that all your words are your own.
- You are strongly encouraged to use LaTex to typeset your write-up.
Here's a LaTeX template that you can use to
type up solutions. Here
and here are good
introductions to LaTeX.
- Honor code: We expect you to abide by the computer science department's
policies and procedures regarding academic honesty.
- Submission instructions:
We are using Gradescope for the homework submissions. Go to
www.gradescope.com to either login or create a new
account. Use the
course code 9D6V5E to register for this class. Only one group member
needs to submit the assignment. When submitting, please remember to
add all group member names onto Gradescope.
- Late Days:
- One late day equals a 24-hour extension.
- Each student has three free late days.
- At most two late days can be applied to a single assignment;
after Friday (following the original due date) no solutions
will be accepted.
- Each late day used after the first two will result in a 25%
penalty.
- Example: a student had one free late day remaining but his/her
group uses two late days on a Problem Set. If the group's write-up
earns p points, the student receives a final score of .75*p points
for the assignment.
- Exams (40%): There will be two exams.
The first one is during class on Wed Oct 23, the second one
during class on Mon Dec 9.
The second exam will be a
non-cumulative 75-minute exam, weighted equally with the first exam.
- At least half of the questions will be exercise set
questions or variations thereof.
- The exam is closed-book/computer; however, you are allowed
to bring one
double-sided sheets (2 pages) of notes, which must be
handwritten and prepared by yourself.
Lecture Schedule
- Lecture 1 (Wed Sept 4): Course goals and deliverables.
Random sampling and abundance.
Randomized primality testing and the Miller-Rabin test.
Supplementary reading/videos:
- Lecture 2 (Mon Sept 9):
Lightning review of basic probability concepts. Linearity of expectation. A 7/8-approximation for MAX 3SAT. Other constraint satisfaction problems (CSPs). The Goemans-Williamson randomized hyperplane rounding algorithm for the Maximum Cut problem.
- Lecture 3 (Wed Sept 11): Karger's randomized
contraction algorithm for the Min Cut problem.
- Lecture 4 (Mon Sept 16): Hash functions as approximate random oracles. Hash table review (chaining and open addressing). Bloom filters: implementation and heuristic analysis.
- Lecture 5 (Wed Sept 18): Heavy hitters. The count-min sketch. Relaxing the random oracle assumption (theory and practice).
- Lecture 6 (Mon Sept 23):
The AMS streaming algorithm for F_2 estimation.
Supplementary references:
- Lecture 7 (Wed Sept 25):
Pseudorandom data and hashing. Why simple hash functions work as well
as fully random ones in practice.
References:
- Lecture 8 (Mon Sept 30):
Locality-sensitive hashing. Near-duplicate detection.
Jaccard similarity and min-wise hashing. Cosine similarity and a corresponding LSH family.
- Lecture 9 (Wed Oct 2):
Concentration inequalities and the Gaussian baseline.
From Chebyshev to the Weak Law of Large Numbers.
Beating Chebyshev: inspiration from the Central Limit Theorem.
Large deviation bounds for Gaussians.
- This material appears in numerous different textbooks,
including e.g. Chapter 9 of the Mitzenmacher-Upfal book listed above.
- Lecture 10 (Mon Oct 7):
Chernoff bounds and their proofs. Applications: correctness amplification for randomized algorithms with two-sided error; the expected maximum search time in a hash table with chaining.
- Lecture 11 (Wed Oct 9): The Johnson-Lindenstrauss (JL) Lemma.
Dimensionality reduction and its applications. A tail bound for sums of chi-squared random variables.
- Lecture 12 (Mon Oct 14): Killer apps of Chernoff bounds.
Randomized rounding and low-congestion routing. Random (Erdos-Renyi) random graphs and recovering planted bisections.
- For lecture notes on randomized rounding, see Section 4 of these lecture notes.
- Random graphs and planted bisections are discussed in Sections 2.1--2.3 of these lecture notes.
- Lecture 13 (Wed Oct 16):
Killer apps of Chernoff bounds in complexity theory. Newman's theorem in communication complexity (simulating public coins with private coins with logarithmic overhead). Adleman's theorem in computational complexity (i.e., BPP in P/poly).
- For the randomized one-way EQUALITY protocol, see Section 1 of these lecture notes.
- For Newman's theorem, see Section 3.2 of these lecture notes.
- For Adleman's theorem, see this video by Ryan O'Donnell (background starting at 1:00:55, Adleman's theorem at 1:07:15).
- Lecture [N/A] (Mon Oct 21):
Midterm review (optional).
- Lecture [N/A] (Wed Oct 23):
In-class midterm exam.
- Lecture 14 (Mon Oct 28):
PAC learning. Uniform convergence for finite hypothesis classes.
- Lecture 15 (Wed Oct 30):
Growth functions. Uniform convergence for hypothesis classes with polynomially bounded growth functions.
- Lecture [N/A] (Mon Nov 4):
No class (academic holiday).
- Lecture 16 (Wed Nov 6):
VC dimension. Sauer's Lemma. Intro to online learning.
- For VC dimension and Sauer's Lemma, see
these lecture notes by Nina Balcan.
- For an intro to online learning, see Section 2 of these
lecture notes.
- Lecture 17 (Mon Nov 11):
Regret-minimization in online learning.
Geometric random variables and the FTPL (Follow-the-Perturbed-Leader) algorithm.
- Lecture 18 (Wed Nov 13): Introduction to Markov chains. Existence and uniqueness of stationary distributions. Application: a simple sublinear-time algorithm for computing a perfect matching in a regular bipartite graph.
- Lecture 19 (Mon Nov 18): More on random walks. Cover times.
A randomized log-space algorithm for undirected connectivity. Introduction to Markov Chain Monte Carlo. Supplementary reading:
- Lecture 20 (Wed Nov 20): Expansion
characterizes rapid mixing of random walks on undirected graphs.
- Lecture 21 (Mon Nov 25): Power of two choices.
See Section 17.1 of the Mitzenmacher-Upfal book, listed above.
- Lecture [N/A] (Wed Nov 27):
No class (Thanksgiving).
- Lecture 22 (Mon Dec 2): Probabilistic method/Lovasz Local Lemma.
- Lecture 23 (Wed Dec 4): Schoning's randomized algorithm for 3-SAT.
- Lecture [N/A] (Mon Dec 9):
In-class final exam (non-cumulative).