SEEM5020 Algorithms for Big Data

Offered by Sibo Wang, Fall 2023.

 
Brief Description

In this course, we will focus on important topics to deal with massive datasets.  The topics will cover basic tail bounds frequently used for approximation algorithm design, various sketches for efficient summarization of big data, similarity search techniques like LSH for large-scale data, and topics on dealing with matrix data and graph data. Next, we cover different models like the external memory model (for I/O algorithms), the MPC model (for distributed algorithms), the work-depth model (for shared-memory parallel algorithms), and the corresponding algorithms fitted to such modeling. Finally, some topics on sampling-based large-scale graph algorithm designs will be covered.

Time and Venues

Class:  Friday 1:30pm - 4:15pm, William M W Mong Eng Bldg 407

Office hour: Friday 4:30pm - 5:15pm, William M W Mong Eng Bldg 507

 Lecture Notes
Week(s) Lecture Notes Slides
1-2 A review of probability concepts and concentration bounds Lec 1
3 Streaming algorithms (i) Lec 2
4 Streaming algorithms (ii) Lec 3
5 Nearest Neighbor Search: Locality Sensitivity Hashing Lec 4
6 Dimension Reduction:Johnson-Lindenstrauss Lemma Lec 5
7 Midterm Exam Midterm exam
8 Dimension Reduction: PCA and SVD Lec 6
9 Matrix Sketching Lec 7
10-11 Application of Sketches in Graphs Lec 8
12 Sampling-based Graph Algorithm Design Lec 9
13 Final Exam Final exam
 
Homework
Project
  • Total weight: 20%
  • Due date: 8:00 pm, 26/Nov/2023
  • Project details: Project