SEEM 5380: Optimization Methods for High-Dimensional Statistics

2018-19 Second Term

Announcements

NEW: The take-home final is here. It is due at 5pm on May 7, 2019. You only need to answer any four of the five problems. You must work on the exam alone. In other words, you cannot discuss with anyone. However, you may use any material, provided that proper citation is given. Please contact the course instructor if you need any clarification or have any questions.
The exercise sheet has been updated.
Homework 2: Do Problems 6, 8, and 9 in the Exercise Sheet (due March 26, 2019).
The notes for the February 13, 2019 lecture has been updated.
Here is a collection of exercises for the class. The document will be updated regularly. You are encouraged to work out all the problems. Some problems will be selected as homework exercises later.
Welcome to SEEM 5380! The first class will be held on January 7, 2019.

General Information

Instructor: Anthony Man-Cho So (manchoso at se.cuhk.edu.hk)
Office Hours: Tuesdays 4:00pm - 5:30pm or by appointmnt, in ERB 604
Lecture Time/Location:
- Mondays 4:30pm - 6:15pm, in LSB LT5
- Tuesdays 1:30pm - 3:15pm, in LHC 104

Course Description

The prevalence of high-dimensional data has motivated active research on efficient methods for tackling optimization problems that arise in statistical analysis. In this course, we will give an introduction to this exciting area of research, with emphasis on the theory of structured regularizers for high-dimensional statistics and the design and analysis of statistically and computationally efficient optimization algorithms. Applications in various areas of science and engineering, such as machine learning, signal processing, and statistics, will also be discussed. Prerequisite: ENGG 5501 or equivalent.

Course Outline (subject to change)

Course Requirements

Homework sets (60%) and a take-home final examination (40%)

Open problems will be introduced throughout the semester. Students who solve any one of the open problems will automatically receive an A for the course.

General References

Nesterov: Introductory Lectures on Convex Optimization: A Basic Course, Kluwer Academic Publishers, 2004.

Schedule and Reading

Week 1 - Jan 7: Introduction
Jan 8: Linear regression with strong convexity: statistical error and convergence analysis of the gradient method.
Reading:
- The lecture on Jan 8 is in preparation for the following paper, which will be covered in the week of Jan 14:
  Negahban, Ravikumar, Wainwright, Yu. A Unified Framework for High-Dimensional Analysis of M-Estimators with Decomposable Regularizers. Statistical Science 27(4): 538-557, 2012. Supplementary Material
- Our convergence analysis of the gradient method for smooth strongly convex minimization is based on Chapter 2.1 of
  Nesterov: Introductory Lectures on Convex Optimization: A Basic Course, Kluwer Academic Publishers, 2004.
- Lecture Notes
Week 2 - Jan 14, 15: The notions of decomposable regularizer and restricted strong convexity. Bound on statistical error for general models.
Reading:
- Negahban, Ravikumar, Wainwright, Yu. A Unified Framework for High-Dimensional Analysis of M-Estimators with Decomposable Regularizers. Statistical Science 27(4): 538-557, 2012. Supplementary Material
- Lecture Notes
- In the week of Jan 21, we will specialize the discussed results to certain statistical models. See
  Raskutti, Wainwright, Yu. Restricted Eigenvalue Properties for Correlated Gaussian Designs. Journal of Machine Learning Research 11: 2241-2259, 2010.
Week 3 - Jan 21: Statistical error bound for sparse linear regression.
Reading:
- Lecture Notes
Jan 22: Establishing the restricted strong convexity property for sparse linear regression.
Reading:
- Raskutti, Wainwright, Yu. Restricted Eigenvalue Properties for Correlated Gaussian Designs. Journal of Machine Learning Research 11: 2241-2259, 2010.
- The proof of Gordon's inequality can be found in Chapter 8.7 of
  Foucart, Rauhut. A Mathematical Introduction to Compressive Sensing. Springer Science+Business Media, New York, 2013.
- For lower bounds on the statistical error for sparse linear regression, see
  Raskutti, Wainwright, Yu. Minimax Rates of Estimation for High-Dimensional Linear Regression over $\ell_q$-Balls. IEEE Transactions on Information Theory 57(10): 6976-6994, 2011.
- For extension of Raskutti et al.'s result to non-Gaussian settings, see
  Rudelson, Zhou. Reconstruction from Anisotropic Random Measurements. IEEE Transactions on Information Theory 59(6): 3434-3447, 2013.
  Zhou. Restricted Eigenvalue Conditions on Subgaussian Random Matrices. Technical Report, 2009.
- Lecture notes
- Besides sparse linear regression, matrix completion also enjoys the restricted strong convexity property; see
  Negahban, Wainwright. Restricted Strong Convexity and Weighted Matrix Completion: Optimal Bounds with Noise. Journal of Machine Learning Research 13: 1665-1697, 2012.
Week 4 - Jan 28: Establishing the restricted strong convexity property for sparse linear regression (cont'd)
Reading:
- The proof of measure concentration of Lipschitz functions of Gaussian random variables can be found in Appendix V, Theorem V.1 of
  Milman, Schechtman. Asymptotic Theory of Finite Dimensional Normed Spaces. Lecture Notes in Mathematics, volume 1200, Springer-Verlag, 2001.
- Lecture notes
Jan 30: Algorithmic aspects of regularized loss minimization problems - The proximal gradient method.
Reading:
- Lecture Notes
- Some useful properties of the proximal map can be found in
  Combettes, Wajs. Signal Recovery by Proximal Forward-Backward Splitting. Multiscale Modeling and Simulation 4(4): 1168-1200, 2005.
  Rockafellar, Wets. Variational Analysis. Grundlehren der Mathematischen Wissenschaften 317, Springer-Verlag, 2009. (Particularly Chapters 1.G and 2.D)
Week 5 - Feb 4, 6: Chinese New Year holidays
Week 6 - Feb 11: Convergence analysis of the proximal gradient method.
Reading:
- Lecture Notes
- The lecture material is based on
  Hou, Zhou, So, Luo. On the Linear Convergence of the Proximal Gradient Method for Trace Norm Regularization. Advances in Neural Information Processing Systems 26: Proceedings of the 2013 Conference (NIPS 2013), pp. 710-718, 2013.
  
  See, in particular, Theorem 3.2.
  
  WARNING: The conclusion of Theorem 3.1 of the above paper is not correct. This is due to a flaw in the proof. The correct conclusion appears in Proposition 12 of
  Zhou, So. A Unified Approach to Error Bounds for Structured Convex Optimization Problems. Mathematical Programming, Series A 165(2): 689-728, 2017.
- The lemma on p.6 of the notes (concerning certain monotonicity properties of the proximal map) first appears in
  Sra. Scalable Nonconvex Inexact Proximal Splitting. Advances in Neural Information Processing Systems 25: Proceedings of the 2012 Conference (NIPS 2012), pp. 530-538, 2012.
  
  However, to obtain the desired conclusion, an envelope theorem should be invoked in the proof. In fact, just having a unique solution to displayed equation (18) of the above paper is not sufficient to guarantee the differentiability of the function E_g. The envelope theorem mentioned on p.8 of the notes comes from Proposition 1 of
  Oyama, Takenawa. On the (Non-)Differentiability of the Optimal Value Function when the Optimal Solution is Unique. Journal of Mathematical Economics 76: 21-32, 2018.
  
  Envelope theorems have many applications in economics; see the above paper and the pointers therein for details.
- The statistical nature of regularized loss minimization problems can be exploited to develop an alternative convergence analysis of first-order methods. This is done, e.g., in
  Agarwal, Negahban, Wainwright. Fast Global Convergence of Gradient Methods for High-Dimensional Statistical Recovery. Annals of Statistics 40(5): 2452-2482, 2012. Supplementary Material
  
  The above paper shows that for several classes of regularized loss minimization problems, certain first-order methods enjoy global linear convergence up to the statistical precision of the model. Modulo the global vs. local nature of the linear convergence, this is weaker than the convergence result derived from the error bound theory. Nevertheless, the approach developed in the above paper can be used to tackle problems that are not known to possess an error bound.
Feb 13: Algorithmic aspects of regularized loss minimization problems - Error Bounds.
Reading:
- Lecture Notes
- The lecture material is based on
  Zhou, So. A Unified Approach to Error Bounds for Structured Convex Optimization Problems. Mathematical Programming, Series A 165(2): 689-728, 2017.
- A streamlined proof of Hoffman's error bound can be found in Section 11.8 of
  Güler. Foundations of Optimization. Graduate Texts in Mathematics 258, Springer Science+Business Media, LLC, 2010.
- The notion of linear regularity used in Corollary 1 of the lecture notes and other related notions are discussed at length in
  Bauschke, Borwein. On Projection Algorithms for Solving Convex Feasibility Problems. SIAM Review 38(3): 367-426, 1996.
- The outer Lipschitz continuity property used in the lecture is discussed in Sections 3C and 3D (particularly Theorem 3D.1) of
  Dontchev, Rockafellar. Implicit Functions and Solution Mappings. Springer Monographs in Mathematics, Springer Science+Business Media, LLC, 2009.
Week 7 - Feb 25, 26: Design and analysis of a successive quadratic approximation method for regularized loss minimization problems.
Reading:
- The lecture material is based on
  Yue, Zhou, So. A Family of SQA Methods for Non-Smooth Non-Strongly Convex Minimization with Provable Convergence Guarantees. Accepted for publication in Mathematical Programming, Series A, 2018.
- Lecture Notes
Week 8 - Mar 4, 5: Non-convex regularized loss minimization.
Reading:
- The lecture material is based on
  Loh, Wainwright. High-Dimensional Regression with Noisy and Missing Data: Provable Guarantees with Nonconvexity. Annals of Statistics 40(3): 1637-1664, 2012. Supplementary Material
  Loh, Wainwright. Regularized M-estimators with Nonconvexity: Statistical and Algorithmic Theory for Local Optima. Journal of Machine Learning Research 16(Mar): 559-616, 2015.
- Lecture Notes
- The Smoothly Clipped Absolute Deviation (SCAD) penalty and the Minimax Concave Penalty (MCP) penalty were introduced and analyzed respectively in
  Fan, Li. Variable Selection via Nonconcave Penalized Likelihood and Its Oracle Properties. Journal of the American Statistical Association 96(456): 1348-1360, 2001.
  Zhang. Nearly Unbiased Variable Selection under Minimax Concave Penalty. Annals of Statistics 38(2): 894-942, 2010.
  
  For an in-depth analysis of the role of various non-convex regularizers in sparse estimation problems, see
  Zhang, Zhang. A General Theory of Concave Regularization for High-Dimensional Sparse Estimation Problems. Statistical Science 27(4): 576-593, 2012. Supplementary Material
Week 9 - Mar 11, 12: Non-convex regularized loss minimization: Bound on statistical error.
Reading:
- The lecture material is based on
  Loh, Wainwright. Regularized M-estimators with Nonconvexity: Statistical and Algorithmic Theory for Local Optima. Journal of Machine Learning Research 16(Mar): 559-616, 2015.
- Lecture Notes
Week 10 - Mar 18: Non-convex regularized loss minimization: Bound on optimization error.
Reading:
- The lecture material is based on
  Loh, Wainwright. Regularized M-estimators with Nonconvexity: Statistical and Algorithmic Theory for Local Optima. Journal of Machine Learning Research 16(Mar): 559-616, 2015.
- Lecture Notes
Mar 19: Phase Synchronization: Formulation and preliminary observations.
Reading:
- The lecture material is based on
  Liu, Yue, So. On the Estimation Performance and Convergence Rate of the Generalized Power Method for Phase Synchronization. SIAM Journal on Optimization 27(4): 2426-2446, 2017.
- Lecture Notes
Week 11 - Mar 25: Phase Synchronization: The Generalized Power Method and bound on estimation error.
Mar 26: Phase Synchronization: Convergence analysis of the Generalized Power Method.
Reading:
- The lecture material is based on
  Liu, Yue, So. On the Estimation Performance and Convergence Rate of the Generalized Power Method for Phase Synchronization. SIAM Journal on Optimization 27(4): 2426-2446, 2017.
- Lecture Notes
- For an introduction to optimization on manifolds, see
  Absil, Mahony, Sepulchre. Optimization Algorithms on Matrix Manifolds. Princeton University Press, 2008.
Week 12 - Apr 1: Class cancelled.
Apr 2: Error bound for Phase Synchronization.
Reading:
- Lecture Notes
- Some improved probabilistic estimates for Phase Synchronization have recently been obtained in
  Zhong, Boumal. Near-Optimal Bounds for Phase Synchronization. SIAM Journal on Optimization 28(2): 989-1016, 2018.
Week 13 - Apr 8, 9: The Generalized Power Method for MIMO detection.
Reading:
- The lecture material is based on
  Liu, Yue, So, Ma. A Discrete First-Order Method for Large-Scale MIMO Detection with Provable Guarantees. Proceedings of the 18th IEEE International Workshop on Signal Processing Advances in Wireless Communications (SPAWC 2017), pp. 669-673, 2017.
- Lecture Notes
- The probabilistic estimates used can be found in
  So, Ye. Probabilistic Analysis of Semidefinite Relaxation Detectors for Multiple-Input Multiple-Output Systems. Convex Optimization in Signal Processing and Communications (Palomar and Eldar eds.), pp. 166-191, Cambridge University Press, 2010.
- A nice tutorial on the non-asymptotic theory of random matrices is
  Vershynin. Introduction to the Non-Asymptotic Analysis of Random Matrices. Compressed Sensing: Theory and Applications (Eldar and Kutyniok eds.), pp. 210-268, 2012.
Week 14 - Apr 15, 16: Community detection under the Stochastic Block Model
Reading:
- The lecture material is based on
  Amini, Levina. On Semidefinite Relaxations for the Block Model. Annals of Statistics 46(1): 149-179, 2018.
- Lecture Notes
- For further reading and some pointers on recent advances on the stochastic block model for community detection, see
  Fei, Chen. Exponential Error Rates of SDP for Block Models: Beyond Grothendieck�s Inequality. IEEE Transactions on Information Theory 65(1): 551-571, 2019.

Homework Sets

Homework 1: Do Problems 2, 3, and 5 in the Exercise Sheet (due February 25, 2019)
Homework 2: Do Problems 6, 8, and 9 in the Exercise Sheet (due March 26, 2019)

Last Updated: April 23, 2019