SEEM 5380: Optimization Methods for High-Dimensional Statistics

2021-22 Second Term


Announcements


General Information

  • Instructor: Anthony Man-Cho So (manchoso at se.cuhk.edu.hk)
  • Office Hours: By appointment, in ERB 604 or online
  • Lecture Time/Location:
    • Tuesdays 9:30am - 11:15am, in KKB 101 online via ZOOM
    • Fridays 9:30am - 11:15am, in ERB 804 online via ZOOM
  • Online Q&A Forum: Follow this link.

Course Description

The prevalence of high-dimensional data has motivated active research on efficient methods for tackling optimization problems that arise in statistical analysis. In this course, we will give an introduction to this exciting area of research, with emphasis on the theory of structured regularizers for high-dimensional statistics and the design and analysis of statistically and computationally efficient optimization algorithms. Applications in various areas of science and engineering, such as machine learning, signal processing, and statistics, will also be discussed. Prerequisite: ENGG 5501 or equivalent.

Course Outline (subject to change)

Course Requirements

Homework sets (60%) and a take-home final examination (40%)

Open problems will be introduced throughout the semester. Students who solve any one of the open problems will automatically receive an A for the course.

General References

Schedule and Reading

  • Week 1 - Jan 11: Introduction Notes
                   Jan 14: Linear regression in the over-determined case: Estimation error and spectrum of random Gaussian matrix. Notes
    Reading:
    • The material for the lecture on Jan 14 is taken from Chapter 4 of Vershynin: High-Dimensional Probability: An Introduction with Applications in Data Science; read Sections 4.1, 4.2, 4.4.1 and 4.6. You can also refer to my lecture summary. The result in Section 4.6 applies to a more general family of random matrices than that covered in class.

  • Week 2 - Jan 18: Linear regression in the over-determined case: Optimization error and convergence analysis of gradient method for smooth strongly convex minimization. Notes
                   Jan 21: Linear regression in the under-determined case: Regularization and notion of decomposability. Notes
    Reading:
    • The material for the lecture on Jan 18 is taken from Chapter 2 of Nesterov: Lectures on Convex Optimization (2nd Edition); read Sections 2.1.3 and 2.1.5. You can also refer to my lecture summary.
    • The material for the lecture on Jan 21 is taken from Chapter 9 of Wainwright: High-Dimensional Statistics: A Non-Asymptotic Viewpoint; read Sections 9.1 and 9.2.1. You can also refer to my lecture summary.

  • Week 3 - Jan 25: General models: Localization of error vector and the notion of restricted strong convexity. Notes
                   Jan 28: General models: Bound on the estimation error. Notes
    Reading:
    • The material for the lecture on Jan 25 is taken from Chapter 9 of Wainwright: High-Dimensional Statistics: A Non-Asymptotic Viewpoint; read Sections 9.2 and 9.3.
    • The material for the lecture on Jan 28 is taken from Chapter 9 of Wainwright: High-Dimensional Statistics: A Non-Asymptotic Viewpoint; read Section 9.4.1. We will discuss an application of the estimation error bound to LASSO with hard sparsity in the next lecture. If you want to read ahead, see Sections 7.3.1-7.3.2.
    • Lecture summary.

  • Week 4 - Feb 8: Estimation error bound for LASSO with hard sparsity. Notes
                   Feb 11: Establishing the restricted strong convexity property for sparse linear regression - Part I. Notes
    Reading:
    • The material for the lecture on Feb 8 is taken from Chapter 7 of Wainwright: High-Dimensional Statistics: A Non-Asymptotic Viewpoint; read Sections 7.3.1-7.3.2. You can also refer to my lecture summary.
    • For other applications of the general estimation error bound, you can read Sections 9.5-9.7 of Wainwright: High-Dimensional Statistics: A Non-Asymptotic Viewpoint.

    • The material for the lecture on Feb 11 is taken from Chapter 7 of Wainwright: High-Dimensional Statistics: A Non-Asymptotic Viewpoint; read Theorem 7.16 and its proof in Section 7.6. You can also refer to my lecture summary.
    • The proof of Gordon's inequality can be found in Chapter 8.7 of
      Foucart, Rauhut. A Mathematical Introduction to Compressive Sensing. Springer Science+Business Media, New York, 2013.
    • Further techniques for dealing with Gaussian random processes can be found in Chapter 7 of Vershynin: High-Dimensional Probability: An Introduction with Applications in Data Science; read in particular Sections 7.1-7.3.

  • Week 5 - Feb 15: Establishing the restricted strong convexity property for sparse linear regression - Part II. Notes
                   Feb 18: Algorithmic aspects of regularized loss minimization problems - Part I: The proximal gradient method. Notes
    Reading:
    • The material for the lecture on Feb 11 is taken from Chapter 7 of Wainwright: High-Dimensional Statistics: A Non-Asymptotic Viewpoint; read Theorem 7.16 and its proof in Section 7.6. Lecture summary.
    • The proof of measure concentration of Lipschitz functions of Gaussian random variables can be found in Appendix V, Theorem V.1 of
      Milman, Schechtman. Asymptotic Theory of Finite Dimensional Normed Spaces. Lecture Notes in Mathematics, volume 1200, Springer-Verlag, 2001.
    • It is possible to obtain comparison inequalities for sub-Gaussian processes. The interested reader can refer to Chapter 8 of Vershynin: High-Dimensional Probability: An Introduction with Applications in Data Science (particularly Section 8.6) for the setup and results.
    • Concentration inequalities for Lipschitz functions can be obtained for various probability measures that exhibit the concentration phenomenon. The interested reader can refer to Ledoux: The Concentration of Measure Phenomenon, American Mathematical Society, 2001 (particularly Chapter 1) for details.

    • Here is my lecture summary for the Feb 18 lecture.
    • Some useful properties of the proximal map can be found in
      Combettes, Wajs. Signal Recovery by Proximal Forward-Backward Splitting. Multiscale Modeling and Simulation 4(4): 1168-1200, 2005.
      Rockafellar, Wets. Variational Analysis. Grundlehren der Mathematischen Wissenschaften 317, Springer-Verlag, 2009. (Particularly Chapters 1.G and 2.D)

  • Week 6 - Feb 22: Algorithmic aspects of regularized loss minimization problems - Part II: Basic convergence analysis of the proximal gradient method. Notes
                   Feb 25: Algorithmic aspects of regularized loss minimization problems - Part III: Local linear convergence of the proximal gradient method under error bound condition. Notes
    Reading:
  • Week 7 - Mar 1: Error bounds for regularized smooth convex minimization. Notes
                   Mar 4: Error bounds for regularized smooth convex minimization (cont'd). Notes
    Reading:
  • Week 8 - Mar 8: Error bounds for regularized smooth convex minimization (cont'd). Notes
                   Mar 11: Error bounds for regularized smooth convex minimization (cont'd). Notes
    Reading:
    • The material for the lectures on Mar 8 and Mar 11 can be found in my lecture summary.
    • The outer Lipschitz continuity property used in the lecture is discussed in Sections 3C and 3D (particularly Theorem 3D.1) of
      Dontchev, Rockafellar. Implicit Functions and Solution Mappings. Springer Monographs in Mathematics, Springer Science+Business Media, LLC, 2009.

    • The statistical nature of regularized loss minimization problems can be exploited to develop an alternative convergence analysis of first-order methods. This is done, e.g., in
      Agarwal, Negahban, Wainwright. Fast Global Convergence of Gradient Methods for High-Dimensional Statistical Recovery. Annals of Statistics 40(5): 2452-2482, 2012. Supplementary Material

      The above paper shows that for several classes of regularized loss minimization problems, certain first-order methods enjoy global linear convergence up to the statistical precision of the model. Modulo the global vs. local nature of the linear convergence, this is weaker than the convergence result derived from the error bound theory. Nevertheless, the approach developed in the above paper can be used to tackle problems that are not known to possess an error bound.

  • Week 9 - Mar 15: Non-convex regularized loss minimization: Motivation and examples. Notes
                   Mar 18: Non-convex regularizers and their basic properties. Notes
    Reading:
  • Week 10 - Mar 22: Non-convex regularized loss minimization: Bound on estimation error. Notes
                     Mar 25: Weakly convex optimization: Basic setup and stationarity measure. Notes
    Reading:
  • Week 11 - Mar 29: Weakly convex optimization: Iteration complexity analysis of projected stochastic subgradient method. Notes
                     Apr 1: Synchronization problems: Examples. Phase synchronization: Basic setup. Notes
    Reading:
  • Week 12 - Apr 5: Ching Ming Festival.
                     Apr 8: Phase synchronization: Spectral estimator and its estimation error. Notes
    Reading:
  • Week 13 - Apr 12: Phase Synchronization: Entrywise error bound of spectral estimator. Notes
                     Apr 15: Easter holiday.
    Reading:
    • The material for the lecture on Apr 12 is based on
      Zhong, Boumal. Near-Optimal Bounds for Phase Synchronization. SIAM Journal on Optimization 28(2): 989-1016, 2018.

      The version of the Davis-Kahan theorem mentioned in the lecture is taken from the above paper. For further reading on the Davis-Kahan theorem and Weyl's inequality, you can refer to
      Stewart, Sun. Matrix Perturbation Theory. Academic Press, 1990.

  • Week 14 - Apr 19: Phase Synchronization: Entrywise error bound of spectral estimator (cont'd). Notes
                     Apr 22: Analysis of the projected gradient method for phase synchronization. Notes
    Reading:

Homework Sets

  • Homework 1: Do Problems 1, 2, 3, and 5 in the Exercise Sheet (due March 1, 2022)


Last Updated: April 22, 2022