SEEM 5380: Optimization Methods for High-Dimensional Statistics
2021-22 Second Term
Announcements
General Information
in KKB 101 online via ZOOM
in ERB 804 online via ZOOM
Course Description
The prevalence of high-dimensional data has motivated active research on efficient methods for tackling optimization problems that arise in statistical analysis. In this course, we will give an introduction to this exciting area of research, with emphasis on the theory of structured regularizers for high-dimensional statistics and the design and analysis of statistically and computationally efficient optimization algorithms. Applications in various areas of science and engineering, such as machine learning, signal processing, and statistics, will also be discussed. Prerequisite: ENGG 5501 or equivalent.
Course Outline (subject to change)
Course Requirements
Homework sets (60%) and a take-home final examination (40%)
Open problems will be introduced throughout the semester. Students who solve any one of the open problems will automatically receive an A for the course.
General References
Schedule and Reading
Jan 14: Linear regression in the over-determined case: Estimation error and spectrum of random Gaussian matrix. Notes
Reading:
Jan 21: Linear regression in the under-determined case: Regularization and notion of decomposability. Notes
Reading:
Jan 28: General models: Bound on the estimation error. Notes
Reading:
Feb 11: Establishing the restricted strong convexity property for sparse linear regression - Part I. Notes
Reading:
Foucart, Rauhut. A Mathematical Introduction to Compressive Sensing. Springer Science+Business Media, New York, 2013.
Feb 18: Algorithmic aspects of regularized loss minimization problems - Part I: The proximal gradient method. Notes
Reading:
Milman, Schechtman. Asymptotic Theory of Finite Dimensional Normed Spaces. Lecture Notes in Mathematics, volume 1200, Springer-Verlag, 2001.
Combettes, Wajs. Signal Recovery by Proximal Forward-Backward Splitting. Multiscale Modeling and Simulation 4(4): 1168-1200, 2005.
Rockafellar, Wets. Variational Analysis. Grundlehren der Mathematischen Wissenschaften 317, Springer-Verlag, 2009. (Particularly Chapters 1.G and 2.D)
Feb 25: Algorithmic aspects of regularized loss minimization problems - Part III: Local linear convergence of the proximal gradient method under error bound condition. Notes
Reading:
Sra. Scalable Nonconvex Inexact Proximal Splitting. Advances in Neural Information Processing Systems 25: Proceedings of the 2012 Conference (NIPS 2012), pp. 530-538, 2012.
However, to obtain the desired conclusion, an envelope theorem should be invoked in the proof. In fact, just having a unique solution to displayed equation (18) of the above paper is not sufficient to guarantee the differentiability of the function Eg. The envelope theorem mentioned on p.8 of the lecture summary comes from Proposition 1 of
Oyama, Takenawa. On the (Non-)Differentiability of the Optimal Value Function when the Optimal Solution is Unique. Journal of Mathematical Economics 76: 21-32, 2018.
Envelope theorems have many applications in economics; see the above paper and the pointers therein for details.
Hou, Zhou, So, Luo. On the Linear Convergence of the Proximal Gradient Method for Trace Norm Regularization. Advances in Neural Information Processing Systems 26: Proceedings of the 2013 Conference (NIPS 2013), pp. 710-718, 2013.
See, in particular, Theorem 3.2. You can also refer to my lecture summary.
WARNING: The conclusion of Theorem 3.1 of the above paper is not correct. This is due to a flaw in the proof. The correct conclusion appears in Proposition 12 of
Zhou, So. A Unified Approach to Error Bounds for Structured Convex Optimization Problems. Mathematical Programming, Series A 165(2): 689-728, 2017.
Mar 4: Error bounds for regularized smooth convex minimization (cont'd). Notes
Reading:
Güler. Foundations of Optimization. Graduate Texts in Mathematics 258, Springer Science+Business Media, LLC, 2010.
Bauschke, Borwein. On Projection Algorithms for Solving Convex Feasibility Problems. SIAM Review 38(3): 367-426, 1996.
Mar 11: Error bounds for regularized smooth convex minimization (cont'd). Notes
Reading:
Dontchev, Rockafellar. Implicit Functions and Solution Mappings. Springer Monographs in Mathematics, Springer Science+Business Media, LLC, 2009.
Agarwal, Negahban, Wainwright. Fast Global Convergence of Gradient Methods for High-Dimensional Statistical Recovery. Annals of Statistics 40(5): 2452-2482, 2012. Supplementary Material
The above paper shows that for several classes of regularized loss minimization problems, certain first-order methods enjoy global linear convergence up to the statistical precision of the model. Modulo the global vs. local nature of the linear convergence, this is weaker than the convergence result derived from the error bound theory. Nevertheless, the approach developed in the above paper can be used to tackle problems that are not known to possess an error bound.
Mar 18: Non-convex regularizers and their basic properties. Notes
Reading:
Loh, Wainwright. High-Dimensional Regression with Noisy and Missing Data: Provable Guarantees with Nonconvexity. Annals of Statistics 40(3): 1637-1664, 2012. Supplementary Material
Loh, Wainwright. Regularized M-estimators with Nonconvexity: Statistical and Algorithmic Theory for Local Optima. Journal of Machine Learning Research 16(Mar): 559-616, 2015.
You can also refer to my lecture summary.
Fan, Li. Variable Selection via Nonconcave Penalized Likelihood and Its Oracle Properties. Journal of the American Statistical Association 96(456): 1348-1360, 2001.
Zhang. Nearly Unbiased Variable Selection under Minimax Concave Penalty. Annals of Statistics 38(2): 894-942, 2010.
For an in-depth analysis of the role of various non-convex regularizers in sparse estimation problems, see
Zhang, Zhang. A General Theory of Concave Regularization for High-Dimensional Sparse Estimation Problems. Statistical Science 27(4): 576-593, 2012. Supplementary Material
Mar 25: Weakly convex optimization: Basic setup and stationarity measure. Notes
Reading:
Loh, Wainwright. Regularized M-estimators with Nonconvexity: Statistical and Algorithmic Theory for Local Optima. Journal of Machine Learning Research 16(Mar): 559-616, 2015.
You can also refer to my lecture summary.
Vial. Strong and Weak Convexity of Sets and Functions. Mathematics of Operations Research 8(2): 231-259, 1983.
Davis, Drusvyatskiy. Stochastic Model-Based Minimization of Weakly Convex Functions. SIAM Journal on Optimization 29(1): 207-239, 2019.
Li, So, Ma. Understanding Notions of Stationarity in Nonsmooth Optimization. IEEE Signal Processing Magazine 37(5): 18-31, 2020.
Apr 1: Synchronization problems: Examples. Phase synchronization: Basic setup. Notes
Reading:
Davis, Drusvyatskiy. Stochastic Model-Based Minimization of Weakly Convex Functions. SIAM Journal on Optimization 29(1): 207-239, 2019.
The above paper establishes a sublinear rate of convergence to a near-approximately stationary point of the problem.
Loh, Wainwright. Regularized M-estimators with Nonconvexity: Statistical and Algorithmic Theory for Local Optima. Journal of Machine Learning Research 16(Mar): 559-616, 2015.
Fan, Liu, Sun, Zhang. I-LAMM for Sparse Learning: Simultaneous Control of Algorithmic Complexity and Statistical Error. Annals of Statistics 46(2): 814-841, 2018.
Apr 8: Phase synchronization: Spectral estimator and its estimation error. Notes
Reading:
Liu, Yue, So. On the Estimation Performance and Convergence Rate of the Generalized Power Method for Phase Synchronization. SIAM Journal on Optimization 27(4): 2426-2446, 2017.
You can also refer to my lecture summary.
Apr 15: Easter holiday.
Reading:
Zhong, Boumal. Near-Optimal Bounds for Phase Synchronization. SIAM Journal on Optimization 28(2): 989-1016, 2018.
The version of the Davis-Kahan theorem mentioned in the lecture is taken from the above paper. For further reading on the Davis-Kahan theorem and Weyl's inequality, you can refer to
Stewart, Sun. Matrix Perturbation Theory. Academic Press, 1990.
Apr 22: Analysis of the projected gradient method for phase synchronization. Notes
Reading:
Zhong, Boumal. Near-Optimal Bounds for Phase Synchronization. SIAM Journal on Optimization 28(2): 989-1016, 2018.
Liu, Yue, So. On the Estimation Performance and Convergence Rate of the Generalized Power Method for Phase Synchronization. SIAM Journal on Optimization 27(4): 2426-2446, 2017.
You can also refer to my lecture summary.
Absil, Mahony, Sepulchre. Optimization Algorithms on Matrix Manifolds. Princeton University Press, 2008.
Boumal. An Introduction to Optimization on Smooth Manifolds. 2022.
Homework Sets
Last Updated: April 22, 2022