SEEM 5380: Optimization Methods for High-Dimensional Statistics
2018-19 Second Term
Announcements
General Information
Course Description
The prevalence of high-dimensional data has motivated active research on efficient methods for tackling optimization problems that arise in statistical analysis. In this course, we will give an introduction to this exciting area of research, with emphasis on the theory of structured regularizers for high-dimensional statistics and the design and analysis of statistically and computationally efficient optimization algorithms. Applications in various areas of science and engineering, such as machine learning, signal processing, and statistics, will also be discussed. Prerequisite: ENGG 5501 or equivalent.
Course Outline (subject to change)
Course Requirements
Homework sets (60%) and a take-home final examination (40%)
Open problems will be introduced throughout the semester. Students who solve any one of the open problems will automatically receive an A for the course.
General References
Schedule and Reading
Jan 8: Linear regression with strong convexity: statistical error and convergence analysis of the gradient method.
Reading:
Negahban, Ravikumar, Wainwright, Yu. A Unified Framework for High-Dimensional Analysis of M-Estimators with Decomposable Regularizers. Statistical Science 27(4): 538-557, 2012. Supplementary Material
Nesterov: Introductory Lectures on Convex Optimization: A Basic Course, Kluwer Academic Publishers, 2004.
Reading:
Raskutti, Wainwright, Yu. Restricted Eigenvalue Properties for Correlated Gaussian Designs. Journal of Machine Learning Research 11: 2241-2259, 2010.
Reading:
Jan 22: Establishing the restricted strong convexity property for sparse linear regression.
Reading:
Foucart, Rauhut. A Mathematical Introduction to Compressive Sensing. Springer Science+Business Media, New York, 2013.
Raskutti, Wainwright, Yu. Minimax Rates of Estimation for High-Dimensional Linear Regression over $\ell_q$-Balls. IEEE Transactions on Information Theory 57(10): 6976-6994, 2011.
Rudelson, Zhou. Reconstruction from Anisotropic Random Measurements. IEEE Transactions on Information Theory 59(6): 3434-3447, 2013.
Zhou. Restricted Eigenvalue Conditions on Subgaussian Random Matrices. Technical Report, 2009.
Negahban, Wainwright. Restricted Strong Convexity and Weighted Matrix Completion: Optimal Bounds with Noise. Journal of Machine Learning Research 13: 1665-1697, 2012.
Reading:
Milman, Schechtman. Asymptotic Theory of Finite Dimensional Normed Spaces. Lecture Notes in Mathematics, volume 1200, Springer-Verlag, 2001.
Jan 30: Algorithmic aspects of regularized loss minimization problems - The proximal gradient method.
Reading:
Combettes, Wajs. Signal Recovery by Proximal Forward-Backward Splitting. Multiscale Modeling and Simulation 4(4): 1168-1200, 2005.
Rockafellar, Wets. Variational Analysis. Grundlehren der Mathematischen Wissenschaften 317, Springer-Verlag, 2009. (Particularly Chapters 1.G and 2.D)
Reading:
Hou, Zhou, So, Luo. On the Linear Convergence of the Proximal Gradient Method for Trace Norm Regularization. Advances in Neural Information Processing Systems 26: Proceedings of the 2013 Conference (NIPS 2013), pp. 710-718, 2013.
See, in particular, Theorem 3.2.
WARNING: The conclusion of Theorem 3.1 of the above paper is not correct. This is due to a flaw in the proof. The correct conclusion appears in Proposition 12 of
Zhou, So. A Unified Approach to Error Bounds for Structured Convex Optimization Problems. Mathematical Programming, Series A 165(2): 689-728, 2017.
Sra. Scalable Nonconvex Inexact Proximal Splitting. Advances in Neural Information Processing Systems 25: Proceedings of the 2012 Conference (NIPS 2012), pp. 530-538, 2012.
However, to obtain the desired conclusion, an envelope theorem should be invoked in the proof. In fact, just having a unique solution to displayed equation (18) of the above paper is not sufficient to guarantee the differentiability of the function Eg. The envelope theorem mentioned on p.8 of the notes comes from Proposition 1 of
Oyama, Takenawa. On the (Non-)Differentiability of the Optimal Value Function when the Optimal Solution is Unique. Journal of Mathematical Economics 76: 21-32, 2018.
Envelope theorems have many applications in economics; see the above paper and the pointers therein for details.
Agarwal, Negahban, Wainwright. Fast Global Convergence of Gradient Methods for High-Dimensional Statistical Recovery. Annals of Statistics 40(5): 2452-2482, 2012. Supplementary Material
The above paper shows that for several classes of regularized loss minimization problems, certain first-order methods enjoy global linear convergence up to the statistical precision of the model. Modulo the global vs. local nature of the linear convergence, this is weaker than the convergence result derived from the error bound theory. Nevertheless, the approach developed in the above paper can be used to tackle problems that are not known to possess an error bound.
Feb 13: Algorithmic aspects of regularized loss minimization problems - Error Bounds.
Reading:
Zhou, So. A Unified Approach to Error Bounds for Structured Convex Optimization Problems. Mathematical Programming, Series A 165(2): 689-728, 2017.
Güler. Foundations of Optimization. Graduate Texts in Mathematics 258, Springer Science+Business Media, LLC, 2010.
Bauschke, Borwein. On Projection Algorithms for Solving Convex Feasibility Problems. SIAM Review 38(3): 367-426, 1996.
Dontchev, Rockafellar. Implicit Functions and Solution Mappings. Springer Monographs in Mathematics, Springer Science+Business Media, LLC, 2009.
Reading:
Yue, Zhou, So. A Family of SQA Methods for Non-Smooth Non-Strongly Convex Minimization with Provable Convergence Guarantees. Accepted for publication in Mathematical Programming, Series A, 2018.
Reading:
Loh, Wainwright. High-Dimensional Regression with Noisy and Missing Data: Provable Guarantees with Nonconvexity. Annals of Statistics 40(3): 1637-1664, 2012. Supplementary Material
Loh, Wainwright. Regularized M-estimators with Nonconvexity: Statistical and Algorithmic Theory for Local Optima. Journal of Machine Learning Research 16(Mar): 559-616, 2015.
Fan, Li. Variable Selection via Nonconcave Penalized Likelihood and Its Oracle Properties. Journal of the American Statistical Association 96(456): 1348-1360, 2001.
Zhang. Nearly Unbiased Variable Selection under Minimax Concave Penalty. Annals of Statistics 38(2): 894-942, 2010.
For an in-depth analysis of the role of various non-convex regularizers in sparse estimation problems, see
Zhang, Zhang. A General Theory of Concave Regularization for High-Dimensional Sparse Estimation Problems. Statistical Science 27(4): 576-593, 2012. Supplementary Material
Reading:
Loh, Wainwright. Regularized M-estimators with Nonconvexity: Statistical and Algorithmic Theory for Local Optima. Journal of Machine Learning Research 16(Mar): 559-616, 2015.
Reading:
Loh, Wainwright. Regularized M-estimators with Nonconvexity: Statistical and Algorithmic Theory for Local Optima. Journal of Machine Learning Research 16(Mar): 559-616, 2015.
Mar 19: Phase Synchronization: Formulation and preliminary observations.
Reading:
Liu, Yue, So. On the Estimation Performance and Convergence Rate of the Generalized Power Method for Phase Synchronization. SIAM Journal on Optimization 27(4): 2426-2446, 2017.
Mar 26: Phase Synchronization: Convergence analysis of the Generalized Power Method.
Reading:
Liu, Yue, So. On the Estimation Performance and Convergence Rate of the Generalized Power Method for Phase Synchronization. SIAM Journal on Optimization 27(4): 2426-2446, 2017.
Absil, Mahony, Sepulchre. Optimization Algorithms on Matrix Manifolds. Princeton University Press, 2008.
Apr 2: Error bound for Phase Synchronization.
Reading:
Zhong, Boumal. Near-Optimal Bounds for Phase Synchronization. SIAM Journal on Optimization 28(2): 989-1016, 2018.
Reading:
Liu, Yue, So, Ma. A Discrete First-Order Method for Large-Scale MIMO Detection with Provable Guarantees. Proceedings of the 18th IEEE International Workshop on Signal Processing Advances in Wireless Communications (SPAWC 2017), pp. 669-673, 2017.
So, Ye. Probabilistic Analysis of Semidefinite Relaxation Detectors for Multiple-Input Multiple-Output Systems. Convex Optimization in Signal Processing and Communications (Palomar and Eldar eds.), pp. 166-191, Cambridge University Press, 2010.
Vershynin. Introduction to the Non-Asymptotic Analysis of Random Matrices. Compressed Sensing: Theory and Applications (Eldar and Kutyniok eds.), pp. 210-268, 2012.
Reading:
Amini, Levina. On Semidefinite Relaxations for the Block Model. Annals of Statistics 46(1): 149-179, 2018.
Fei, Chen. Exponential Error Rates of SDP for Block Models: Beyond Grothendieck’s Inequality. IEEE Transactions on Information Theory 65(1): 551-571, 2019.
Homework Sets
Last Updated: April 23, 2019