Continuous and Discrete Optimization are two important pillars of machine learning. Continuous
optimization typically occurs in learning model parameters while discrete optimization problems
appear in inference and auxiliary tasks such as feature selection, data subset selection, model
compression etc.
In the first part of this course, we will cover continuous optimization with applications in
machine learning.
Topics to discuss will include Convexity, Gradient Methods, Proximal algorithms, Stochastic and
Online Variants of mentioned methods, Coordinate Descent Methods, Subgradient Methods,
Non-Convex Optimization, FrankWolfe, Accelerated Methods, Lagrange and Fenchel Duality,
Second-Order Methods, Quasi-Newton Methods, Gradient-Free and Zero-Order Optimization.
We will also cover some advanced topics including non-convex optimization, alternating
minimization algorithms and parallel/distributed variants of algorithms.
We will ground all these algorithms with applications and loss functions in machine learning
(starting from logistic regression, linear regression, svms, matrix factorization right up to
deep neural networks).
Summarily, in Continuous Optimization we will cover topics such as Basics of Continuous
Optimization, Convexity, Gradient Descent, Projected/Proximal GD, Subgradient Descent,
Accelerated Gradient Descent, Newton & Quasi Newton, Duality: Legrange, Fenchel, Coordinate
Descent, Frank Wolfe, Continuous Optimization in Practice, etc.
The second part of this course will cover the fundamentals of discrete optimization.
We will start with basic forms of combinatorial optimization (knapsack, s-t cuts/paths,
matchings and matroids) and then discuss submodular functions and their applications.
We will cover algorithms for submodular minimization and maximization under various
combinatorial constraints.
We will also discuss discrete probabilistic models such as Determinantal Point Processes (DPPs).
We will ground all these algorithmic and theoretical results with real world applications in
feature selection, summarization and diversified search, structured prediction, data subset
selection and model compression.
Summarily, in Discrete Optimization we will cover topics such as Linear Cost Problems with
Applications in ML, Matroids, Spanning Trees, s-t paths, s-t cuts, Matchings, Covers (Set
Covers, Vertex Covers, Edge Covers), Non-Linear Discrete Optimization, Submodular Functions and
Applications in Machine Learning, Submodularity and Convexity, Submodular Minimization,
Submodular Maximization, Beyond Minimization and Maximization: Other forms of Submodular
Optimization, Discrete Optimization in Practice
References
S303251bastien Bubeck, Convex Optimization: Algorithms and Complexity , In Foundations and
Trends in Machine Learning 8(3-4): 231-357 Online copy: https://arxiv.org/abs/1405.4980
Zhang, Aston, et al., Dive into Deep Learning , Chapter 11, Unpublished draft. Retrieved 3
(2019): 319, by Zhang, Lipton, Li and Smola. Online copy: http://d2l.ai/
Bach, Francis. "Learning with submodular functions: A convex optimization perspective."
Foundations and Trends302256 in Machine Learning 6.2-3 (2013): 145-373. Online copy: https://arxiv.org/pdf/1111.6453.pdf