Hyper-parameters are ubiquitous in optimization algorithms, the two most common ones being the step-size and the momentum. While their careful selection is crucial to get the most of algorithms, usual automatic selection techniques (e.g., line-search) often fail in modern applications. In this talk we present new automatic hyper-parameter tuning methods based on theoretical considerations.
We mainly focus on Nesterov's algorithm. It features the intriguing momentum parameter $(k-1)/(k+2)$ that depends on the iteration index $k$. This means that the initial iteration index is also an hyper-parameter that affects the performance of the algorithm.
From a dynamical system perspective, we overcome this issue by replacing the momentum parameter by the square root of a Lyapunov function, hence coupling the momentum with the speed of convergence of the system. We show that the resulting method achieves a convergence rate arbitrarily close to the optimal one while getting rid of two hyper-parameters.
A general introduction to optimization for machine learning will be given, and the case of step-size tuning for deep learning will also be discussed, if time allows.
This is joint work with S. Maier and P. Ochs.