Disputation Maren Mahsereci
am Montag, 23. Juli 2018, um 10:30 Uhr im Magnetresonanzzentrum, Max Planck Campus, Seminarraum
Probabilistic Approaches to Stochastic Optimization
Berichterstatter 1: Prof. Dr. Philipp Hennig
Berichterstatter 2: Prof. Dr. Ulrike von Luxburg
Kurze Zusammenfassung des Vortrags:
Optimization is a cardinal concept in the sciences, and viable algorithms of high practical relevance
since they solve optimization problems. Empirical risk minimization is a major workhorse, in
particular in machine learning applications, where an input-target relation is learned in a supervised
manner. Empirical risks with high-dimensional inputs are mostly optimized by greedy, gradientbased,
and possibly stochastic optimization routines, such as stochastic gradient descent.
Though popular, and practically successful, this setup has certain downsides which often makes it
finicky to work with. Sometimes it is also the bottleneck, time-sink or major cost-factor in a larger
chain of learning procedures. For instance, typical issues are:
• Overfitting of a parametrized model to the data. This may lead to poor generalization
performance on unseen data.
• The manual or semi-automated tuning of algorithmic parameters, such as learning rates, is
tedious, inefficient, and costly; especially when tuning is done in an outer-loop that requires
multiple runs of the same optimizer.
• Stochastic losses and gradients occur due to sub-sampling of a large dataset. They only
yield incomplete information about the empirical risk and are thus difficult to handle from a
decision-making point of view.
The talk will mainly focus on the first two out of the three points listed above. In particular I will
motivate and introduces a probabilistic line search algorithm which is a novel way of setting learning
rates during the runtime of a stochastic optimizer at low cost and in a fully automated way. The
method removes the need for any explorative experiments and only one run is required to train the
models to convergence in test accuracy.
The second part of the talk will focus on an innovative way of early-stopping an optimizer in order to
improve generalization performance. The method does not require a held-out validation set and has
thus the potential to improve test accuracy for small to mid-sized datasets. For this I will introduce a
statistical estimator which assesses if stochastic gradients can be fully explained by the noise
induced by conditioning on a finite dataset of a richer data-distribution.