FDS Special Seminar: Kasper Green Larsen, “Bagging is an Optimal PAC Learner”


Yale Institute for Foundations of Data Science Common Area, Kline Tower 13th Floor, Common Area 1307, New Haven, CT 06511

Kasper Green Larsen (Aarhus University)

Lunch begins at 12:00 pm (13th floor Kitchen)
Talk begins at 12:15 pm
at the Yale institute for Foundations of Data Science, Kline Tower, 13th Floor

Speaker: Kasper Green Larsen, Ph.D.
Head of Algorithms, Data Structures and Foundations of Machine Learning
Department of Computer Science, Aarhus University

Title: Bagging is an Optimal PAC Learner

Abstract: Determining the optimal sample complexity of PAC learning in the realizable setting was a central open problem in learning theory for decades. Finally, the seminal work by Hanneke (2016) gave an algorithm with a provably optimal sample complexity. His algorithm is based on a careful and structured sub-sampling of the training data and then returning a majority vote among hypotheses trained on each of the sub-samples. While being a very exciting theoretical result, it has not had much impact in practice, in part due to inefficiency, since it constructs a polynomial number of sub-samples of the training data, each of linear size.

In this talk, we prove the surprising result that the practical and classic heuristic Bagging (a.k.a. bootstrap aggregation), due to Breiman (1996), is in fact also an optimal PAC learner. Bagging pre-dates Hanneke’s algorithm by twenty years and is taught in most undergraduate machine learning courses. Moreover, we show that it only requires a logarithmic number of sub-samples to reach optimality.

This work was published at COLT’23 and received the Best Paper Award.

Speaker Bio: I’m a Full Professor at the Department of Computer Science at Aarhus University where I head the Algorithms, Data Structures and Foundations of Machine Learning research group. My research interests span many areas of theoretical computer science, including theory of machine learning, data structures, algorithms, lower bounds and cryptography. I also serve as a Senior Scientific Expert at the Quantum and HPC startup Kvantify.

I received my Ph.D. from Aarhus University in 2013, under the supervision of Professor Lars Arge. I had the pleasure of being a member of the Young Academy under the Royal Danish Academy of Sciences and Letters from 2017-2022.

Website: https://cs.au.dk/~larsen/