S&DS Seminar: Theodor Misiakiewicz (Stanford)

decorative

Mason Lab 211 with remote access option, 9 Hillhouse Avenue, New Haven, CT 06520

Speaker: Theodor Misiakiewicz, Stanford University

In-Person seminars will be held at Mason Lab 211 with optional remote access:
(https://yale.hosted.panopto.com/Panopto/Pages/Sessions/List.aspx?folderID=f8b73c34-a27b-42a7-a073-af2d00f90ffa)

New Statistical and Computational Phenomena From Deep Learning

Abstract: Deep learning methodology has presented major challenges for statistical learning theory. Indeed deep neural networks often operate in regimes outside the realm of classical statistics and optimization wisdom. In this talk, we will consider two illustrative examples which clarify some of these new challenges. The first example considers an instance where kernel ridge regression with a simple RBF kernel achieves optimal test error when it perfectly fits the noisy training data. Why can we interpolate noisy data and still generalize well? Why can overfitting be benign in kernel ridge regression? The second example—computational in nature—considers fitting two different smooth ridge functions with deep neural networks (DNNs). Both can be estimated at the same near-parametric rate by DNNs trained with unbounded computational resources. However, empirically, learning becomes much harder for one of these functions when restricted to DNNs trained using SGD. Why does SGD succeed on some functions and fail on others? The goal of this talk will be to understand these two simulations. In particular, we will demonstrate quantitative theories that can precisely capture both phenomena.

Bio: My interest lies broadly at the intersection of statistics, machine learning, probability and computer science. Lately, I have been focusing on the statistical and computational aspects of deep learning, and the performance of kernel and random feature methods in high dimension. Some of the questions I am currently interested in: When can we expect neural networks to outperform kernel methods? When can neural networks beat the curse of dimensionality? On the other hand, what are the computational limits of gradient-trained neural networks? What structures in real data allows for efficient learning? When is overfitting benign? How much overparametrization is optimal? When can we expect universal or non-universal behavior in empirical risk minimization? Website: https://misiakie.github.io/

Monday, March 06, 2023

3:30pm – Pre-talk meet and greet teatime – Dana House, 24 Hillhouse Avenue

4:00pm – 5:00 pm – Talk – Mason Lab 211, 9 Hillhouse Avenue with the option of virtual participation