Newsroom
Deep Learning
-
FDS Colloquium: Jinchao Xu (Kaust), “Finite Element versus Finite Neuron Methods”
Talk summary: This talk presents a unified framework connecting Barron and Sobolev spaces to analyze the approximation properties of ReLU$^k$ neural networks. It establishes both classical and new sharp approximation rates, showing that for functions in the relevant Barron space, ReLU$^k$ networks can achieve high accuracy without the curse of dimensionality. The same convergence rate…
-
S&DS Seminar: Jingfeng Wu (Berkeley), “Gradient Descent Dominates Ridge: A Statistical View on Implicit Regularization”
Talk summary: A key puzzle in deep learning is how simple gradient methods find generalizable solutions without explicit regularization. This talk discusses the implicit regularization of gradient descent (GD) through the lens of statistical dominance. Using least squares as a clean proxy, we present two surprising findings. First, GD dominates ridge regression. For any well-specified…
-
S&DS Seminar: Zhuoran Yang (Yale), “Unveiling In-Context Learning: Provable Training Dynamics and Feature Learning in Transformers”
Abstract: In-context learning (ICL) is a cornerstone of large language model (LLM) functionality, yet its theoretical foundations remain elusive due to the complexity of transformer architectures. In particular, most existing work only theoretically explains how the attention mechanism facilitates ICL under certain data models. It remains unclear how the other building blocks of the transformer contribute…
-
SDS Seminar: Blake Bordelon (Harvard), “Scaling Limits and Scaling Laws of Deep Learning”
Abstract: Scaling up the size and training horizon of deep learning models has enabled breakthroughs in computer vision and natural language processing. Empirical evidence suggests that these neural network models are described by regular scaling laws where performance of finite parameter models improves as model size increases, eventually approaching a limit described by the performance of…
