Back to Upcoming EventsThis Event has Passed
FDS Statistics & Data Science Seminar

Unveiling In-Context Learning: Provable Training Dynamics and Feature Learning in Transformers

Speaker: Zhuoran Yang (Yale)

Assistant Professor of Statistics and Data Science and Computer Science

Yale University

Monday, March 31, 2025

3:30PM - 5:00PM

3:30pm - Tea and snacks in 1307
4:00pm - Talk in 1327

and via Webcast: https://yale.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=23d16765-e107-4f8d-992a-b233012bcdb3

Abstract: In-context learning (ICL) is a cornerstone of large language model (LLM) functionality, yet its theoretical foundations remain elusive due to the complexity of transformer architectures. In particular, most existing work only theoretically explains how the attention mechanism facilitates ICL under certain data models. It remains unclear how the other building blocks of the transformer contribute to ICL. To address this question, we study how a simple softmax transformer is trained to perform ICL on two synthetic tasks — (multi-task) linear regression and n-gram Markov chain. We show that transformer successfully learns these tasks in-context. More importantly, we will interpret the estimator represented by the learned transformer, show how transformers are trained by gradient-based dynamics, and how features emerge during training. Our theory is further validated by experiments.

This is joint work with Siyu Chen, Jianliang He, Xintian Pan, Heejune Sheen, and Tianhao Wang.

Speaker bio: Zhuoran Yang is an Assistant Professor of Statistics and Data Science and Computer Science at Yale University. He is also affiliated with the Yale Institute for Foundations of Data Science and the Center for Algorithms, Data, and Market Design (CADMY) at Yale. His research lies at the intersection of machine learning, statistics, game theory, and optimization.

Yang’s recent work focuses on the foundations of reinforcement learning, particularly in multi-agent systems where agents interact strategically. Additionally, he explores the foundations of artificial intelligence, investigating the emergent behaviors of large language models during pre-training and post-training and their relationship to model architecture. His research is supported by NSF DMS 2413243.

Before joining Yale, Yang was a postdoctoral researcher at the University of California, Berkeley, under the mentorship of Michael I. Jordan. He earned his Ph.D. in Operations Research and Financial Engineering from Princeton University, co-advised by Jianqing Fan and Han Liu. He completed his bachelor’s degree in Mathematics at Tsinghua University in 2015.

Website: https://zhuoranyang.github.io/

Add To: Google Calendar | Outlook | iCal File

Submit an Event

Interested in creating your own event, or have an event to share? Please fill the form if you’d like to send us an event you’d like to have added to the calendar.

Submit an Event

Share your event ideas with us using the form below.

"*" indicates required fields

MM slash DD slash YYYY
Start Time*
:
End Time*
: