Back to Upcoming Events
FDS Colloquium

Learning Hard Problems with Neural Networks and Language Models

FDS Seminar - Eran Malach

Speaker: Eran Malach (Harvard)

Research Fellow
Kempner Institute

Harvard University

Wednesday, January 15, 2025

11:30AM - 1:00PM

Lunch at 11:30am in room 1307
Talk from 12:00-1:00pm in room 1327A

Location: Yale Institute for Foundations of Data Science, Kline Tower 13th Floor, Room 1327, New Haven, CT 06511 and via Webcast: https://yale.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=6f3e7e6c-c1ef-4066-b4f2-b2590138c39

Speaker bio: Eran Malach is a postdoc Research Fellow in the Kempner Institute at Harvard University. Previously, he did his PhD at the School of Computer Science and Engineering in the Hebrew University of Jerusalem, advised by Prof. Shai Shalev-Shwartz. His research focus is Machine Learning and Theoretical Foundations of Deep Learning and Language Models. He is mainly interested in computational aspects of learning and optimization. He also worked in Mobileye, where he developed machine learning and computer vision algorithms for driver-assistance systems and self-driving cars. His research is supported by the Rothschild Fellowship, the William F. Milton Fund and the OpenAI Superalignment Fast Grant.

Abstract: Modern machine learning models, and in particular large language models, can now solve surprisingly complex mathematical reasoning problems. In this talk I will explore how neural networks and autoregressive language models can learn to solve computationally hard reasoning tasks. I will begin by discussing the sparse parity problem, a theoretical proxy for studying the challenges of learning complex functions with Stochastic Gradient Descent (SGD). I will show that the computational resources required for learning sparse parities with SGD scale exponentially with the “sparsity” of the problem, making it computationally hard to learn. Next, I will demonstrate how introducing step-by-step supervision through auto-regressive language models overcomes these barriers, enabling simple models trained on next-token prediction to efficiently learn any Turing-computable function. These results serve as a basis for studying machine learning with language models, with implications on data structure, architecture design and training paradigms.

Add To: Google Calendar | Outlook | iCal File

Submit an Event

Interested in creating your own event, or have an event to share? Please fill the form if you’d like to send us an event you’d like to have added to the calendar.

Submit an Event

Share your event ideas with us using the form below.

"*" indicates required fields

MM slash DD slash YYYY
Start Time*
:
End Time*
: