Events
FDS Colloquium
Learning Hard Problems with Neural Networks and Language Models
Speaker: Eran Malach (Harvard) Research Fellow Harvard University Wednesday, January 15, 2025 11:30AM - 1:00PM Lunch at 11:30am in room 1307
Talk from 12:00-1:00pm in room 1327A Location: Yale Institute for Foundations of Data Science, Kline Tower 13th Floor, Room 1327, New Haven, CT 06511 and via Webcast: https://yale.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=6f3e7e6c-c1ef-4066-b4f2-b2590138c39 |
Speaker bio: Eran Malach is a postdoc Research Fellow in the Kempner Institute at Harvard University. Previously, he did his PhD at the School of Computer Science and Engineering in the Hebrew University of Jerusalem, advised by Prof. Shai Shalev-Shwartz. His research focus is Machine Learning and Theoretical Foundations of Deep Learning and Language Models. He is mainly interested in computational aspects of learning and optimization. He also worked in Mobileye, where he developed machine learning and computer vision algorithms for driver-assistance systems and self-driving cars. His research is supported by the Rothschild Fellowship, the William F. Milton Fund and the OpenAI Superalignment Fast Grant.
Abstract: Modern machine learning models, and in particular large language models, can now solve surprisingly complex mathematical reasoning problems. In this talk I will explore how neural networks and autoregressive language models can learn to solve computationally hard reasoning tasks. I will begin by discussing the sparse parity problem, a theoretical proxy for studying the challenges of learning complex functions with Stochastic Gradient Descent (SGD). I will show that the computational resources required for learning sparse parities with SGD scale exponentially with the “sparsity” of the problem, making it computationally hard to learn. Next, I will demonstrate how introducing step-by-step supervision through auto-regressive language models overcomes these barriers, enabling simple models trained on next-token prediction to efficiently learn any Turing-computable function. These results serve as a basis for studying machine learning with language models, with implications on data structure, architecture design and training paradigms.
Add To: Google Calendar | Outlook | iCal File
Submit an Event
Interested in creating your own event, or have an event to share? Please fill the form if you’d like to send us an event you’d like to have added to the calendar.