This Event has Passed
Student Led Seminar

Dissertation Defense: "Learning Theory in the Wild: Foundations of Missing Data and Language Generation"

Speaker: Anay Mehrotra (Yale)

PhD Candidate

Yale University

Thursday, November 13, 2025

2:00PM - 3:00PM

Location: Yale Institute for Foundations of Data Science, Kline Tower 13th Floor, Room 1327, New Haven, CT 06511 and via Webcast: https://yale.zoom.us/j/98819552424

Abstract: What can be learned from data? This fundamental question in machine learning takes on new complexity in modern pipelines where classical assumptions fail—both in how data is generated and in how learning objectives are defined. This thesis develops foundations for learning under these complex conditions, revealing how violations of traditional assumptions transform not just the difficulty of learning, but the very nature of what is learnable. 

Part I addresses learning under selection filters that break the symmetry between training and test distributions. We complete the characterization of positive-only learning—a problem dating back to early developments in probably approximately correct (PAC) learning—and develop a smoothed analysis of positive-only learning that bypasses classical impossibility results to yield algorithms with the potential to improve practical performance. These theoretical tools enable faster algorithms for truncated statistics and establish surprising connections between causal inference and learning theory, culminating in a characterization of when treatment effects are identifiable from observational data. 

Part II studies language generation, a task that does not fit well into traditional loss-minimization frameworks. On the theoretical side, we study a recent model of language generation. Here, our results uncover the potential and limitations of large language models (LLMs), including an inherent trade-off between hallucinations and mode collapse, which we prove. On the empirical side, we introduce an effective “jailbreaking” method that has subsequently been used for safety testing of frontier LLMs in industry. 

Across both parts, missingness emerges as the unifying principle: language models cannot avoid hallucinations precisely because they never see explicitly labeled invalid examples during training—the same fundamental challenge that underlies the statistical problems in Part I.

Advisors: 
Manolis Zampetakis (Yale) and Amin Karbasi (now at Cisco)

Other committee members:
Yang Cai (Yale), Pravesh K. Kothari (Princeton), Daniel A. Spielman (Yale)

Alternate ways to join the Zoom meeting:

Join from PC, Mac, Linux, iOS or Android: https://yale.zoom.us/j/98819552424
    Or Telephone:203-432-9666 (2-ZOOM if on-campus) or 646 568 7788
    One Tap Mobile: +12034329666,,98819552424# US (Bridgeport)

    Meeting ID: 988 1955 2424
    International numbers available: https://yale.zoom.us/u/abQA0UzyMS

Bio: Anay Mehrotra is a Ph.D. candidate in Computer Science at Yale University, advised by Professors Amin Karbasi and Manolis Zampetakis. His research applies learning-theoretic tools to problems involving missing data—spanning causal inference, truncated statistics, and omissions shaped by societal biases. Through this work, he develops principled methods to understand both the potential and the limitations of modern AI systems, often contributing new insights back to the foundations of learning theory.

Anay’s work has received the Best Paper Award at the Conference on Learning Theory (COLT), been featured in WIRED, and earned him the Sir Binay Kumar Sinha Award from the Indian Institute of Technology Kanpur, where he completed his undergraduate studies. A former ICPC World Finalist representing IIT Kanpur, he now shares his enthusiasm for algorithms and problem-solving as an instructor with the Yale ICPC Club.

Beyond his research accomplishments, Anay is a wonderful colleague and a delightful presence in the FDS community—curious, generous with his ideas, and deeply engaged in collaborative work. We are thrilled to count him as part of FDS.

Add To: Google Calendar | Outlook | iCal File

  • Student Led Seminar

Submit an Event

Interested in creating your own event, or have an event to share? Please fill the form if you’d like to send us an event you’d like to have added to the calendar.

Submit an Event

Share your event ideas with us using the form below.

"*" indicates required fields

MM slash DD slash YYYY
Start Time*
:
End Time*
: