Events
FDS Statistics & Data Science Seminar
Learning Large Softmax Mixtures with Warm Start EM
Speaker: Florentina Bunea (Cornell) Professor, Department of Statistics and Data Science Cornell University Monday, October 28, 2024 4:00PM - 5:00PM 3:30 PM – Pre-talk meet and greet teatime at 219 Prospect Street, 13th floor. There will be light snacks and beverages in the kitchen area.
Location: Yale Institute for Foundations of Data Science, Kline Tower 13th Floor, Room 1327, New Haven, CT 06511 and via Webcast: https://yale.zoom.us/j/94223816617 |
Mixed multinomial logits are discrete mixtures introduced several decades ago to model the probability of choosing an attribute xj 2 RL from p possible candidates, in heteroge-neous populations. The model has recently attracted attention in the AI literature, under the name softmax mixtures, where it is routinely used in the nal layer of a neural network to map a large number p of vectors in RL to a probability vector. Despite its wide applicability and empirical success, statistically optimal estimators of the mixture parameters, obtained via algorithms whose running time scales polynomially in L, are not known. This paper provides a solution to this problem for contemporary applications, such as LLMs (Large Language Models), in which the mixture has a large number p of support points, and the size N of the sample observed from the mixture is also large. Our proposed estimator combines two classical estimators, obtained respectively via a method of moments (MoM) and the expectation-minimization (EM) algorithm. Although both estimator types have been studied, from a theoretical perspective, for Gaussian mixtures, no similar results exist for softmax mixtures for either procedure. We develop a new MoM parameter estimator based on latent moment estimation that is tailored to our model, and provide the rst theoretical analysis for a MoM-based procedure in softmax mixtures. Although consistent, as N; p ! 1, MoM for softmax mixtures can exhibit poor numerical performance, an empirical observation that is in line with those made for other mixture models. Nevertheless, as MoM is provably in a neighborhood of the target, it can be used as warm start for any iterative algorithm. We study in detail the EM algorithm, and provide its rst theoretical analysis for softmax mixtures, extending the only other class of similar results, valid for Gaussian mixtures. Our nal proposal for parameter estimation is the EM algorithm with a MoM warm start. In addition to leading to the desired parametric estimation rates, this combined procedure provides computational savings relative to the standard practice of selecting one of the outputs of multiple EM runs, each initialized at random. These facts are supported by our simulation studies. Concrete examples that substantiate the large applicability of the model will be given throughout the talk.
3:30pm – Pre-talk meet and greet teatime – 219 Prospect Street, 13 floor, there will be light snacks and beverages in the kitchen area.
Bio: Florentina Bunea is a Professor in the Department of Statistics and Data Science at Cornell University, where she is also a member of the Graduate Fields of Statistics, Applied Mathematics, and Computer Science. As a member of the Diversity and Inclusion Council of the Bowers College of Computing and Information Science, she is dedicated to promoting diversity within data science disciplines.
Professor Bunea’s research spans statistical machine learning theory and high-dimensional statistical inference, with a focus on developing new methodologies and sharp theoretical insights for addressing a range of data science challenges. Her recent projects include estimation and theory for soft-max mixtures to deepen the understanding of large language models (LLMs) and AI algorithms, optimal transport for high-dimensional mixture distributions, and inference for the Wasserstein distance in topic models. She is also working on high-dimensional latent-space clustering, cluster-based inference, network modeling, and latent structure inference in high-dimensional models.
Her research interests extend to model selection, sparsity, and dimension reduction, with applications in fields such as genetics, systems immunology, neuroscience, sociology, and economics. Professor Bunea’s work is supported by the National Science Foundation (NSF-DMS). She is a Fellow of the Institute of Mathematical Statistics (IMS) and a recipient of the prestigious IMS Medallion Award. She has served as an Associate Editor for leading statistical journals, including Annals of Statistics, Bernoulli, JASA, JRSS-B, and EJS, and is a co-editor for the Chapman and Hall Statistics and Applied Probability Monograph Series.
Add To: Google Calendar | Outlook | iCal File
Submit an Event
Interested in creating your own event, or have an event to share? Please fill the form if you’d like to send us an event you’d like to have added to the calendar.