BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//wp-events-plugin.com//6.6.3//EN
TZID:America/New_York
X-WR-TIMEZONE:America/New_York
BEGIN:VEVENT
UID:644@fds.yale.edu
DTSTART;TZID=America/New_York:20241028T160000
DTEND;TZID=America/New_York:20241028T170000
DTSTAMP:20241028T175657Z
URL:https://fds.yale.edu/events/sds-seminar-florentina-bunea-cornell/
SUMMARY:S&\;DS Seminar: Florentina Bunea (Cornell)\, "Learning Large Sof
tmax Mixtures with Warm Start EM"
DESCRIPTION:\nMixed multinomial logits are discrete mixtures introduced sev
eral decades ago to model the probability of choosing an attribute xj 2 R
L from p possible candidates\, in heteroge-neous populations. The model ha
s recently attracted attention in the AI literature\, under the name soft
max mixtures\, where it is routinely used in the nal layer of a neural net
work to map a large number p of vectors in RL to a probability vector. De
spite its wide applicability and empirical success\, statistically optimal
estimators of the mixture parameters\, obtained via algorithms whose run
ning time scales polynomially in L\, are not known. This paper provides a
solution to this problem for contemporary applications\, such as LLMs (L
arge Language Models)\, in which the mixture has a large number p of suppo
rt points\, and the size N of the sample observed from the mixture is als
o large. Our proposed estimator combines two classical estimators\, obtain
ed respectively via a method of moments (MoM) and the expectation-minimiza
tion (EM) algorithm. Although both estimator types have been studied\, fr
om a theoretical perspective\, for Gaussian mixtures\, no similar results
exist for softmax mixtures for either procedure. We develop a new MoM pa
rameter estimator based on latent moment estimation that is tailored to ou
r model\, and provide the rst theoretical analysis for a MoM-based proced
ure in softmax mixtures. Although consistent\, as N\; p ! 1\, MoM for so
ftmax mixtures can exhibit poor numerical performance\, an empirical obser
vation that is in line with those made for other mixture models. Neverthel
ess\, as MoM is provably in a neighborhood of the target\, it can be used
as warm start for any iterative algorithm. We study in detail the EM alg
orithm\, and provide its rst theoretical analysis for softmax mixtures\, e
xtending the only other class of similar results\, valid for Gaussian mix
tures. Our nal proposal for parameter estimation is the EM algorithm with
a MoM warm start. In addition to leading to the desired parametric estim
ation rates\, this combined procedure provides computational savings rela
tive to the standard practice of selecting one of the outputs of multiple
EM runs\, each initialized at random. These facts are supported by our s
imulation studies. Concrete examples that substantiate the large applicab
ility of the model will be given throughout the talk. \n\n\n\n3:30pm - Pr
e-talk meet and greet teatime - 219 Prospect Street\, 13 floor\, there wil
l be light snacks and beverages in the kitchen area.\n\n\n\nBio: Florentin
a Bunea is a Professor in the Department of Statistics and Data Science at
Cornell University\, where she is also a member of the Graduate Fields of
Statistics\, Applied Mathematics\, and Computer Science. As a member of t
he Diversity and Inclusion Council of the Bowers College of Computing and
Information Science\, she is dedicated to promoting diversity within data
science disciplines.\n\n\n\nProfessor Bunea's research spans statistical m
achine learning theory and high-dimensional statistical inference\, with a
focus on developing new methodologies and sharp theoretical insights for
addressing a range of data science challenges. Her recent projects include
estimation and theory for soft-max mixtures to deepen the understanding o
f large language models (LLMs) and AI algorithms\, optimal transport for h
igh-dimensional mixture distributions\, and inference for the Wasserstein
distance in topic models. She is also working on high-dimensional latent-s
pace clustering\, cluster-based inference\, network modeling\, and latent
structure inference in high-dimensional models.\n\n\n\nHer research intere
sts extend to model selection\, sparsity\, and dimension reduction\, with
applications in fields such as genetics\, systems immunology\, neuroscienc
e\, sociology\, and economics. Professor Bunea's work is supported by the
National Science Foundation (NSF-DMS). She is a Fellow of the Institute of
Mathematical Statistics (IMS) and a recipient of the prestigious IMS Meda
llion Award. She has served as an Associate Editor for leading statistical
journals\, including Annals of Statistics\, Bernoulli\, JASA\, JRSS-B\, a
nd EJS\, and is a co-editor for the Chapman and Hall Statistics and Applie
d Probability Monograph Series.\n\n\n\nWebsite\n
CATEGORIES:Statistics & Data Science Seminar
LOCATION:Yale Institute for Foundations of Data Science\, Kline Tower 13th
Floor\, Room 1327\, New Haven\, CT\, 06511\, United States
X-APPLE-STRUCTURED-LOCATION;VALUE=URI;X-ADDRESS=Kline Tower 13th Floor\, Ro
om 1327\, New Haven\, CT\, 06511\, United States;X-APPLE-RADIUS=100;X-TITL
E=Yale Institute for Foundations of Data Science:geo:0,0
END:VEVENT
BEGIN:VTIMEZONE
TZID:America/New_York
X-LIC-LOCATION:America/New_York
BEGIN:DAYLIGHT
DTSTART:20240310T030000
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
END:DAYLIGHT
END:VTIMEZONE
END:VCALENDAR