This Event has Passed
Project Match

FDS Data Science Project Match

Wednesday, August 27, 2025

4:00PM - 5:00PM

We start on time! Don't be late!

Location: Yale Institute for Foundations of Data Science, Kline Tower 13th Floor, Room 1327, New Haven, CT 06511 and via Webcast: https://yale.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=deaeca36-307e-498f-b86e-b331010d1107

Speaker: Albert Higgins-Chen (YSM)

Assistant Professor of Psychiatry, Yale School of Medicine

Yale School of Medicine

Talk Title: "Supercharging Aging and Longevity Biomarker Research with AI and LLMs"

Project Description: A massive effort is underway to bring various interventions, known to slow the aging process in animals, to human clinical trials to prevent numerous age-related diseases. Because humans live way longer than laboratory animals, numerous aging biomarkers have been developed as a proposed means to assess efficacy. We have built TranslAGE, a massive knowledgebase of 180 harmonized datasets and nearly 2,000 aging biomarkers, which allow us to systematically benchmark aging biomarkers across all datasets based on A) Prognostic value for various age-related diseases, B) Stability in the absence of interventions or health changes, and C) Responsiveness to 87 different pro-longevity interventions and 25 different pro-aging events. Projects in the lab fall into two categories. First, lab members may leverage this unique TranslAGE knowledgebase to investigate targeted questions – for example, how do stress, mental health, or environmental exposures affect aging and what are the consequences for cancer or dementia risk? Second, lab members may train LLMs and AI agents to work with TranslAGE and supercharge aging research, develop the next generation of intelligent aging biomarkers that shed light on causal mechanisms of aging, and guide personalized interventions to prevent age-related diseases in humans.

Website

Speaker: Jamie Tucker-Foltz (SOM)

Assistant Professor of Operations and Computer Science

Yale School of Management

Talk Title: "Faster Sampling of Redistricting Maps Using Planar Embedding Data"

Project description: Independent redistricting commissions and courts are increasingly relying on algorithms to assess whether a given map is gerrymandered (meaning intentionally drawn to disadvantage a political party or group). The dominant, data-driven approach relies on sampling a massive ensemble of “random” maps, formally represented as partitions of a graph, where the vertices represent indivisible geographic units like census blocks or voting precincts. This is a challenging computational problem involving up to hundreds of thousands of vertices. We have developed a new, theoretically faster algorithm leveraging additional topological information about the embedding of the underlying graph in the plane, structural data that existing algorithms do not take into account, and is not currently available in any data repositories (but should be able to be inferred from the geographic data). We would like to implement this algorithm and contribute to the open-source GerryChain Python package. The ideal candidate would be someone with exceptional programming skills and an ability to reason about difficult geometric/topological problems.

Website

Speaker: Quanquan Liu (Yale)

Assistant Professor of Computer Science

Yale University

Talk Title: "Large-Language-Model-Assisted Graph Analysis"

Project description: This project explores how large language models (LLMs) can augment and accelerate graph analysis by integrating natural-language reasoning with algorithmic graph computation. We investigate methods for using LLMs to interpret graph structures, suggest relevant analysis techniques, and generate efficient code for large-scale graph algorithms. Target applications include social network analysis, biological interaction networks, and knowledge graph exploration, with the goal of lowering the barrier to advanced graph analytics for non-expert users while enhancing productivity for domain specialists.

Website

Speaker: Xiang Zhou (Yale)

Professor of Statistics and Data Science

Yale University

Talk Title: "Statistical and Computational Challenges in Spatial Transcriptomics"

Project: Description: Spatial transcriptomics represents an exciting frontier in genomic research, offering powerful new technologies that enable the profiling of gene expression within intact tissues while preserving spatial context. By combining spatial localization with gene expression data, these technologies provide a richer, more nuanced understanding of cellular function and tissue organization. Spatial transcriptomics has already been successfully applied across a wide range of tissue types, uncovering novel biological insights and transforming our understanding of genome biology. Despite these rapid experimental advances, computational and statistical methods have lagged behind. The complexity, scale, and multi-modal nature of modern single-cell and spatial omics data present significant challenges for data analysis and interpretation. Existing tools often fall short in addressing high-resolution, large-scale datasets that integrate multiple layers of biological information. In this session, we will briefly introduce several open biological questions that demand the development of new computational and statistical approaches. These include subcellular spatial modeling to localize RNA within cellular compartments, cell segmentation and annotation in dense tissue structures, integration of multi-omics data such as transcriptomics, proteomics, and epigenomics, modeling spatial cell-cell interactions and networks, etc.. This is a unique opportunity to contribute to cutting-edge statistical and computational methods development at the intersection of statistics, computation, and spatial genomics.

Website

Speaker: Shuangping Li (Yale)

Assistant Professor of Statistics and Data Science

Yale University

Talk Title: "Algorithms and Data Problems in the Perceptron Model"

Project Description: The binary perceptron is a deceptively simple model at the crossroads of machine learning theory, statistical physics, and computational complexity. It asks: can we find a vector of {1, -1}’s that satisfies a large number of random constraints? The mystery is that solutions live in rare, “frozen” regions of the search space, seemingly inaccessible to most algorithms, yet somehow discoverable by certain ingenious methods. In this project, we will study these frozen structures, analyze the behavior of different classes of algorithms on them, and investigate the “information–computation gap,” where problems are theoretically solvable but computationally challenging. We will also explore the perceptron’s connection to real-world tasks in causal inference, where balancing covariates and designing experiments are closely related to perceptron-style constraints. Students will gain hands-on experience in probability,  algorithms, computational experiments, and causal inference, while engaging with questions that connect deep theory to impactful applications.

Website

Speaker: Harsh Parikh (YSPH)

Assistant Professor of Biostatistics

Yale School of Public Health

Talk Title: "Methods for Estimating Effects of Health Shock on Social and Economic Outcomes"

Project Description: This project develops new causal inference methodologies, specifically focusing on event-study approaches, to estimate the effects of health shocks on socio-economic outcomes. The methodological work will address key challenges in identifying causal relationships between adverse health events (such as breast cancer or alcohol use disorder) and subsequent economic trajectories. Using Denmark’s comprehensive registry data as our empirical setting, we will develop and validate these methods to account for complex temporal dynamics and potential confounding factors. The resulting methodological framework will enable health economists at University of Southern Denmark (SDU) to more precisely estimate how health shocks affect economic outcomes, and how these effects are mediated by private insurance and social security systems. In collaboration with researchers from the SDU, this theoretical work will advance the toolkit available for causal inference in health economics research.

Website

Speaker: Manolis Zampetakis (Yale)

Assistant Professor of Computer Science

Yale University

Talk Title: "Learning-Augmented Statistical Inference"

Project Description: Capitalizing on the widespread success of machine learning (ML) models in predictive tasks, the field of learning-augmented algorithms seeks to answer a key question: Can we enhance the performance of classical algorithms by incorporating the predictions of ML models trained on historical data? A critical challenge in this context is that the accuracy of these predictions is unknown. Therefore, learning-augmented algorithms are designed with two main objectives: achieving optimal performance when predictions are accurate and maintaining the worst-case guarantees of classical algorithms when predictions are arbitrarily poor. The goal of this project is to leverage recent theoretical developments in the area in real-world applications.

Website.

Speaker: Emma Zang (Yale)

Associate Professor of Sociology, Biostatistics and Global Affairs

Yale University

Talk Title: "Mapping Neighborhood Disadvantage with AI: Building the Next-Generation Area Deprivation Index"

Project Description: Neighborhood conditions are powerful predictors of health, education, and economic opportunity, yet widely used measures like the Area Deprivation Index (ADI) are based on survey data updated infrequently. This project uses Google Street View imagery, satellite data, and U.S. Census indicators to train machine learning models that quantify neighborhood disinvestment at fine geographic scales.

Students will help develop computer vision and geospatial models, engineer features from imagery, and test model performance against existing indices. They will also explore bias detection and validation using real-world health and education outcomes. The goal is to produce a more timely, precise, and equitable measure of neighborhood disadvantage that can inform policy and research.

Website.

Speaker: Joel Rozowsky for Mark Gerstein (Yale)

Research Scientist, Molecular Biophysics and Biochemistry

Yale University

Talk Title: "Genomics & Bioinformatics Research Opportunities in the Gerstein Lab"

Project Description: The Gerstein lab conducts bioinformatics/data science research in the biomedical and genomic fields. We use various computational analytics methods including machine learning and AI techniques to analyze large biomedical datasets. The lab has particular focuses on the following areas of research: genomic privacy, personal genomes, genome annotation and neurogenomics.

Mark Gerstein is the Albert L. Williams Professor of Biomedical Informatics, Molecular Biophysics & Biochemistry, Computer Science and Statistics and Data Science.

Website.

Speaker: Maria Rodriguez Martinez (YSM)

Associate Professor of Biomedical Informatics and Data Science

Yale School of Medicine

Talk Title: "Rank-and-refine: Structure-aware antibody affinity ranking for gastric cancer"

Project Description: Gastric cancer remains a leading cause of cancer mortality, and CLDN18.2 is an emerging therapeutic target. Prioritizing antibody candidates for follow-up experiments is still slow and labor-intensive. Our lab has developed a deep-learning framework for antibody structural modeling that represents antibody–antigen interfaces as graphs and learns to rank pairs by relative affinity from 3D geometric and contact features. In this project, we will adapt that framework to gastric cancer and benchmark it against alternatives that rely only on sequence information, as well as current structure-prediction and generative approaches (described methodologically, without naming specific models).

Methodology: The student will (1) reproduce our baseline structure-aware ranking pipeline on a curated antibody–antigen dataset, verifying quality metrics and generating sanity-check plots; (2) apply the pipeline to CLDN18.2 by first building structural representations of the target with state-of-the-art prediction tools, assessing confidence with standard quality metrics, and, if needed, refining with geometric and energy-based procedures; (3) generate a candidate list using established sequence-based methods for antibody–antigen interaction scoring; and (4) construct structural models for the top candidates and apply our structure-aware ranking framework to prioritize antibodies for experimental validation. All code and results will be delivered as reproducible notebooks with clear evaluation tables and ablation notes.

Conclusion: The project will deliver a tested, end-to-end pipeline and a prioritized shortlist of CLDN18.2 antibody candidates, along with quantitative evidence of when structure adds value beyond sequence, directly advancing our gastric-cancer goals.

Desired skills: Python, machine learning, familiarity with deep learning for protein modeling (or willingness to learn), data wrangling, and Git; interest in immunology/structural biology is a plus.

Website.

Speaker: Nils Rudi (SOM)

Professor of Operations Management

Yale School of Management

Talk Title: "How can I be a better teacher when not teaching?"

Project Description: This project explores how students can better learn probability—and quantitative subjects more generally—outside the classroom, where most learning is believed to occur. Building on tools such as QR-linked course booklets, semi-automated feedback and grading, two-stage homework, solution videos, and gamified practice problems, the goal is to investigate how such resources can be designed and integrated to improve learning outcomes.

Speaker: (Not Presenting) Purushottam Dixit (Yale)

Assistant Professor, Department of Biomedical Engineering

Yale University

Talk Title: "Detecting dimensionality of microbiomes"

Project Description: Microbiomes are some of the most complex ecosystems we know, with thousands of microbial species interacting in ways we don’t fully understand. In this project, we plan to develop new statistical methods to estimate the effective niche dimensionality of these communities, the hidden environmental variables that shape species coexistence. We’ll use latent-variable models inspired by consumer–resource theory, which makes this ecological problem closely related to nonlinear matrix factorization and shared latent embedding. The exciting part is that the hidden factors we uncover won’t just be abstract, they correspond to real ecological constraints. We’re looking for students interested in statistics and data science who want to help us design these models, test them on large-scale microbiome datasets, and explore how modern statistical tools can reveal the “invisible axes” of biodiversity.

Speaker: (Not Presenting) Brian Macdonald (Yale)

Senior Lecturer and Research Scientist Co-Director of Undergraduate Studies Department of Statistics and Data Science

Yale University

Talk Title: "Opportunities in the Yale Sports Analytics Lab"

Project Description: The sports analytics lab at Yale has a variety of opportunities for research projects. Topics include player personnel decisions, strategy, player impact and player performance analysis, schedule optimization problems, and many others. Some projects involve working with organizations like the United States Olympic and Paralympic Committee (several possible Olympic sports, including figure skating and curling), a prominent MLB team, a prominent WNBA team, or a team at Yale. Many projects involve working with spatial data, or “player tracking data”, which is spatiotemporal data giving locations of players and the ball several times a second through gameplay.

Speaker: (Not Presenting) Soheil Ghilli (SOM)

Associate Professor of Marketing

Yale School of Management

Talk Title: "AI Negotiators Trained on eBay Marketplace Data"

Project Description: AI has mastered structured games like chess, Go, Diplomacy, and poker. But real-world bargaining—central to commerce—remains unsolved. In negotiations, buyers conceal willingness to pay and sellers hide costs. These values are not only hidden from the bargaining opponent but also from researchers trying to design negotiation agents, creating a fundamental challenge that distinguishes this setting from AI for board games.

Our interdisciplinary group is developing AI negotiators: large-language-model agents that learn to bargain under these conditions. We combine economic theory with modern machine learning, and we are collaborating with eBay, where millions of peer-to-peer negotiations provide a unique testing ground. Projects include analyzing negotiation transcripts from eBay data, developing reinforcement learning methods to train bargaining agents under hidden information, and evaluating performance in terms of trade frequency, gains from trade, and strategic behavior.

Students will gain experience in reinforcement learning, LLM fine-tuning, and empirical analysis of bargaining data. This is an opportunity to contribute at the intersection of economics, AI, and real-world marketplaces.

Speaker: (Not Presenting) Arianna Salazar-Miranda (YSE)

Assistant Professor

Yale School of the Environment

Talk Title: "The Potential of Underutilized Gray Spaces"

Project Description: With the increasing threats from climate change, cities need to rethink how they use space to protect people from rising environmental risks. This project focuses on identifying and assessing the potential of underutilized gray spaces, such as parking lots, to be transformed into climate-resilient areas. By applying computer vision techniques to satellite and street-view imagery from cities across the globe, the goal is to map these gray spaces, assess their potential for transformation, and quantify the resulting benefits, such as reduced urban heat and flood risks.

Students will gain hands-on experience in spatial data analysis, computer vision, and machine learning while contributing to solutions for climate resilience and sustainable urban development.

Requisite Skills and Qualifications: Ideal candidates should have a good understanding of GIS and be proficient in handling large datasets. Experience with Python and computer vision techniques would be beneficial.

Website.

The FDS Data Science Project Match, organized by the Yale Institute for Foundations of Data Science, offers a dynamic opportunity for Yale faculty to connect with students from the Departments of Statistics and Data Science, Applied Mathematics, and Computer Science. This event brings together researchers and aspiring data scientists through a fast-paced series of lightning talks, where faculty members each have five minutes to present a research problem that could benefit from student involvement. This event is in-person, and streamed to the Yale community only via Panopto. Requires NetID to access. It will not be recorded.

Photo of a past project match event
Jennifer Marlon presenting at the 2023 FDS Project Match

The event is designed to spark collaboration. Faculty gain access to a pool of highly skilled students eager to apply their knowledge to real-world challenges, while students are introduced to a wide array of cutting-edge research taking place across Yale. It’s an ideal setting for students seeking hands-on experience and for faculty looking to expand their teams with motivated collaborators.

Participants leave not only with potential research partners but also with a broader sense of the diverse, data-driven work happening throughout the university. Whether you’re a student curious about applying your skills to meaningful projects or a faculty member hoping to connect with talented researchers-in-training, the Project Match offers an efficient, collegial, and inspiring way to forge new academic partnerships.

If you’re a faculty member interested in presenting a project, please contact Emily Hau, Associate Director of FDS. Students can attend the event to learn more about available opportunities and find the right fit for their interests and expertise.

To explore previous Project Match events, including project descriptions and speaker lists, please visit:


Please join our mailing list for future announcements.

Add To: Google Calendar | Outlook | iCal File

  • Project Match

Submit an Event

Interested in creating your own event, or have an event to share? Please fill the form if you’d like to send us an event you’d like to have added to the calendar.

Submit an Event

Share your event ideas with us using the form below.

"*" indicates required fields

MM slash DD slash YYYY
Start Time*
:
End Time*
: