This Event has Passed
Project Match

FDS Data Science Project Match

Monday, December 8, 2025

3:00PM - 4:00PM

Project presentations 3:00-4:00pm in 1327
Snacks and conversations to follow in 1307

Location: Yale Institute for Foundations of Data Science, Kline Tower 13th Floor, Room 1327, New Haven, CT 06511 and via Webcast: https://yale.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=7c2eea03-86f9-4088-8380-b398014d9d6e

Speaker: Soheil Ghili (SOM)

Associate Professor (Quantitative Marketing Group)

Yale School of Management

Talk Title: Market Design for Agentic Commerce: Pay-Per-Crawl Pricing for the AI Economy

For decades, the internet’s business model has been built around search: websites get indexed by search engines and monetize the human traffic that search sends to them. With the rise of large-language-model agents, this model is breaking. AI systems increasingly fetch web content directly in order to answer questions or perform tasks for users. When AI agents replace human visits, websites lose the traffic that historically generated revenue.

This shift creates the need for new business models for online content—models in which AI agents pay websites for the content they consume, whether for grounding, retrieval, or training. One emerging approach is Pay-Per-Crawl (PPC), where websites charge AI models for access to their pages. Companies like Cloudflare have begun building the infrastructure for this market, but the economics remain entirely undeveloped.

Our project designs the market mechanisms that should govern PPC. We evaluate pricing rules—fixed-fee crawls, content-sensitive pricing (e.g., politics vs. sports), crawler-identity pricing, and hybrid mechanisms—using three criteria: (1) revenue for content creators, (2) minimal degradation to AI answer quality relative to a zero-price world, and (3) design simplicity.

Requirements: strong interest in market design for emerging AI economies; comfort with ML methods such as embeddings-based content valuation; ability to run simulations of multi-sided markets; and willingness to engage with cutting-edge industry questions involving Cloudflare, OpenAI, and others.

Speaker: Leandros Tassiulas (SEAS)

John C. Malone Professor of Electrical & Computer Engineering

Yale Engineering

Talk Title: An AI assistant for Regulatory Insight and Data-Driven Analysis in the Power Industry (AIReD)

Presented by Ibrahim Ibne Alam, Postdoctoral Associate

We propose a next-generation AI assistant for Regulatory Insight and Data-Driven Analysis (AIReD) aimed to support a wide range of power-industry stakeholders, including consumers, generators, system operators, and management. The agent combines retrieval-augmented generation (RAG) for transparent, context-aware answers with tool-calling for advanced analytical tasks such as statistical evaluation, forecasting, and optimization.  The project has completed a comprehensive landscape review of existing power-domain LLM efforts, evaluated baseline model performance on ElecBench with and without LoRA adaptation, curated key regulatory documents for domain grounding, and implemented a working RAG prototype. An ML model for EV charging-demand prediction is currently being explored as part of the envisioned tool-calling framework. By integrating citation-based knowledge access with computational reasoning, AIReD aims to become a reliable, domain-grounded AI assistant that strengthens decision-making and operational efficiency across the power sector.

Speaker: Brian Macdonald (Yale)

Senior Lecturer and Research Scientist Statistics and Data Science

Yale University

Talk Title: Sports and Environmental Data Science Projects

We will discuss projects in two areas, environmental data science and sports, many of which involve working with industry partners.  One project focuses on working with an industry expert in large-scale lithium ion battery energy storage systems (BESS) to model and simulate thermal runaway and ensuing fires in BESS. In another project, the goal is to develop an R package that streamlines the process of incorporating active learning into the process of manually labeling landcover types for remote-sensing data projects.  The sports analytics lab at Yale has a variety of opportunities, including using spatiotemporal bat and pitch tracking data to analyze batter-pitcher interactions (with the Boston Red Sox MLB team); analyzing the value of draft picks (with the New York Liberty WNBA team); developing a Stuff+ metric for evaluating pitches and pitchers (with the Yale baseball team’s pitching coach); using data to understand why horse racing is losing participation (owners) despite higher earnings, and find improvements to make horse racing more appealing and sustainable for owners in the long run (with the Jockey Club and Joseph Appelbaum, Yale ’90); various projects in lacrosse analytics (with the Yale lacrosse team); and projects with the United States Olympic and Paralympic Committee (several possible Olympic sports).

Speaker: Purushottam Dixit (SEAS)

Assistant Professor of Biomedical Engineering

Yale Engineering

Talk Title: Quantifying dimensionality of microbiomes

Many high-dimensional biological systems, from microbial communities to gene expression dynamics, appear complicated on the surface but often evolve on low-dimensional manifolds governed by latent ecological or biophysical constraints. Our group is developing new statistical tools to quantify this effective dimensionality directly from data and to identify signatures of low-dimensional organization in mechanistic models, specifically in microbiomes. We want you to analyze real and simulated time-series datasets, test algorithms for dimensionality estimation, and compare theoretical models against empirical patterns. The project is ideal for students interested in statistics, machine learning, or dynamical systems; no biology background is required. You’ll be working at a conceptual frontier: figuring out when and why messy biological systems actually behave in surprisingly simple, structured ways.

Speaker: Mark Gerstein (Yale)

Albert L Williams Professor of Biomedical Informatics and Professor of Molecular Biophysics & Biochemistry, of Computer Science, and of Statistics & Data Science

Yale University

Talk Title: Genomics & Bioinformatics Research Opportunities in the Gerstein Lab

Presented by Joel Rozowsky

The Gerstein lab conducts computational biology & bioinformatics research in the biomedical and genomic fields. We use various computational analytics methods including artificial-intelligence / machine-learning techniques to analyze large biomedical datasets and develop bioinformatics tools. The lab has particular focuses on the following areas of research: neurogenomics, personal genomes, genomic privacy and genome annotation.

Speaker: Zongming Ma (Yale)

Professor, Statistics and Data Science

Yale University

Talk Title: Data Integration in Spatial and Single-cell Biology

Spatial and single-cell technologies have generated massive datasets through consortia-level efforts. However, these data remain fragmented by differences in biological modalities, disease conditions, spatial resolutions, and technology-induced artifacts. To tame this “Wild West” of data from cutting-edge biotech sensors, our goal is to leverage advanced machine learning and AI techniques to develop rigorous, scalable integration methods and standardize them into a unified framework for ingesting, processing, and harmonizing diverse spatial omics data.
Seeking students with interest in:
  • AI / ML: Multi-modal representation learning, graph neural networks. (No prior biology background required).
  • Coding: Python.
  • The Challenge: Learning robust representations for high-dimensional, noisy, and heterogeneous spatial omics data.

Speaker: Phillip Atiba Solomon (fka Goff) (Yale)

Carl I. Hovland Professor of Black Studies and Professor of Psychology Co-Founder & CEO, Center for Policing Equity

Yale University

Talk Title: A Model-Based National Estimate of Police Use-of-Force

In the United States, police regularly use force against civilians. Much of the prior research in this area has focused on lethal force, but much less is known about the vast majority of incidents that do not result in death. This project aims to generate the first national estimate of the annual number of use-of-force incidents in the US overall and by race/ethnicity. Leveraging newly available use-of-force data for thousands of agencies and a national dataset of predictors, we use Bayesian and machine learning approaches estimating use of force counts for the entire US.

Meg Urry

Speaker: Meg Urry (Yale)

Israel Munson Professor of Physics

Yale University

Talk Title: Finding Merging Galaxies and Supermassive Black Holes in Large Astronomical Surveys

We have data covering large areas of the sky, which needs to be scanned automatically to find candidates for merging galaxies, dual AGN (Active Galactic Nuclei, which is when both black holes in the merging galaxies are growing rapidly and thus shining brightly), and all combinations in between. Right now, we’ve trained a first-generation CNN that does the job okay, but at a minimum, we need to update it to incorporate multiple images taken at different wavelengths, to query the full data set (right now, we start with known AGN and look for a companion), and to distinguish stars from AGN. More ambitious goals: Is there a better approach than training a CNN (we use a large number of simulated images and then a more limited number of bona fide cases)? Can we train it more efficiently and/or improve the code in other ways?

The FDS Data Science Project Match, hosted by the Yale Institute for Foundations of Data Science (FDS), is an opportunity for Yale faculty from any department or school within the university to connect with talented students from the departments of Statistics and Data Science, Applied Mathematics, and Computer Science. In a series of lightning-round talks, faculty will have exactly five minutes to pitch a current research problem, aiming to team up with students interested in tackling complex data challenges. This event facilitates collaboration on current research projects, offering a platform for faculty to present their data-driven initiatives and find skilled undergraduate and/or graduate students eager to contribute. It’s also a wonderful way to learn about the research of many Yale faculty.

Add To: Google Calendar | Outlook | iCal File

  • Project Match

Submit an Event

Interested in creating your own event, or have an event to share? Please fill the form if you’d like to send us an event you’d like to have added to the calendar.

Submit an Event

Share your event ideas with us using the form below.

"*" indicates required fields

MM slash DD slash YYYY
Start Time*
:
End Time*
: