Back to Upcoming EventsThis Event has Passed

Data Science Project Match

Thursday, November 30, 2023    
4:00PM – 5:00PM


Remote access available to Yale-only.

Project Presenters:

Jonathan Reuning-Scherer
Senior Lecturer in Statistics
Yale Dept of Statistics / School of the Environment |

Dramatists Guild Membership Survey Analysis

During spring of 2023, the largest ever survey was conducted of the membership of the Dramatists Guild of America.  The resulting database contains demographic, compensation, attitudinal, and career history for 2000 respondents.  In addition, data was collected on the details surrounding the creation of 1000 creative works including major Broadway shows.  There is the opportunity for 2-3 students to complete senior projects/graduate practical work during the spring of 2024 working with DG leadership and Jonathan Reuning-Scherer.  Results will likely be published in DG materials around the 2024 Tony Awards.

Mark Gerstein 
Albert L Williams Professor of Biomedical Informatics 
Professor of Molecular Biophysics & Biochemistry, of Computer Science, and of Statistics & Data Science 
Project presented by Joel Rozowsky
Research Scientist in Molecular Biophysics and Biochemistry, Gerstein Lab |

Genomics & Bioinformatics Research in the Gerstein Lab

The Gerstein lab conducts bioinformatics research in the biomedical and genomic fields. We use various data computational data analytics methods including machine learning techniques to analyze large biomedical datasets. The lab is particularly focused on the following areas of research: genomic privacy, personal genomes, genome annotation and neurogenomics.

Kim R. M. Blenman, Ph.D., M.S.
Assistant Professor
Department of Internal Medicine, Section of Medical Oncology, School of Medicine
Department of Computer Science, School of Engineering and Applied Science
Yale Cancer Center | Profile:

Statistical Analysis and Data Visualization for Predictive and Prognostic Tools for Proteomics

Statistical analysis and data visualization for predictive and prognostic tools are the cornerstones of omics analysis in medicine. Although we have progressed through the age of genomics, as a field we are also now moving into the age of proteomics. The biology assays that generate the data for genomics and proteomics are not the same. Therefore, new statistical analysis tools are required for this new proteomics revolution. Students who are interested in being part of this revolution are welcome to join my research group. There are many projects available.

Leandros Tassiulas
John C. Malone Professor of Electrical Engineering & Computer Science
Project Presented by Georgios Palaiokrassas
Postdoctoral Associate, Electrical Engineering, Tassiulas Lab |

Blockchain Analytics: A Machine Learning Approach

The inception of permissionless blockchains with Bitcoin in 2008, was followed by the development of Ethereum and other blockchain platforms, offering new solutions by enabling smart contracts’ implementation and execution. This project emphasizes into applying machine learning techniques including statistical methods, GNNs and LLMs to an extensive transaction dataset spanning multiple blockchain platforms. The project aims to uncover patterns, trends, and anomalies within the blockchain transactions for use cases such as identifying fraudulent activities, predicting cryptocurrency price fluctuations, and understanding the network’s growth dynamics. Another direction is the combination of data processing, feature engineering and application of Machine Learning to estimate the risk of transactions, assess the credit scoring of users and recommend strategies to mitigate risk.

We are looking for students who have background in applied machine learning. Any experience in the areas of Blockchain and Decentralized Finance are a plus!

Professor of Therapeutic Radiology; Director of Physics Research, Therapeutic Radiology; Associate Director of Medical Physics Residency Program, Therapeutic Radiology | Profile: 

Enabling Digital Twins for Predictive Oncology

The human body is a complex, multiscale, dynamical system with constant interactions within itself and with the environment. Many new technologies have been used for health profiling, such as functional and molecular imaging, liquid biopsies, digital pathology, genomic profiling, fitness trackers and wearables, and implantable sensors. While each of these technologies sheds light on one’s health state, these multimodal datasets are scattered and disconnected, not amenable to AI/ML analysis at scale.

Predictive oncology is to anticipate likely patient outcomes and health status based on multimodal data by modeling the dynamics and trajectory for individual cancer patient. One of the promising technologies to explore predictive oncology is by creating digital twins of cancer patients. A person’s digital twin may aid in monitoring health status, simulating patient outcome trajectories, developing tailored therapeutic strategies, preventing adverse effects, and improving lifestyle. In this project, we aim to develop novel AI/ML algorithms by modeling existing clinical, imaging, and radiotherapy datasets to enable cancer patient digital twins in radiation oncology.

We are looking for students to join our lab and help enable digital twins for predictive oncology via statistical, computational, mathematical, and mechanistic modeling of spatiotemporal patient data.

Victor S. Batista, FRSC
John Gamble Kirkwood Professor of Chemistry
Yale Quantum Institute & Yale Energy Sciences Institute
ACS Associate Editor, JCTC |

Quantum and Classical Machine Learning Models for Molecular Design

The incredible capabilities of generative machine learning models and recent advances in quantum computing have the potential to revolutionize the field of molecular design and drug discovery. My group is working on the development and implementation of generative algorithms for design of drugs and retrosynthetic pathways. We are currently working on state-of-the-art transformers, quantum convolutional neural networks, and quantum variational autoencoders for de novo molecular design, and development of a cloud server interface to make our methods available to external users from pharmaceutical companies. 

David van Dijk, Ph.D.
Assistant Professor of Medicine, Yale School of Medicine
Assistant Professor of Computer Science 
Project presented by Daniel Levine
Postdoctoral Associate |

“Using Machine Learning to understand the language of biology”

Recent advances in large language models provide new opportunities for decoding biology. Single-cell omics data encodes complex cellular behaviors and processes into high-dimensional molecular profiles. By treating these data as textual representations, we can apply and fine-tune neural language models to uncover the underlying grammatical rules governing biological systems. We have demonstrated that these models can learn to translate between species, matching cell types and gene expression programs between mice and humans in a completely unsupervised fashion. This cross-species translation highlights how fundamental aspects of biology form a universal language translatable across organisms. More broadly, interpreting single cell data as “biological text” enables leveraging powerful natural language processing approaches to find patterns, generate hypotheses, and gain conceptual understanding of biology.

Rohan Khera, MD, MS
Director, Cardiovascular Data Science (CarDS) Lab
Assistant Professor, Cardiovascular Medicine, Yale School of Medicine
Presented by Lovedeep Dhingra and Arya Aminorroaya
Postdoctoral Associates |

“Innovating Cardiovascular Care with Multimodality Data Science”

The Cardiovascular Data Science (CarDS) Lab at Yale leverages advances in deep learning and AI to enhance and automate care. The work uses numerous data streams in the electronic health record and focuses on natural language processing, federated learning, signal processing, and computer vision for enhanced inference, and develops and deploys novel convolutional neural networks and transformer models to address care challenges. The experience is ideal for students interested in health tech and/or medicine and looking to gain from a longitudinal research experience.

Eduardo Fernandez-Duque
Professor of Anthropology. School of the Environment |

Querying a Social Evolution Research Video Database for Research and Teaching

Eduardo Fernandez-Duque (Anthropology and School of the Environment) has been co-organizing the international remote Frontiers in Social Evolution Seminar Series (FINE website).  Researchers from > 20 countries and all continents have given 125 one-hour talks on their “social evolution” research followed by a 1-hour Q&A session.  All weekly seminars were recorded live and made publicly available in the FINE YouTube channel (FINE YouTube Channel).

Data set: 2,500 hours of videos on social evolution research and follow-up discussions.

Specific possible objectives:

1- to develop searching tools to query the video collection and to extract “material” (e.g. graphs, tables, images) from the videos
2- to produce series of short video clips illustrating topics that cut across many of the talks.

Reza Yaesoubi
Associate Professor of Public Health
Associate Professor, Institution for Social and Policy Studies |

Generating and evaluating simple classification rules to predict local surges in COVID-19 hospitalizations

Low rates of vaccination, emergence of novel variants of SARS-CoV-2, and increasing transmission relating to seasonal changes and relaxation of mitigation measures leave many US communities at risk for surges of COVID-19 that might strain hospital capacity. The trajectories of COVID-19 hospitalizations differ across communities, but existing predictive models of COVID-19 hospitalizations are almost exclusively focused on state-level predictions. We are interested to develop and evaluate methods to generate simple, interpretable classification rules to predict whether COVID-19 hospitalization will exceed the local hospitalization capacity in the short term.

Elena Grewal
Lecturer, Yale School of the Environment | 

Informing policy decisions to increase affordable housing

New Haven has an affordable housing crisis. The number of homeless students has doubled in the past year. Residents cannot afford to stay in their homes because of rent increases and a general shortage of affordable housing. While there are new apartments being built, many are high-end and not something that people who are being pushed out of their homes can afford. A policy to allow homeowners to make attics/basements and attached buildings to their own homes into rental units (ADUs) resulted in no additional housing being built.  It would be helpful to have data on the current housing stock and rental market to inform policy makers decisions.  

The Fair Rent Commission is tasked with reviewing rent increases and also reducing rents when tenants live in poor conditions (example here). Recently the commission has seen cases of parents with children and elderly on fixed income who do not have other options. There is a staff member who is tasked with knocking on doors to raise awareness of the commission and also to inspect housing conditions. It would be helpful to use data to target their efforts. In addition the commission is supposed to use the availability of other housing as a factor in decisions and there is no database available for this. The Fair Rent Commission can also make housing policy recommendations. 

Submit an Event

Interested in creating your own event, or have an event to share? Please fill the form if you’d like to send us an event you’d like to have added to the calendar.

Submit an Event

Share your event ideas with us using the form below.

"*" indicates required fields

MM slash DD slash YYYY
Start Time*
End Time*