Workshop: Healthcare Data Science


Healthcare Data Science Workshop on 5/1
Healthcare Data Science Workshop, organizers and speakers

This workshop, hosted by the Yale Institute for Foundation of Data Science, aims to channel the growing interest in healthcare among data scientists and help bridge the information gap for data scientists and clinical investigators, who have complementary expertise that can enhance scientific discovery and patient care.

Our target audience spans Yale University and Yale-New Haven Hospital, representing a multidisciplinary group of clinicians, research scientists, as well as students across Yale. Our plan is to define the broad value of data science in healthcare, identify the various data streams within healthcare settings, highlight the barriers to successful scientific investigations in this area, and discuss ways in which clinicians and data scientists can collaborate. We are planning to achieve these goals through short talks and panel discussions in the morning, followed by hands-on workshops highlighting some of the key takeaways for participants.

Organized by Rohan Khera
Assistant Professor of Medicine (Cardiovascular Medicine) and of Biostatistics (Health Informatics);
Clinical Director, Center for Health Informatics and Analytics, YNHH/Yale Center for Outcomes Research & Evaluation (CORE)
Director, Cardiovascular Data Science (CarDS) Lab
Yale University


8:00-8:05 – Rohan Khera, MD, MS: Introduction

8:05-9.00 – Lucila Ohno-Machado, MD, PhD: “Keynote Presentation: The unique challenges and opportunities in healthcare data science”

Session notes: The session will define why healthcare increasingly relies on data science and how there is an incredible opportunity to improve the health of people and societies by appropriately leveraging the various data streams. It will also identify the uniqueness of healthcare data sciences and the need for building specific expertise to translate scientific discoveries to healthcare.

9.00-9.15 – Coffee Break

9.15-10.15 – Marc Suchard MD, PhD: “Data science across silos in healthcare”

Session notes: This solution-oriented talk by Dr. Suchard will demonstrate the power of collaborative science for data-driven discoveries while successfully tackling data silos in healthcare. The work will highlight successful federated approaches that have proven to be a highway to successful multicenter and multinational studies.

10.15-10.45 – Smita Krishnaswamy, PhD: “Strategies to Tackle Multimodal Sparse Data in the Electronic Health Record”

Session notes: Dr. Krishnaswamy will provide methodological approaches to overcoming the challenges of working with real-world healthcare data that spans multiple domains and with often informative missingness. The overview will provide a way to appreciate how innovation in methods can solve key challenges in healthcare data science.

10:45-11:00 – Break

11.00-12.30 – Harlan Krumholz, MD, SM; Puneet Batra, PhD; & Panelists: “How to build successful clinician-investigator and data scientist collaborations”

Session notes: This interactive panel discussion will tackle how to best bridge the gap between clinician investigators and data scientists to enable successful discovery and have a major impact. The panel will discuss the talks in the morning, and where panel chairs will share their unique experiences building these bridges as a clinician (Dr. Krumholz) and a data scientist (Dr. Batra), with participation from the panel and audience.

12.30-13.30 – Lunch Break

13.30-15.00 – “Hands-on EHR Workshop by the CarDS Lab”

Session notes: Members of the Cardiovascular Data Science (CarDS) Lab at Yale School of Medicine will lead a hands-on demonstration of working with structured data in the electronic health record, the most widely available data stream across health systems. The EHR workshop will focus on applying research best practices when working with the EHR. The workshop will be interactive and will follow an instructor-led session format with the following learning objectives:

•           key issues when working with structured data from EHR

•           how to work with common data models to design scalable studies

•           how to apply statistical best practices to data from the EHR

The demonstration will include real-world de-identified EHR and a web-based format that allows participants to learn key analytic principles and design and test conducting a mini-study.


Lucila Ohno-Machado MD, MBA, PhD
Waldemar von Zedtwitz Professor of Medicine and Biomedical Informatics and Data Science; Deputy Dean for Biomedical Informatics; Chair, Section of Biomedical Informatics and Data Science

Lucila Ohno-Machado is Deputy Dean for Biomedical Informatics at Yale and Chair of Biomedical Informatics and Data Science. She is a MD, PhD, and MBA, and has received numerous awards for her leadership in informatics and is an elected member of the National Academy of Medicine, the American Society for Clinical Investigation, and the American College of Medical Informatics. Her research focuses on predictive models, data sharing, and innovative algorithms to distribute computation with local data.

Marc Suchard, MD, PhD
Professor of Biostatistics, Biomathematics, & Human Genetics

Marc Suchard is a professor in the Departments of Biostatistics, Biomathematics and Human Genetics at UCLA. He has a Medical Degree from UCLA and a PhD in Biomathematics from the same university. Dr. Suchard is helping to develop the nascent field of evolutionary medicine. This field harnesses the power of methods and theory from evolutionary biology to advance our understanding of human disease processes. Just as phylogenetic approaches have stimulated the field of evolution at large, they posses the potential to revolutionize evolutionary medicine, particularly in the study of rapidly evolving pathogens. To bridge the gap between phylogenetics and human-pathogen biology, Dr. Suchard’s interests focus on the development of novel reconstruction methods drawing heavily on statistical, mathematical and computation techniques. Some of his current projects involve jointly estimating alignments and phylogenies from molecular sequence data and mapping recombination hot-spots in the HIV genome.

Puneet Batra, PhD
Senior Principal, Flagship Pioneering

Puneet is a senior principal at Flagship Pioneering where he works as part of a venture-creation team leading machine learning strategy and helping advance Flagship portfolio companies working in generative drug design and materials.

Prior to joining Flagship Pioneering, Puneet was the Director of Machine Learning at the Broad Institute of Harvard & MIT, where he helped found the Machine Learning for Health group that broke new ground in the development and application of deep learning architectures for biological discovery in cardiovascular disease, metabolic disease, and brain health. Prior to the Broad, Puneet was lead scientist at Aster Data (Acq by Teradata).

He has published in Nature Genetics, The New England Journal of Medicine, The Lancet, and Physical Review D, and has served as Co-PI on grants from the American Heart Association, the Department of Energy, the National Institutes of Health, National Heart, Lung, and Blood Institute, and the Impetus Foundation. He serves on the advisory board of Our Health, a non-profit initiative to research the root causes of atherosclerotic disease in South Asians.

Puneet completed his B.A. at Harvard University and has a Ph.D. from Stanford University, both in theoretical physics.

Harlan Krumholz, MD, SM
Harold H. Hines, Jr. Professor of Medicine (Cardiology) and Professor in the Institute for Social and Policy Studies, of Investigative Medicine and of Public Health (Health Policy); Director, Center for Outcomes Research and Evaluation (CORE); Yale University

Harlan Krumholz is a cardiologist and scientist at Yale University and Yale New Haven Hospital, who has been honored by membership in the National Academy of Medicine, the Association of American Physicians, and the American Society for Clinical Investigation for his work to improve the quality and efficiency of care and eliminate disparities, as well as co-founding the Yale University Open Data Access (YODA) Project, medRxiv, HugoHealth, Refactor Health and the American Heart Association’s Quality of Care and Outcomes Research Council. He has published more than 1400 articles and three books with an h-index of more than 220.

Smita Krishnaswamy, PhD
Associate Professor of Genetics and of Computer Science

Smita Krishnaswamy is an Associate Professor in the departments of Computer Science (SEAS) and Genetics (YSM). She is part of the programs in Applied Mathematics, Computational Biology & Bioinformatics and Interdisciplinary Neuroscience. She is also affiliated with the Yale Institute for the foundations of data science, Wu-Tsai Institute, Yale Cancer Center. Smita’s lab works on fundamental deep learning and machine learning developments for representing and learning from big data. Her techniques incorporate mathematical priors from graph spectral theory, manifold learning, signal processing, and topology into machine learning and deep learning frameworks, in order to denoise and model the underlying systems faithfully for predictive insight. Currently her methods are being widely used for data denoising, visualization, generative modeling, dynamics. modeling, comparative analysis and domain transfer in datasets arising from stem cell biology, cancer, immunology and structural biology (among others).

Smita teaches several courses including: Deep Learning Theory and Applications, Unsupervised learning, and Geometric and Topological Methods in Machine Learning. Prior to joining Yale, Smita completed her postdoctoral training at Columbia University in the systems biology department where she focused on learning computational models of cellular signaling from single-cell mass cytometry data. She obtained her Ph.D. from EECS department at University of Michigan where her research focused on algorithms for automated synthesis and probabilistic verification of nanoscale logic circuits. Following her time in Michigan, Smita spent 2 years at IBM’s TJ Watson Research Center as a researcher in the systems division where she worked on automated bug finding and error correction in logic. Smita’s work over the years has won several awards including the NSF CAREER Award, Sloan Faculty Fellowship, and Blavatnik fund for Innovation.