Harsh Parikh, PhD
Assistant Professor of Biostatistics
Harsh Parikh is an Assistant Professor in the Department of Biostatistics at Yale University, where he develops machine learning–aided causal inference approaches for solving high-stakes problems. His research focuses on methods that are accurate, enabling the estimation of heterogeneous treatment effects in complex scenarios with limited data; trustworthy, allowing domain experts to interpret the underlying machinery, validate assumptions, and ensure safety; and domain-conscious, leveraging contextual knowledge to design applicable solutions that narrow the gap between research and practice. Harsh earned his PhD in Computer Science from Duke University, where he received the Outstanding PhD Dissertation Award in 2023 and was an Amazon Graduate Fellow in 2021. He also holds an MS in Economics and Computation from Duke University and a BTech in Computer Science from the Indian Institute of Technology Delhi.
What do you do with Data Science?
I use data science to design machine learning–aided causal inference methods that help answer high-stakes questions in health, policy, and other applied domains. My work focuses on creating accurate approaches—able to estimate heterogeneous treatment effects in complex, small-data settings; trustworthy—transparent enough for domain experts to interpret, validate assumptions, and ensure safety; and domain-conscious—incorporating contextual knowledge to bridge the gap between research and practice. I have developed methods for integrating experimental and observational data, generalizing trial findings to new populations, and learning optimal treatment regimes, with applications ranging from "evaluating anti-seizure treatments in acute brain injury patients" to "characterizing underrepresented groups in clinical trials". My publications appear in venues such as the Journal of the American Statistical Association, Harvard Data Science Review, Journal of Machine Learning Research, Nature Communications and The Lancet Digital Health. Going forward, I aim to expand these methods to settings with complex data structures—such as networks, longitudinal records, and multi-modal health data—to enable safer and trustworthy decision-making.
