Smita Krishnaswamy

Smita Krishnaswamy is an Associate Professor in the departments of Computer Science (SEAS) and Genetics (YSM). She is part of the programs in Applied Mathematics, Computational Biology & Bioinformatics and Interdisciplinary Neuroscience. She is also affiliated with the Yale Center for Biomedical Data Science, Yale Cancer Center, Wu-Tsai Institute. Smita’s lab works at the intersection of computer science, applied math, computational biology, and signal processing to develop representation-learning and deep learning methods that enable exploratory analysis, scientific inference and prediction from big biomedical datasets. She has applied her methods on datasets generated from single-cell sequencing, structural biology, biomedical imaging, brain activity recording, electronic health records on a wide variety of biological, cellular, and disease systems. Her techniques generally incorporate mathematical priors from graph spectral theory, manifold learning, signal processing, and topology into machine learning and deep learning frameworks, in order to denoise and model the underlying systems faithfully for predictive insight. Currently her methods are being widely used for data denoising, visualization, generative modeling, dynamics. modeling, comparative analysis and domain transfer.

Smita teaches several courses including: Deep Learning Theory and Applications, Unsupervised learning, and Geometric and Topological Methods in Machine Learning. Prior to joining Yale, Smita completed her postdoctoral training at Columbia University in the systems biology department where she focused on learning computational models of cellular signaling from single-cell mass cytometry data. She obtained her Ph.D. from EECS department at University of Michigan where her research focused on algorithms for automated synthesis and probabilistic verification of nanoscale logic circuits. Following her time in Michigan, Smita spent 2 years at IBM’s TJ Watson Research Center as a researcher in the systems division where she worked on automated bug finding and error correction in logic. Smita’s work over the years has won several awards including the NSF CAREER Award, Sloan Faculty Fellowship, and Blavatnik fund for Innovation.

What do you do with data science?

The primary focus of my research is on Machine Learning for extracting patterns and insights from scientific data in order to drive biomedical discovery. While much of AI has focused on matching known patterns for classification, there is a great need for using AI to find unknown patterns and to generate plausible scientific hypotheses. My work is at the intersection of several fields including applied math, deep learning, data geometry, topology, manifold learning, and graph signal processing, all serving to tackle key challenges in data science. The problems I address are motivated by the ubiquity of high-throughput, high-dimensional data in the biomedical sciences — a result of breakthroughs in measurement technologies like single cell sequencing, proteomics, fMRI and vast improvements in health record data collection and storage. While these large datasets, containing millions of cellular or patient observations hold great potential for understanding the generative mechanisms, the state space of the data, as well as causal interactions driving development, disease and progression, they also pose new challenges in terms of noise, missing data, measurement artifacts, and the so-called “curse of dimensionality.” My research has been addressing these issues, by developing denoised data representations that are designed for data exploration, mechanistic understanding, and hypothesis generation.