Gerstein Lab: Diego Garrido (Univ of Barcelona) “A multivariate approach to study the genetic determinants of phenotypic traits”

Diego Garrido-Martín, PhD

Speaker: Diego Garrido-Martín, PhD
Assistant Professor
Department of Genetics, Microbiology and Statistics
University of Barcelona (Spain)

Title:               “A multivariate approach to study the genetic determinants of phenotypic traits

Date:               Monday, April 3rd 2023

Time:              2:30 – 3.30 PM

Place:              Yale Science Building, Room 352 

Host:               Mark Gerstein

Abstract: The increasing availability of phenotypic data at multiple levels – from the organismal to the molecular – in large cohorts of genotyped individuals enables genetic association studies (GWAS, molecular QTL mapping). These studies often test association with genetic variants using a single trait at a time, even though many biological phenotypes are intrinsically multi-trait: size and connectivity of brain regions, levels of blood lipids, facial and allometric traits, composition of the gut microbiota, abundances of alternative splicing isoforms, single-cell gene expression across cell types, or even automatically learnt features from histological images via deep convolutional autoencoder networks. Because of the correlated structure of these traits, joint (multivariate) analysis often results in increased statistical power to detect genetic associations, even when only a small fraction of the traits is affected by the genetic variants tested. However, commonly used multivariate methods either lack interpretability, tend to make strong assumptions on the distribution of the traits of interest or do not scale well to the size of current datasets. In this context, PERMANOVA offers a powerful non-parametric approach. However, it relies on permutations to assess significance, which hinders the analysis of large datasets. Here, we derive the limiting null distribution of the PERMANOVA test statistic, providing a framework for the fast computation of asymptotic p-values. We show that the asymptotic test presents controlled type I error and high power, comparable to or higher than parametric approaches. We illustrate the applicability of our method in a number of use-cases. Using the GTEx cohort, we perform the first population-biased splicing QTL mapping study across multiple tissues. We identify thousands of genetic variants that affect alternative splicing differently depending on ethnicity, including potential disease markers. Using the UK Biobank cohort, we perform the largest GWAS to date of MRI-derived volumes of hippocampal subfields. Most of the identified loci have not been previously related to the hippocampus, but many are associated to cognition or brain disorders, thus contributing to understand the intermediate traits through which genetic variants impact complex organismal phenotypes.