banner02
Mobile element insertions and structural variation in a large human population reference database
Alex Yenkin  1, 2, 3@  , Xuefang Zhao  2, 3, 4  , Ryan Collins  3, 5  , Mark Walker  2, 3  , Jack Fu  2, 3, 4  , Christopher Whelan  2, 3  , Emma Pierce-Hoffman  2, 3  , Cal Liao  2, 3  , Chelsea Lowther  2, 3, 4  , Vahid Jallili  3  , Nehir Kurtas  2, 3, 4  , Daniel Ben-Isvy  1, 2, 3  , Lily Wang  1, 2, 3  , Alba Sanchis-Juan  2, 3, 4  , Stephanie Hao  2, 3, 4  , Harrison Brand  2, 3, 4  , Michael Talkowski  2, 3, 4  
1 : Division of Medical Sciences, Harvard Medical School
260 Longwood Ave, TMEC 435 Boston, MA 02115 -  United States
2 : Center for Genomic Medicine, Massachusetts General Hospital
185 Cambridge Street, Boston, MA 02114 -  United States
3 : Broad Institute of MIT and Harvard
Cambridge, MA 02142 -  United States
4 : Department of Neurology, Massachusetts General Hospital
55 Fruit Street Boston, MA 02114 -  United States
5 : Dana-Farber Cancer Institute [Boston]
450 Brookline Ave.Boston, MA 02215 -  United States

Structural variants (SVs) are a numerous and inadequately characterized source of human genetic variation that occur from a diverse set of molecular mechanisms. They include large deletions, duplications, inversions, mobile element insertions (MEIs), and more. Using our state-of-the-art SV calling pipeline, GATK-SV, we have identified over 1.1 million high quality SVs from 63,046 unrelated individuals with whole genome sequencing (WGS) data, newly released as part of the Genome Aggregation Database (gnomAD). Within the gnomAD v4 callset, we identify over 200,000 MEIs, comprising 173,374 Alu, 30,223 L1, and 17,607 SINE-VNTR-Alu (SVA) element insertions, the largest high-quality callset of MEIs released to date. We examined distributional patterns of MEIs across the genomes and gene content, as well as their frequency within different functional annotation categories. We also identified variant distributions within many different sets of noncoding annotations from numerous sources. Additionally, we use the GATK-SV variant calls within the newest release of the GTEx dataset, which consists of paired WGS and transcriptome data from 851 individuals across 54 tissues. Upon identifying expression and splicing quantitative trait loci, we put forth the largest set of common MEIs that are likely causal for changes in expression and splicing.


Online user: 5 Privacy
Loading...