use of the genome aggregation database (gnomad)

52
Use of the Genome Aggregation Database (gnomAD) Anne O’Donnell-Luria, MD, PhD Broad Institute of MIT and Harvard Boston Children’s Hospital H3Africa Rare Disease Workshop February 2021 Twitter: @AnneOtation

Upload: others

Post on 26-Jun-2022

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Use of the Genome Aggregation Database (gnomAD)

Use of the Genome Aggregation Database (gnomAD)

Anne O’Donnell-Luria, MD, PhDBroad Institute of MIT and Harvard

Boston Children’s Hospital

H3Africa Rare Disease Workshop February 2021

Twitter: @AnneOtation

Page 2: Use of the Genome Aggregation Database (gnomAD)

Every genome contains many rare, potentially functional variants

o ~2 million high quality variants in a variant call file (vcf)o ~20,000 variants within coding (exome) regionso ~500 rare missense variants (1/3 of which are predicted

damaging by in silico predictors)o ~100 LoF variants: ~20 homozygous, ~20 rareo ~100 rare variants in known disease geneso ~50 reported disease-causing mutations (!)o 1-2 de novo coding mutationso Unknown number of sequencing errors

In Mendelian disease analysis, how can we identify the pathogenic genetic variant(s) in

the sea of benign variation?

Page 3: Use of the Genome Aggregation Database (gnomAD)

Given known mutation rates, it is almost certain that every possible single base change compatible with

life exists in a living human

The power of 7.7 billion people

Page 4: Use of the Genome Aggregation Database (gnomAD)

Variant aggregation at Broad

• 20+ M pageviews from 184 countries• Aided in the diagnosis of over 200,000 patients with rare disease

Exome Aggregation Consortium (ExAC) – v160,076 exomesGenome Build 37released October 2014

Genome Aggregation Database (gnomAD) – v2125,748 exomes + 15,708 genomesGenome Build 37released October 2016

Genome Aggregation Database (gnomAD) – v371,702 genomesGenome Build 38released October 2019

Matt Solomonson

Page 5: Use of the Genome Aggregation Database (gnomAD)

Who’s in the gnomAD database?• Case-control studies of complex adult-onset diseases (e.g. type 2 diabetes, heart

attack, migraine, bipolar)

• Depleted as much as possible of people known to have severe pediatric disease, and their first-degree relatives

• 55% Male, Mean age 54 years

Grace Tiao

Page 6: Use of the Genome Aggregation Database (gnomAD)

Thank you to the PIs who share data in gnomADMaria AbreuCarlos A Aguilar SalinasTariq AhmadChristine M. AlbertNicholette AllredDavid AltshulerDiego ArdissinoGil AtzmonJohn BarnardLaurent BeaugerieGary BeechamEmelia J. BenjaminMichael BoehnkeLori BonnycastleErwin BottingerDonald BowdenMatthew BownSteven BrantHannia CamposJohn ChambersJuliana ChanDaniel ChasmanRex ChisholmJudy ChoRajiv ChowdhuryMina Chung

Wendy ChungBruce CohenAdolfo CorreaDana DabeleaMark DalyJohn DaneshDawood DarbarJoshua DennyRavindranath DuggiralaJosée DupuisPatrick T. EllinorRoberto ElosuaJeanette ErdmannTõnu EskoMartt FärkkiläDiane FatkinJose FlorezAndre FrankeGad GetzDavid GlahnBen GlaserStephen GlattDavid GoldsteinClicerio GonzalezLeif GroopChristopher Haiman

Ira HallCraig HanisMatthew HarmsMikko HiltunenMatti HoliChristina HultmanChaim JalasMikko KallelaJaakko KaprioSekar KathiresanEimear KennyBong-Jo KimYoung Jin KimGeorge KirovJaspal KoonerSeppo KoskinenHarlan M. KrumholzSubra KugathasanSoo Heon KwakMarkku LaaksoTerho LehtimäkiRuth LoosSteven A. LubitzRonald MaDaniel MacArthurGregory M. Marcus

Jaume MarrugatKari MattilaSteve McCarrollMark McCarthyJacob McCauleyDermot McGovernRuth McPhersonJames MeigsOlle MelanderDeborah MeyersLili MilaniBraxton MitchellAliya NaheedSaman NazarianBen NealePeter NilssonMichael O'DonovanDost OngurLorena OrozcoMichael OwenColin PalmerAarno PalotieKyong Soo ParkCarlos PatoAnn PulverDan Rader

Nazneen RahmanAlex ReinerAnne RemesStephen RichJohn D. RiouxSamuli RipattiDan RodenJerome I. RotterDanish SaleheenVeikko SalomaaNilesh SamaniJeremiah ScharfHeribert SchunkertSvati ShahMoore ShoemakerTai ShyongEdwin K. SilvermanPamela SklarGustav Smith

Broad Genomics PlatformBroad Data Sciences Platform

Hilkka SoininenHarry SokolTim SpectorNathan StitzielPatrick SullivanJaana SuvisaariKent TaylorYik Ying TeoTuomi TiinamaijaMing TsuangDan TurnerTeresa Tusie LunaErkki VartiainenJames WareHugh WatkinsRinse WeersmaMaija WessmanJames WilsonRamnik Xavier

Data processed through same pipeline and called jointly

Page 7: Use of the Genome Aggregation Database (gnomAD)

http://gnomad.broadinstitute.orgMPG Primers on using gnomAD

https://www.youtube.com/watch?v=J29_Pf8Ti1shttps://www.youtube.com/watch?v=Bh8AKkI-DhY

gnomAD v2 – GRCh37exomes and genomes

gnomAD v3 – GRCh38genomes only (for now)

Page 8: Use of the Genome Aggregation Database (gnomAD)
Page 9: Use of the Genome Aggregation Database (gnomAD)
Page 10: Use of the Genome Aggregation Database (gnomAD)

Additional features in gnomAD v3 (GRCh38)

In silico predictors

Released last week (also in gnomAD v2)

Page 11: Use of the Genome Aggregation Database (gnomAD)

Additional features in gnomAD v3 (GRCh38)

Two open access datasetsHuman Genome Diversity

Project (HGDP)

1000Genomes (1KG)

Page 12: Use of the Genome Aggregation Database (gnomAD)

Additional features in gnomAD v3 (GRCh38)

Two open access datasetsHuman Genome Diversity

Project (HGDP)

1000Genomes (1KG)

Page 13: Use of the Genome Aggregation Database (gnomAD)

Constraint

• Where to we see depletion of variation across human populations?

Conservation is between speciesConstraint is within the human species

Page 14: Use of the Genome Aggregation Database (gnomAD)

Developed a mutational model to calculate expected number of variants per gene

Kaitlin Samocha

(Samocha et al. 2014; Lek et al. 2016)

Daniel MacArthurMark Daly

Konrad Karczewski

Page 15: Use of the Genome Aggregation Database (gnomAD)

• Used z-scores to assess depletion from expectation• For pLoF variants, correlation with gene length (short genes under powered)

Constraint metrics

LOEUF

Page 16: Use of the Genome Aggregation Database (gnomAD)

Constraint metrics

LOEUF

Samocha et al. 2014; Lek et al. 2016

pLI

# of

gen

es

pLI = Probability of loss function intoleranceDichotomous metric (>0.9 LoF constrained)

Karczewski et al., Nature, 2020

LOEUF = Loss-of-function observed/expected upper bound fractionContinuous metric (lower scores more LoF constrained)

0.9

Page 17: Use of the Genome Aggregation Database (gnomAD)

• Binning this spectrum into deciles

Resolving the spectrum of LoF intolerance

More depletedMore constrained

More tolerantLess constrained

Page 18: Use of the Genome Aggregation Database (gnomAD)

• Known haploinsufficient genes where loss of one copy results in disease

Resolving the spectrum of LoF intolerance

ClinGen gene dosage map

Page 19: Use of the Genome Aggregation Database (gnomAD)

• Autosomal recessive genes are centered around 60% of expected

Resolving the spectrum of LoF intolerance

Gene list from:Blekhman et al., 2008

Berg et al., 2013

Page 20: Use of the Genome Aggregation Database (gnomAD)

• Some genes, e.g. olfactory receptors, are unconstrained

Resolving the spectrum of LoF intolerance

Page 21: Use of the Genome Aggregation Database (gnomAD)

Important notes on constraint• Primarily for dominant disease genes; do not expect to see strong

constraint signal (depletion of pLoF variation) for recessive disease genes• Not necessarily a sign of a strong phenotype – can see evidence of

constraint from mild degrees of negative selection • Selection occurs through reproduction so deleterious effect must be

before/during reproductive years • Example: BRCA1 does not show evidence of loss of function constraint

BRCA1Autosomal dominant

Breast and ovarian cancer

Page 22: Use of the Genome Aggregation Database (gnomAD)

Regional missense constraint (Samocha et al.)https://www.biorxiv.org/content/early/2017/06/12/148353

Page 23: Use of the Genome Aggregation Database (gnomAD)

Regional missense constraint (Samocha et al.)https://www.biorxiv.org/content/early/2017/06/12/148353

Page 24: Use of the Genome Aggregation Database (gnomAD)

Regional missense constraint (Samocha et al.)https://www.biorxiv.org/content/early/2017/06/12/148353

Page 25: Use of the Genome Aggregation Database (gnomAD)

DECIPHER Protein View: NSD1https://decipher.sanger.ac.uk/gene/NSD1/overview/protein-info

Page 26: Use of the Genome Aggregation Database (gnomAD)

Expression data from GTEx in gnomAD

Page 27: Use of the Genome Aggregation Database (gnomAD)

GTEx

NSD1

Showing only Pathogenic pLoF variants

Page 28: Use of the Genome Aggregation Database (gnomAD)

GTEx

NSD1

Page 29: Use of the Genome Aggregation Database (gnomAD)

GTEx Consortium et al., Nature, 2017https://www.gtexportal.org/home/

Gene expressionTranscript expressioneQTLs

Page 30: Use of the Genome Aggregation Database (gnomAD)

Example gene expression across tissues: NSD1

https://www.gtexportal.org/home/gene/NSD1

Page 31: Use of the Genome Aggregation Database (gnomAD)

Example gene expression across tissues: NSD1

https://www.gtexportal.org/home/gene/NSD1

Page 32: Use of the Genome Aggregation Database (gnomAD)

Transcript aware expression analysis - pext score

Cummings et al., Nature, 2020

Page 33: Use of the Genome Aggregation Database (gnomAD)

Exploring subsets

Page 34: Use of the Genome Aggregation Database (gnomAD)

Exploring subsets

Cancer means germline sample (typically blood) for individual recruited to a study for a non-blood cancer

Page 35: Use of the Genome Aggregation Database (gnomAD)

Gene page region view

Page 36: Use of the Genome Aggregation Database (gnomAD)

• gnomAD-SV callset from 10,847 whole genomes• Multiple algorithms followed by integration step• 433,371 unique structural variants with frequency data

RyanCollins

HarrisonBrand

MikeTalkowski

Page 37: Use of the Genome Aggregation Database (gnomAD)

gnomAD Structural Variants: RBM8A

Collins et al., Nature, 2020 gnomad.broadinstitute.org

Thrombocytopenia absent radius (TAR) syndrome (recessive)

Page 38: Use of the Genome Aggregation Database (gnomAD)

Previously reported “common” deletion in Thrombocytopenia absent radius (TAR) syndrome

Page 39: Use of the Genome Aggregation Database (gnomAD)
Page 40: Use of the Genome Aggregation Database (gnomAD)

Mitochondrial variants in gnomAD v3

Siekevitz P, Scientific American (Wikipedia)

37 genes

Page 41: Use of the Genome Aggregation Database (gnomAD)
Page 42: Use of the Genome Aggregation Database (gnomAD)

Vamsi MoothaSarah CalvoKristen LaricchiaMonkol LekNicole LakeDaniel MacArthur

Mitochondrialvariant page

Page 43: Use of the Genome Aggregation Database (gnomAD)

pLoF curations: predictions not the full story

Manually curated (gnomAD v2)• All homozygous pLOF variants

• 28% not LoF• Heterozygous pLoF variants in

some recessive genes• 25% not LoF

• Working on pLoF genes in haploinsufficient genes

• 60% not LoF

https://gnomad.broadinstitute.org/blog/2020-10-loss-of-function-curations-in-gnomad/

LoF Curation column if the gene has variants curated by the gnomAD team

Moriel Singer-Berk

Page 44: Use of the Genome Aggregation Database (gnomAD)

Genetics as a lever to understand biology

Discovery of novel disease genes Phenotyping humans with LoF variantsSlide from Daniel MacArthur

Page 45: Use of the Genome Aggregation Database (gnomAD)

Making sense of one exome requires tens of thousands of exomes (or genomes) to reveal rare variants

vs

Harnessing the power of allele frequency

Page 46: Use of the Genome Aggregation Database (gnomAD)

Five-fold reduction in number of very rare variants with large

reference databases

# variants remaining in an exome after applying a 0.1% filter across all populations

Both size and ancestral diversity increase filtering power6K 60K

East AsianSouth AsianLatinoAfricanEuropean

# va

riant

s rem

aini

ng a

fter

filte

ring

Lek et al., Nature, 2016

Page 47: Use of the Genome Aggregation Database (gnomAD)

2015

Evaluating rare variant pathogenicity

Page 48: Use of the Genome Aggregation Database (gnomAD)

Richards et al., Genet Med,

2015

Page 49: Use of the Genome Aggregation Database (gnomAD)

Conclusions• Reference population databases are public resources that are

critically important to evaluate variant rarity, which is necessary but not sufficient for pathogenicity for rare disease

• These datasets provide multiple tools (constraint, pext) to help prioritize variants to understand human biology

• The power of reference population datasets will increase as they grow in size and diversity

• gnomAD v4 with >300,000 exomes on GRCh38 anticipated in 2021 or 2022

Page 50: Use of the Genome Aggregation Database (gnomAD)

• Exome and genome data is in separate files

• Exome data is 125K exomes

• Genome data is 15K genomes

Available on• Google Cloud Public Datasets• Registry of Open Data on AWS• Azure Open Datasets

Page 51: Use of the Genome Aggregation Database (gnomAD)
Page 52: Use of the Genome Aggregation Database (gnomAD)

Acknowledgements

Steering CommitteeHeidi Rehm (co-PI)Mark Daly (co-PI)Daniel MacArthurBen NealeMike TalkowskiAnne O’Donnell-LuriaKonrad KarczewskiGrace TiaoMatt SolomonsonSamantha Baxter

Analysis teamKonrad KarczewskiLaurent FrancioliGrace TiaoKristen LaricchiaBeryl CummingsEric MinikelIrina ArmeanJames WareKaitlin SamochaNicola WhiffinQingbo WangRyan Collins

Structural VariationRyan CollinsHarrison BrandKonrad KarczewskiLaurent FrancioliNick WattsMatthew SolomonsonXuefang ZhaoLaura GauthierHarold WangChelsea LowtherMark WalkerChristopher WhelanTed BrookingsTed SharpeJack FuEric BanksMichael Talkowski

Browser teamMatthew SolomonsonNick WattsBen Weisburd

Ethics teamAndrea SaltzmanMolly SchleicherNamrata GuptaStacey Donnelly

Broad Genomics PlatformStacey GabrielKristen ConnollySteven Ferriera

Ben NealeCotton SeedTim PoterbaArcturus WangChris Vittal

Production teamGrace TiaoJulia GoodrichKatherine ChaoMichael WilsonWilliam PhuEric BanksCharlotte TolonenChristopher LlanwarneDavid RoazenDiane KaplanGordon WadeJeff GentryJose SotoKathleen TibbettsKristian CibulskisLaura GauthierLouis BergelsonMiguel CovarrubiasNikelle PetrilloRuchi MunshiSam NovodThibault JeandetValentin Ruano-RubioYossi Farjoun

https://www.nature.com/collections/afbgiddedegnomAD papers at:

https://gnomad.broadinstitute.org/aboutContact for ?s: [email protected] or [email protected]