Transcript
Page 1: Structural Variation Landscape Across 26 Human Populations ... · Cao, H., et al., Rapid detection of structural variation in a human genome using NanoChannel-based genome mapping

Position (kbp)

Background

Methods

Abstract

(1) Long molecules of DNA are labeled with Bionano reagents by (2) incorporation of fluorophore labeled nucleotides at a specific sequence motif throughout the genome. (3) The labeled genomic DNA is then linearized in the Saphyr Chip using NanoChannel arrays (4) Single molecule are imaged and then digitized by the Saphyr instrument. (5) Molecules are labeled with a unique signature pattern that is uniquely identifiable and useful in assembly into genome maps. (6) Bionano maps may be used in a variety of downstream analysis using Bionano Access software.

Extraction of long DNA molecules Label DNA at specific sequence motifs

Saphyr Chip linearizes DNA in NanoChannel arrays

Saphyr automates imaging of single molecules in NanoChannel arrays

Molecules and labels detected in images by instrument software

Bionano Access software assembles optical maps

1 2 3 4 5 6

Blood Cell Tissue Microbes

Free DNA Solution DNA in a Microchannel DNA in a Nanochannel

Gaussian Coil Partially Elongated Linearized

Free DNA Displaced Strand

Polymerase Nick Site Nickase Recognition

Motif

©20

17 B

iona

no G

enom

ics.

All

right

s re

serv

ed.

Structural Variation Landscape Across 26 Human Populations Reveals Population Specific Variation Patterns in Complex Genomic regions

Structuralvaria+on(SV)studiesusingdifferentethnicgroupsatpopula+onlevelleadtogreaterinsightinthegenomicandtraitdiversityanddifferencesindiseasee+ology.Whilestructuralvaria+on(SV)basedonshort-readsequencesandsta+s+calphasinghavebeenconstructedforsamplescomprisingthe1000GenomesProject1,thesensi+vityofdetec+onandlocaliza+onofsomeclassesofSVs(suchaslonginser+ons,inversions,copynumbervaria+ons,andduplica+onsspanningtensofkbpormore)aresubop+mal.Wehaveconstructedgenomeop+calmaps2usingBionanonext-genera+onmapping(NGM)for146unrelatedindividualsfrom26humanpopula+onswithlongDNAmolecules(>150kbp)fluorescentlylabeledatspecificsequencemo+fs(nickaserecogni+onsites).These

samplesconsistof6individuals(3malesand3females)fromeachof26humanpopula+onsofthe1000GenomesCollec+on.Asthedataaregeneratedfromna+veDNAwithoutamplifica+onandassembledwithouttheuseofthe

humanreferencegenome,thegenomemapsaredenovoassembliesofthe146genomes.AllSVs>1.5kbparevisualizedandanalyzedbyalgorithmsdevelopedbyBionanoandtheteamthatpar+cipatedinthisstudy.

Whenthemo+fpaYernsfromthesegenomeop+calmapswerecomparedagainsttheinsilicomapsdigitallyderivedfromthehumanreferencegenomeandagainsteachother,wefoundthattherewereclearspecificSVpaYernsamongdifferentethnicgroupsandindividualsinthepopula+on.Thesepopula+onSVpaYernsaremostpronouncedincomplexregionsofthegenomewherelarge(>50kbp)inversionsandtandemduplica+onsaremixedtogetherinthesameloci.Theseregionsincludethelociformicrodele+onsyndromes(suchas7q11.23,15q13.3,16p11.2and22q11.2)andsubtelomericregionswhereneariden+cal,longrepeatsrenderthemhotspotsforSVforma+onandintractableforshort-readsequencestoassembleintouniquecon+gs.

Genera+nghigh-qualityfinishedgenomesrepletewithaccurateiden+fica+onofstructuralvaria+onandhighcomple+on(minimalgaps)remainschallengingusingshortreadsequencingtechnologiesalone.BionanoNGMprovidesdirectvisualiza+onoflongDNAmoleculesintheirna+vestate,bypassingthesta+s+calinferenceneededtoalignpaired-endreadswithanuncertaininsertsizedistribu+on.Theselonglabeledmoleculesaredenovoassembledintophysicalmapsspanningthewholegenome.Theresul+ngorderandorienta+onofsequenceelementsinthemapcanbeusedforanchoringNGScon+gsandstructuralvaria+ondetec+on.

HRCao4,C.Chu1,A.Leung3,L.Li3,C.Lin1,J.McCaffrey2,,Y.Mostovoy1,A.Naguib4,E.Lam4,A.Poon1,S.Pastor2,R.Rajagopalan2,J.Sibert2,M.Sakin1,W.Wang4,A.Has+e4,E.Young2,T.Chan3,K.Yip3,M.Xiao2,P.Kwok1

Conclusions Wehaveconstructedgenomeop+calmapsusingBionanoNGMfor146unrelatedindividualsfrom26humanpopula+onswithlongDNAmolecules

(>150kbp)fluorescentlylabeledatspecificsequencemo+fs(nickaserecogni+onsites).Thesesamplesconsistof6individuals(3malesand3females)fromeachof26humanpopula+onsofthe1000GenomesCollec+on.

HerewedemonstratetheabilityoflongsinglemoleculemappingtoresolvecomplexlongrangeSVs,some+meswithmul+plehaplotypes,inthehumangenomeandprovidenew“alterna+ve”humanpopula+onbasedreferencesfortheseregionsthatareassociatedwithimportanthumandiseases.Thepopula+onspecificSVpaYernshavebeenshowntopresentinrela+ve“well-behaved”aswellasvariablecomplexregions,sheddinglightontheoriginsofthecomplexregionsandthepaYernsmorecloselyassociatedwithhumandisease.Inconclusion,BionanoNGMmayprovetobetheonecost-effec<ve,fastandcomprehensivepla?ormforpopula<onlevelstudyoffunc<onally-relevantlargestructuralvariants,pavingthewayfortheeraofprecisiongenomicsandmedicine. .

Reference Sudmant PH et al. An integrated map of structural variation in 2,504 human genomes. Nature. 2015; 526:75-81. Mak AC et al. Genome-Wide Structural Variation Detection by Genome Mapping on NanoChannel Arrays. Genetics. 2016; 202:351-62. Cao, H., et al., Rapid detection of structural variation in a human genome using NanoChannel-based genome mapping technology. Gigascience (2014); 3(1):34 Lam, E.T., et al. Genome mapping on NanoChannel arrays for structural variation analysis and sequence assembly. Nature Biotechnology (2012); 10: 2303

1)UniversityofCalifornia,SanFrancisco,SanFrancisco,CA;2)DrexelUniversity,Philadelphia,PA. 3)CUHK,Sha+n,HongKong;4)BionanoGenomics,Inc.,LaJolla,CA

DenovoAssembledGenomeMapsof146unrelatedindividualsfrom26humanpopula<onsAnalyzedforSVs

SummaryofSVSta<s<cs

http://www.1000genomes.org/sites/1000genomes.org/files/documents/1000-genomes-map_11-6-12-2_750.jpg

•  5.6% of the reference genome not present in maps•  ~20 Mbp new genomic content not found in reference genome•  5% of the reference genome is covered in <20% of the assemblies•  ~70% of the genome is “well-behaved” and covered by most

individuals•  ~1800SVsarecommoninallsuper-popula+on(Black)•  ~1500SVsaresharedatleastin2oftheSuPop(Grey)•  Largepropor+onsofuniqueSVsinAFR(~2100)(yellow)

•  Largepropor+onsofuniqueSVsinAFR(42%)

VariableComplexityObservedintheMHCRegion(chr6:28.5-33.5M)

•  Thewholeregionspansacrossalongrange(5Mbp)•  Anoverviewofcon+g-to-referencemappingshows

differentdegreesofvaria<onsamongsub-regions28Mb 33Mb

C D F1yellowlinefor1con+gEachsamplemayhavemul+plecon+gsUnmappedregionsdenotedingreen

B E GA

Highcomplexity

1

Reference

Con+g

Pattern 4: C<-G

A C

D

E

F

G

B Pattern 1

Pattern 2

Pattern 3 Pattern 4 C

A B

F G

C G

Pattern 1: A->B->C->E->F->G

Pattern 2: A->B<-D->G

Pattern 3: A<-C->F->G

SegmentalDuplica+onRegion:16p12

AFRisthedeepestsplitsamong

Popula<onstructurestudyPhylogene<ctree(Fst)

Top Related