advanced gene mapping coursemath.ucdenver.edu/~spaul/empty/hostedfiles/... · • chi-square test...

37
Advanced Gene Mapping Course Rockefeller University, NY Subrata Paul

Upload: others

Post on 29-Mar-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: advanced gene mapping coursemath.ucdenver.edu/~spaul/empty/hostedfiles/... · • Chi-square test of independence • Gains power with assumption that the two loci are independent

AdvancedGeneMappingCourse

RockefellerUniversity,NY

Subrata Paul

Page 2: advanced gene mapping coursemath.ucdenver.edu/~spaul/empty/hostedfiles/... · • Chi-square test of independence • Gains power with assumption that the two loci are independent

GeneticsforStatisticsiswhatPhysicsisforMathematics

Geneticsisaleadingmotivationfordevelopmentofnewbasicstatistics.

Page 3: advanced gene mapping coursemath.ucdenver.edu/~spaul/empty/hostedfiles/... · • Chi-square test of independence • Gains power with assumption that the two loci are independent

TopicsCovered

• Populationandfamilybasedassociationstudies• DataQC• Rarevariantassociationanalysis• DetectingInteraction• Imputation• Metaanalysis• Linearmixedmodel• eQTL mapping• Evolutionarygenetics• Incorporatefunctionalityinrarevariantassoc• Missingheritability

• PLINK• VAT• GenAbel• BEAM3• CASSI• MACH• MINIMAC• METAL• GCTA-MLMA• GERP• GenMAPP• Etc.

Page 4: advanced gene mapping coursemath.ucdenver.edu/~spaul/empty/hostedfiles/... · • Chi-square test of independence • Gains power with assumption that the two loci are independent

Instructors

• HeatherCordell,InstituteofGeneticMedicine,NewcastleUniversity,UK• SuzanneM.Leal,BaylorCollageofMedicine• GoncaloAbecasis,Univ ofMichiganSchoolofPublicHealth• NancyJ.Cox,VanderbiltGeneticsInstitute• Shamil Sunyaev,DepartmentofMedicine,HarvardMedicalSchool

Page 5: advanced gene mapping coursemath.ucdenver.edu/~spaul/empty/hostedfiles/... · • Chi-square test of independence • Gains power with assumption that the two loci are independent

GWAS:WTCCC

WelcomeTrustCaseControlConsortium

• 7differentdiseases:Bipolardisorder,coronaryartery,crohn's disease,hypertension,rheumatoidarthritis,type1andtype2diabetes.• 2000casesforeachdisease• Commonpopulation-basedcontrols• Foundsignals6outof7diseases• ExpandedtoWTCCC2andWTCCC3with5200commoncontrols

Page 6: advanced gene mapping coursemath.ucdenver.edu/~spaul/empty/hostedfiles/... · • Chi-square test of independence • Gains power with assumption that the two loci are independent

DataQC:

• LowCallrates;excessheterozygosity• Xchromosomemarkersusefulforcheckinggender• Checkingrelationshipandethnicity• Mendelianmisinheritances• Hardy-Weinbergdisequilibrium• MinorAlleleFrequency

Page 7: advanced gene mapping coursemath.ucdenver.edu/~spaul/empty/hostedfiles/... · • Chi-square test of independence • Gains power with assumption that the two loci are independent

DataQC:CallratesandHeterozygosity

Inbreeding

SampleContamination

Page 8: advanced gene mapping coursemath.ucdenver.edu/~spaul/empty/hostedfiles/... · • Chi-square test of independence • Gains power with assumption that the two loci are independent

AccessingSex

• MaleswithanexcessofheterozygousSNPsontheXchromosomecandenote• Malesmislabeledasfemales• MaleswithKlinefelter syndrome

• FemaleswithanexcessofhomozygousgenotypeontheXchromosomecandenote• Femalesmislabeledasmales• FemaleswithTurnerSyndrome

• Canbeobservedduetosamplemix-ups• Samplesforwhichthesexisincorrectshouldberemovedfromtheanalysis(probablynotthepersonyouthinkitis)

Page 9: advanced gene mapping coursemath.ucdenver.edu/~spaul/empty/hostedfiles/... · • Chi-square test of independence • Gains power with assumption that the two loci are independent

DataQC:

• Ethnicity

Page 10: advanced gene mapping coursemath.ucdenver.edu/~spaul/empty/hostedfiles/... · • Chi-square test of independence • Gains power with assumption that the two loci are independent

QQPlots(good)

Page 11: advanced gene mapping coursemath.ucdenver.edu/~spaul/empty/hostedfiles/... · • Chi-square test of independence • Gains power with assumption that the two loci are independent

QQPlots(bad)

Page 12: advanced gene mapping coursemath.ucdenver.edu/~spaul/empty/hostedfiles/... · • Chi-square test of independence • Gains power with assumption that the two loci are independent

GenomicInflationFactor

• GenomicInflationFactoristheratioofthemedianoftheteststatisticstoexpectedmedianandisusuallyrepresentedas𝜆• Noinflationoftheteststatistics𝜆 = 1• Inflation𝜆 > 1• Deflation𝜆 < 1

Page 13: advanced gene mapping coursemath.ucdenver.edu/~spaul/empty/hostedfiles/... · • Chi-square test of independence • Gains power with assumption that the two loci are independent

PopulationStratification

• Populationsampledactuallyconsistsofseveralsub-populationthatdonotintermix• Canleadtospuriousfalsepositive(type1errors)incase/controlstudies• Solutions:• PCA• MDS(MultidimensionalScaling)alsoknownasprincipalcoordinatesanalysis

Page 14: advanced gene mapping coursemath.ucdenver.edu/~spaul/empty/hostedfiles/... · • Chi-square test of independence • Gains power with assumption that the two loci are independent

PopulationStratification(PCA)

• Computetheeigenvectorsandeigenvaluesofmatrixofcorrelationsbetweenindividuals(basedonIBDorIBS)• Includeprincipalcomponentscoresfromtop10(say)eigenvectorsascovariatesinalogisticregressionanalysis• Plottingfirstprincipalcomponents(firsttwo)youcanvisualizeethnicoutliers• LinearMixedModel• Estimatekinshipmatrix(IBDsharing)betweenpairsofindividualsusinggenome-widegenotypedata• Usethistomodeltheir(extra)correlation,inalinearregressiontypeanalysis

Page 15: advanced gene mapping coursemath.ucdenver.edu/~spaul/empty/hostedfiles/... · • Chi-square test of independence • Gains power with assumption that the two loci are independent

PopStratification(Variancecomponentsmodels)

• Analternativeapproachbasedonvariancecomponentsmodelshasbeenproposed• Kangetal.(2010)NatGenet42:348-354• Zhangetal.(2010)NatGenet42:355-360

• Basedonmethodsdesignedtotestforgenotypeassociationswithquantitativetraits:linearregression

𝑦 = 𝜇 + 𝛽𝑥 + 𝜖Where,

𝑦 isthetraitvalue𝑥 isavariablecodingforgenotype𝜖 ∼ 𝑁(0, 𝜎3) Residualerror

Page 16: advanced gene mapping coursemath.ucdenver.edu/~spaul/empty/hostedfiles/... · • Chi-square test of independence • Gains power with assumption that the two loci are independent

VarianceComponents(mixed)models• Linearmixedmodelsallowthisideatobeappliedtorelatedindividuals• 𝜖 ∼ 𝑀𝑉𝑁(0, 𝑉) wherevariance/covariancematrix𝑉 followsstandardvariancecomponentsmodel,accountingforknownkinship

• 𝑉78 = 𝜎93 + 𝜎:3 𝑖 = 𝑗• 𝑉78 = 2Φ78𝜎93 𝑖 ≠ 𝑗

• 𝜎93, 𝜎:3 representstheadditivepolygenicvariance(duetoallloci)andtheenvironmental(=error)variancerespectively

• Φ78 ishalftheexpectedIBDsharingbetweenindividuals𝑖and𝑗(=theirkinshipcoefficient)

• CloselyrelatedtoQTDT(Abecasis et.al2000a;b)whichimplementsaslightlymoregeneral/complexmodel• Softwaretoimpement :GenABEL,EMMAX,FaST-LMM,GEMMA,MMM

Page 17: advanced gene mapping coursemath.ucdenver.edu/~spaul/empty/hostedfiles/... · • Chi-square test of independence • Gains power with assumption that the two loci are independent

LinearMixedModel(detailed)𝑌7 =A𝛽8𝑋78 + 𝜖

8

𝑋78- Normalizedgenotypeofindividual𝑖 atSNP𝑗Inthematrixform:

𝑦D = 𝑋�̅� + 𝜖Twoimportantmatrices

𝐿𝐷 =1𝑀𝑋H𝑋

𝐺𝑅𝑀 =1𝑁𝑋𝑋

H

Page 18: advanced gene mapping coursemath.ucdenver.edu/~spaul/empty/hostedfiles/... · • Chi-square test of independence • Gains power with assumption that the two loci are independent

LinearMixedModel(detailed)Ourmodel

𝑌7 =A𝛽8𝑋78 + 𝜖�

8Wehavetofitmarkersindividually

𝑌7 = 𝛽K + 𝑋K +A𝛽8𝑋78 + 𝜖 ∼ 𝛽K𝑋K + 𝜖′�

8M3ForeachSNPwecanfitthemodel

𝑌7 = 𝛽𝑋7 + 𝑢7 + 𝜖𝜖 ∼ 𝑁 0, 𝐼𝜎3 𝑢 ∼ 𝑀𝑉𝑁(𝑜, 𝐺𝑅𝑀)

Page 19: advanced gene mapping coursemath.ucdenver.edu/~spaul/empty/hostedfiles/... · • Chi-square test of independence • Gains power with assumption that the two loci are independent

ROADTRIPS

• RobustAssociation-DetectionTestforRelatedIndividualswithPopulationSubstructure• ThorntonandMcPeek (2010)AJHG86:172-184

• ExtensionofMQLS(MaximumQuassi-LikelihoodStatistic)• Bothmethodsconstructadjustedversionofcase/control𝜒3(orArmitageTrend)test• Usingknownpedigreerelationshipstocorrectforrelatedness• ROADTRIPSalsousescovariancematrixbasedonkinship/IBDsharingtocorrectforunknownrelatedness/populationstratification

Page 20: advanced gene mapping coursemath.ucdenver.edu/~spaul/empty/hostedfiles/... · • Chi-square test of independence • Gains power with assumption that the two loci are independent

ComplexTrait:RareVariants

• MRV– MultipleRareVarianthypothesis:Complextraitsaretheresultofmultiplerarevariantswithalargephenotypiceffect• Largeeffectsizecomparedtocommonvariants• Althoughthesevariantsarerarecollectivelytheymaybequitecommon• Strongevidencethatrarevariantsplayanimportantrole

Page 21: advanced gene mapping coursemath.ucdenver.edu/~spaul/empty/hostedfiles/... · • Chi-square test of independence • Gains power with assumption that the two loci are independent

FunctionalRareVariants

Keizun,Garimella,Do,Stitziel etal.NatureGenetics2012

Page 22: advanced gene mapping coursemath.ucdenver.edu/~spaul/empty/hostedfiles/... · • Chi-square test of independence • Gains power with assumption that the two loci are independent

AnalysisofRareVariants

• Difficulties• Lackofararevariantcatalogwithreferencegenotypes• Largesamplesizeneeded.

• Samplingallelewithfrequency.5%or.05%withprobability99%needs460or4600individualsrespectively.

• Betteranalyticaltoolboxneededtogain power.• Commonvariantshaveonlyalimitedcapacitytotagrarevariants

• SingleMarkerTest• Chi-squaretest• Cochran-Armitagetestfortrend

• MultipleMarkerTest• Hotelling's T^2• LogisticRegression• Minimalp-value

Page 23: advanced gene mapping coursemath.ucdenver.edu/~spaul/empty/hostedfiles/... · • Chi-square test of independence • Gains power with assumption that the two loci are independent

SingleMarkerTest• Forcase-controldatapossiblemethods:chi-squared,Fishers'exact,Cocharn-Armitagetrend,logisticregression(linearregression)• Fisher'sexacttestisrecommendedwhentherearesmallcounts• Regressionanalysiscontrollingforcofounders• Correctionformultiplecomparisonsneeded• ControllingFWERresultsinaloseofpower• Obtainempiricalp-valuesbyrandompermutationorcontrolFDR(sequential Bonferroni-typeprocedure).• Samplesizemustbeverylargeforsufficientpower

• Need6,400,54,000and540,000samplesforMAF0.1,0.01and0.001toget80%power• Successexample:insulinprocessing;Sample– 8000,variantsinSGSM2withMAF=1.4%,𝑝 = 8.7×10WKX andMADDwithMAF3.7%,𝑝 = 7.6×10WKZ

Page 24: advanced gene mapping coursemath.ucdenver.edu/~spaul/empty/hostedfiles/... · • Chi-square test of independence • Gains power with assumption that the two loci are independent

MultipleMarkerTests

• Multipleregression:reduceddegreesoffreem• Hotelling’s twosample𝑇3 test:

• Reductionofpowerwithnumberofvariants• Greatlyeffectedbymaf• Identifiedriskallele(direction)isneeded

• MDMR(MultivariateDistanceMatrixRegression)• Usesgeneticsimilarityofindividuals• Don’tneedtoidentifyriskalleleateachvariant

• KBAT(Kernel-BasedAssociationTest)• Basedongenotypesimilarityscorebetweenindividualsmeasuredbyakernelfunction

• Noassumptionaboutdirection• Canhandlecorrelatedand/orindependentSNPs

Page 25: advanced gene mapping coursemath.ucdenver.edu/~spaul/empty/hostedfiles/... · • Chi-square test of independence • Gains power with assumption that the two loci are independent

GenebasedAggregationTests

• Regressionbasedtests• Burdentests(collapsing)• Adaptiveburdentests• Variancecomponenttests• Combinationoftheabove

• Evaluatecumulativeeffectsofmultiplevariants• CMC(CombinedMultivariateandCollapsing)

Page 26: advanced gene mapping coursemath.ucdenver.edu/~spaul/empty/hostedfiles/... · • Chi-square test of independence • Gains power with assumption that the two loci are independent

CMC

Page 27: advanced gene mapping coursemath.ucdenver.edu/~spaul/empty/hostedfiles/... · • Chi-square test of independence • Gains power with assumption that the two loci are independent

ResentMethodsandSummary

• CMC– jointlyassessesroleofcommonandrarevariants• WSS– Weightedsumstatistics• KBAC– Kernelbasedadaptiveclustertest:weightingscheme• SKAT– sequencekernelassociationtest• Powertodetectassociationdepends• Thenumberandproportionofcausalvariants• Populationfrequency• Theireffectsizesanddirectionality• Numberofgenescontributingtothetrait• Thefractionofcausalvariantslocated(bysequencinge.g.exomeseq)

Page 28: advanced gene mapping coursemath.ucdenver.edu/~spaul/empty/hostedfiles/... · • Chi-square test of independence • Gains power with assumption that the two loci are independent

RecentmethodsandSummary

• Statisticaltestsaresensitivetodiseasearchitecture• Differenttestshowsstrengthfordifferenteffectsizedistribution:• WWS:1/𝑥(1 − 𝑥);x-populationfreq.• SKAT:𝛽(𝑥; 𝑎K, 𝑎3) forpre-specified𝑎K, 𝑎3

• Allowoppositeeffectsontraits• Step-up,C-alpha,thereplication-basedtest,SKAT

Page 29: advanced gene mapping coursemath.ucdenver.edu/~spaul/empty/hostedfiles/... · • Chi-square test of independence • Gains power with assumption that the two loci are independent

SoftwarePackages

Page 30: advanced gene mapping coursemath.ucdenver.edu/~spaul/empty/hostedfiles/... · • Chi-square test of independence • Gains power with assumption that the two loci are independent

Gene× GeneInteraction

Page 31: advanced gene mapping coursemath.ucdenver.edu/~spaul/empty/hostedfiles/... · • Chi-square test of independence • Gains power with assumption that the two loci are independent

Gene× GeneInteraction

Page 32: advanced gene mapping coursemath.ucdenver.edu/~spaul/empty/hostedfiles/... · • Chi-square test of independence • Gains power with assumption that the two loci are independent

Gene× GeneInteraction

Page 33: advanced gene mapping coursemath.ucdenver.edu/~spaul/empty/hostedfiles/... · • Chi-square test of independence • Gains power with assumption that the two loci are independent

Gene× GeneInteraction

Page 34: advanced gene mapping coursemath.ucdenver.edu/~spaul/empty/hostedfiles/... · • Chi-square test of independence • Gains power with assumption that the two loci are independent

TestingforInteraction

• Logistic(linear)regressionforcase/controldata• ‘—epistasis’inPLINK• Morepowerful:Case-onlyanalysis• Interaction⟺ Correlationbetweenrelevantpredictors• TestNullhypothesis:twolociareindependent(nocorrelation)• Chi-squaretestofindependence• Gainspowerwithassumptionthatthetwolociareindependentinpopulation• Preferabletoincorporatecase-onlyandcase-controlestimatorintoasingletest(greaterpowerthanlogistic);--fast-epistasisinPLINKperformssuchtest

Page 35: advanced gene mapping coursemath.ucdenver.edu/~spaul/empty/hostedfiles/... · • Chi-square test of independence • Gains power with assumption that the two loci are independent

PLINK--fast-epistasis

Page 36: advanced gene mapping coursemath.ucdenver.edu/~spaul/empty/hostedfiles/... · • Chi-square test of independence • Gains power with assumption that the two loci are independent

• ExhaustiveSearch:useGPUs,suffersfrommultipletesting• Dataminingapproach:usecross–validationtoavoidoverfitting• MultifactorDimensionalityReduction(Ritchieetal.(2001)AJHG)• RandomForest(CART)• Penalizedregressionmethods(Zhuetal.(2014))• Entropybasedmethods• BEAM(Zhangetal.(2007))• Bayesianmodelselection• MCMC,MECPM(JiangandNeapolitan(2015))

OtherTechniques

Page 37: advanced gene mapping coursemath.ucdenver.edu/~spaul/empty/hostedfiles/... · • Chi-square test of independence • Gains power with assumption that the two loci are independent

THANKYOU