ch927 quantitative genomics lecture 2 how can … · by the end of this lecture you should be able...

41
How can quantitative traits be mapped? CH927 Quantitative Genomics Lecture 2

Upload: phungthu

Post on 06-Apr-2018

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: CH927 Quantitative Genomics Lecture 2 How can … · By the end of this lecture you should be able to explain: ... (CIM), Multiple Interval Mapping (MIM) and under which experimental

How can quantitative traits be mapped?

CH927 Quantitative GenomicsLecture 2

Page 2: CH927 Quantitative Genomics Lecture 2 How can … · By the end of this lecture you should be able to explain: ... (CIM), Multiple Interval Mapping (MIM) and under which experimental

By the end of this lecture you should be able to explain:

• What the main steps in QTL mapping are

• What the different methods for QTL analysis are:

- Single marker, Interval/Flanking Mapping - Composite Interval Mapping (CIM), Multiple Interval Mapping (MIM)

and under which experimental conditions they should be used

• What the different statistical methods for QTL analysis are:

- t-test, ANOVA, multiple regression, linear regression and what they can predict about QTLs

Lecture objectives

Page 3: CH927 Quantitative Genomics Lecture 2 How can … · By the end of this lecture you should be able to explain: ... (CIM), Multiple Interval Mapping (MIM) and under which experimental

Basis for QTL mapping known for over 70 years but lack ofgenetic markers prevented widespread use until the mid 80’s

• With DNA sequencing, the number & density of markers have grown

• Also, more statistically-sophisticated mapping methods have been developed

1. Score a population for (i) a trait, and (ii) distribution of genome markers

2. Identify regions of the genome containing QTLs based on occurance of a phenotype- marker association that is significantly more likely than chance

Page 4: CH927 Quantitative Genomics Lecture 2 How can … · By the end of this lecture you should be able to explain: ... (CIM), Multiple Interval Mapping (MIM) and under which experimental

• Results from marker A/a: suggests that the gene is very close to the marker

Association of phenotypes with markers

aGBa A

B b

G g

aGb AgB aGb Agb

Agb aGB aGb AgB

• Results from marker B/b: suggests that the gene is not linked to the marker

aGb

AgB

aGb

AgBaGB

Agb

aGB

aGb Agb

A/a and B/b = molecular scores

G/g = phenotypic score

Page 5: CH927 Quantitative Genomics Lecture 2 How can … · By the end of this lecture you should be able to explain: ... (CIM), Multiple Interval Mapping (MIM) and under which experimental

This is a generalisation of the principle...but for only one gene.We need to consider Quantitative Trait Loci (multiple)

aGcHBJkD

c C

B b

D d

a A

H h

G g

k KJ j

AgcHbjkd

aGcHBJKd

AgcHbjkD

AgcHBJKD

aGcHbjkd

AgcHBJkD

AgcHBJkd

aGcHbjkD

aGcHBJKd

AgcHbjkD

AgcHBJkD

aGcHbjkD

aGcHBJkD

AgcHbjkd

AgcHBJKD

aGcHbjkd

AgcHBJkd

Page 6: CH927 Quantitative Genomics Lecture 2 How can … · By the end of this lecture you should be able to explain: ... (CIM), Multiple Interval Mapping (MIM) and under which experimental

3. Estimate the effects of the QTLs on the quantitative trait:

- many genes with small effect each or few genes with large effect each? - their effects on the trait: is gene action additive or dominant? - their positions in the genome: linkage and association, epistasis - their interaction with the environment

4. Identify candidate genes underlying the QTL and thus the trait

Objectives of QTL analysis

1. Score a population for (i) a trait, and (ii) distribution of genome markers

2. Identify regions of the genome containing QTLs based on occurence of a phenotype-marker association that is significantly more likely than chance

Page 7: CH927 Quantitative Genomics Lecture 2 How can … · By the end of this lecture you should be able to explain: ... (CIM), Multiple Interval Mapping (MIM) and under which experimental

QTL analysis can be classified by the type of progeny used

• All of the different progenies are derived from the same reference population

MMQQ xP1

mmqq

M = marker genotypeQ = QTL genotype

• From this reference population different progenies can be produced

P2

selfx 5

F7 (RILs) x P4

TC3

MmQq F1

TC1 x P4

x P3

TC2

self

F2 x P3

TC4self SI

lines

Page 8: CH927 Quantitative Genomics Lecture 2 How can … · By the end of this lecture you should be able to explain: ... (CIM), Multiple Interval Mapping (MIM) and under which experimental

Backcrosses and Near Isogenic Lines (NILs)

P2

BC1 (Backcross1) F1: use for QTL mapping

x

F2

BC2 F1

BC2 F2

BC2 F3

self

x

BC1 F2

BC1 F3

Near Isogenic Lines

Isolate part of genome A of interest

Rapid generation of material for QTL analysis

Page 9: CH927 Quantitative Genomics Lecture 2 How can … · By the end of this lecture you should be able to explain: ... (CIM), Multiple Interval Mapping (MIM) and under which experimental

To map a quantitative trait:

1. Make a cross and generate marker data - Type of mapping population (e.g. RIL)

3. Collect phenotypic measurements - Evaluate in uniform environment, - Evaluate in multiple environments - Data transformation (approach normal distribution)

2. Generate linkage maps - Genome size, genome coverage

Total variance = VT = VG + VE genetic variance + environmental variance

heterogeneous env. stochastic events measurement errorfre

quen

cy

trait value

A2/A2

A1/A2

A1/A1

Assumes genes act additively (i.e. no epistasis) and that their effects are not conditional on environment, otherwise VT = VG + VE + VGxG + VGxE

Page 10: CH927 Quantitative Genomics Lecture 2 How can … · By the end of this lecture you should be able to explain: ... (CIM), Multiple Interval Mapping (MIM) and under which experimental

By the end of this lecture you should be able to explain:

• What the main steps in QTL mapping are

• What the different methods for QTL analysis are:

- Single marker, Interval/Flanking Mapping - Composite Interval Mapping (CIM), Multiple Interval Mapping (MIM)

and under which experimental conditions they should be used

• What the different statistical methods for QTL analysis are:

- t-test, ANOVA, multiple regression, linear regression and what they can predict about QTLs

Lecture objectives

Page 11: CH927 Quantitative Genomics Lecture 2 How can … · By the end of this lecture you should be able to explain: ... (CIM), Multiple Interval Mapping (MIM) and under which experimental

- Single marker tests (t-test, F-test or Linear Regression)

- Interval/Flanking Mapping (IM) (pair of markers simultaneously)

- Composite Interval Mapping (CIM) (analysis of a marker interval, flanked by adjacent markers, ML-based)

- Multiple Interval Mapping (MIM)

4. The statistical machinery for QTL mapping

Several analysis frameworks for marker-QTL associations:

Page 12: CH927 Quantitative Genomics Lecture 2 How can … · By the end of this lecture you should be able to explain: ... (CIM), Multiple Interval Mapping (MIM) and under which experimental

Four main analysis techniques:

4. The statistical machinery for QTL mapping

ANOVA (marker regression): detects marker differences when there are more than two marker genotypes. Produces a ranking of genotypes, in order of phenotypic effect for the trait of interest, and tests for significant differences between each genotype

Simple t-test: use to evaluate presence of a QTL through statistical differences between two marker genotypes

Linear regression: most complex point analysis method, allowing different characteristics of the QTL to be investigated. Including:

dominance effects, additive effects genotype-environment interactions, epistasis

Multiple regression: simple remodelling of the ANOVA technique in regression terms, with the same ranking and testing for differences

Page 13: CH927 Quantitative Genomics Lecture 2 How can … · By the end of this lecture you should be able to explain: ... (CIM), Multiple Interval Mapping (MIM) and under which experimental

Probabilites andt-tests

Page 14: CH927 Quantitative Genomics Lecture 2 How can … · By the end of this lecture you should be able to explain: ... (CIM), Multiple Interval Mapping (MIM) and under which experimental

Basic mapping format: conditional probablities

• The conditional probibility that the QTL genotype is Qq, given that the marker genotype is Mm:

Pr(Qk | Mj) = Pr(QkMj)Pr(Mj)

• Calculate this in an F2 from:gamete frequenciesmarker genotype probabilities

• Consider a QTL linked to a marker (recombination Fraction = c)

• In the F2, freq(MQ) = freq(mq) = (1-c)/2 freq(mQ) = freq(Mq) = c/2

MMQQ

xP1

mmqq

MmQq

F2

F1

P2

self

QTL genotypes = missing

Marker genotypes = observed

Page 15: CH927 Quantitative Genomics Lecture 2 How can … · By the end of this lecture you should be able to explain: ... (CIM), Multiple Interval Mapping (MIM) and under which experimental

• Since Pr(MM) = 1/4, the conditional probabilities become:

Pr(QQ | MM) = Pr(MMQQ)/Pr(MM) = (1-c)2

Pr(Qq | MM) = Pr(MMQq)/Pr(MM) = 2c(1-c)

Pr(qq | MM) = Pr(MMqq)/Pr(MM) = c2

Basic mapping format: conditional probablities

• Hence,

Pr(MMQq) = 2Pr(MQ)Pr(Mq) = 2c(1-c) /4

Pr(MMqq) = Pr(Mq)Pr(Mq) = c2 /4

Pr(MMQQ) = Pr(MQ)Pr(MQ) = (1-c)2/4

• In the F2, freq(MQ) = freq(mq) = (1-c)/2 freq(mQ) = freq(Mq) = c/2

Page 16: CH927 Quantitative Genomics Lecture 2 How can … · By the end of this lecture you should be able to explain: ... (CIM), Multiple Interval Mapping (MIM) and under which experimental

Using a t-test to probe a QTL

• e.g. backcross with two genes: marker (alleles M, m), and QTL (alleles Q, q) • These two genes are linked with the recombination fraction of c

• Mean of marker genotype Mm:

m1= (1-c)/2(m+a) + c/2m = m + (1-c)a

• Mean of marker genotype mm:

m0= c/2(m+a) + (1-c)/2m = m + ca

MmQq Mmqq mmQq mmqq

Frequency (1-c)/2 c/2 c/2 (1-c)/2

Mean effect m+a m m+a m

If trait mean is significantly different forthe genotypes at a marker locus,

it is linked to a QTL

small effecttight linkage

large effectloose linkage

• A small MM-mm difference:

Page 17: CH927 Quantitative Genomics Lecture 2 How can … · By the end of this lecture you should be able to explain: ... (CIM), Multiple Interval Mapping (MIM) and under which experimental

ANOVAand

single marker regression

Page 18: CH927 Quantitative Genomics Lecture 2 How can … · By the end of this lecture you should be able to explain: ... (CIM), Multiple Interval Mapping (MIM) and under which experimental

• Partition variance: genetically-determined and environmental components

• Model (there is a QTL linked to a marker) is tested against the null hypothesis of no QTL

trait

valu

e

A1/A1 A1/A2 A2/A2

genotype

Partitioning of variance: a simple ANOVA model

Page 19: CH927 Quantitative Genomics Lecture 2 How can … · By the end of this lecture you should be able to explain: ... (CIM), Multiple Interval Mapping (MIM) and under which experimental

Grandmean

• Total sum of squares: calculate grand mean, deviation of each individual from mean SST square each deviation & sum all the deviations for the population

trait

valu

e

A1/A1 A1/A2 A2/A2

Partitioning of variance methodology

SST degrees of freedom = n-1

= total variance• Total mean sum, MST =

n=23

Page 20: CH927 Quantitative Genomics Lecture 2 How can … · By the end of this lecture you should be able to explain: ... (CIM), Multiple Interval Mapping (MIM) and under which experimental

Grandmean

trait

valu

e

A1/A1 A1/A2 A2/A2

• Calculate mean for each genotype group• SSR = residual sum of squares = sum (deviations of each individual from genotype mean)2

Partitioning of variance: fitting the model

SSR degrees of freedom= (n-1) - #genotypes)

= variance not explained by the model (or explained by this QTL)

• Total mean sum, MSR =

Page 21: CH927 Quantitative Genomics Lecture 2 How can … · By the end of this lecture you should be able to explain: ... (CIM), Multiple Interval Mapping (MIM) and under which experimental

Genetic variance and testing the model

• Model sum of squares, SSM = sum values for each genotype:

SSM degrees of freedom = 2

• Genetic variance, MSM =

(grand mean - each genotype mean)2 x (# individuals with that genotype)

• But since MST = MSM + MSR

• It is easier to calculate as MSM = MST - MSR

Page 22: CH927 Quantitative Genomics Lecture 2 How can … · By the end of this lecture you should be able to explain: ... (CIM), Multiple Interval Mapping (MIM) and under which experimental

Genetic variance and testing the model

• To test whether the QTL explains a significant amount of the variation, calculate

• Look up the minimum value of F that is unlikely to have occurred by chance, given 2 d.f. for MSM and 20 for MSR (F ≥ 3.49 for p ≤ 0.05 in this case)

• If F exceeds this value, we can reject the null hypothesis of no QTL

Model to residual variance, F-ratio = MSM / MSR

Variance explained by the QTL = MSM / MST

MSM = MST - MSR

Page 23: CH927 Quantitative Genomics Lecture 2 How can … · By the end of this lecture you should be able to explain: ... (CIM), Multiple Interval Mapping (MIM) and under which experimental

• Incorporate terms into the model to estimate:

The additive effect of the alleles, a = half the difference between the averages for the two homozygotes can be positive or negative, depending on which allele is being considered

The dominance deviation, d = the average difference between hets and the mid-point of the homs can also be positive or negative

This is essentially a least-squares regression

If d > ±aone allele showsover-dominance

If d = ±a one allele

completely dominant

Page 24: CH927 Quantitative Genomics Lecture 2 How can … · By the end of this lecture you should be able to explain: ... (CIM), Multiple Interval Mapping (MIM) and under which experimental

Estimation of additive and dominance effects

Additive effects (a):

(m1–m0)/2 = a(1-2c) = a*

• a* = estimated additive effects

• d* = estimated dominance effects

Dominance effects (d):

m2 - (m1–m0)/2 = d(1-2c) = d* (m1–m0)/2

MmQq Mmqq mmQq mmqq

Frequency (1-c)/2 c/2 c/2 (1-c)/2

Mean effect m+a m m+a m

• Mean of marker genotype Mm: m1

• Mean of marker genotype mm: m0

Page 25: CH927 Quantitative Genomics Lecture 2 How can … · By the end of this lecture you should be able to explain: ... (CIM), Multiple Interval Mapping (MIM) and under which experimental

Linear Models for QTL Detection

• Detection: a QTL is linked to the marker if at least one of the bm is significantly different from zero

• Estimation (QTL effect and position): have to relate the bm to the QTL effects and map position

y mk = π + b m + emkEffect of marker genotype

m on trait value

Value of trait in kth individual of marker genotype m

• Uses the linear relationship between the apparent affects of a marker on a quantitative character, and the substantial effects of all related QTLs that are linked to that marker

• Differences in the distance between the QTL and the markers alter factors in this relationship

Page 26: CH927 Quantitative Genomics Lecture 2 How can … · By the end of this lecture you should be able to explain: ... (CIM), Multiple Interval Mapping (MIM) and under which experimental

Detecting epistasis

• One major advantage of linear models is their flexibility• Test for epistasis between two QTLs: use an ANOVA with an interaction term:

y = π + ai + bk + dik + e

Effect from marker genotype at first

marker set(can be > 1 loci)

Effect from marker genotype at second

marker set

Interaction between marker genotypes i in 1stmarker set and k in 2nd

marker set

• At least one of the ai significantly different from 0QTL linked to first marker set

• At least one of the bk significantly different from 0QTL linked to second marker set

• At least one of the dik significantly different from 0interactions between QTL in sets 1 and two

Page 27: CH927 Quantitative Genomics Lecture 2 How can … · By the end of this lecture you should be able to explain: ... (CIM), Multiple Interval Mapping (MIM) and under which experimental

Interval mappingand

marker regression

Page 28: CH927 Quantitative Genomics Lecture 2 How can … · By the end of this lecture you should be able to explain: ... (CIM), Multiple Interval Mapping (MIM) and under which experimental

• If marker density is high, ANOVA with individual marker genotypes is effective: “single marker analysis” or “single marker regression”

Problems with single marker mapping using ANOVA

Three important weaknesses:

• Do not receive separate estimates of QTL location and QTL effect.

• Must discard individuals whose genotypes are missing at the marker

• When markers are sparse, the QTL may be quite far from all markers, and so the power for QTL detection will decrease

Page 29: CH927 Quantitative Genomics Lecture 2 How can … · By the end of this lecture you should be able to explain: ... (CIM), Multiple Interval Mapping (MIM) and under which experimental

Interval mapping

• Can use probability estimates for the genotypes in intervals between markers

• Move the QTL position every 2cM from M1 to M2 and draw the profile of the F value. The peak of the profile corresponds to the best estimate of the QTL position

M1 M2 M3 M4 M5

F-va

lue

Testing position

Page 30: CH927 Quantitative Genomics Lecture 2 How can … · By the end of this lecture you should be able to explain: ... (CIM), Multiple Interval Mapping (MIM) and under which experimental

Interval mapping implementation

0

3.7500

7.5000

11.2500

15.0000

Interval mapping by regression(QTL Express)

F-ratio**

* ***

** **0

• Carry out a QTL scan step-wise: once a significant QTL has been identified,

other markers tested for their ability to explain the residual variation

• Known QTL are said to be “fixed” or “co-factors” in the regression

Page 31: CH927 Quantitative Genomics Lecture 2 How can … · By the end of this lecture you should be able to explain: ... (CIM), Multiple Interval Mapping (MIM) and under which experimental

Interval mapping with regression approach

• Consider a marker interval M1-M2. We assume that a QTL is located at a particular position between the two markers (r1 and θ are fixed)

• With response variable, yi, and dependent variable, xi, a regression model is constructed:

• yi is the overall mean• x*i is the indicator variable for QTL genotypes: x*i = 1 for Qq; 0 for qq • a* is the additive effect effect of the putative QTL on the trait• ei is the residual error, ei ~ N(0, σ2)

yi = μ + a*xi + ei i = 1, …, n (latent model)

• The phenotypic value for individual i affected by a QTL can be expressed as,

Page 32: CH927 Quantitative Genomics Lecture 2 How can … · By the end of this lecture you should be able to explain: ... (CIM), Multiple Interval Mapping (MIM) and under which experimental

• Advantages:

- the position of the QTL can be inferred by a support interval

- the estimated position and effects of the QTL tend to be asymptotically unbiased if there is only one segregating QTL on a chromosome

- method requires fewer individuals

Advantages and disadvantages of interval mapping

• Disadvantages:

- this is not an interval test

- even when there is no QTL within an interval, the likelihood profile on the interval can still exceed the threshold if there is a QTL nearby

- if there is more than one QTL on a chromosome, the test statistic at the position being tested will be affected by all QTL and the estimated positions - not efficient to use only two markers at a time for testing

Page 33: CH927 Quantitative Genomics Lecture 2 How can … · By the end of this lecture you should be able to explain: ... (CIM), Multiple Interval Mapping (MIM) and under which experimental

Flanking methodsand

Maximum likelihood

Page 34: CH927 Quantitative Genomics Lecture 2 How can … · By the end of this lecture you should be able to explain: ... (CIM), Multiple Interval Mapping (MIM) and under which experimental

Flanking marker methods have been the most popular analysis techniques over recent years

• Due to their accuracy and level of characterisation of the putative QTL- combine both detection and estimation of QTL effects and position

• Two basic techniques:Maximum likelihoodMaximum likelihood estimation through regression

• Three methods for estimating likelihood:Single marker maximum likelihood (least power)Flanking marker maximum likelihood (most versatile)Order restricted interval mapping (most power)

Page 35: CH927 Quantitative Genomics Lecture 2 How can … · By the end of this lecture you should be able to explain: ... (CIM), Multiple Interval Mapping (MIM) and under which experimental

Estimating the QTL position (θ): Likelihood maps

• View θ as a variable being estimated (derive log-likelihood equation for MLE of θ)

• (LO / LA ) = ratio of the likelihood of the null hypothesis (no QTL in the marker interval) to the likelihood of the alternative hypothesis (QTL present)

LOD (Log of the Odds) = log10 (LO / LA )

Estimated QTL location

Support intervalLO

D s

core

Chromosome position

8

7

6

5

4

3

2

1

0

Significancethreshold

In each method a likelihood map is

produced:

• View θ as a fixed parameter, assume the QTL is located at a particular position

Page 36: CH927 Quantitative Genomics Lecture 2 How can … · By the end of this lecture you should be able to explain: ... (CIM), Multiple Interval Mapping (MIM) and under which experimental

• Uses multiple markers as additional factors (marker cofactors)

Composite interval mapping (CIM)

i i+1 i+2i-1

Interval being mapped

Method:• Predict QTL marker genotype every x cM• Carry out an LR test for QTL effect every x cM• Combines MLE and multiple regression methods

• Five different types of markers are considered for the regression model, depending on the characteristics of the chromosome region: - markers surrounding the QTL of interest - linked & unlinked markers within the QTL region - linked & unlinked markers outside the QTL region

Page 37: CH927 Quantitative Genomics Lecture 2 How can … · By the end of this lecture you should be able to explain: ... (CIM), Multiple Interval Mapping (MIM) and under which experimental

Permutation testing to determine experiment-wide signficance thresholds

• Multiple testing problem: how often are random QTL effects of a certain magnitude detected in similar datasets?

• Method: - create a large number of ‘random empirical’ datasets

- take your marker data and randomly reassign the

phenotypes back to the marker genotypes

- repeat the QTL detection process

- record the highest LR produced for a ‘random QTL’

anywhere in the map

- repeat the whole process > 500 times

top 5% ofrandom

95% ofrandom - record the magnitude of the lowest ‘random QTL’ observed

in the top 5% of LR results = threshold

Page 38: CH927 Quantitative Genomics Lecture 2 How can … · By the end of this lecture you should be able to explain: ... (CIM), Multiple Interval Mapping (MIM) and under which experimental

Multiple interval mapping

Method:• Build regression models which include all QTLs (detected first by CIM)

• Use information content (IC) theory to evaluate alternative models

• Allows simultaneous detection and estimation of additive, dominance & epistatic effects

• Uses multiple marker intervals simultaneously

• Aims to map multiple QTLs in a single step

Page 39: CH927 Quantitative Genomics Lecture 2 How can … · By the end of this lecture you should be able to explain: ... (CIM), Multiple Interval Mapping (MIM) and under which experimental

Some examplesof the final output

Page 40: CH927 Quantitative Genomics Lecture 2 How can … · By the end of this lecture you should be able to explain: ... (CIM), Multiple Interval Mapping (MIM) and under which experimental

1 2 3 4 5 6 7 8 9

232 plant lines, 211 loci (189 SSR, 22 AFLP)

JoinMap2010 broccolilinkage map

Genetics and genomics of post harvest senescence in broccoliVicky Buchanan-Wollaston and Dave Pink (Warwick HRI)

Page 41: CH927 Quantitative Genomics Lecture 2 How can … · By the end of this lecture you should be able to explain: ... (CIM), Multiple Interval Mapping (MIM) and under which experimental

• Two major QTL for ‘time to yellowing’ confirmed on 2010 broccoli map

Chr 1

30.6% of variation

7 cM1.7 cM

MapQTLPermutation test 10,000 iterations

3.8 Lod p >0.01 22.4% of variation

3.8 Lodp >0.01

4.8 Lodp >0.001

Chr 9

• REML calculated: 64.4 % of line mean variation is genetic

QTLs for senescence traits in broccoli