breedr - an open statistical package to analyse genetic ... conference... ·...

41
breedR An open statistical package to analyse genetic data (WP6) http://famuvie.github.io/breedR/ Facundo Muñoz 4th–6th april 2016 Outline 1. Introduction 2. Additive-genetic effects 3. Environmental effects 4. Competition effects 5. Longitudinal data 6. Genomic data 7. Multi-environment trials 8. Multi-trait models 9. Simulation framework 10. Remote computing

Upload: others

Post on 31-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: breedR - An open statistical package to analyse genetic ... conference... · FittingLinearMixedModels y=Xβ+ Zu+ ε u∼N(0,G) ε∼N(0,R) Aquantitativevariable y ismodelledasalinearfunctionof

breedRAn open statistical package to analysegenetic data (WP6)http://famuvie.github.io/breedR/

Facundo Muñoz4th–6th april 2016

Outline

1. Introduction2. Additive-genetic effects3. Environmental effects4. Competition effects5. Longitudinal data6. Genomic data7. Multi-environment trials8. Multi-trait models9. Simulation framework10. Remote computing

Page 2: breedR - An open statistical package to analyse genetic ... conference... · FittingLinearMixedModels y=Xβ+ Zu+ ε u∼N(0,G) ε∼N(0,R) Aquantitativevariable y ismodelledasalinearfunctionof

1 Introduction

What is breedR

R-package implementing statistical models specificallytailored to the analysis of forest genetic resourcesA inference tool for Linear Mixed Models, with facilities fortypical needsbreedR acts as an interface providing the means to:

1 Combine any number of prefabricated model componentsinto a larger model

2 Compute automatically incidence and covariance matricesfrom a few input parameters

3 Fit the model4 Plot data and results, and perform model diagnostics

Page 3: breedR - An open statistical package to analyse genetic ... conference... · FittingLinearMixedModels y=Xβ+ Zu+ ε u∼N(0,G) ε∼N(0,R) Aquantitativevariable y ismodelledasalinearfunctionof

Installation

2 alternatives:

Project web page http://famuvie.github.io/breedR/Set up this URL as a package repository in .Rprofile(detailed instructions on the web)install.packages('breedR')Not possible to use CRAN (or yes?) due to closed-sourceBLUPF90 programsStable version, with automatic updates

GitHub dev-site https://github.com/famuvie/breedRif( !require(devtools) )install.packages('devtools')devtools::install_github('famuvie/breedR')Development version, latest features, more inestable, manualupdates

Getting started

These slides show WHAT can be done with breedRFor HOW to perform these analyses, refer to the website:

http://famuvie.github.io/breedR/

Page 4: breedR - An open statistical package to analyse genetic ... conference... · FittingLinearMixedModels y=Xβ+ Zu+ ε u∼N(0,G) ε∼N(0,R) Aquantitativevariable y ismodelledasalinearfunctionof

Where to find help

Package’s help: help(package = breedR)Help pages ?remlf90Code demos demo(topic, package = 'breedR') (omittopic for a list)Vignettes vignette(package = 'breedR') (pkg and wiki)

Wiki pagesGuides, tutorials, FAQ

Mailing list http://groups.google.com/group/breedrQuestions and debates about usage and interface

Issues pageBug reportsFeature requests

License

breedR is FOSS. Licensed GPL-3RShowDoc('LICENSE', package = 'breedR')

You can use and distribute breedR for any purposeYou can modify it to suit your needs

we encourage to!please consider contributing your improvementsyou can distribute your modified version under the GPL

However, breedR makes (intensive) use of the BLUPF90 suiteof Fortran programs, which are for free but not free (rememberCRAN?)

Page 5: breedR - An open statistical package to analyse genetic ... conference... · FittingLinearMixedModels y=Xβ+ Zu+ ε u∼N(0,G) ε∼N(0,R) Aquantitativevariable y ismodelledasalinearfunctionof

Fitting Linear Mixed Models

y =Xβ + Zu+ ε

u ∼N (0,G)ε ∼N (0,R)

A quantitative variable y is modelled as a linear function offixed effects β and random effects u, with unaccountedresiduals εThe function remlf90() yields a REML fit of a model to adatasetAdditional functions (e.g. summary(), fixef(), ranef(),plot(), etc.) extract and present specific results

I/O example

ped <- globulus[,1:3]

res <- remlf90(fixed = phe_X ~ gg,genetic = list(

model = 'add_animal',pedig = ped,id = 'self'),

data = globulus)

summary(res)

## Linear Mixed Model with pedigree and spatial## effects fit by AI-REMLF90 ver. 1.122## Data: globulus## AIC BIC logLik## 5799 5809 -2898#### Variance components:## Estimated variances S.E.## genetic 3.397 1.595## Residual 14.453 1.529#### Estimate S.E.## Heritability 0.1887 0.08705#### Fixed effects:## value s.e.## gg.1 13.591 0.5014## gg.2 14.085 0.7984## ...

Page 6: breedR - An open statistical package to analyse genetic ... conference... · FittingLinearMixedModels y=Xβ+ Zu+ ε u∼N(0,G) ε∼N(0,R) Aquantitativevariable y ismodelledasalinearfunctionof

2 Additive-genetic effects

What is an additive genetic effect

Random effect at individual levelBased on a pedigree (determining the relationship matrix A)

Zu, u ∼ N (0, σ2aA)

BLUP of Breeding Values from own and relatives’ phenotypesRepresents the additive component of the genetic valueMore general:

family effect is a particular caseaccounts for more than one generationmixed relationships

More flexible: allows to select individuals within familiesMore accurate: direct inference over the additive-geneticvariance of the base population

Page 7: breedR - An open statistical package to analyse genetic ... conference... · FittingLinearMixedModels y=Xβ+ Zu+ ε u∼N(0,G) ε∼N(0,R) Aquantitativevariable y ismodelledasalinearfunctionof

Pedigrees

A 3-column data.frame ormatrix with the codes for eachindividual and its parentsA family effect is easily translatedinto a pedigree:

use the family code as theidentification of a fictitiousmotheruse 0 or NA as codes for theunknown fathers

self dad mum

69 0 6470 0 4171 0 5672 0 5573 0 2274 0 50

Predicted Breeding Values

0

25

50

75

−2 0 2PBV

Page 8: breedR - An open statistical package to analyse genetic ... conference... · FittingLinearMixedModels y=Xβ+ Zu+ ε u∼N(0,G) ε∼N(0,R) Aquantitativevariable y ismodelledasalinearfunctionof

Predicted Breeding Values by Family

−3

−2

−1

0

1

2

295544668750216119311214366318482453523351571045402722374703946265935423868176511416495449435620303262286058341561325166723Family

PB

V

3 Environmental effects

Page 9: breedR - An open statistical package to analyse genetic ... conference... · FittingLinearMixedModels y=Xβ+ Zu+ ε u∼N(0,G) ε∼N(0,R) Aquantitativevariable y ismodelledasalinearfunctionof

What is spatial autocorrelation

The residuals of any LMM must be noiseHowever, most times there are environmental factors thataffect the responseThis causes that observations that are close to each other tendto be more similar that observations that are far awayThis is called spatial autocorrelationIt may affect both the estimations and their accuracyThis is why experiments are randomized into spatial blocks

Diagnosing spatial autocorrelationresiduals spatial plot

You can plot() the spatialarrangement of variousmodel components(e.g. residuals)Look like independentgaussian observations(i.e. noise)?Do you see any signal in thebackground?

0

30

60

90

0 25 50 75x

y

−10

−5

0

5

z

Page 10: breedR - An open statistical package to analyse genetic ... conference... · FittingLinearMixedModels y=Xβ+ Zu+ ε u∼N(0,G) ε∼N(0,R) Aquantitativevariable y ismodelledasalinearfunctionof

Diagnosing spatial autocorrelationvariograms of residuals

Plot the variogram of residuals with variogram()

9

10

11

12

10 20 30 40distance

vario

gram

0

10

20

30

40

0 10 20 30 40x

y

9

10

11

12

13

row disp.

010

203040

col disp.

010

2030

400

5

10

010203040

−50 −25 0 25 50x

y10

12

14z

Interpreting the variogramsIsotropic variogram

γ(h) = 12V [Z(u)− Z(v)], dist(u,v) = h

The empirical isotropic variogram is built by aggregating all thepairs of points separated by h, no matter the direction.

9

10

11

12

10 20 30 40distance

vario

gram

Page 11: breedR - An open statistical package to analyse genetic ... conference... · FittingLinearMixedModels y=Xβ+ Zu+ ε u∼N(0,G) ε∼N(0,R) Aquantitativevariable y ismodelledasalinearfunctionof

Interpreting the variogramsRow/Column variogram

γ(x, y) = 12V [Z(u)− Z(v)], dist(u,v) = (x, y)

The empirical row/col variogram is built by aggregating all thepairs of points separated by exactly x rows and y columns.

0

10

20

30

40

0 10 20 30 40x

y

9

10

11

12

13

z

Interpreting the variogramsAnisotropic variogram

γ(x) = 12V [Z(u)− Z(v)], u = v± x

The empirical anisotropic variogram is built by aggregating all thepairs of points in the same direction separated by |x|.

010203040

−50 −25 0 25 50x

y

10

12

14z

Page 12: breedR - An open statistical package to analyse genetic ... conference... · FittingLinearMixedModels y=Xβ+ Zu+ ε u∼N(0,G) ε∼N(0,R) Aquantitativevariable y ismodelledasalinearfunctionof

Accounting for spatial autocorrelation

Include an explicit spatial effect in the modelI.e., a random effect with a specific covariance structure thatreflects the spatial relationship between individuals

Spatial modellingBlocks

Zu, u ∼ N (0, σ2sI)

u is the vector of random effects for the blocksZ is an indicator matrix such that Z[i, j] = 1 if the observationi belongs to block jσ2s is the spatial variance parameter

The block effect, is a very particular case of spatial effect:It is designed from the begining, possibly using prior knowledgeCan account for non-spatial effects (e.g. operator)Introduces independent effects between blocksMost neighbours are within the same block (i.e. share the sameeffect)

Page 13: breedR - An open statistical package to analyse genetic ... conference... · FittingLinearMixedModels y=Xβ+ Zu+ ε u∼N(0,G) ε∼N(0,R) Aquantitativevariable y ismodelledasalinearfunctionof

Spatial modellingBidimensional Splines

A cubic B-spline B(x):

Piecewise curve defined in the intervals determined by 5 knotsEach piece is a polynomial of 3rd degree

Spatial modellingBidimensional Splines

A cubic B-spline B(x) with regularly spaced knots:

The curve is constrained for C2 continuity at each knotOnly 1 degree of freedom controls the scale

Page 14: breedR - An open statistical package to analyse genetic ... conference... · FittingLinearMixedModels y=Xβ+ Zu+ ε u∼N(0,G) ε∼N(0,R) Aquantitativevariable y ismodelledasalinearfunctionof

Spatial modellingBidimensional Splines

A number of overlapping curves form a base of B-splines {Bj(x)}

Spatial modellingBidimensional Splines

Each, can be scaled using a coefficient {ujBj(x)}

Page 15: breedR - An open statistical package to analyse genetic ... conference... · FittingLinearMixedModels y=Xβ+ Zu+ ε u∼N(0,G) ε∼N(0,R) Aquantitativevariable y ismodelledasalinearfunctionof

Spatial modellingBidimensional Splines

And summed to a linear combination f(x) =∑j ujBj(x)

Spatial modellingBidimensional Splines in a Mixed Model

f(x) =∑j ujBj(x) provides a spline representation of a

wide family of curves, in terms of a vector of coefficients uFor any set of points x = {xi}, the vector of values f(xi) canbe written as a matrix operation f =

[Bj(xi)

]u

breedR extends this to two dimensions and defines a randomeffect

Bu, u ∼ N (0, σ2sRs)

u is the vector of spline effectsB is the matrix of spline bases evaluated at the observationsσ2

s is the spatial variance parameterRs imposes a fixed positive correlation between coefficients ofneighbouring spline bases

Page 16: breedR - An open statistical package to analyse genetic ... conference... · FittingLinearMixedModels y=Xβ+ Zu+ ε u∼N(0,G) ε∼N(0,R) Aquantitativevariable y ismodelledasalinearfunctionof

Spatial modellingNumber of knots of a splines model

The smoothness of the spatial surface can be controlledmodifying the number of base functionsThis is directly determined by the number of knots (nok) ineach dimensionIf not explicitly set, it is determined heuristically by breedR as afunction of the number of observations

4

6

8

10

0 250 500 750 1000x

nok

Spatial modellingBidimensional First-Order Autoregressive Process

An AR1(ρ) on the line is a collection of random variables {xi}where

xt = ρxt−1 + εt, εt ∼ N (0, 1), |ρ| < 1A few random simulations with ρ = 0.5:

Page 17: breedR - An open statistical package to analyse genetic ... conference... · FittingLinearMixedModels y=Xβ+ Zu+ ε u∼N(0,G) ε∼N(0,R) Aquantitativevariable y ismodelledasalinearfunctionof

Spatial modellingBidimensional First-Order Autoregressive Process

breedR extends this model to the plane using and defines acomponent

Zu, u ∼ N (0, σ2sRAR)

u is the vector of random effects for each individual locationon a regular gridZ is an indicator matrix such that Z[i, j] = 1 if theobservation i is at site jσ2s is the spatial variance parameterRAR defines a separable correlation structure based on thekronecker product of two AR1 processes

Spatial modellingAutoregressive parameters of a AR model

The smoothness of the AR effects can be controlled by theautoregressive parameters (ρx, ρy) in each dimensionThey can be given explicitlyOtherwise, breedR fits a model for each combination ofparameters in a default grid and returns the most likely

−1.0

−0.5

0.0

0.5

1.0

−1.0 −0.5 0.0 0.5 1.0rho_r

rho_

c

−2925

−2900

−2875

−2850

loglik

Page 18: breedR - An open statistical package to analyse genetic ... conference... · FittingLinearMixedModels y=Xβ+ Zu+ ε u∼N(0,G) ε∼N(0,R) Aquantitativevariable y ismodelledasalinearfunctionof

Predicted spatial effectsby spatial model

All capture a similar underlying environmental patternwith somewhat increasing ranges of variability

Model residualsby spatial model

The spatial variability is taken mostly from the model residualswhich increasingly look like pure noise

Page 19: breedR - An open statistical package to analyse genetic ... conference... · FittingLinearMixedModels y=Xβ+ Zu+ ε u∼N(0,G) ε∼N(0,R) Aquantitativevariable y ismodelledasalinearfunctionof

4 Competition effects

Competition model assumptions

Each individual have two(unknown) Breeding Values(BV):

direct BV affects itsown phenotype,competition BV affectsits neghbours’

The total effect of theneighbouring competitionBVs is given by theirdistance-weighted sum

Page 20: breedR - An open statistical package to analyse genetic ... conference... · FittingLinearMixedModels y=Xβ+ Zu+ ε u∼N(0,G) ε∼N(0,R) Aquantitativevariable y ismodelledasalinearfunctionof

Weighted neighbour competition effect

i kdk

Let ∂i be the set of neighbouringlocations of tree i, anduc = (uc,k)′ the vector ofcompetition BVs

ωi(α) =∑k∈∂i

zik(α)uc,k

where zik(α) ∝ 1/dαik, such that∑k∈∂i

zik(α)2 = 1.

This condition is variance-estabilizer ensuring ∀i:

Var(ωi) = Var(uc) = σ2c

The decay parameter

The decay parameter α controls the relative intensity ofcompetition of the neighbours

α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1α = 1

α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 2α = 20

1

2

3

4

0 1 2 3 4Relative distance

Rel

ativ

e w

eigh

t

The weights zik arescale-invariante.g. a tree twice as far isweighted 1/2α as muchhigher values of αconcentrate the weights onthe closest trees

Page 21: breedR - An open statistical package to analyse genetic ... conference... · FittingLinearMixedModels y=Xβ+ Zu+ ε u∼N(0,G) ε∼N(0,R) Aquantitativevariable y ismodelledasalinearfunctionof

Random-effect representation

Zdud + Zc(α)uc,(uduc

)∼ N

(0,Σa ⊗A

), Σa =

(σ2d σdc

σdc σ2c

)

Each set of BVs is modelled as a zero-mean random effectwith structure matrix given by the pedigree and independentvariances σ2

d and σ2c

Both random effects are modelled jointly with covariance σdcZd is an indicator matrix linking observations and individualsZc(α) weights the competition effect of the neighbours with(fixed) decay parameter α

Permanent Environmental Competition Effect

Zpup, Zp = Zc, u ∼ N (0, σ2pI)

Optional companion effect with environmental (rather thangenetic) basisModelled as an individual independent random effect thataffects neighbouring trees in the same (weighted) way

Page 22: breedR - An open statistical package to analyse genetic ... conference... · FittingLinearMixedModels y=Xβ+ Zu+ ε u∼N(0,G) ε∼N(0,R) Aquantitativevariable y ismodelledasalinearfunctionof

Diagnosis of competition

additive−genetic only competition

12000

13000

14000

10 20 30 40 50 10 20 30 40 50distance

vario

gram

Competition is often observable in the first lag of thevariogram of residuals

increased antagonism between neighbouring phenotypescorrelation between neighbours pulled towards neg. values

In addition, use model comparison criteria (AIC, BIC, loglik)

Breeding for growth and collaboration

−100

−50

0

50

100

−100 −50 0 50 100direct

com

petit

ion

direct and competitionBV are usually negativelycorrelatedselection based only ondirect genetic merit tends tofavour competitiveindividuals, hampering theglobal performancethe competition model allowsfor selection based on a jointassessment

Page 23: breedR - An open statistical package to analyse genetic ... conference... · FittingLinearMixedModels y=Xβ+ Zu+ ε u∼N(0,G) ε∼N(0,R) Aquantitativevariable y ismodelledasalinearfunctionof

5 Longitudinal data

What is longitudinal data

Measurements repeated in time or along some climatic orgeographical variable (e.g. temperature, precipitation,latitude, altitude, . . . )All model parameters (e.g. variances, random effects) can befunctions of the longitudinal variableIncreased complexity: from estimating numerical values(dimension 0) to estimating (infinte-dimensional) functions(with finite data)Strategies:

assume parametric shape (e.g. linear regression)nonparametric components (e.g. splines, Legendre polynomials)

Page 24: breedR - An open statistical package to analyse genetic ... conference... · FittingLinearMixedModels y=Xβ+ Zu+ ε u∼N(0,G) ε∼N(0,R) Aquantitativevariable y ismodelledasalinearfunctionof

Repeated measurements in timeMean response evolution by replicate - Larix dataset

4

6

8

10

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16Year

Mea

n rin

g le

ngth

There is a strong temporal trend that needs to be accountedfor

Climatic gradientMean ring-length by Mean Annual Precipitation

34.72 37.24 38.64

39.15 40.91 41.19

43.93 44.66 44.92

20

30

20

30

20

30

20 30 40 20 30 40 20 30 40

3

6

9

12LAS

Page 25: breedR - An open statistical package to analyse genetic ... conference... · FittingLinearMixedModels y=Xβ+ Zu+ ε u∼N(0,G) ε∼N(0,R) Aquantitativevariable y ismodelledasalinearfunctionof

Legendre Polynomials

Family of orthogonal polynomials dense in L2Any regular curve can be approximated as much as needed bytaking a linear combination of polynomials up to a sufficientlyhigh order

−2

−1

0

1

2

−1.0 −0.5 0.0 0.5 1.0x

y

order

0

1

2

3

Random-Regression model

For each observation of an individual i at j

yij =Xiβ + Σordk=0aikLk(j) + εij

(a′0, . . . , a′ord)′ ∼N (0,Σ⊗A)ε ∼N (0, σ2

e)

The Breeding Value of an individual is a function of anenvironmental variableThis function is parameterised as a linear combination ofLegendre orthogonal polynomials of order up to a fixed ordEach individual is described by ord + 1 correlated coefficients

Page 26: breedR - An open statistical package to analyse genetic ... conference... · FittingLinearMixedModels y=Xβ+ Zu+ ε u∼N(0,G) ε∼N(0,R) Aquantitativevariable y ismodelledasalinearfunctionof

Example of results

Functional Breeding Values for each individual

6 Genomic data

Page 27: breedR - An open statistical package to analyse genetic ... conference... · FittingLinearMixedModels y=Xβ+ Zu+ ε u∼N(0,G) ε∼N(0,R) Aquantitativevariable y ismodelledasalinearfunctionof

Using genomic markers

breedR allows random effects with an arbitrary covariancestructure (generic)This can be used to leverage genomic information (GBLUP)

The Generic model

This additional component allows to introduce a random effect ψwith arbitrary incidence and covariance matrices Z and Σ:

Zψ, ψ ∼ N(0, σ2

ψΣψ

)Applications:

include additional not-predefined components e.g. Dominance,Hybrid populations, Genomic evaluation, etc.

Page 28: breedR - An open statistical package to analyse genetic ... conference... · FittingLinearMixedModels y=Xβ+ Zu+ ε u∼N(0,G) ε∼N(0,R) Aquantitativevariable y ismodelledasalinearfunctionof

GBLUP

Zu, u ∼ N(0, σ2

GG)

Use markers to compute a relationship matrix G forindividuals

Several methods availablee.g. VanRaden et al. 2009

G = XX ′/∑

2p(1− p)

Replace the additive-genetic model, which uses thepedigree-based relationship matrix A with a generic modelwith a genomic relationship matrix GZ is an indicator matrix linking observations with individualsPredicts genetic value of individuals, not markersImproved accuracy wrt pedigree-pased evaluation

Relationship matricespedigree-based vs. genomic

Note the increased level of detail in the relationship structuure

Page 29: breedR - An open statistical package to analyse genetic ... conference... · FittingLinearMixedModels y=Xβ+ Zu+ ε u∼N(0,G) ε∼N(0,R) Aquantitativevariable y ismodelledasalinearfunctionof

7 Multi-environment trials

Douglas dataset

s1 s2 s3

Environments may refer to sites, but also to years or climates

Page 30: breedR - An open statistical package to analyse genetic ... conference... · FittingLinearMixedModels y=Xβ+ Zu+ ε u∼N(0,G) ε∼N(0,R) Aquantitativevariable y ismodelledasalinearfunctionof

Particularities

Fixed effects

some can be transversal accross environmentse.g. origin, provenance

other take different values at each environmente.g. the intercept, replicatemodelled as a fixed interaction

Random effects

some can be transversal accross environmentse.g. main effect of the genotype

other can be environment-specifice.g. blocks, GxE, residualsmodelled as a group of (correlated or not) random effects

Statistical model

y =X0β0 + Z0u0 + X∑e

1{e}βe + Z∑e

1{e}ue + ε

u0 ∼N (0,G0)(u1, . . .)′ ∼N (0,ΣG ⊗G)

ε ∼N (0,DR ⊗ I)

Particular case of the general LMMFor each environment e, there is a group of random effectsue, each with covariance structure G, possibly cross-correlatedthrough ΣGIndependent site-specific residual variances

Page 31: breedR - An open statistical package to analyse genetic ... conference... · FittingLinearMixedModels y=Xβ+ Zu+ ε u∼N(0,G) ε∼N(0,R) Aquantitativevariable y ismodelledasalinearfunctionof

Practical case 1GxE study

C13 =orig + site + fam +3∑e=1

fe1e + ε

fam ∼N(0, σ2

f I)

(f1, f2, f3)′ ∼N(0,ΣG×E ⊗ I

)ε ∼N

(0, D3 ⊗ I

)One global family effect (fam)One group of three site-specific family effects (fi, i = 1, 2, 3)Jointly, they represent the G× E interaction with geneticcross-covariation ΣG×E

Predicted genetic effectsby family

Sum of main and interaction effectsNote:

different variances per sitehigh genetic correlationsome families are more interactive

Page 32: breedR - An open statistical package to analyse genetic ... conference... · FittingLinearMixedModels y=Xβ+ Zu+ ε u∼N(0,G) ε∼N(0,R) Aquantitativevariable y ismodelledasalinearfunctionof

Genetic correlations

EcovalenceMeasure of interactivity of families

For each family x

ϕ(x) ∝∑i∈x

∑e

f2e

Page 33: breedR - An open statistical package to analyse genetic ... conference... · FittingLinearMixedModels y=Xβ+ Zu+ ε u∼N(0,G) ε∼N(0,R) Aquantitativevariable y ismodelledasalinearfunctionof

Practical case 2Site-specific spatial effects

8 Multi-trait models

Page 34: breedR - An open statistical package to analyse genetic ... conference... · FittingLinearMixedModels y=Xβ+ Zu+ ε u∼N(0,G) ε∼N(0,R) Aquantitativevariable y ismodelledasalinearfunctionof

Multivariate Linear Mixed Models2-trait case

Y1 = Xβ1 + Zu1 + ε1

Y2 = Xβ2 + Zu2 + ε2,

(u1, u2)′ ∼ N(0,Σu ⊗G)(ε1, ε2)′ ∼ N(0,Σ⊗ In).

Σu and Σ either diagonal or fully-parameterized 2× 2matricesSome of the fixed or random effects can affect only a subsetof the traits

e.g. fixed effect of operator

Limitationof breedR’s implementation

All fixed and random effects are assumed to be trait-specifictransversal effects not directly supported (ultimately byPROGSF90)

Simpler covariance structures not supportede.g. independent effects with shared variance, exchangeablestructure

A workaround is to reshape the dataset to long-layout

Page 35: breedR - An open statistical package to analyse genetic ... conference... · FittingLinearMixedModels y=Xβ+ Zu+ ε u∼N(0,G) ε∼N(0,R) Aquantitativevariable y ismodelledasalinearfunctionof

Multi-trait with reshapingwide to long-layout

Reshaping operation:Stack traits into a single variable valueAdditional variable traitDuplicate individual information and other variables

Use single-trait models with MET syntaxtrait instead of site

This overcomes the limitations breedR’s multi-traitimplementation

more complex models like multi-trait and multisite becomecumbersome

9 Simulation framework

Page 36: breedR - An open statistical package to analyse genetic ... conference... · FittingLinearMixedModels y=Xβ+ Zu+ ε u∼N(0,G) ε∼N(0,R) Aquantitativevariable y ismodelledasalinearfunctionof

breedR’s simulation functionSimulating data under competition

Simulate datasets of any size, from any most supported modelsSee ?simulation for details on the syntax

Source: local data frame [500 x 14]

self sire dam beta x y spatial BV1(dbl) (dbl) (dbl) (dbl) (int) (int) (dbl) (dbl)

1 41 14 40 1 1 19 -1.0738194 -1.0236542 42 18 33 1 18 22 1.6654315 2.4774883 43 11 31 1 19 19 0.7047348 1.9613054 44 5 38 1 1 23 0.4698724 1.2071125 45 16 24 1 7 12 -0.7233131 -1.9197426 46 9 38 1 3 5 -0.1197804 1.174054.. ... ... ... ... ... ... ... ...Variables not shown: BV2 (dbl), wnc (dbl), pec (dbl), wnp (dbl), resid

(dbl), phenotype (dbl)

Applicationsa.k.a. what the heck I want a simulator for?

check models under ideal scenariosBootstrapping:

compute heritability (and it s.e.) for complex modelscompute more accurate s.e. for fixed and random effectsinference on arbitrary hypotheses (involving any combination ofmodel parameters)

Page 37: breedR - An open statistical package to analyse genetic ... conference... · FittingLinearMixedModels y=Xβ+ Zu+ ε u∼N(0,G) ε∼N(0,R) Aquantitativevariable y ismodelledasalinearfunctionof

Heritability of complex models

e.g. competition or splines models fitted by EM-REML (rather thanAI)

Thus, heritability not availableOther methods (e.g. Delta) not feasibleEven when available, is approximate (relies on asymptoticnormality of parameters)

Example

1 Fit the model to your data2 Write a function to simulate data from your fitted model

parameters3 Write a function to fit a simulated dataset and return

realised heritability

Repeated calls to thisfunction yields the samplingdistribution of heritabilityCompute SE and CI fromnumerical summaries

0

20

40

60

0.00 0.25 0.50 0.75 1.00h2

coun

t

Page 38: breedR - An open statistical package to analyse genetic ... conference... · FittingLinearMixedModels y=Xβ+ Zu+ ε u∼N(0,G) ε∼N(0,R) Aquantitativevariable y ismodelledasalinearfunctionof

More accurate Standard Errors

SE in breedR’s output are approximateRely on asymptotic normality (same as heritability)Same Bootstrapping procedure applies

bl fam Residual

0

50

100

150

0 5 10 15 0 5 10 15 0 5 10 15value

coun

t

10 Remote computing

Page 39: breedR - An open statistical package to analyse genetic ... conference... · FittingLinearMixedModels y=Xβ+ Zu+ ε u∼N(0,G) ε∼N(0,R) Aquantitativevariable y ismodelledasalinearfunctionof

Remote computation

If you have access to a Linux server through SSH, you can performcomputations remotely

Take advantage of more memory or faster processorsParallelize jobsFree local resources while fitting modelsSee ?remote for details

Configuring a server

1 Windows users: install cygwin with ssh beforehand(http://cygwin.org/)

2 configure the client and server machines so that passwordlessSSH authentication works

3 Set breedR options remote.host, remote.user,remote.port and remote.bin (see ?breedR.setOption)

Optionally, set these options permanently in $HOME/.breedRrc

writeLines(c("remote.host = '123.45.678.999'",

"remote.user = 'uname'","remote.bin = 'remote/path/to/breedR/bin/linux'"),

con = file.path(Sys.getenv('HOME'), '.breedRrc'))

Page 40: breedR - An open statistical package to analyse genetic ... conference... · FittingLinearMixedModels y=Xβ+ Zu+ ε u∼N(0,G) ε∼N(0,R) Aquantitativevariable y ismodelledasalinearfunctionof

Fitting models remotely

res <- remlf90(..., breedR.bin = "remote")

Fit model remotelyR-console stays in stand-by until job is finishedWhen job finishes (provided that connection keeps alive),results are automatically retrieved

Identical in use to local computing, but without theprocessor/memory burden

Submitting jobs

res <- remlf90(..., breedR.bin = "submit")

Fit model remotelyConnection is closed in the meanwhileR-console is activeTyping res queries the server for the job status(Running/Finished/Aborted)

Page 41: breedR - An open statistical package to analyse genetic ... conference... · FittingLinearMixedModels y=Xβ+ Zu+ ε u∼N(0,G) ε∼N(0,R) Aquantitativevariable y ismodelledasalinearfunctionof

Parallel computing

After you submit a job, you are free to submit more (speciallywith multiple-processor servers)Query the status of all jobs with breedR.qstat()Kill some job with breedR.qdel(res) or all jobs withbreedR.qnuke()

breedRhttp://famuvie.github.io/breedR/