sampling based approximation of confidence intervals for functions of genetic covariance matrices
TRANSCRIPT
Sampling based appr©ximation of
confidence intervals for functions of
genetic covariance matrices
Karin Meyer 1 David Houle 2
1Animal Genetics and Breeding Unit, University of New England, Armidale NSW 2351
2Department of Biological Science, Florida State University, Tallahassee, FL 32306-4295
AAABG 2013
Sampling standard errors | Introduction
REML sampling variances
REML estimates of covariance components
– multivariate normal distribution: θ̂θθ ∼ N (θθθ, I(θθθ)−1)
– inverse of information matrix −→ sampling errors– large sample theory; asymptotic lower bounds
Linear functions of estimates
– sampling variances readily obtained
Non-linear functions
– obtain 1st order Taylor series expansion– evaluate sampling variance of linear approximation– needs partial derivatives w.r.t. all variables−→ can be complicated / tedious−→ options for evaluating in REML software limited
Confidence intervals: ±zα s.e.
– misleading at boundary of parameter space?
K. M. | 2 / 12
“Delta method”
Sampling standard errors | Introduction
Alternatives
Dealing with boundary conditions
– Derive confidence intervals from profile likelihood– Bayesian estimation
General procedure
– Sample data, repeat analysis −→ distribution over reps– slow & laborious!
Objectives
1 Propose new scheme
– sample from (theoretical) distribution of estimates– simple & fast
2 Examine quality of approximation of sampling errors
K. M. | 3 / 12
Sampling standard errors | Introduction
Alternatives
Dealing with boundary conditions
– Derive confidence intervals from profile likelihood– Bayesian estimation
General procedure
– Sample data, repeat analysis −→ distribution over reps– slow & laborious!
Objectives
1 Propose new scheme
– sample from (theoretical) distribution of estimates– simple & fast
2 Examine quality of approximation of sampling errors
K. M. | 3 / 12
Sampling standard errors | Method
Sampling scheme
Large sample theory– (RE)ML estimates have MVN distribution– Sampling covariance ∝ inverse of information matrix
Sample from this distribution
θ̃θθ ∼ N
�
θ̂θθ, H(θ̂θθ)−1
�
Information matrix
– Use same parameterisation as REML analysis,→ eliminate linear approx., account for constraints
– Evaluate function(s) of interest for θ̃θθ– Examine distribution over replicates
Mandel, M. (2013) Simulation-based confidence intervals forfunctions with complicated derivatives. American Statistician67, 76–81.
K. M. | 4 / 12
Sampling standard errors | Method
Sampling scheme
Large sample theory– (RE)ML estimates have MVN distribution– Sampling covariance ∝ inverse of information matrix
Sample from this distribution
θ̃θθ ∼ N
�
θ̂θθ, H(θ̂θθ)−1
�
Information matrix
– Use same parameterisation as REML analysis,→ eliminate linear approx., account for constraints
– Evaluate function(s) of interest for θ̃θθ– Examine distribution over replicates
Mandel, M. (2013) Simulation-based confidence intervals forfunctions with complicated derivatives. American Statistician67, 76–81.
K. M. | 4 / 12
Sampling standard errors | Simulation
Does it work?
Simulate two data sets
– 4000 animals, 6 traits– h2 = 2× (0.2,0.3,0.4)
– σ2P
= 100– rE = 0.3– a) rG = 0.5, b) rG = |0.7||i−j|
REML analysis
– AI algorithm– Cholesky factor
Estimates
– θ̂θθ– H(θ̂θθ)
Compare estimates of sampling variances
REML Based on H(θ̂θθ), “Delta” method
Empirical Re-sample data using estimates as popul.values, repeat analysis; 10000 replicates
Approx. Sample from MVN distribution, N(θ̂θθ,H(θ̂θθ)−1)
200000 replicates
K. M. | 5 / 12
Sampling standard errors | Results
Sampling covariances for Σ̂ΣΣG - a∗
Empirical vs. REML Approximate vs. REML Approximate vs. Empirical
●●
●
●●●
●
●●
●
●
●
●
●
●●
●●
●●●●●●●●●●●
●
●●
●
●
●●●
●
●●
●●●
●
●●
●
●
●●
●
●●
●
●●●
●
●●
●
●●
●
●
●
●
●
●●●●
●
●●
●
●
●
●
●●
●
●
●●
●●
●●●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●●●●
●
●●
●●
●
●●●●●●●●●●
●
●
●●
●
●
●●●
●
●
●●●
●●●
●●●
●
●●
●●
●
●●●
●
●●
●
●
●●
●●●
●
●●
●●
●
●●●
●
●●
●
●
●
●
●
●
●●●
REML
●●
●
●●●
●
●●
●
●
●
●
●
●●
●●
●●●●
●●●●●●●
●
●●
●
●
●●●
●●●●●●
●
●●
●
●
●●
●
●●
●
●●●
●
●●
●
●●
●
●
●
●
●
●●●●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●●●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●●●●
●
●●
●●
●
●●●●
●●●●●●
●
●
●●
●
●
●●●
●
●
●
●●
●●●
●●●
●
●●
●●
●
●●●
●
●●
●
●
●●
●●●
●
●●●●
●
●●●
●
●●
●
●
●
●
●
●
●●●
REML
●●
●
●●●
●
●●
●
●
●
●
●
●●
●●
●●●●
●●●●●●●
●
●●
●
●
●●●
●●●●●●
●
●●
●
●
●●
●
●●
●
●●●
●
●●
●
●●
●
●
●
●
●
●●●●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●●●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●●●●
●
●●●●
●
●●●●
●●●●●●
●
●
●●
●
●
●●●
●
●
●
●●
●●●
●●●
●
●●
●●
●
●●●
●
●●
●
●
●●
●●●
●
●●●●
●
●●●
●
●●
●
●
●
●
●
●
●●●
Empirical0
5
10
15
0 5 10 15 0 5 10 15 0 5 10 15
6 traits, 21 (co)variance components, 231 sampling (co)variances� variance, ◦ covariance
∗Case a: all genetic eigenvalues > 0
K. M. | 6 / 12
Sampling standard errors | Results
Sampling covariances for Σ̂ΣΣG - b†
Rank 6 Rank 5
●●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●●●
●●●
●●
●
●
●
●
●
●
●
●●●
●●●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●●
●●
●●●●
●●
●
●●
●
●●
●
●
●●
●
●●
●●●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●●●●
●
●●●
●
●●
●
●●
●
●
●●●●
●●
●
●●●
●
●●
●
●
●
●
●
●●
●
●●●
●●
●●●
●●●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●●
●
●●
●●
●●
●●●
●
●●
●
●●
●
●●
●
●
●●
●
●●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●●
●
●●●●
●
●●●
●
●●
●
●●
●
●
●●●●
0
5
10
15
0 5 10 15 0 5 10 15Empirical
Appro
xim
ate
Approximation unreliable if model is over-parameterised
†Case b: one genetic eigenvalue ≈ 0
K. M. | 7 / 12
Sampling standard errors | Results
Delta method for r̂ijEstimate elements of Cholesky L factor of ΣΣΣ = LL′
– H(θ̂θθ)−1 gives Cov(̂lij, l̂mn)
– covariances between σij
Cov(σ̂ij, σ̂kl) ≈f(i,j)∑
t=1
f(k,m)∑
s=1
�
l̂jt̂lms Cov�
l̂it, l̂ks�
+ l̂jt̂lks Cov�
l̂it, l̂ms
�
+ l̂it̂lms Cov�
l̂jt, l̂ks�
+ l̂it̂lks Cov�
l̂jt, l̂ms
�
�
For r̂ij = σ̂ij/Ç
σ̂2iσ̂2j
Var(r̂ij) ≈�
4σ̂4iσ̂4j
Var(σ̂ij) + σ̂2ijσ̂4j
Var(σ̂2i) + σ̂2
ijσ̂4i
Var(σ̂2j)
− 4σ̂ijσ̂2iσ̂4j
Cov(σ̂ij, σ̂2i)− 4σ̂ijσ̂
4iσ̂2j
Cov(σ̂ij, σ̂2j)
+ 2σ̂2ijσ̂2iσ̂2j
Cov(σ̂2i, σ̂2
j)
�
/�
4σ̂6iσ̂6j
�
K. M. | 8 / 12
Sampling standard errors | Results
Approximation for r̂ij
Let ΣΣΣ = LL′ and θθθ = vech(L)
For many replicates
– Sample θ̃θθ ∼ N(θ̂θθ,H(θ̂θθ)−1)
– Construct L̃ from θ̃θθ– Calculate Σ̃ΣΣ = L̃L̃
′
– Calculate correlation r̃ij = σ̃ij/Ç
σ̃2iσ̃2j
Evaluate Var(r̂ij) as emprical variance of r̃ij acrossreplicates
K. M. | 9 / 12
Sampling standard errors | Results
Distribution of r̂G12 - bEmpirical
0.5 0.6 0.7 0.8 0.9 1.0Correlation
Approximate
0.5 0.6 0.7 0.8 0.9 1.0Correlation
REML Empirical Approxim.r̂G12 0.897 0.873 0.866s.e. 0.059 0.066 0.063
K. M. | 10 / 12
Sampling standard errors | Results
Distribution of second eigenvalueEmpirical
20 30 40Eigenvalue
Approximate
20 30 40Eigenvalue
REML Empirical Approxim.
λ̂2 32.93 33.25 33.84s.e. – 3.27 3.30
K. M. | 11 / 12
Sampling standard errors | Results | Conclusions
Conclusions
Sampling from MVN distribution
– accommodates arbitrary functions– yields good approximation of sampling variances– easier than Delta method for complicated derivatives– more appropriate confidence interval at boundary of
parameter space– but:
−→ relies on large sample theory−→ information matrix needs to be safely p.d.−→ assumes θ̂θθ ≈ θθθ
Simple but useful addition to our toolkit
– implemented in WOMBAT
K. M. | 12 / 12