a general modeling framework for studying candidate genes copy files from f:\edwin\example

A General Modeling Framework for Studying Candidate Genes

Copy files from f:\edwin\example

Why general modeling framework?

• Candidate genes for quantitative traits usually “main effect” on mean.

• Genetic advantage more extensive modeling framework– Some candidate genes may be more likely to be

detected• One reason is power e.g. (pleiotropic) easier to

detect in multivariate study• Some genes may not work in a simple “main effect”

fashion e.g. exert their effects in severely deprived environments only, or influence the sensitivity to environmental fluctuations (variance)

• Correct tests? e.g. different genotypic variances in selected samples

• Substantive advantage general modeling framework– More extensive picture genetic effects– Shed new light on traditional research questions

Continuity, change, and heterotypyComorbidity/pleiotropyComplex traits: Causal mechanisms involving multiple factors

– New issues: The interplay between genotypes and environment.

Vulnerability, resilience, and protective factorsRisk behavior and the construction of favorable environmentsSensitivity to environmental fluctuations

– Instrumental function due to unique properties

Requirements modeling framework

• Genetic effects on the means, variances, and relations between variables

• Stratification effects on all these components

• Nuclear families of various sizes

• Interpretable parameterization

• Di- and multi-allelic loci, marker haplotypes, multiple loci simultaneously, and parental genotypes

• Easy to fit in existing (Mx) software

LISREL based model

(s)jk(s)jk(s)(s)jk(s)jk(s)

y(s)yjk(s)y

jk(s)(s)yjk(s)

y subject variables

x family variables

Names, Symbols and Function of Model Matrices

Name Symbol Function

Subject (=y) variables

Structural part

Alpha jk Means

Beta jk Causal effects of subject variables on each other

Gamma jk Causal effects of family variables on subject variables

Psi yjk

diagonal Residual variances

off-diagonal Residual covariances

Measurement part

Nu yjk Intercepts or means indicators

Lambda yjk Factor loadings of indicators

Theta yjk

diagonal Variances errors of measurement

off-diagonal Covariances between errors of measurement

Covariances between y variables of subjects from same family

Family (=x) variables

Psi xk

diagonal Variances

off-diagonal Covariances

Nu xk Intercepts or means of indicators

Lambda xk Factor loadings of indicators

Theta xk

Alternative Models

• Conditional model

(s)jk(s)jk(s)(s)jk(s)xsjk(s)

y(s)jk(s)jk(s)(s)jk(s)xsjk(s)

• x-variables is independent subject plus family variables– relax assumption full multivariate normality

– curvi or non-linear effects x-variables• Disadvantage:

- Optimization,

- Measurement model x-variables

Other modeling frameworks

Partitioning parameter matrices

• Most matrices:

– a) general matrices that are not subscripted represent overall model in all genotype groups and population strata

– b) genetic matrices j represent deviations from the general model caused by locus effects

– c) matrices that are subscripted k and represent deviations from the general model caused by population stratification

• Example matrix Beta: Causal effects of subject variables on each other

jk(s) =j(gsI) k(fI)

• Main effects are in B that has dimension n n,

• Genetic effects in term j(gsI)

– The ng 1 vector gs contains ng dummy variables coding the genotype (haplotype) of subject s

• deviations from B thus maximum = #genotypes - 1

• sets of dummy variables to study multiple loci simultaneously or effects of parental genotypes

j = [ 1 | 2 |… | ng]

dimension is n (ng n),

• where 1 is the n n submatrix containing the effects of the first dummy variable, …etc.

A1A1 A1A2 A2A2

G1 1 0 -1G2 0 1 0

Example

(gsI) = =

I) = =0 0

21(1) 00 0

21(2) 0

1 00 10 00 0

21(1) 0

A1A1 subjects

(gsI) = =

I) = =0 0

21(1) 00 0

21(2) 0

0 00 01 00 1

21(2) 0

A1A2 subjects

(gsI) = =

I) = =0 0

21(1) 00 0

21(2) 0

-1 1 0

-1 0 0 -1 0 0 0 0

21(1) 0

A2A2 subjects

Stratification effects in termk(fI)

• The nf 1 vector f contains the nf dummy variables used to code family types– deviations thus maximum = #family types - 1

• k = [ 1 | 2 |… | nf]

dimension is n (nf n),– where 1 is the n n submatrix containing the effects

of the first dummy variable, …etc.

and I select proper matrix for dummy variable

F1 F2 F3 F4 F5

SubjectA

SubjectB

Not informative 2 2 1 0 0 0 0

of stratification 1 1 0 1 0 0 0

0 0 0 0 1 0 0

Informative 2 1 0 0 0 1 0

of stratification 2 0 0 0 0 0 1

1 0 0 0 0 0 0

Sibling pairs

ParentA

ParentB

SubjectA

F1 F2 F3 F4 F5

Not informative 2 2 2 1 0 0 0 0

of stratification 2 0 1 0 1 0 0 0

0 0 0 0 0 1 0 0

Informative 2 1 2 0 0 0 1 0

of stratification 1 0 0 0 1 0

1 1 2 0 0 0 0 1

1 0 0 0 0 1

0 0 0 0 0 1

1 0 1 0 0 0 0 0

0 0 0 0 0 0

Two Parents, one “child”

Family Types in a Sample of Singletons and Pairs of Siblings With or Without Genotyped Parentsa

Parent not genotyped Parent genotyped

One subject Two subjects One subject Two subjects

Family types not informative Subject 1 Subject 1 Subject 2 Parent 1 Parent 2 Subject 1 Parent 1 Parent 2 Subject 1 Subject 2

of stratification 2 2 2 2 2 2 2 2 2 2

1 1 1 2 0 1 2 0 1 1

0 0 0 0 0 0 0 0 0 0

Family types informative 2 1 2 1 2 2 1 2 2

of stratification 2 0 1 2 1

1 0 1 1 2 1 1

1 1 1 2 2

1 0 1 2 0

1 0 1 1

a The cells list the number of A1 alleles.

Subject (=y) variables, Structural part

with dimension is n 1, j is n ng, k = n nf

jk(s) =j(gs

I) k(fI)

with dimension is n n, j is n (ng n), k is n (nf n)

jk(s) =j(gs

I) k(fI)

Ik(fI)

Other matrices are partitioned in the same way

Subject (=y) variables, measurement part

yjk(s)

yyjgsy

with dimension y = ny 1, yj = ny ng, y

k = ny nf

yjk(s)

yyj(gs

Iyk(fI

with dimension y = ny n, yj = ny (ng n), y

k = ny (nf n)

yjk(s) yy

j(gsIy

yk(fIy

with dimension y ny ny, yj = ny (ng ny), y

k = ny (nf ny)

Covariance between subjects from same family:

k(s=A,s=B)

= (C + Ck(fIy

with dimension C = ny ny, Ck = ny (nf ny).

Family (=x) variables:

with dimension x is n n, xk is n (nf n)

with dimension x = nx 1, xk = nx nf

with dimension x = nx n, x k = nx (nf n)

with dimension x nx nx, xk = nx (nf nx)

General interpretation

• Genetic effects on:

– means are “main” effects

– relations between variables are interaction effects

– residuals are variance effects

1 (2 )

2 (1 )

1 (1 ) 2 (2 )

G 1 G 2

Genotype

Simple example

y = jgy

E( t) = y

= + + +

y1 = 1 + 1(1)G1 + 1(2)G2 + 12y2 + 1

y2 = 2 + 2(1)G1 + 2(2)G2 + 21y1 + 2

2(1) 2(2)

y = 11

Genetic effects on y1

-1(1) 1(2) 1(1)

A2A2 A1A2 A1A1

A1A1 A1A2 A2A2

G1 1 0 -1G2 0 1 0

Additive model

Genotype

Mediator model

Genotype

Reversed effect model

Genotype

Common gene model

1(1) 2(2)

Genotype

Interactions

y = jgyjy +

Applied to additive model:

y1 = 1 + 1

y2 = 2 + 2(1)G1 + 2(2)G2 + 21y1 + 21(1)y1G1 + 21(2)y1G2 + 2

21(1) 00 0

21(2) 0

21(1) > 0 and 21(2) = 0

21(1) and 21(2) >0

Estimation and specification in Mx

jk(s) y

jk(s)y

jk(s)I jk(s)

E((y(s) y

jk(s))(y(s) y

jk(s))t)yjk(s)

yjk(s)

I jk(s)

jk(s)x

jk(s)ty

jk(s))I jk(s) ty

jk(s)ty

E((x xk)(x x

k)t)xkx

E((y(s)

jk(s))(x xk)t)yx

jk(s)I jk(s)

xk(x)tk

Expected means and covariances single subject

Complete data vector zt = (xt,yt):

zttjk = [(x

k)t, (y

jk(s=1))t,…,(y

jk(s=ns))t]

jk(s=1)xy

jk(s=ns)

yxjk(s=1)

yjk(s=1)

yk(s=A,s=B)

yxjk(s=ns)

k(s=A,s=B)

yjk(s=ns)

k(s=A, s=B)

covariances between subjects from same family

E((z jk)(z jk)t)jk

Expected means and covariances whole family

NlnL(;zi) = lnLi

lnLi = { nzilog(2) + log (jk + (zi - (jk)t(jk-1(zi - (jk)}

Maximize log-likelihood function given the observed data by Raw Maximum likelihood

Minus two times the difference between the log likelihoods of two nested models is chi-square distributed with the difference in estimated parameters as the degrees of freedom.

where the individual log-likelihoods equal

Specification

– Most instances selection of matrices

– Dimension matrices > boring, errors

– Get started

Therefore simple program– Batch or questions

MxScript

• Data structure– Number of (latent) subject variables?– Number of subjects in largest family?– Number of dummy variables for genotypes?

• Matrices to be used– Do the subject variables have causal effects on each other? BETA?– GENETIC: causal relations between subject variables? BETA?– STRATIFICATION: means of subject variables? ALPHA?

• File names– Name of file with your data? (DOS name)?– Name of the file for the Mx script? (DOS name)

Structure Mx script

• Most instances four groupsGroup Function Free parameters Starting values

1 General part yes yes

2 Genetic effects yes

3 Stratification effects yes

4 Fit model to data

Type from DOS-prompt: MxScript <ENTER>

Type from DOS-prompt: MxScript input.dat <ENTER>

Example

• Name data file: example.dat

• Sibling pairs, no parents

• Three genotype groups

• Family variables in data file (indicate that you want specify admixture effects)

• Starting values: sample drawn from multivariate distribution with means 0 and variances 1.5

BMDexercise

SpineDuration

Intensity

General part

Identification measurement model:

0 00 10 42

BMDexercise

SpineDuration

IntensityGenetic + Stratification effects

Common pathway?

Independent pathway?

Common pathway-Estimate model with genetic and stratification effects on means of second latent variable and test for significance of:

1. Genetic effects

2. Stratification effects

3. Genetic + stratification effect

Independent pathway- Estimate model with genetic and stratification effects on means of the indicators of the second latent variable and test for significance of:

1. Genetic effects

2. Stratification effects

3. Genetic + stratification effect

Free elements

a Full 2 1 Free [Matrices-End matrices section]

Free a 1 1 a 2 1 [After End matrices - free elements]

Free a 1 1 to a 2 1 [After End matrices - free range]

Subject (=y) variables

Structural part

Alpha jk Means

Beta jk Causal effects of subject variables on each other

Gamma jk Causal effects of family variables on subject variables

Psi yjk

diagonal Residual variances

off-diagonal Residual covariances

Measurement part

Nu yjk Intercepts or means indicators

Lambda yjk Factor loadings of indicators

Theta yjk

Covariances between y variables of subjects from same family

Solution

Copy files from f:\edwin\solution

a general modeling framework for studying candidate genes copy files from f:\edwin\example

Documents

molecular biology lecture 10 chapter 5 molecular tools for...

chapt. 5 molecular tools for studying genes and gene...

biotechnology methods producing recombinant dnaproducing...

chap. 5. molecular tools for studying genes and gene...

edwin morgan. edwin morgan modern ‘makar’ edwin...

dharmacon solutions for studying gene function ·...

utsouthwestern.edu · web view2019-11-20 · your tissue...

guide to the edwin j. beinecke collection of robert louis...

edwin josé vargas moreno edwin sebastián barrera...

met world€¦ · well done grade 7 and good luck with ela...

using the genome studying expression of all genes...

human drosophila c. elegans ~ 24,000 genes ~ 13,000 genes ~...

sentinel - unc gillings school of global public...

on testing the significance of sets of genes · on testing...

studying genomes. studying dna enzymes for dna manipulation...

from gene to protein metabolism teaches us about genes...

forward genetics letting the genome tell you what genes are...

edwin acosta

genes and how they work chapter 15. 2 the nature of genes...

the environmental resistome - bsac.org.uk · all...