human chromosomes male xy x y female xx x xx xy daughter son
TRANSCRIPT
Human Chromosomes
Male Xy
X yFemaleXX X XX Xy Daughter Son
Gene, Allele, Genotype, Phenotype
Chromosomes from Father Mother
Gene A,with twoalleles Aand a
Genotype Phenotype
AA 185 100AA 182 104
Aa 175 103Aa 171 102
aa 155 101aa 152 103
Height IQ
Regression model for estimatingthe genotypic effect
Phenotype = Genotype + Error yi = xij + ei
xi is the indicator for QTL genotypej is the mean for genotype j
ei ~ N(0, 2)
The genotypes for the trait are not observable and should be predicted from linked neutral molecular markers (M)
Uniqueness for our genetic problem
M1
M2
M3
Mm
QTL
.
.
. Our task is to construct a statistical model that connects the QTL genotypes and marker genotypes through observed phenotypes
The genes that lead to the phenotypic variation are called Quantitative Trait Loci (QTL)
Subject
Marker (M) Genotype frequency
M1 M2 … Mm
Phenoty
pe (y)
QQ(2) Qq(1) qq(0)
¼ ½ ¼
1 AA(2) BB(2) … y1
¼ ½ ¼
2 AA(2) BB(2) ... y2
¼ ½ ¼ 3 Aa(1) Bb(1) ... y
3¼ ½ ¼
4 Aa(1) Bb(1) ... y4
¼ ½ ¼5 Aa(1) Bb(1) ... y
5¼ ½ ¼
6 Aa(1) bb(0) ... y6
¼ ½ ¼7 aa(0) Bb(1) ... y
7¼ ½ ¼
8 aa(0) bb(0) … y8
¼ ½ ¼
Data Structure
n = n22 + n21 + n20 + n12 + n00 + n02 + n01 + n00
Parents AA aa
F1 Aa
F2 AA Aa aa ¼ ½ ¼
Finite mixture model forestimating genotypic effects
yi ~ p(yi|,) = ¼ f2(yi) + ½ f1(yi) + ¼ f0(yi)QTL genotype (j) QQ Qq qq Code 2 1 0
fj(yi) is a normal distribution densitywith mean j and variance 2
= (2, 1, 0), = (2)
where
n
iiiiiii
n
iiii
n
iiii
n
iiii
n
iiii
yfyfyf
yfyfyf
yfyfyf
yfyfyf
yfyfyf
1|||
100|00|00|
121|21|21|
122|22|22|
141
21
41
)()()(
)()()(
...
)()()(
)()()(
)()()(
00
21
22
0011222
001122
001122
001122
012
j|i is the conditional (prior) probability of QTL genotype j (= 2, 1, 0) given marker genotypes for subject i (= 1, …, n).
Likelihoodfunctionbasedonthemixturemodel
L(, , |M, y)
QTL genotype frequency: j|i = gj(p)Mean: j = hj(m)Variance: = l(v)
We model the parameters contained within the mixture model using particular functions
p contains the population genetic parameters
q = (m, v) contains the quantitative genetic parameters
n
i jijijqp yfyL
1
2
0| )(log),|,(log MΘΘ
n
iij
qp
ij
ij
ij
n
iij
qp
ij
ijj ijij
ijij
n
ij ijij
ijij
j ijij
ij
qp
yf
yfyf
yf
yf
yf
yf
yf
yL
qp
ij
1
|
|
|
1
|
|
2
0 |
|
12
0 |
|
2
0 |
)(log1
)(log1
)(
)(
)(
)(
)(
)(
)|,(log
|
ΘΘ
ΘΘ
ΘΘ
ΘΘ
Log-LikelihoodFunction
The EM algorithm
2
0 |
||
)(
)(
j ijij
ijijij
yf
yf
M step 0)|,(
yL qp ΘΘ
E step
Iterations are made between the E and M steps until convergence
Calculate the posterior probability of QTL genotype j for individual i that carries a known marker genotype
Solve the log-likelihood equations
Three statistical issues
Modeling mixture proportions, i.e., genotype frequencies at a putative QTL
Modeling the mean vector
Modeling the (co)variance matrix
An innovative model for genetic dissection of complex traits by incorporating mathematical aspects of biological principles into a mapping framework
Functional Mapping
Provides a tool for cutting-edge research at the interplay between gene action and development
Developmental Pattern of Genetic Effects
Subject
Marker (M) Phenotype (y) Genotype frequency
1 2 … m
1 2 … T
QQ(2) Qq(1) qq(0)
¼ ½ ¼
1 2 2 … y1(1) y
1(2) … y
1(T) ¼ ½ ¼
2 2 2 ... y2(1) y
2(2) … y
2(T) ¼ ½ ¼
3 1 1 … y3(1) y
3(2) … y
3(T) ¼ ½ ¼
4 1 1 … y4(1) y
4(2) … y
4(T)
¼ ½ ¼
5 1 1 … y5(1) y
5(2) … y
5(T) ¼ ½ ¼
6 1 0 … y6(1) y
6(2) … y
6(T) ¼ ½ ¼
7 0 1 … y7(1) y
7(2) … y
7(T) ¼ ½ ¼
8 0 0 ... y8(1) y
8(2) … y
8(T) ¼ ½ ¼
Data Structure
n = n22 + n21 + n20 + n12 + n00 + n02 + n01 + n00
Parents AA aa
F1 Aa
F2 AA Aa aa ¼ ½ ¼
The Finite Mixture Model
n
iiiiiii
qp
fff
L
1||| )()()(
),|,,(
yyy
yMΘΘ
0011222
Observation vector, yi = [yi(1), …, yi(T)] ~ MVN(uj, )
Mean vector, uj = [uj(1), uj(2), …, uj(T)],
(Co)variance matrix,
)()2,()1,(
),2()2()1,2(
),1()2,1()1(
2
2
2
TTT
T
T
Σ
Modeling the Mean Vector
• Parametric approach Growth trajectories – Logistic curve HIV dynamics – Bi-exponential function Biological clock – Van Der Pol equation Drug response – Emax model
• Nonparametric approach Lengedre function (orthogonal polynomial) B-spline (Xueli Liu & R. Wu: Genetics, to be submitted)
Stem diameter growth in poplar trees
Ma, Casella& Wu: Genetics2002
Logistic Curve of Growth – A Universal Biological Law
Logistic Curve of Growth – A Universal Biological Law (West et al.: Nature 2001)
Modeling the genotype-dependent mean vector,uj = [uj(1), uj(2), …, uj(T)]
= [ , , …, ]
Instead of estimating uj, we estimate curveparametersm = (aj, bj, rj)jr
j
j
eb
a1 jr
j
j
eb
a2
1 jTr
j
j
eb
a1
Number of parameters to be estimated in the mean vectorTime points Traditional approach Our approach 5 3 5 = 15 3 3 = 910 3 10 = 30 3 3 = 950 3 50 = 150 3 3 = 9
Modeling the Variance Matrix
Stationary parametric approachAutoregressive (AR) model
Nonstationary parameteric approachStructured antedependence (SAD) modelOrnstein-Uhlenbeck (OU) process
Nonparametric approachLengendre function
Differences in growth across agesUntransformed Log-transformed
Poplardata
Functional mapping incorporated by logistic curves and AR(1) model
QTL
Developmental pattern of genetic effects
Wu,Ma,Lin,Wang&Casella:Biometrics2004
Timing at which the QTL is switched on
The implications of functional mapping for high-dimensional biology
• Multiple genes – Epistatic gene-gene interactions
• Multiple environments – Genotype environment interactions
• Multiple traits – Trait correlations• Multiple developmental stages• Complex networks among genes,
products and phenotypes
High-dimensional biology deals with
Functional mapping for epistasis in poplar
QTL 1QTL 2
Wu,Ma,Lin&CasellaGenetics2004
The growth curves of four different QTL genotypes for two QTL detected on the same linkage group D16
Genotype environment interaction in rice
Zhao,Zhu,Gallo-Meagher&Wu:Genetics2004
Plant height growth trajectories in rice affected by QTL in two contrasting environments
Red: Subtropical HangzhouBlue: Tropical Hainan
QQqq
Functional mapping: Genotype sex interaction
Zhao,Ma,Cheverud&WuPhysiological Genomics2004
Red: Male mice
Blue: Female mice
QQQqqq
Body weight growth trajectories affected by QTL in male and female mice
Functional mapping for trait correlation
Zhao,Hou,Littell&Wu: Biometricssubmitted
Growth trajectories for stem height and diameter affected by a pleiotropic QTL
Red: DiameterBlue: Height
QQ Qq
Statistical Challenges
Logistic curves provide a bad fit of the volume data!
Stem volume=CoefficientHeightDiameter2
Modeling the mean vector using the Legendre function
The general form of a Legendre polynomial of order r is given by the sum,
K
k
kr
r
kr x
krkrkkr
xP0
2
)!2()!(!2)!22(
)1()(
where K = r/2 or (r-1)/2 is an integer. We have first few polynomials:
P0(x) = 1 P5(x) = 1/8(63x5-70x3+15x)P1(x) = x P6(x) = 1/16(231x6-315x4+105x2-5)P2(x) = ½(3x2-1)P3(x) = ½(5x3-3x)P4(x) = 1/8(35x4-30x2+3)
New fit by the Legendre function
Lin,Hou&Wu:JASA,underrevision
Functional Mapping:toward high-dimensional biology
• A new conceptual model for genetic mapping of complex traits
• A systems approach for studying sophisticated biological problems
• A framework for testing biological hypotheses at the interplay among genetics, development, physiology and biomedicine
Functional Mapping:Simplicity from complexity
• Estimating fewer biologically meaningful parameters that model the mean vector,
• Modeling the structure of the variance matrix by developing powerful statistical methods, leading to few parameters to be estimated,
• The reduction of dimension increases the power and precision of parameter estimation
Prospects
Teosinte and Maize
Teosinte branched 1(tb1) is found to affect the differentiation in branch architecture from teosinte to maize (John Doebley 2001)
Biomedical breakthroughs in cancer, next?
Single Nucleotide Polymorphisms (SNPs)
no cancer
cancer
Liu, Johnson, Casella & Wu:Genetics 2004Lin & Wu: PharmacogenomicsJournal 2005