temporal configuration analysis 1 · order markov chain for the latent variable assumes that the...
TRANSCRIPT
Temporal Configuration Analysis
1
1
Running head: TEMPORAL CONFIGURATION ANALYSIS
Temporal Configuration Analysis of Developmental Trajectories in Young Children of Heavy Episodic Drinking Mothers
Edward H. Ip
Wake Forest University School of Medicine, Department of Biostatistical Sciences
Alison Snow Jones
1 Wake Forest University School of Medicine, Department of Social Sciences and Health Policy
Qiang Zhang
Wake Forest University School of Medicine, Department of Biostatistical Sciences
Frank Rijmen
VU Medical Center, Amsterdam, The Netherlands
Correspondence concerning this article should be addressed to Edward Ip, Wake Forest University School of Medicine, Department of Biostatistical Sciences, Division of Public Health Sciences, MRI Building 3/F.,Medical Center Boulevard, Winston-Salem, NC 27157. Phone: 336.716.9833. Fax: 336.716.6427. Electronic mail may be sent to [email protected] This research was made possible by research grants #R01 AA11071 (Jones (PI)) from the National Institute of Alcohol Abuse and Alcoholism and the Office of Research on Women’s Health and #0532185 from the National Science Foundation (Ip (PI), Jones (co-PI)).
Temporal Configuration Analysis
2
Abstract
Most models for longitudinal analysis focus on a single outcome variable. In this paper, we
develop a method for examining patterns of evolution of cognitive and behavioral profiles of
children of heavy episodic drinking (HED) mothers. A developmental profile contains multiple
outcome variables. Our objectives are to delineate clusters of outcome trajectories and to study
the effect of maternal HED on these trajectories. Several analytical methods, including the
functional principal component analysis and the K-mean algorithm, are adapted to achieve this
aim. The proposed method, which we call Temporal Configuration Analysis (TCA), is applied to
a sample of young children drawn from the National Longitudinal Survey of Youth (NLSY).
While most of our results are consistent with previous findings, we demonstrate how the method
can lead to nuanced yet important differences in developmental trajectories for children of HED
mothers.
Temporal Configuration Analysis
3
Temporal Configuration Analysis of Developmental Trajectories in Young Children of Heavy
Episodic Drinking Mothers
Motivated by the increasing use of longitudinal data for answering important clinical
questions, many analytical tools such as the mixed model, generalized estimation equations, and
the growth curve model have been developed and became widely accessible to researchers.
However, most models for longitudinal analysis focus on a single outcome variable. These models
also tend to view developmental outcomes as relatively stationary over time. Trends and
heterogeneity in developmental trajectories are routinely summarized by linear regression
coefficients and p-values. As a result, subtle nonlinear changes and nuanced interaction between
outcomes over time are mostly neglected. To overcome these shortcomings, in this article we
propose a method - the Temporal Configuration Analysis (TCA) - for analyzing multiple outcomes
in a longitudinal context and for examining evolution of outcomes over time. The method is
illustrated by an application to a sample of young women and their children drawn from the
National Longitudinal Survey of Youth (NLSY). We study a collection of cognitive and social
outcomes and their evolution over time for children that have been exposed to heavy episodic
drinking mothers.
Approximately 1 in 4 children in the United States under 18 years is exposed to heavy,
abusive or dependent drinking in the family 1. The scale of the problem has important implications
for the development of millions of children raised in these families. Previous studies 2-6 have
found a positive association between parents’ or maternal alcohol consumption and children’s
behavioral and cognitive problems. However, most of these studies viewed developmental
outcomes as relatively stationary over time. They have not examined the extent to which dynamic
Temporal Configuration Analysis
4
child development trajectories differ among children with heterogeneous cognitive and behavioral
profiles exposed to heavy drinking and non-heavy drinking mothers.
TCA expands the discrete hidden Markov model (HMM) 7, 8 which allows the analysis of a
portfolio of multiple outcomes, to include a clustering component for the analysis of the
distribution and structure of individual developmental trajectories. The multiple outcomes used in
the HED example include cognitive scores in math, reading ability, and reading comprehension
and measures for anti-social, anxious, headstrong, hyperactive and peer conflict behaviors. Data
analysis using TCA proceeds in two stages. In Stage I, a HMM and a latent trait regression model 9
are simultaneously estimated. In Stage II, model parameters obtained from Stage I are held fixed,
and the most probable developmental trajectory for each child is estimated. Relatively
homogeneous groups of trajectories are then identified through the method of data clustering.
The structure of HMM reported in this paper is similar to that described in Scott, James, &
Sugar 10, in which HMM is applied to continuous medical data under Bayesian methods. However,
no covariate is used in their model. Other examples of biomedical application of HMM have also
been reported. 11-16
The remainder of this article is organized as follows. In the next section, we describe the
two-stage models of TCA. It is followed by a TCA analysis of the NLSY data. We conclude the
article with a brief discussion.
Model
Stage I Components
We begin by describing the two Stage I components of TCA: HMM and the latent trait
Temporal Configuration Analysis
5
regression model. The HMM model for ordinal outcome is described for ease of exposition. The
model specification and estimation procedure for categorical outcomes are similar.
Hidden Markov Model
The ordinal outcome response HMM is defined as follows. Let yitj denote the response of child i
at occasion t on outcome j, i =1,…,n; j = 1,…,J; t = 1,…,T . Specifically,
( )1, , , , ′= … …it it itj itJy y yy denotes the response vector of child i at occasion t, and
( )1, , , ,i i it iT′′ ′ ′=y y y y… … represents the complete response pattern of child i. Additionally, zit =
1,…,s,…. S, denotes the categorical latent state of child i at occasion t. Thus,
( )1, , , ,i i it iTz z z ′=z … … is the trajectory of child i through the latent space over time. A first-
order Markov chain for the latent variable assumes that the latent state at occasion t + 1 depends
upon the past latent states through the latent state at occasion t only,
( ) ( )1 1 1Pr , Pri t i i t i t itz z z z z+ +=… . Accordingly, the basic hidden Markov model contains the
following three basic sets of parameters:
(a) ( )= ττ rs , the matrix of time-homogeneous transition probabilities between latent
states, ( )1Pr it i t rsZ s Z r−= = = τ for t = 2,…, T.
(b) ( )1 11 1, , S′= α αα … , the vector of marginal-state probabilities at occasion 1 (initial
state probabilities). Marginal-state probabilities at occasion t = 2,…, T are given
by the recursive formula −′ ′= τ1t tα α .
(c) ( )Π = π jsk , the matrix of state-conditional response probabilities, where k is the
index of outcome category, k = 1,…,K, and for any t,
Temporal Configuration Analysis
6
( )π = = =Prjsk ijt itY k Z s . (1)
Assuming conditional independence between responses, given the latent state, the
marginal probability of a response pattern yi is given by
( ) ( ) ( )∈ =
= =∑ ∏D 1
Pr Pr Pr ,i
T
i i it it itt
Z zz
y z y (2)
with the summation over the entire state space D of size ST. Furthermore,
( ) δ
= =
= = π∏∏ ( , )
1 1Pr ijt
J Ky k
it it jskj k
Z sy , (3)
where
1,( , )
0, .ijt
ijtijt
y ky k
y kδ
=⎧= ⎨ ≠⎩
The distribution of latent state is given by:
( )−
=
= α τ∏1 112
Pri i t it
T
i z z zt
z (4)
Latent trait regression HMM
A latent trait regression model 9 is embedded within the HMM. The latent trait model, used in
conjunction with discrete HMM, is motivated by several factors. First, in traditional HMM, at a
specific time point the outcomes within a state are assumed to be conditionally independent
given state. The assumption of conditional independence often leads to a large number of latent
states because of possible residual correlation among responses. The latent trait regression model
models dependency among multiple ordinal responses the way it is handled in item response
theory (IRT) (e.g., Lord & Novick) 17. Specifically, an extended one parameter logistic model
Temporal Configuration Analysis
7
(1PL), or the Rasch model, is designed to capture conditional dependence between items given
state and time. According to the Rasch model, an underlying latent trait, θ , or random effect, is
posited to influence the conditional probabilities of how a child positively responds to an item. A
second motivation of using an IRT based model is the ability to incorporate both item-level and
individual-level contextual information.18 The incorporation of individual-level random effects
and covariates necessitates a slight change in notation. For any given time point, define for child
i, i=1,.., n, the following conditional probability:
( )π = = = θPr | , .ijsk ijt it iY k Z s (5)
Note that compared to (1), the probability ijskπ in (5) now has an index for child as well.
Formally, the latent trait regression model for ordinal response can be expressed as follows:
11
1
log , 1
ijsjs i
ijs
πβ θ
π⎛ ⎞
= +⎜ ⎟⎜ ⎟−⎝ ⎠and
1 1 ( 1)
1 1 ( 1)
log log log , 2,..., -1,1 ( ) 1 ( )
ijs ijsk ijs ijs kjsk
ijs ijsk ijs ijs k
k Kπ π π π
βπ π π π
−
−
⎧ ⎫⎛ ⎞ ⎛ ⎞+ + + +⎪ ⎪− = =⎜ ⎟ ⎜ ⎟⎨ ⎬⎜ ⎟ ⎜ ⎟− + + − + +⎪ ⎪⎝ ⎠ ⎝ ⎠⎩ ⎭
(6)
with ( )2~ ,i itNθ σ′w γ , (7)
where γ is the vector of fixed background effects and itw is a vector of background and family
characteristics for child, i, at time t. The special log-log form in Equation (6) enforces the
constraint that the cumulative probabilities are in a monotonically increasing order (Fahrmeir and
Tutz, p.81) 19, thus avoiding the more difficult task of explicitly implementing the constraint
1 , 1... ...js jsk js Kβ β β −< < < < .. To identify the model in (6), no jskβ coefficient is estimated for the
Temporal Configuration Analysis
8
last category K and 1
11
K
itjK itjkk
π π−
=
= −∑ . The introduction of the latent trait regression model leads
to the following modification of the likelihood equation (3):
( )β β β β −
β β β β −
δθ +β + + + θ +β + + +
θ +β + + + θ +β + + += =
⎛ ⎞⎜ ⎟= = − θ γ σ θ⎜ ⎟+ +⎝ ⎠
∏ ∏∫1 1 , 1
1 1
1 1 , 11 1
( , )... ...
2
... ...1 1
ˆ ˆPr ( | , ˆ )1 1
ijtjs jsk js js ki js i js
js jsk js js ki js i js
y ke e e eJ K
it it i it ie e e ej k
e eZ s N de e
y w
(8)
Stage I Estimation and Algorithm
The EM algorithm (Dempster, Laird, & Rubin 1977) is used to estimate parameters in
Stage I. The objective here is to maximize the expected log-likelihood of the complete data,
which include both the observed y and the latent states z, given some provisional estimates for
the parameter. For each individual, the expected loglikelihood can be expressed as:
ˆ[log( ( , | )] log ( , | ) ( | , )i i i i i i i iE p p pθ θ∈
Θ = Θ Θ∑z
y , z y , z z yD
, (9)
where ( , , , )α τ β γΘ = and ˆˆ ˆ ˆˆ( , , , )α τ β γΘ = respectively denote the true parameter and the current
estimate. Given a particular sequence of latent states, one can write out ( | )i ip Θy , z as:
1 1 , 11 1
1 1 , 11 , 11 1
( , )... ...
21, , ... ...
1 1 1
ˆ ˆ( | ) ( | , )1 1
ijtjs jsk js js ki js i js
js jsk js js ki i t iti js i js
y ke e e eT J K
i i z z z i it ie e e et j k
e ep N de e
β β β β
β β β β
δθ β θ β
θ β θ βα τ θ σ θ
−
−−
+ + + + + + + +
+ + + + + + + += = =
⎛ ⎞⎜ ⎟Θ = −⎜ ⎟+ +⎝ ⎠
∏ ∏ ∏∫y , z w γ
(10)
Therefore, the overall expected log-likelihood is given by:
Temporal Configuration Analysis
9
1
1
, 1
1 1 , 11 1
1 1 , 11 1
1 11 1
,1 1
... ...
... ...
ˆlog( ) ( | , )
ˆlog( ) ( | , )
log1 1
i
i t it
js jsk js js ki js i js
js jsk js js ki js i js
n S
z i ii z
n T
z z i ii t
e e e e
e e e e
p z
p
e e
e e
β β β β
β β β β
θ β θ β
θ β θ β
α
τ−
−
−
= =
= ∈ =
+ + + + + + + +
+ + + + + + + +
Θ
⎛ ⎞+ Θ⎜ ⎟⎝ ⎠
⎛ ⎞⎜ ⎟+ −⎜ ⎟+ +⎝ ⎠
∑∑
∑∑ ∑z
y
z yD
( )
( , )
1 1 1
2
1
ˆ ˆ( | , , ) ( | , ),
ˆ ˆlog ( | , ) ( | , , ) ( | , )
ijty kKn J
i i i i i ii j k
n
i it i i i i i ii
p d p
N p d p
δ
θ θ
θ σ θ θ
= ∈ = =
= ∈
⎡ ⎤⎢ ⎥Θ Θ⎢ ⎥⎢ ⎥⎣ ⎦
⎡ ⎤+ Θ Θ⎣ ⎦
∑∑∑ ∏∫
∑∑ ∫
z
z
z y z y
w γ z y z y
D
D
(11)
where ˆ( | , , )i i ip θ Θz y is computed with Bayes Theorem,
ˆ ˆ ˆ ˆ1 1 , 11 1
ˆ ˆ ˆ ˆ1 1 , 11 1
ˆ ˆ11
( , )ˆ ˆ... ...
2ˆ ˆ ˆ ˆ... ...1
ˆ ...
ˆ ˆ( | , )1 1ˆ( | , , )
ijtjs jsk js js ki js i js
js jsk js js ki js i js
js jski js
y ke e e eK
i ite e e ek
i i ie e
e e Ne e
pe
β β β β
β β β β
β β
δθ β θ β
θ β θ β
θ β
θ σ
θ
−
−
+ + + + + + + +
+ + + + + + + +=
+ + + +
⎛ ⎞⎜ ⎟−⎜ ⎟+ +⎝ ⎠Θ =
∏ w γ
z yˆ ˆ1 , 1
1
ˆ ˆ ˆ ˆ1 1 , 11 1
( , )ˆ ...
2ˆ ˆ ˆ ˆ... ...1
.
ˆ ˆ( | , )1 1
ijtjs js ki js
js jsk js js ki js i js
y ke eK
i it ie e e ek
e N de e
β β
β β β β
δθ β
θ β θ βθ σ θ
−
−
+ + + +
+ + + + + + + +=
⎛ ⎞⎜ ⎟−⎜ ⎟+ +⎝ ⎠
∏∫ w γ
and so is ˆ( | , )i ip Θz y ,
ˆ ˆ ˆ ˆ( | , , ) ( | , )ˆ( | , ) .ˆ ˆ( | , )i i jsk i i
i ii jsk i
p pp
pβ θ
β θΘ =
y z z α τz y
y
By adding a Lagrangian term into each of the first two terms in (11) to respectively reflect the
constraints 1
1
11
1i
S
zz
α=
=∑ and ,1
1S
ir issτ
=
=∑ , we solve, in the M-step, the maximum likelihood
estimates through the following equations:
1
11
ˆ( | , )n
i ii
s
p z s
Nα =
= Θ=∑ y
, (11a)
, 1
1 2
, 11 2
ˆ( , | , )
ˆ( | , )
n T
i t it iti t
rs n T
i t iti t
p z r z s
p z rτ
−= =
−= =
= = Θ=
= Θ
∑∑
∑∑
y
y (11b)
Temporal Configuration Analysis
10
Equation (11a) and (11b) are the updating equations for parameters α and τ . For the
regression parameters in the latent trait model, we do not have a closed form updating equation
for β and γ . Instead, we use Gauss-Hermite quadrature to approximate the integrals in (11) and
an indirect maximization algorithm to update β and γ (section 7.4.3 of Fahrmeir and Tutz) 19.
The E-step, which involves the computation for the joint probabilities in (11), can be
efficiently implemented using graphical model theory 21. The essence of this implementation is a
decomposition of a high-dimensional joint probability distribution into modules of lower
dimensions. Briefly, we create a directed acyclic graph for the associated statistical model. The
directed graph, which consists of nodes (variables) and arcs, is a pictorial representation of the
conditional dependence relations of the statistical model. The directed acyclic graph is then
transformed into a so-called junction tree, which forms the basis of efficient computational
schemes 22.
To construct the directed acyclic graph, and to transform the graph into a junction tree,
we adapted the Bayes Net Toolbox of K. Murphy 23, which can be downloaded from
http://www.cs.ubc.ca/~murphyk/Software/BNT/bnt.html (accessed November 1, 2006). Note
that this publicly available toolbox does not contain a user-friendly graphical interface, nor does
it contain the latent trait regression model. This set of Matlab functions for ordinal data and
latent trait regression, together with a point-and-click graphical user interface, can be
downloaded from our website (provided at production stage).
In fitting HMM models, the number of hidden (latent) states is usually not known a priori
and a model selection step prior to parameter estimation is required. Model selection has been an
important and actively researched area in HMMs 24 and in finite mixture models generally 25. In
this application, we follow an approach suggested by Huang & Bandeen-Roche 26. We employed
Temporal Configuration Analysis
11
the Bayesian Information Criterion (BIC) 27 to determine the optimal number of states, by fitting
a range of HMMs to the data without including any covariates.
Stage II
Stage II consists of a global optimization procedure for finding the most probable trajectory for
each individual, functional principal component analysis (FPCA) of the trajectories, and K-mean
clustering for grouping similar trajectories.
Global optimization of trajectory
Here we define a child’s most probable trajectory as the sequence of hidden states that
best explains the child’s responses:
11
*
,...,,max ( ,..., | )
ti iti iz z i iP z z= Θz y
To obtain child i’s most likely trajectory, *iz , a naïve method is to identify the most probable
state at each time point and then link the selected states to form the trajectory. However, this
method may lead to suboptimal solutions. For example, the most likely state at time t does not
transition to the most likely state at time t+1. We applied Viterbi’s algorithm 28 to evaluate the
globally optimal solution. The Viterbi algorithm calculates the most likely state sequence of the
HMM given an observation sequence. The most likely sequences up to each time point are
stored. By tracing back stored probability tables from the last time point, the algorithm can
determine the most probable trajectory up to the present point.
Functional Principal Component Analysis (FPCA) and Trajectory Clustering
Consider a most probable trajectory for each individual in the form * * *1 2i i iTz z z→ → →
produced by the global optimization procedure. Our objective is to sort the collection of
Temporal Configuration Analysis
12
individual trajectories * * *1 2{ , 1, , }i i iTz z z i n→ → → = into groups through some form of
clustering algorithm. Because the states are qualitative variables, it is challenging, if not
impossible, to directly use traditional clustering methods such as K-mean, which are designed to
handle quantitative information. To address this problem, we propose to project each individual
trajectory into its respective conditional probability space. In this way, we are able to use
probabilities in defining a distance metric between different states. In other words, when the data
space for clustering is a sequence of conditional probability tables, we can treat the data as
functional data and apply functional data analysis methods.
The clustering of functional data proceeds in two steps: (1) functional principal
component analysis (FPCA), and (2) K-mean clustering. The FPCA 29 aims to identify a
parsimonious set of coefficients that explain a high percentage of the variance among a
potentially large number of trajectories in a high dimensional space. For each child i, given the
person’s most probable trajectory, * * *1 2i i iTz z z→ → → , “filtered” model-based conditional
probabilities of the response vector at each time point t is given by:
* JKitj it itjit
p = {P(y = k |z , ) (y ,k), 1,..., , 1,..., } I , 1, ...,k K j J i nδΘ = = ∈ = , (13)
where I = [0, 1], * denotes the term-by-term product between two vectors, and the hat notation is
used to indicate that the parameter is estimated from the data. Designed for clustering purpose,
the filtered conditional probability vector in (13) implies that for two different responses for the
same item at the same time point, their Euclidean distance will necessarily be the root sum of
squares of the two conditional probabilities for the respective observed response. Compared to
other versions of conditional probabilities (such as without “filtering”), our preliminary
Temporal Configuration Analysis
13
experiments showed that the filtered conditional probabilities in (13) led to better separated
clusters.
Formally, the first step of clustering, FPCA, proceeds as follows: Form the trajectory
profile of each individual by concatenating the vectors it
p over time points:
JKTi i1 i2 it = {p , p , ..., p } I , 1, ..., ,p t T∈ =
Next, compose a trajectory matrix using n trajectory profiles as its columns.
JKT n1 2 n
= {p ; p ; ... p } I ×Μ ∈ .
Then, apply a spectral decomposition to the nonnegative definite variance-covariance matrix of
Μ :
TV = Cov (M) = R R ,Λ
where 1 kR = {u , ..., u } , iu is the ith eigenfunction and 1 k= diag( , ... )λ λΛ , λi is the ith
eigenvalue. The ith column vector iv of V can be expressed as a linear combination of the
eigenfunctions:
JKT
i i1
v = u ,ij jj
μ ε=
+∑
Finally, extract the coefficient ikε for principal component k from trajectory i by first subtracting
each column vector iv of V from the mean vector iμ and then projecting the residuals onto the
eigenfunctions,
JKT
T Ti i k j k
1(v - ) u =( u )u .ij ik
jμ ε ε
=
=∑ (14)
An individual’s trajectory profile is characterized by the directions and magnitudes of the
coefficients of the eigenfunctions associated with each principal component.
Temporal Configuration Analysis
14
In order to reduce the dimension of the vector that enters into the subsequent clustering
algorithm, we need to determine the number of principal components, m, that will explain a
sufficiently large proportion of the variance-covariance matrix, V. There are two benefits in
using principal components. First, it increases the computational efficiency of the subsequent
clustering algorithm. More importantly, it will significantly reduce the noise level inherent in the
data. The final determination of m was through an examination of a scree-plot of the
eigenvalues. Analogous to a scree-plot in factor analysis, the eigenvalues typically drop off
rather quickly after a few principal components. We use this information to select the appropriate
value for m. Once the value for m is determined, the first m entries of the coefficients for iv in
the eigenspace will enter into the K-mean clustering algorithm. The K-mean algorithm will not
be described here. Readers are referred to Hartigan 30.
An important issue that arises from using K-mean is that the number of desired clusters,
c, must be determined prior to implementing the algorithm. There is no standard statistical
method for choosing c. We apply a metric, the CH index 31, which is recommended by Milligan
and Cooper 32. Using simulated data, Milligan and Cooper reported that the CH index provided
the best performance among 30 indexes. The CH index is defined by the following equation:
( ) /( 1)( )( ) /( )
B c cCH kW c n c
−=
−,
where ( )B c is the between-cluster variance, ( )W c is the within-cluster variance.
Data
Data sources are the 1989 and 1994 waves of the National Longitudinal Survey of Youth
(NLSY79) and the 1990-1994 Children of the NLSY (CoNLSY). The data used in this report
Temporal Configuration Analysis
15
was collected in 1990, 1992, and 1994. This analysis focuses on cognitive and social
development in children five to nine years old. In this sample of N=367 (52% female, 29%
African-American and 20% Hispanic), the majority of children were 5 years old in 1990.
Eight measures, selected from several instruments, were compiled together to form the
developmental profiles. Five variables were selected from the Behavior Problem Index (BPI) 33 :
anti-social, anxious, peer conflict, headstrong, and hyperactive behaviors. Three variables that
measure academic acheivement were selected from Peabody Individual Achievement Test
(PIAT) scores: math, oral reading, and reading comprehension. For the purpose of convenient
interpretation for clinicians and social scientists we trichomtomized BPI subscales and PIAT into
three categories (0,1,2) based on nationally normed percentiles with cutpoints at 15% and 85%.
BPI categories were then re-ordered so that movement from 0 to 2 represented increasingly
better behavior and conformed to the direction of the PIAT classification.
Because characteristics of the child’s home may also be associated with child behavior
problems 34, we controlled for the home environment in the latent trait regression using the Home
Observation for Measurement of the Environment (HOME) Inventory 35. Two HOME scores (low,
medium, or high) measuring cognitive stimulation and emotional support were used in the latent
trait regression. Another important potential predictor of behavior problems is the child’s exposure
to alcohol during pregnancy, which has been linked to child behavior problems. A binary indicator
for maternal drinking during pregnancy of once a month or more was created. Together with the
HOME scores, drinking during pregnancy is included as a covariate in latent trait regression
models described below.
Variables used to classify maternal drinking status were only asked in 1989 and 1994 of
individuals who reported drinking within 30 days of the NLSY interview. Heavy episodic drinking
Temporal Configuration Analysis
16
(HED) is typically classified as 5 or more drinks on one occasion (Naimi et al., 2003), however
recent evidence suggests that for women, the HED drinking amount should be 4 or more drinks on
a single occasion (Wechsler & Austin, 1998). Unfortunately, the only cut point reported in the
NLSY in both years was 6 or more drinks. Consequently, the HED measure,a binary indicator that
six or more drinks were consumed on at least one day in the month prior to the 1989 NLSY
interview date, likely misclassifies some HED mothers as non-HED.
Results
Insert Figure 1 here
Figure 1 shows the BIC values for models with 2 to 6 hidden states. Based on the BIC
criterion, the 5-state model was selected. Figure 2 displays the profiles of the five states. Each
bar shows the conditional probability distribution of the response categories for a specific item
within each state. As noted above, BPI was rescaled such that higher values for items imply
fewer behavior problems. For PIAT items, higher values indicate better academic achievement.
The prevalence of states 1 through 5 at the first time point are respectively 16, 27, 31, 17, and 10
percent.
Insert Figure 2 here
To aid interpretation of the five states, we order them approximately from the best to the worst.
Beginning with State 1, we see high probabilities of few behavior problems and high
probabilities of good academic achievement across all items. We denote this state as “no
problems” or NP. Children in State 2 demonstrate behavior problem probabilities that are quite
similar to children in State 1. However, their probabilities of low academic achievement are
higher. We denote this state as “good behavior, poor academics” or GB/PA. State 3 is almost the
Temporal Configuration Analysis
17
reverse of GB/PA in that children in this state demonstrate elevated probabilities of behavior
problems, but their academic achievement probabilities are similar to those of NP. We denote
this state as “poor behavior, good academics” or PB/GA. Children whose profiles fit that of State
4 demonstrate high probabilities of behavior problems and academic achievement probabilities
similar to that of GB/PA. We denote this group as “very poor behavior, poor academics”
(VPB/PA). Children whose profiles match those of State 5 show slightly lower behavior problem
probabilities, but a poorer academic achievement than children in state 4 or VPB/PA. We denote
this group as “poor behavior, very poor academics” (PB/VPA). The profiles for VPB/PA and
VPB/PA suggest that children who belong to these two states may be most at risk for future
problems. Table 1 summarizes these interpretations.
Insert Table 1 here
Table 2 reports latent trait regression coefficient estimates, standard errors and p-values.
Only maternal drinking during pregnancy does not show statistical significance at conventional
levels (p = 0.30). The two HOME scores show strong positive impacts on the conditional
response probabilities (p < 0.001), indicating that more supportive home environments are
associated with better behavior and academic performance.
Insert Table 2 here
Table 3 contains the transition probability from one state to another over time. Row
probabilities indicate the probability of transition from any state (row) to another state (column).
The diagonal entries therefore show the probability of remaining in each state over time. As can
be seen, all but one state are comparatively stable (~85%). The probability of staying in GB/PA
is smallest (67%), with a 15% chance of transitioning into PB/GA and an 11% chance of
transitioning into PB/VPA, and only 4% chance of transitioning into the best state, i.e. NP.
Temporal Configuration Analysis
18
Figure 3 provides a “picture” to visualize the dynamics occurring between the different states.
Problematic academic performance and behavior are arranged from low to high respectively
along the x- and y- axis. To highlight the most important features of the dynamic analysis, only
paths with a higher-than-five-percent probability are displayed.
Insert Table 3 and Figure 3 here
Using the global optimization routine, we obtained the frequency distributions of the
many trajectories. The trajectory distribution indicates that 5-9 year old children in this sample
tend to stay within the same developmental state over the 4-year period, which is consistent with
the information from the transition probability matrix. Out of a total of 367 trajectories, 268 (73
%) of them showed a stable pattern of not changing to another state at any given time point. For
the remaining trajectories, the most frequent trajectories include 2 3 3→ → (3.5%), and
2 5 5→ → (3%).
The FPCA and K-mean clustering provides summarized information about the patterns of
the trajectories. Figure 4 shows the eigenvalues of the variance-covariance matrix V. From the
scree-plot, it can be seen that using more than 5 principal components leads to very little
marginal gain in total variance explained. We further compared the results using all 24 principal
components and the first five principal components and found that the results for clustering were
not sensitive to the number of components used. However, this conclusion may not generalize to
cases in which the dimension of the problem is much higher.
Insert Figure 4 here
Figure 5 shows the how the CH index varies with the number of clusters. The plot shows
that the greatest gain in CH-index occurs at 4k = . Based on this result, we identify four distinct
clusters of trajectories.
Temporal Configuration Analysis
19
Insert Figure 5 here
Figure 6 includes two views of three-dimensional plots of the first three functional
coefficients for all subjects. The solid dots represent those subjects with stable trajectories. It can
be seen that the four K-means identified clusters are rather well separated.
Insert Figure 6 here
Table 4 shows the relative proportions of states belonging to different clusters of the
trajectories. In this table, a state is counted in the numerator of the ratio whenever it appears in a
trajectory within the cluster. By inspecting Table 4 and the actual trajectories within each cluster,
we provide the following interpretation of the four clusters.
Cluster 1. mainly the stable and transitioning trajectories into NP, representing a group of well-
behaved and academically sound children;
Cluster 2. mainly the stable and transitioning trajectories in and out of GB/PA, some PB/VPA
and some PB/GA, representing a group of children mostly with poor school
performance but diverse behaviors.
Cluster 3. mainly the trajectories with good academic performance but poor behavior from
PB/GA.
Cluster 4. mainly the stable trajectories in and out of VPB/PA and PB/VPA, representing a
group of children with both a high level of behavioral problems and weak school
performance;
Insert Table 4 here
Maternal HED
In the sample, there are 90 (25%) children identified as having an HED mother. In this
subsection, we provide several perspectives regarding the association between maternal HED
Temporal Configuration Analysis
20
status and a child’s development through separate analyses of latent state and trajectory
distributions by maternal HED status. First, Figure 7 shows the relative proportion of the five
behavior/academic performance states between 1990 and 1994 (three time points) for two groups
of children, one with HED mothers (Figure 6a) and the other without (Figure 6b). The states are
ordered from bottom (State 1) to top (State 5).
Insert Figure 7 here
By following the probabilities of the least at-risk group, State 1 (NP), we see children of
non-HED mothers show a slightly increasing trend over time to be in this group, whereas
children of HED mothers demonstrate a slightly decreasing trend. In contrast, the proportion of
PB/VPA children is much higher for the group of HED mothers and this proportion also tends to
increase over time. Table 5 shows the result of two-sample z-tests between the proportions of the
first two groups combined (NP and GB/PA), and the worst group (PB/VPA) within two distinct
groups characterized by maternal alcohol status. There is a significantly higher proportion of
children that have no or little problem in group of non HED mothers. On the other hand, for the
PB/VPA state, while the tests between non HED and HED mothers are only marginally
significant, the higher level of PB/VPA children in the HED group is consistent across the three
time points.
Insert Table 5 here
This finding is supported by a closer inspection of the distributions of the trajectories and
the trajectory clusters. Table 6 shows the distribution of trajectories by cluster. Note that,
because the trajectories are individual-specific, the same trajectories of states may have slightly
different conditional probabilities, and therefore be classified into different clusters. Cluster 1,
which is dominated by trajectories 1 1 1→ → (all NP), and those starting from State 2 (GB/PA)
Temporal Configuration Analysis
21
or State 3 (PB/GA) but ending up in State 1 (NP), does not show any dependency on the
mother’s HED. The proportion of children with HED mothers is no different from that of the
larger sample, i.e. 25%. Clusters 4 and 2 show slightly higher proportions of HED mothers than
the average, i.e. 27% and 29% respectively, and they both are characterized by poor academic
performance. This is mainly due to the majority trajectories in these two clusters are 2 2 2→ →
(all GB/PA) and 5 5 5→ → (all PB/VPA), which are both characterized by 35% HED mothers.
In contrast, cluster 3, dominated by State 3 (PB/GA), has the smallest percent of HED mothers.
Based on these patterns, maternal HED seems to have a negative association with both child
behavior and academic performance, but the negative association with academic performance is
especially pronounced.
Insert Table 6 here
Discussion
In this paper, we describe the TCA methodology and its usefulness in discerning patterns
of developmental trajectories over time. This extended HMM approach allowed us to obtain
more nuanced insight into multiple child outcomes. The relationship between maternal alcohol
consumption and child behavior problems is consistent with previous research. However, a
slightly more nuanced finding is that children of HED mothers appear to be at greater risk for
both behavior problems and poor academic performance, but the latter seems to be more strongly
linked to maternal HED, either with or without behavior problems. Although a relatively large
proportion of children whose mothers do not have alcohol problems demonstrate elevated
behavior problems scores, we found that their academic achievement (State 3 PB/GA) remains in
Temporal Configuration Analysis
22
the average to good range. Children of alcohol mothers were found to be more likely to fall into
States 2 (GB/PA) and 5 (PB/VPA), characterized by poor academic achievement problems.
It is also worth noting that children with poor academic performance, but no behavior
problems (State 2) are more like to transition to states characterized by poor behavior regardless
of maternal drinking status. If this result were to persist in larger, more representative samples,
further investigation of causal links would be warranted. Taken at face value, the finding
suggests that policies that enhance early academic performance may have positive behavioral
effects.
The findings also suggest that children of HED mothers may be at risk for a double
handicap in later life since poor academic performance is likely to impair later educational
attainment and elevated behavior problems have been linked to later drug and alcohol problems.
A finding from the trajectory analysis that may be of interest to developmental psychologists and
other social scientists is that children with more severe behavioral problems tend to remain that
way during this period of their life, regardless of maternal alcohol status. This is consistent with
policies that intervene earlier in a child’s life rather than after the child has started school.
There are several technical limitations to the current form of TCA approach. The first
limitation is the uncertainty ignored in the two-stage estimation procedure. Literature on
comparison between multi-stage and simultaneous estimation (e.g., Scott and Ip) 36 show that the
significance of tests is often over-reported in multi-stage procedure. The extent to which the
clustering of trajectories will be affected by uncertainty in the parameters from the first stage is
not clear and merits further investigation such as using a Bayesian approach.
A second limitation of TCA is the absence of convariates in modeling the transition
probabilities. For example, one might expect that home environment variables affect transition
Temporal Configuration Analysis
23
from poorer developmental states to better states. We note that Equation (11b), which estimates
transition probabilities, can be modified in a way that is analogous to Equation (11c) and (12) to
incorporate covariates. However, because the number of transition probabilities quickly increases
with the number of latent states, the solution for modeling transition is likely to be dictated by
sample sizes. We are currently exploring constrained regression-type models to address this
issue.
.
Temporal Configuration Analysis
24
Reference List
1. Grant BF. Estimates of US children exposed to alcohol abuse and dependence in the family.
American Journal of Public Health 2000; 90(1):112-115.
2. Snow JA, Miller DJ, Salkever DS. Parental use of alcohol and children's behavioural health:
a household production analysis. Health Econ. 1999; 8(8):661-683.
3. Wall TL, Garcia-Andrade C, Wong V, Lau P, Ehlers CL. Parental history of alcoholism and
problem behaviors in native-American children and adolescents. Alcoholism-Clinical
and Experimental Research 2000; 24(1):30-34.
4. Jones AS. Maternal Alcohol Abuse/Dependence, Children's Behavior Problems, and Home
Environment: Estimates from the National Longitudinal Survey of Youth Using
Propensity Score Matching*. J.Stud.Alcohol Drugs 2007; 68(2):266-275.
5. Schuckit MA, Chiles JA. Family History As A Diagnostic Aid in 2 Samples of Adolescents.
Journal of Nervous and Mental Disease 1978; 166(3):165-176.
6. Aronson M, Kyllerman M, Sabel KG, Sandin B, Olegard R. Children of Alcoholic Mothers -
Developmental, Perceptual and Behavioral-Characteristics As Compared to Matched
Controls. Acta Paediatrica Scandinavica 1985; 74(1):27-35.
Temporal Configuration Analysis
25
7. Rabiner LR. A Tutorial on Hidden Markov-Models and Selected Applications in Speech
Recognition. Proceedings of the Ieee 1989; 77(2):257-286.
8. MacDonald IL. Hidden Markov and other models for discrete-valued time series /. London:
Chapman & Hall, 1997.
9. Rijmen, F., Vansteelandt, K., and DeBoeck, P. Latent class modesl for diary method data:
parameter estimation by local computations. Psychometrika . 2006.
Ref Type: In Press
10. Scott SL, James GM, Sugar CA. Hidden Markov models for longitudinal comparisons.
Journal of the American Statistical Association 2005; 100(470):359-369.
11. Bureau A, Shiboski S, Hughes JP. Applications of continuous time hidden Markov models to
the study of misclassified disease outcomes. Stat.Med. 2003; 22(3):441-462.
12. Jackson CH, Sharples LD. Hidden Markov models for the onset and progression of
bronchiolitis obliterans syndrome in lung transplant recipients. Stat.Med. 2002;
21(1):113-128.
13. LeStrat Y., Carrat F. Monitoring epidemiologic surveillance data using hidden Markov
models. Stat.Med. 1999; 18(24):3463-3478.
Temporal Configuration Analysis
26
14. Smith T, Vounatsou P. Estimation of infection and recovery rates for highly polymorphic
parasites when detectability is imperfect, using hidden Markov models. Stat.Med.
2003; 22(10):1709-1724.
15. Dunson DB, Herring AH. Bayesian latent variable models for mixed discrete outcomes.
Biostatistics. 2005; 6(1):11-25.
16. Altman RM, Petkau AJ. Application of hidden Markov models to multiple sclerosis lesion
count data. Stat.Med. 2005; 24(15):2335-2344.
17. Lord FM, joint a. Statistical theories of mental test scores. Reading, Mass., Addison-Wesley
Pub. Co.: 1968.
18. Fischer GH. Linear Logistic Test Model As An Instrument in Educational Research. Acta
Psychologica 1973; 37(6):359-374.
19. Fahrmeir L, Tutz G. Multivariate statistical modelling based on generalized linear models.
Springer-Verlag, New York, 1994.
20. Dempster AP, Laird NM, Rubin DB. Maximum Likelihood from Incomplete Data Via Em
Algorithm. Journal of the Royal Statistical Society Series B-Methodological 1977;
39(1):1-38.
Temporal Configuration Analysis
27
21. Lauritzen SL. The Em Algorithm for Graphical Association Models with Missing Data.
Computational Statistics & Data Analysis 1995; 19(2):191-201.
22. Jensen FV, Lauritzen SL, Olesen KG. Bayesian updating in causal probabilistic networks by
local computation. Computational Statistics Quarterly 1990; 4:269-282.
23. Murphy, K. The Bayesian net toolbox for Matlab. Computing sciences and statistics:
proceedings of the 33rd symposium on the interface . 2001.
Ref Type: Journal (Full)
24. Scott SL. Bayesian methods for hidden Markov models: Recursive computing in the 21st
century. Journal of the American Statistical Association 2002; 97(457):337-351.
25. McLachlan G, Peel D. Finite mixture models. Wiley: 2000.
26. Huang GH, Bandeen-Roche K. Building an identifiable latent class model with covariate
effects on underlying and measured variables. Psychometrika 2004; 69(1):5-32.
27. Schwarz G. Estimating Dimension of A Model. Annals of Statistics 1978; 6(2):461-464.
28. Viterbi AJ. Error bounds for convolutional codes and an asymptotically optimal decoding
algorithm. IEEE Transaction, Information Theory 1967; 13:260-269.
29. Ramsey JO, Silverman BW. Functional Data Anslysis. Springer: New York, 1997.
Temporal Configuration Analysis
28
30. Hartigan JA. Clustering algorithms. New York, Wiley: 1975.
31. Calinski T, Harabasz J. A dendrite method for cluster analysis. Communications in statistics
1974; 3:1-27.
32. Milligan GW, Cooper MC. An Examination of Procedures for Determining the Number of
Clusters in A Data Set. Psychometrika 1985; 50(2):159-179.
33. Achenbach TM. Manual for the child behavior checklist and revised child behavior profile /.
Burlington, VT : T.M. Achenbach, T.M. Achenbach, 1983.
34. McCarty CA, Zimmerman FJ, Digiuseppe DL, Christakis DA. Parental emotional support
and subsequent internalizing and externalizing problems among children. Journal of
Developmental and Behavioral Pediatrics 2005; 26(4):267-275.
35. Caldwell BM, University of Arkansas at Little Rock., Center for Child Development and
Education. HOME observation for measurement of the environment /. Little Rock,
Ark. : University of Arkansas at Little Rock, University of Arkansas at Little Rock,
1984.
36. Scott SL, Ip EH. Empirical Bayes and item-clustering effects in a latent variable hierarchical
model: A case study from the national assessment of educational progress. Journal of
the American Statistical Association 2002; 97(458):409-419.
Temporal Configuration Analysis
29
Table 1.
Interpretation of states
State Interpretation
State 1 No problems (NP)
State 2 Good behavior, poor academics (GB/PA)
State 3 Bad behavior, good academics (PB/GA)
State 4 Very bad behavior, poor academics (VPB/PA)
State 5 Bad behavior, very poor academics (PB/VPA)
Temporal Configuration Analysis
30
Table 2
Coefficients for latent trait regression covariates and their significances
Covariates Coefficient SE t value Pr > |t|
HOME cognitive stimulation 0.30 0.06 5.12 <.001
HOME emotional support 0.30 0.06 4.71 <.002
Maternal pregnancy drinking -0.05 0.29 -0.18 0.30
Temporal Configuration Analysis
31
Table 3
Transition probability table between latent states
State 1
NP
State 2
GB/PA
State 3
PB/GA
State 4
VPB/PA
State 5
PB/VPA
State 1 0.87 0.05 0.06 0.02 0.00
State 2 0.04 0.67 0.15 0.03 0.11
State 3 0.06 0.05 0.83 0.05 0.02
State 4 0.00 0.07 0.11 0.83 0.00
State 5 0.00 0.12 0.00 0.00 0.88
Temporal Configuration Analysis
32
Table 4
Proportions of latent states within each cluster of trajectories
State 1
(NP)
State 2
(GB/PA)
State 3
(PB/GA)
State 4
(VPB/PA)
State 5
(PB/VPA)
HED(%) Non-
HED(%) Total
Cluster 1 0.90 0.05 0.05 0.00 0.00 15 (25) 45 (75) 60
Cluster 2 0.01 0.72 0.11 0.04 0.12 27 (28) 71 (72) 98
Cluster 3 0.04 0.04 0.88 0.03 0.01 22 (18) 98 (82) 120
Cluster 4 0.00 0.06 0.01 0.58 0.34 26 (29) 63 (71) 89
Temporal Configuration Analysis
33
Table 5
Two sample z-test results by maternal alcoholism status for states 1 (NP) and 5 (PB/VPA)
Time
Proportion of
specific state non-HED mother HED mother z-value P>|z|
1 State 1 & 2 (NP+GB/PA)
0.49 0.36 -2.24 0.01
State 5 (PB/VPA) 0.08 0.14 1.51 0.07
2 State 1 & 2 (NP+GB/PA)
0.53 0.42 -1.74 0.04
State 5 (PB/VPA) 0.11 0.18 1.56 0.06
3 State 1 & 2 (NP+GB/PA)
0.54 0.40 -2.37 0.009
State 5 (PB/VPA) 0.12 0.19 1.23 0.11
Temporal Configuration Analysis
34
Table 6
Distribution of trajectories within different clusters by maternal alcoholism status
Cluster 1 Cluster 2 Cluster 3 Cluster 4 C* Trajectory T** A % C Trajectory T A % C Trajectory T A % C Trajectory T A % o 1 1 1 47 12 26 o 2 2 2 57 20 35 o 3 3 3 87 14 16 o 5 5 5 25 9 36+ 2 1 1 4 1 25 o 5 5 5 6 2 33 - 2 3 3 10 5 50 o 4 4 4 45 10 22+ 3 1 1 2 1 50 - 2 2 3 6 0 0 + 4 3 3 5 1 20 - 4 2 5 3 1 33+ 2 3 1 1 0 0 - 2 5 5 4 1 25 + 3 3 1 3 0 0 - 2 5 5 7 2 29+ 2 2 1 1 0 0 o 3 3 3 3 0 0 - 3 3 4 3 1 33 - 2 4 4 3 1 33- 1 2 3 1 0 0 + 3 2 2 3 0 0 + 3 2 2 2 0 0 + 4 4 3 2 0 0 + 3 3 1 2 0 0 + 5 2 2 3 1 33 - 1 3 3 2 0 0 - 3 4 4 1 0 0 - 1 1 2 1 1 100 - 2 3 3 3 1 33 - 3 3 5 2 0 0 o 4 2 4 1 1 100- 1 1 3 1 0 0 0 4 4 4 2 0 0 - 3 5 5 1 1 100 + 4 4 2 1 1 100 + 5 5 2 2 1 50 - 1 1 4 1 0 0 - 1 4 4 1 1 100 - 1 3 3 1 1 100 + 3 1 1 1 0 0 + 3 3 2 1 0 0 + 4 3 1 1 0 0 - 3 5 5 1 0 0 - 1 1 3 1 0 0 - 2 2 4 1 0 0 - 1 2 4 1 0 0 - 2 2 5 1 0 0 + 2 3 1 1 0 0 + 4 4 2 1 0 0 + 4 3 3 1 0 0 + 4 2 2 1 0 0 - 3 - 17 - 21 - 15 + 10 + 13 + 12 + 3
Total 60 15 25 98 17 17 120 22 18 89 26 29
*C indicates how a trajectory change : + improving, - deteriorating, o no change
** T= total count A= number of HED mother, %= percent HED mother
Temporal Configuration Analysis
35
Figure 1. BIC values for models with 2 to 6 latent states
Temporal Configuration Analysis
36
Figure 2. Conditional probability profile for the 5-State model. Within each bar, dark color
represents the lowest category of outcome.
Temporal Configuration Analysis
37
Figure 3. Graphical representation of the transition matrix. The x- and y-axis include respectively
problems in academic achievement and behavior. Arrows are only shown for probabilities higher
than 8%.
Good Behavior
Good Academic
State 1
10%
Poor Behavior
Good Academic
State 3
Very PoorBehavior
Poor Academic
State 4
GoodBehavior
PoorAcademic
State 2
Poor Behavior
Very poor Academic
State 5
15%6% 11% 12% 6%
11%
7%
5%
Temporal Configuration Analysis
38
Figure 4. Plot of eigenvalues versus number of principal components.
Temporal Configuration Analysis
39
Figure 5. Plot of CH index versus the number of clusters used in K-mean algorithm.
Temporal Configuration Analysis
40
Figure 6 Plots of two views for the first three functional coefficients ( 1 2,i iε ε and 3iε in Equation (14)). Clusters 1-4 are respectively represented as circle, square, diamond, and triangle.
Temporal Configuration Analysis
41
(a) (b)
Figure 7. Prevalence of states across three time points (left to right) for two groups (a) non-HED
mother, and (b) HED mother. The states are ordered from the bottom (state 1) to the top (state 5).