temporal configuration analysis 1 · order markov chain for the latent variable assumes that the...

Temporal Configuration Analysis

1

1

Running head: TEMPORAL CONFIGURATION ANALYSIS

Temporal Configuration Analysis of Developmental Trajectories in Young Children of Heavy Episodic Drinking Mothers

Edward H. Ip

Wake Forest University School of Medicine, Department of Biostatistical Sciences

Alison Snow Jones

1 Wake Forest University School of Medicine, Department of Social Sciences and Health Policy

Qiang Zhang

Wake Forest University School of Medicine, Department of Biostatistical Sciences

Frank Rijmen

VU Medical Center, Amsterdam, The Netherlands

Correspondence concerning this article should be addressed to Edward Ip, Wake Forest University School of Medicine, Department of Biostatistical Sciences, Division of Public Health Sciences, MRI Building 3/F.,Medical Center Boulevard, Winston-Salem, NC 27157. Phone: 336.716.9833. Fax: 336.716.6427. Electronic mail may be sent to [email protected] This research was made possible by research grants #R01 AA11071 (Jones (PI)) from the National Institute of Alcohol Abuse and Alcoholism and the Office of Research on Women’s Health and #0532185 from the National Science Foundation (Ip (PI), Jones (co-PI)).


2

Abstract

Most models for longitudinal analysis focus on a single outcome variable. In this paper, we

develop a method for examining patterns of evolution of cognitive and behavioral profiles of

children of heavy episodic drinking (HED) mothers. A developmental profile contains multiple

outcome variables. Our objectives are to delineate clusters of outcome trajectories and to study

the effect of maternal HED on these trajectories. Several analytical methods, including the

functional principal component analysis and the K-mean algorithm, are adapted to achieve this

aim. The proposed method, which we call Temporal Configuration Analysis (TCA), is applied to

a sample of young children drawn from the National Longitudinal Survey of Youth (NLSY).

While most of our results are consistent with previous findings, we demonstrate how the method

can lead to nuanced yet important differences in developmental trajectories for children of HED

mothers.


3

Temporal Configuration Analysis of Developmental Trajectories in Young Children of Heavy

Episodic Drinking Mothers

Motivated by the increasing use of longitudinal data for answering important clinical

questions, many analytical tools such as the mixed model, generalized estimation equations, and

the growth curve model have been developed and became widely accessible to researchers.

However, most models for longitudinal analysis focus on a single outcome variable. These models

also tend to view developmental outcomes as relatively stationary over time. Trends and

heterogeneity in developmental trajectories are routinely summarized by linear regression

coefficients and p-values. As a result, subtle nonlinear changes and nuanced interaction between

outcomes over time are mostly neglected. To overcome these shortcomings, in this article we

propose a method - the Temporal Configuration Analysis (TCA) - for analyzing multiple outcomes

in a longitudinal context and for examining evolution of outcomes over time. The method is

illustrated by an application to a sample of young women and their children drawn from the

National Longitudinal Survey of Youth (NLSY). We study a collection of cognitive and social

outcomes and their evolution over time for children that have been exposed to heavy episodic

drinking mothers.

Approximately 1 in 4 children in the United States under 18 years is exposed to heavy,

abusive or dependent drinking in the family 1. The scale of the problem has important implications

for the development of millions of children raised in these families. Previous studies 2-6 have

found a positive association between parents’ or maternal alcohol consumption and children’s

behavioral and cognitive problems. However, most of these studies viewed developmental

outcomes as relatively stationary over time. They have not examined the extent to which dynamic


4

child development trajectories differ among children with heterogeneous cognitive and behavioral

profiles exposed to heavy drinking and non-heavy drinking mothers.

TCA expands the discrete hidden Markov model (HMM) 7, 8 which allows the analysis of a

portfolio of multiple outcomes, to include a clustering component for the analysis of the

distribution and structure of individual developmental trajectories. The multiple outcomes used in

the HED example include cognitive scores in math, reading ability, and reading comprehension

and measures for anti-social, anxious, headstrong, hyperactive and peer conflict behaviors. Data

analysis using TCA proceeds in two stages. In Stage I, a HMM and a latent trait regression model 9

are simultaneously estimated. In Stage II, model parameters obtained from Stage I are held fixed,

and the most probable developmental trajectory for each child is estimated. Relatively

homogeneous groups of trajectories are then identified through the method of data clustering.

The structure of HMM reported in this paper is similar to that described in Scott, James, &

Sugar 10, in which HMM is applied to continuous medical data under Bayesian methods. However,

no covariate is used in their model. Other examples of biomedical application of HMM have also

been reported. 11-16

The remainder of this article is organized as follows. In the next section, we describe the

two-stage models of TCA. It is followed by a TCA analysis of the NLSY data. We conclude the

article with a brief discussion.

Model

Stage I Components

We begin by describing the two Stage I components of TCA: HMM and the latent trait


5

regression model. The HMM model for ordinal outcome is described for ease of exposition. The

model specification and estimation procedure for categorical outcomes are similar.

Hidden Markov Model

The ordinal outcome response HMM is defined as follows. Let yitj denote the response of child i

at occasion t on outcome j, i =1,…,n; j = 1,…,J; t = 1,…,T . Specifically,

( )1, , , , ′= … …it it itj itJy y yy denotes the response vector of child i at occasion t, and

( )1, , , ,i i it iT′′ ′ ′=y y y y… … represents the complete response pattern of child i. Additionally, zit =

1,…,s,…. S, denotes the categorical latent state of child i at occasion t. Thus,

( )1, , , ,i i it iTz z z ′=z … … is the trajectory of child i through the latent space over time. A first-

order Markov chain for the latent variable assumes that the latent state at occasion t + 1 depends

upon the past latent states through the latent state at occasion t only,

( ) ( )1 1 1Pr , Pri t i i t i t itz z z z z+ +=… . Accordingly, the basic hidden Markov model contains the

following three basic sets of parameters:

(a) ( )= ττ rs , the matrix of time-homogeneous transition probabilities between latent

states, ( )1Pr it i t rsZ s Z r−= = = τ for t = 2,…, T.

(b) ( )1 11 1, , S′= α αα … , the vector of marginal-state probabilities at occasion 1 (initial

state probabilities). Marginal-state probabilities at occasion t = 2,…, T are given

by the recursive formula −′ ′= τ1t tα α .

(c) ( )Π = π jsk , the matrix of state-conditional response probabilities, where k is the

index of outcome category, k = 1,…,K, and for any t,


6

( )π = = =Prjsk ijt itY k Z s . (1)

Assuming conditional independence between responses, given the latent state, the

marginal probability of a response pattern yi is given by

( ) ( ) ( )∈ =

= =∑ ∏D 1

Pr Pr Pr ,i

T

i i it it itt

Z zz

y z y (2)

with the summation over the entire state space D of size ST. Furthermore,

( ) δ

= =

= = π∏∏ ( , )

1 1Pr ijt

J Ky k

it it jskj k

Z sy , (3)

where

1,( , )

0, .ijt

ijtijt

y ky k

y kδ

=⎧= ⎨ ≠⎩

The distribution of latent state is given by:

( )−

=

= α τ∏1 112

Pri i t it

T

i z z zt

z (4)

Latent trait regression HMM

A latent trait regression model 9 is embedded within the HMM. The latent trait model, used in

conjunction with discrete HMM, is motivated by several factors. First, in traditional HMM, at a

specific time point the outcomes within a state are assumed to be conditionally independent

given state. The assumption of conditional independence often leads to a large number of latent

states because of possible residual correlation among responses. The latent trait regression model

models dependency among multiple ordinal responses the way it is handled in item response

theory (IRT) (e.g., Lord & Novick) 17. Specifically, an extended one parameter logistic model


7

(1PL), or the Rasch model, is designed to capture conditional dependence between items given

state and time. According to the Rasch model, an underlying latent trait, θ , or random effect, is

posited to influence the conditional probabilities of how a child positively responds to an item. A

second motivation of using an IRT based model is the ability to incorporate both item-level and

individual-level contextual information.18 The incorporation of individual-level random effects

and covariates necessitates a slight change in notation. For any given time point, define for child

i, i=1,.., n, the following conditional probability:

( )π = = = θPr | , .ijsk ijt it iY k Z s (5)

Note that compared to (1), the probability ijskπ in (5) now has an index for child as well.

Formally, the latent trait regression model for ordinal response can be expressed as follows:

11

1

log , 1

ijsjs i

ijs

πβ θ

π⎛ ⎞

= +⎜ ⎟⎜ ⎟−⎝ ⎠and

1 1 ( 1)

1 1 ( 1)

log log log , 2,..., -1,1 ( ) 1 ( )

ijs ijsk ijs ijs kjsk

ijs ijsk ijs ijs k

k Kπ π π π

βπ π π π

−

−

⎧ ⎫⎛ ⎞ ⎛ ⎞+ + + +⎪ ⎪− = =⎜ ⎟ ⎜ ⎟⎨ ⎬⎜ ⎟ ⎜ ⎟− + + − + +⎪ ⎪⎝ ⎠ ⎝ ⎠⎩ ⎭

(6)

with ( )2~ ,i itNθ σ′w γ , (7)

where γ is the vector of fixed background effects and itw is a vector of background and family

characteristics for child, i, at time t. The special log-log form in Equation (6) enforces the

constraint that the cumulative probabilities are in a monotonically increasing order (Fahrmeir and

Tutz, p.81) 19, thus avoiding the more difficult task of explicitly implementing the constraint

1 , 1... ...js jsk js Kβ β β −< < < < .. To identify the model in (6), no jskβ coefficient is estimated for the


8

last category K and 1

11

K

itjK itjkk

π π−

=

= −∑ . The introduction of the latent trait regression model leads

to the following modification of the likelihood equation (3):

( )β β β β −

β β β β −

δθ +β + + + θ +β + + +

θ +β + + + θ +β + + += =

⎛ ⎞⎜ ⎟= = − θ γ σ θ⎜ ⎟+ +⎝ ⎠

∏ ∏∫1 1 , 1

1 1

1 1 , 11 1

( , )... ...

2

... ...1 1

ˆ ˆPr ( | , ˆ )1 1

ijtjs jsk js js ki js i js

js jsk js js ki js i js

y ke e e eJ K

it it i it ie e e ej k

e eZ s N de e

y w

(8)

Stage I Estimation and Algorithm

The EM algorithm (Dempster, Laird, & Rubin 1977) is used to estimate parameters in

Stage I. The objective here is to maximize the expected log-likelihood of the complete data,

which include both the observed y and the latent states z, given some provisional estimates for

the parameter. For each individual, the expected loglikelihood can be expressed as:

ˆ[log( ( , | )] log ( , | ) ( | , )i i i i i i i iE p p pθ θ∈

Θ = Θ Θ∑z

y , z y , z z yD

, (9)

where ( , , , )α τ β γΘ = and ˆˆ ˆ ˆˆ( , , , )α τ β γΘ = respectively denote the true parameter and the current

estimate. Given a particular sequence of latent states, one can write out ( | )i ip Θy , z as:

1 1 , 11 1

1 1 , 11 , 11 1

( , )... ...

21, , ... ...

1 1 1

ˆ ˆ( | ) ( | , )1 1


js jsk js js ki i t iti js i js

y ke e e eT J K

i i z z z i it ie e e et j k

e ep N de e

β β β β

β β β β

δθ β θ β

θ β θ βα τ θ σ θ

−

−−

+ + + + + + + +

+ + + + + + + += = =

⎛ ⎞⎜ ⎟Θ = −⎜ ⎟+ +⎝ ⎠

∏ ∏ ∏∫y , z w γ

(10)

Therefore, the overall expected log-likelihood is given by:


9

1

1

, 1

1 1 , 11 1

1 1 , 11 1

1 11 1

,1 1

... ...

... ...

ˆlog( ) ( | , )

ˆlog( ) ( | , )

log1 1

i

i t it



n S

z i ii z

n T

z z i ii t

e e e e

e e e e

p z

p

e e

e e

β β β β

β β β β

θ β θ β

θ β θ β

α

τ−

−

−

= =

= ∈ =

+ + + + + + + +

+ + + + + + + +

Θ

⎛ ⎞+ Θ⎜ ⎟⎝ ⎠

⎛ ⎞⎜ ⎟+ −⎜ ⎟+ +⎝ ⎠

∑∑

∑∑ ∑z

y

z yD

( )

( , )

1 1 1

2

1

ˆ ˆ( | , , ) ( | , ),

ˆ ˆlog ( | , ) ( | , , ) ( | , )

ijty kKn J

i i i i i ii j k

n

i it i i i i i ii

p d p

N p d p

δ

θ θ

θ σ θ θ

= ∈ = =

= ∈

⎡ ⎤⎢ ⎥Θ Θ⎢ ⎥⎢ ⎥⎣ ⎦

⎡ ⎤+ Θ Θ⎣ ⎦

∑∑∑ ∏∫

∑∑ ∫

z

z

z y z y

w γ z y z y

D

D

(11)

where ˆ( | , , )i i ip θ Θz y is computed with Bayes Theorem,

ˆ ˆ ˆ ˆ1 1 , 11 1

ˆ ˆ ˆ ˆ1 1 , 11 1

ˆ ˆ11

( , )ˆ ˆ... ...

2ˆ ˆ ˆ ˆ... ...1

ˆ ...

ˆ ˆ( | , )1 1ˆ( | , , )



js jski js

y ke e e eK

i ite e e ek

i i ie e

e e Ne e

pe

β β β β

β β β β

β β

δθ β θ β

θ β θ β

θ β

θ σ

θ

−

−

+ + + + + + + +

+ + + + + + + +=

+ + + +

⎛ ⎞⎜ ⎟−⎜ ⎟+ +⎝ ⎠Θ =

∏ w γ

z yˆ ˆ1 , 1

1

ˆ ˆ ˆ ˆ1 1 , 11 1

( , )ˆ ...

2ˆ ˆ ˆ ˆ... ...1

.

ˆ ˆ( | , )1 1

ijtjs js ki js


y ke eK

i it ie e e ek

e N de e

β β

β β β β

δθ β

θ β θ βθ σ θ

−

−

+ + + +

+ + + + + + + +=

⎛ ⎞⎜ ⎟−⎜ ⎟+ +⎝ ⎠

∏∫ w γ

and so is ˆ( | , )i ip Θz y ,

ˆ ˆ ˆ ˆ( | , , ) ( | , )ˆ( | , ) .ˆ ˆ( | , )i i jsk i i

i ii jsk i

p pp

pβ θ

β θΘ =

y z z α τz y

y

By adding a Lagrangian term into each of the first two terms in (11) to respectively reflect the

constraints 1

1

11

1i

S

zz

α=

=∑ and ,1

1S

ir issτ

=

=∑ , we solve, in the M-step, the maximum likelihood

estimates through the following equations:

1

11

ˆ( | , )n

i ii

s

p z s

Nα =

= Θ=∑ y

, (11a)

, 1

1 2

, 11 2

ˆ( , | , )

ˆ( | , )

n T

i t it iti t

rs n T

i t iti t

p z r z s

p z rτ

−= =

−= =

= = Θ=

= Θ

∑∑

∑∑

y

y (11b)


10

Equation (11a) and (11b) are the updating equations for parameters α and τ . For the

regression parameters in the latent trait model, we do not have a closed form updating equation

for β and γ . Instead, we use Gauss-Hermite quadrature to approximate the integrals in (11) and

an indirect maximization algorithm to update β and γ (section 7.4.3 of Fahrmeir and Tutz) 19.

The E-step, which involves the computation for the joint probabilities in (11), can be

efficiently implemented using graphical model theory 21. The essence of this implementation is a

decomposition of a high-dimensional joint probability distribution into modules of lower

dimensions. Briefly, we create a directed acyclic graph for the associated statistical model. The

directed graph, which consists of nodes (variables) and arcs, is a pictorial representation of the

conditional dependence relations of the statistical model. The directed acyclic graph is then

transformed into a so-called junction tree, which forms the basis of efficient computational

schemes 22.

To construct the directed acyclic graph, and to transform the graph into a junction tree,

we adapted the Bayes Net Toolbox of K. Murphy 23, which can be downloaded from

http://www.cs.ubc.ca/~murphyk/Software/BNT/bnt.html (accessed November 1, 2006). Note

that this publicly available toolbox does not contain a user-friendly graphical interface, nor does

it contain the latent trait regression model. This set of Matlab functions for ordinal data and

latent trait regression, together with a point-and-click graphical user interface, can be

downloaded from our website (provided at production stage).

In fitting HMM models, the number of hidden (latent) states is usually not known a priori

and a model selection step prior to parameter estimation is required. Model selection has been an

important and actively researched area in HMMs 24 and in finite mixture models generally 25. In

this application, we follow an approach suggested by Huang & Bandeen-Roche 26. We employed


11

the Bayesian Information Criterion (BIC) 27 to determine the optimal number of states, by fitting

a range of HMMs to the data without including any covariates.

Stage II

Stage II consists of a global optimization procedure for finding the most probable trajectory for

each individual, functional principal component analysis (FPCA) of the trajectories, and K-mean

clustering for grouping similar trajectories.

Global optimization of trajectory

Here we define a child’s most probable trajectory as the sequence of hidden states that

best explains the child’s responses:

11

*

,...,,max ( ,..., | )

ti iti iz z i iP z z= Θz y

To obtain child i’s most likely trajectory, *iz , a naïve method is to identify the most probable

state at each time point and then link the selected states to form the trajectory. However, this

method may lead to suboptimal solutions. For example, the most likely state at time t does not

transition to the most likely state at time t+1. We applied Viterbi’s algorithm 28 to evaluate the

globally optimal solution. The Viterbi algorithm calculates the most likely state sequence of the

HMM given an observation sequence. The most likely sequences up to each time point are

stored. By tracing back stored probability tables from the last time point, the algorithm can

determine the most probable trajectory up to the present point.

Functional Principal Component Analysis (FPCA) and Trajectory Clustering

Consider a most probable trajectory for each individual in the form * * *1 2i i iTz z z→ → →

produced by the global optimization procedure. Our objective is to sort the collection of


12

individual trajectories * * *1 2{ , 1, , }i i iTz z z i n→ → → = into groups through some form of

clustering algorithm. Because the states are qualitative variables, it is challenging, if not

impossible, to directly use traditional clustering methods such as K-mean, which are designed to

handle quantitative information. To address this problem, we propose to project each individual

trajectory into its respective conditional probability space. In this way, we are able to use

probabilities in defining a distance metric between different states. In other words, when the data

space for clustering is a sequence of conditional probability tables, we can treat the data as

functional data and apply functional data analysis methods.

The clustering of functional data proceeds in two steps: (1) functional principal

component analysis (FPCA), and (2) K-mean clustering. The FPCA 29 aims to identify a

parsimonious set of coefficients that explain a high percentage of the variance among a

potentially large number of trajectories in a high dimensional space. For each child i, given the

person’s most probable trajectory, * * *1 2i i iTz z z→ → → , “filtered” model-based conditional

probabilities of the response vector at each time point t is given by:

* JKitj it itjit

p = {P(y = k |z , ) (y ,k), 1,..., , 1,..., } I , 1, ...,k K j J i nδΘ = = ∈ = , (13)

where I = [0, 1], * denotes the term-by-term product between two vectors, and the hat notation is

used to indicate that the parameter is estimated from the data. Designed for clustering purpose,

the filtered conditional probability vector in (13) implies that for two different responses for the

same item at the same time point, their Euclidean distance will necessarily be the root sum of

squares of the two conditional probabilities for the respective observed response. Compared to

other versions of conditional probabilities (such as without “filtering”), our preliminary


13

experiments showed that the filtered conditional probabilities in (13) led to better separated

clusters.

Formally, the first step of clustering, FPCA, proceeds as follows: Form the trajectory

profile of each individual by concatenating the vectors it

p over time points:

JKTi i1 i2 it = {p , p , ..., p } I , 1, ..., ,p t T∈ =

Next, compose a trajectory matrix using n trajectory profiles as its columns.

JKT n1 2 n

= {p ; p ; ... p } I ×Μ ∈ .

Then, apply a spectral decomposition to the nonnegative definite variance-covariance matrix of

Μ :

TV = Cov (M) = R R ,Λ

where 1 kR = {u , ..., u } , iu is the ith eigenfunction and 1 k= diag( , ... )λ λΛ , λi is the ith

eigenvalue. The ith column vector iv of V can be expressed as a linear combination of the

eigenfunctions:

JKT

i i1

v = u ,ij jj

μ ε=

+∑

Finally, extract the coefficient ikε for principal component k from trajectory i by first subtracting

each column vector iv of V from the mean vector iμ and then projecting the residuals onto the

eigenfunctions,

JKT

T Ti i k j k

1(v - ) u =( u )u .ij ik

jμ ε ε

=

=∑ (14)

An individual’s trajectory profile is characterized by the directions and magnitudes of the

coefficients of the eigenfunctions associated with each principal component.


14

In order to reduce the dimension of the vector that enters into the subsequent clustering

algorithm, we need to determine the number of principal components, m, that will explain a

sufficiently large proportion of the variance-covariance matrix, V. There are two benefits in

using principal components. First, it increases the computational efficiency of the subsequent

clustering algorithm. More importantly, it will significantly reduce the noise level inherent in the

data. The final determination of m was through an examination of a scree-plot of the

eigenvalues. Analogous to a scree-plot in factor analysis, the eigenvalues typically drop off

rather quickly after a few principal components. We use this information to select the appropriate

value for m. Once the value for m is determined, the first m entries of the coefficients for iv in

the eigenspace will enter into the K-mean clustering algorithm. The K-mean algorithm will not

be described here. Readers are referred to Hartigan 30.

An important issue that arises from using K-mean is that the number of desired clusters,

c, must be determined prior to implementing the algorithm. There is no standard statistical

method for choosing c. We apply a metric, the CH index 31, which is recommended by Milligan

and Cooper 32. Using simulated data, Milligan and Cooper reported that the CH index provided

the best performance among 30 indexes. The CH index is defined by the following equation:

( ) /( 1)( )( ) /( )

B c cCH kW c n c

−=

−,

where ( )B c is the between-cluster variance, ( )W c is the within-cluster variance.

Data

Data sources are the 1989 and 1994 waves of the National Longitudinal Survey of Youth

(NLSY79) and the 1990-1994 Children of the NLSY (CoNLSY). The data used in this report


15

was collected in 1990, 1992, and 1994. This analysis focuses on cognitive and social

development in children five to nine years old. In this sample of N=367 (52% female, 29%

African-American and 20% Hispanic), the majority of children were 5 years old in 1990.

Eight measures, selected from several instruments, were compiled together to form the

developmental profiles. Five variables were selected from the Behavior Problem Index (BPI) 33 :

anti-social, anxious, peer conflict, headstrong, and hyperactive behaviors. Three variables that

measure academic acheivement were selected from Peabody Individual Achievement Test

(PIAT) scores: math, oral reading, and reading comprehension. For the purpose of convenient

interpretation for clinicians and social scientists we trichomtomized BPI subscales and PIAT into

three categories (0,1,2) based on nationally normed percentiles with cutpoints at 15% and 85%.

BPI categories were then re-ordered so that movement from 0 to 2 represented increasingly

better behavior and conformed to the direction of the PIAT classification.

Because characteristics of the child’s home may also be associated with child behavior

problems 34, we controlled for the home environment in the latent trait regression using the Home

Observation for Measurement of the Environment (HOME) Inventory 35. Two HOME scores (low,

medium, or high) measuring cognitive stimulation and emotional support were used in the latent

trait regression. Another important potential predictor of behavior problems is the child’s exposure

to alcohol during pregnancy, which has been linked to child behavior problems. A binary indicator

for maternal drinking during pregnancy of once a month or more was created. Together with the

HOME scores, drinking during pregnancy is included as a covariate in latent trait regression

models described below.

Variables used to classify maternal drinking status were only asked in 1989 and 1994 of

individuals who reported drinking within 30 days of the NLSY interview. Heavy episodic drinking


16

(HED) is typically classified as 5 or more drinks on one occasion (Naimi et al., 2003), however

recent evidence suggests that for women, the HED drinking amount should be 4 or more drinks on

a single occasion (Wechsler & Austin, 1998). Unfortunately, the only cut point reported in the

NLSY in both years was 6 or more drinks. Consequently, the HED measure,a binary indicator that

six or more drinks were consumed on at least one day in the month prior to the 1989 NLSY

interview date, likely misclassifies some HED mothers as non-HED.

Results

Insert Figure 1 here

Figure 1 shows the BIC values for models with 2 to 6 hidden states. Based on the BIC

criterion, the 5-state model was selected. Figure 2 displays the profiles of the five states. Each

bar shows the conditional probability distribution of the response categories for a specific item

within each state. As noted above, BPI was rescaled such that higher values for items imply

fewer behavior problems. For PIAT items, higher values indicate better academic achievement.

The prevalence of states 1 through 5 at the first time point are respectively 16, 27, 31, 17, and 10

percent.


To aid interpretation of the five states, we order them approximately from the best to the worst.

Beginning with State 1, we see high probabilities of few behavior problems and high

probabilities of good academic achievement across all items. We denote this state as “no

problems” or NP. Children in State 2 demonstrate behavior problem probabilities that are quite

similar to children in State 1. However, their probabilities of low academic achievement are

higher. We denote this state as “good behavior, poor academics” or GB/PA. State 3 is almost the


17

reverse of GB/PA in that children in this state demonstrate elevated probabilities of behavior

problems, but their academic achievement probabilities are similar to those of NP. We denote

this state as “poor behavior, good academics” or PB/GA. Children whose profiles fit that of State

4 demonstrate high probabilities of behavior problems and academic achievement probabilities

similar to that of GB/PA. We denote this group as “very poor behavior, poor academics”

(VPB/PA). Children whose profiles match those of State 5 show slightly lower behavior problem

probabilities, but a poorer academic achievement than children in state 4 or VPB/PA. We denote

this group as “poor behavior, very poor academics” (PB/VPA). The profiles for VPB/PA and

VPB/PA suggest that children who belong to these two states may be most at risk for future

problems. Table 1 summarizes these interpretations.

Insert Table 1 here

Table 2 reports latent trait regression coefficient estimates, standard errors and p-values.

Only maternal drinking during pregnancy does not show statistical significance at conventional

levels (p = 0.30). The two HOME scores show strong positive impacts on the conditional

response probabilities (p < 0.001), indicating that more supportive home environments are

associated with better behavior and academic performance.

Insert Table 2 here

Table 3 contains the transition probability from one state to another over time. Row

probabilities indicate the probability of transition from any state (row) to another state (column).

The diagonal entries therefore show the probability of remaining in each state over time. As can

be seen, all but one state are comparatively stable (~85%). The probability of staying in GB/PA

is smallest (67%), with a 15% chance of transitioning into PB/GA and an 11% chance of

transitioning into PB/VPA, and only 4% chance of transitioning into the best state, i.e. NP.


18

Figure 3 provides a “picture” to visualize the dynamics occurring between the different states.

Problematic academic performance and behavior are arranged from low to high respectively

along the x- and y- axis. To highlight the most important features of the dynamic analysis, only

paths with a higher-than-five-percent probability are displayed.

Insert Table 3 and Figure 3 here

Using the global optimization routine, we obtained the frequency distributions of the

many trajectories. The trajectory distribution indicates that 5-9 year old children in this sample

tend to stay within the same developmental state over the 4-year period, which is consistent with

the information from the transition probability matrix. Out of a total of 367 trajectories, 268 (73

%) of them showed a stable pattern of not changing to another state at any given time point. For

the remaining trajectories, the most frequent trajectories include 2 3 3→ → (3.5%), and

2 5 5→ → (3%).

The FPCA and K-mean clustering provides summarized information about the patterns of

the trajectories. Figure 4 shows the eigenvalues of the variance-covariance matrix V. From the

scree-plot, it can be seen that using more than 5 principal components leads to very little

marginal gain in total variance explained. We further compared the results using all 24 principal

components and the first five principal components and found that the results for clustering were

not sensitive to the number of components used. However, this conclusion may not generalize to

cases in which the dimension of the problem is much higher.


Figure 5 shows the how the CH index varies with the number of clusters. The plot shows

that the greatest gain in CH-index occurs at 4k = . Based on this result, we identify four distinct

clusters of trajectories.


19


Figure 6 includes two views of three-dimensional plots of the first three functional

coefficients for all subjects. The solid dots represent those subjects with stable trajectories. It can

be seen that the four K-means identified clusters are rather well separated.


Table 4 shows the relative proportions of states belonging to different clusters of the

trajectories. In this table, a state is counted in the numerator of the ratio whenever it appears in a

trajectory within the cluster. By inspecting Table 4 and the actual trajectories within each cluster,

we provide the following interpretation of the four clusters.

Cluster 1. mainly the stable and transitioning trajectories into NP, representing a group of well-

behaved and academically sound children;

Cluster 2. mainly the stable and transitioning trajectories in and out of GB/PA, some PB/VPA

and some PB/GA, representing a group of children mostly with poor school

performance but diverse behaviors.

Cluster 3. mainly the trajectories with good academic performance but poor behavior from

PB/GA.

Cluster 4. mainly the stable trajectories in and out of VPB/PA and PB/VPA, representing a

group of children with both a high level of behavioral problems and weak school

performance;

Insert Table 4 here

Maternal HED

In the sample, there are 90 (25%) children identified as having an HED mother. In this

subsection, we provide several perspectives regarding the association between maternal HED


20

status and a child’s development through separate analyses of latent state and trajectory

distributions by maternal HED status. First, Figure 7 shows the relative proportion of the five

behavior/academic performance states between 1990 and 1994 (three time points) for two groups

of children, one with HED mothers (Figure 6a) and the other without (Figure 6b). The states are

ordered from bottom (State 1) to top (State 5).


By following the probabilities of the least at-risk group, State 1 (NP), we see children of

non-HED mothers show a slightly increasing trend over time to be in this group, whereas

children of HED mothers demonstrate a slightly decreasing trend. In contrast, the proportion of

PB/VPA children is much higher for the group of HED mothers and this proportion also tends to

increase over time. Table 5 shows the result of two-sample z-tests between the proportions of the

first two groups combined (NP and GB/PA), and the worst group (PB/VPA) within two distinct

groups characterized by maternal alcohol status. There is a significantly higher proportion of

children that have no or little problem in group of non HED mothers. On the other hand, for the

PB/VPA state, while the tests between non HED and HED mothers are only marginally

significant, the higher level of PB/VPA children in the HED group is consistent across the three

time points.

Insert Table 5 here

This finding is supported by a closer inspection of the distributions of the trajectories and

the trajectory clusters. Table 6 shows the distribution of trajectories by cluster. Note that,

because the trajectories are individual-specific, the same trajectories of states may have slightly

different conditional probabilities, and therefore be classified into different clusters. Cluster 1,

which is dominated by trajectories 1 1 1→ → (all NP), and those starting from State 2 (GB/PA)


21

or State 3 (PB/GA) but ending up in State 1 (NP), does not show any dependency on the

mother’s HED. The proportion of children with HED mothers is no different from that of the

larger sample, i.e. 25%. Clusters 4 and 2 show slightly higher proportions of HED mothers than

the average, i.e. 27% and 29% respectively, and they both are characterized by poor academic

performance. This is mainly due to the majority trajectories in these two clusters are 2 2 2→ →

(all GB/PA) and 5 5 5→ → (all PB/VPA), which are both characterized by 35% HED mothers.

In contrast, cluster 3, dominated by State 3 (PB/GA), has the smallest percent of HED mothers.

Based on these patterns, maternal HED seems to have a negative association with both child

behavior and academic performance, but the negative association with academic performance is

especially pronounced.

Insert Table 6 here

Discussion

In this paper, we describe the TCA methodology and its usefulness in discerning patterns

of developmental trajectories over time. This extended HMM approach allowed us to obtain

more nuanced insight into multiple child outcomes. The relationship between maternal alcohol

consumption and child behavior problems is consistent with previous research. However, a

slightly more nuanced finding is that children of HED mothers appear to be at greater risk for

both behavior problems and poor academic performance, but the latter seems to be more strongly

linked to maternal HED, either with or without behavior problems. Although a relatively large

proportion of children whose mothers do not have alcohol problems demonstrate elevated

behavior problems scores, we found that their academic achievement (State 3 PB/GA) remains in


22

the average to good range. Children of alcohol mothers were found to be more likely to fall into

States 2 (GB/PA) and 5 (PB/VPA), characterized by poor academic achievement problems.

It is also worth noting that children with poor academic performance, but no behavior

problems (State 2) are more like to transition to states characterized by poor behavior regardless

of maternal drinking status. If this result were to persist in larger, more representative samples,

further investigation of causal links would be warranted. Taken at face value, the finding

suggests that policies that enhance early academic performance may have positive behavioral

effects.

The findings also suggest that children of HED mothers may be at risk for a double

handicap in later life since poor academic performance is likely to impair later educational

attainment and elevated behavior problems have been linked to later drug and alcohol problems.

A finding from the trajectory analysis that may be of interest to developmental psychologists and

other social scientists is that children with more severe behavioral problems tend to remain that

way during this period of their life, regardless of maternal alcohol status. This is consistent with

policies that intervene earlier in a child’s life rather than after the child has started school.

There are several technical limitations to the current form of TCA approach. The first

limitation is the uncertainty ignored in the two-stage estimation procedure. Literature on

comparison between multi-stage and simultaneous estimation (e.g., Scott and Ip) 36 show that the

significance of tests is often over-reported in multi-stage procedure. The extent to which the

clustering of trajectories will be affected by uncertainty in the parameters from the first stage is

not clear and merits further investigation such as using a Bayesian approach.

A second limitation of TCA is the absence of convariates in modeling the transition

probabilities. For example, one might expect that home environment variables affect transition


23

from poorer developmental states to better states. We note that Equation (11b), which estimates

transition probabilities, can be modified in a way that is analogous to Equation (11c) and (12) to

incorporate covariates. However, because the number of transition probabilities quickly increases

with the number of latent states, the solution for modeling transition is likely to be dictated by

sample sizes. We are currently exploring constrained regression-type models to address this

issue.

.


24

Reference List

1. Grant BF. Estimates of US children exposed to alcohol abuse and dependence in the family.

American Journal of Public Health 2000; 90(1):112-115.

2. Snow JA, Miller DJ, Salkever DS. Parental use of alcohol and children's behavioural health:

a household production analysis. Health Econ. 1999; 8(8):661-683.

3. Wall TL, Garcia-Andrade C, Wong V, Lau P, Ehlers CL. Parental history of alcoholism and

problem behaviors in native-American children and adolescents. Alcoholism-Clinical

and Experimental Research 2000; 24(1):30-34.

4. Jones AS. Maternal Alcohol Abuse/Dependence, Children's Behavior Problems, and Home

Environment: Estimates from the National Longitudinal Survey of Youth Using

Propensity Score Matching*. J.Stud.Alcohol Drugs 2007; 68(2):266-275.

5. Schuckit MA, Chiles JA. Family History As A Diagnostic Aid in 2 Samples of Adolescents.

Journal of Nervous and Mental Disease 1978; 166(3):165-176.

6. Aronson M, Kyllerman M, Sabel KG, Sandin B, Olegard R. Children of Alcoholic Mothers -

Developmental, Perceptual and Behavioral-Characteristics As Compared to Matched

Controls. Acta Paediatrica Scandinavica 1985; 74(1):27-35.


25

7. Rabiner LR. A Tutorial on Hidden Markov-Models and Selected Applications in Speech

Recognition. Proceedings of the Ieee 1989; 77(2):257-286.

8. MacDonald IL. Hidden Markov and other models for discrete-valued time series /. London:

Chapman & Hall, 1997.

9. Rijmen, F., Vansteelandt, K., and DeBoeck, P. Latent class modesl for diary method data:

parameter estimation by local computations. Psychometrika . 2006.

Ref Type: In Press

10. Scott SL, James GM, Sugar CA. Hidden Markov models for longitudinal comparisons.

Journal of the American Statistical Association 2005; 100(470):359-369.

11. Bureau A, Shiboski S, Hughes JP. Applications of continuous time hidden Markov models to

the study of misclassified disease outcomes. Stat.Med. 2003; 22(3):441-462.

12. Jackson CH, Sharples LD. Hidden Markov models for the onset and progression of

bronchiolitis obliterans syndrome in lung transplant recipients. Stat.Med. 2002;

21(1):113-128.

13. LeStrat Y., Carrat F. Monitoring epidemiologic surveillance data using hidden Markov

models. Stat.Med. 1999; 18(24):3463-3478.


26

14. Smith T, Vounatsou P. Estimation of infection and recovery rates for highly polymorphic

parasites when detectability is imperfect, using hidden Markov models. Stat.Med.

2003; 22(10):1709-1724.

15. Dunson DB, Herring AH. Bayesian latent variable models for mixed discrete outcomes.

Biostatistics. 2005; 6(1):11-25.

16. Altman RM, Petkau AJ. Application of hidden Markov models to multiple sclerosis lesion

count data. Stat.Med. 2005; 24(15):2335-2344.

17. Lord FM, joint a. Statistical theories of mental test scores. Reading, Mass., Addison-Wesley

Pub. Co.: 1968.

18. Fischer GH. Linear Logistic Test Model As An Instrument in Educational Research. Acta

Psychologica 1973; 37(6):359-374.

19. Fahrmeir L, Tutz G. Multivariate statistical modelling based on generalized linear models.

Springer-Verlag, New York, 1994.

20. Dempster AP, Laird NM, Rubin DB. Maximum Likelihood from Incomplete Data Via Em

Algorithm. Journal of the Royal Statistical Society Series B-Methodological 1977;

39(1):1-38.


27

21. Lauritzen SL. The Em Algorithm for Graphical Association Models with Missing Data.

Computational Statistics & Data Analysis 1995; 19(2):191-201.

22. Jensen FV, Lauritzen SL, Olesen KG. Bayesian updating in causal probabilistic networks by

local computation. Computational Statistics Quarterly 1990; 4:269-282.

23. Murphy, K. The Bayesian net toolbox for Matlab. Computing sciences and statistics:

proceedings of the 33rd symposium on the interface . 2001.

Ref Type: Journal (Full)

24. Scott SL. Bayesian methods for hidden Markov models: Recursive computing in the 21st

century. Journal of the American Statistical Association 2002; 97(457):337-351.

25. McLachlan G, Peel D. Finite mixture models. Wiley: 2000.

26. Huang GH, Bandeen-Roche K. Building an identifiable latent class model with covariate

effects on underlying and measured variables. Psychometrika 2004; 69(1):5-32.

27. Schwarz G. Estimating Dimension of A Model. Annals of Statistics 1978; 6(2):461-464.

28. Viterbi AJ. Error bounds for convolutional codes and an asymptotically optimal decoding

algorithm. IEEE Transaction, Information Theory 1967; 13:260-269.

29. Ramsey JO, Silverman BW. Functional Data Anslysis. Springer: New York, 1997.


28

30. Hartigan JA. Clustering algorithms. New York, Wiley: 1975.

31. Calinski T, Harabasz J. A dendrite method for cluster analysis. Communications in statistics

1974; 3:1-27.

32. Milligan GW, Cooper MC. An Examination of Procedures for Determining the Number of

Clusters in A Data Set. Psychometrika 1985; 50(2):159-179.

33. Achenbach TM. Manual for the child behavior checklist and revised child behavior profile /.

Burlington, VT : T.M. Achenbach, T.M. Achenbach, 1983.

34. McCarty CA, Zimmerman FJ, Digiuseppe DL, Christakis DA. Parental emotional support

and subsequent internalizing and externalizing problems among children. Journal of

Developmental and Behavioral Pediatrics 2005; 26(4):267-275.

35. Caldwell BM, University of Arkansas at Little Rock., Center for Child Development and

Education. HOME observation for measurement of the environment /. Little Rock,

Ark. : University of Arkansas at Little Rock, University of Arkansas at Little Rock,

1984.

36. Scott SL, Ip EH. Empirical Bayes and item-clustering effects in a latent variable hierarchical

model: A case study from the national assessment of educational progress. Journal of

the American Statistical Association 2002; 97(458):409-419.


29

Table 1.

Interpretation of states

State Interpretation

State 1 No problems (NP)

State 2 Good behavior, poor academics (GB/PA)

State 3 Bad behavior, good academics (PB/GA)

State 4 Very bad behavior, poor academics (VPB/PA)

State 5 Bad behavior, very poor academics (PB/VPA)


30

Table 2

Coefficients for latent trait regression covariates and their significances

Covariates Coefficient SE t value Pr > |t|

HOME cognitive stimulation 0.30 0.06 5.12 <.001

HOME emotional support 0.30 0.06 4.71 <.002

Maternal pregnancy drinking -0.05 0.29 -0.18 0.30


31

Table 3

Transition probability table between latent states

State 1

NP

State 2

GB/PA

State 3

PB/GA

State 4

VPB/PA

State 5

PB/VPA

State 1 0.87 0.05 0.06 0.02 0.00

State 2 0.04 0.67 0.15 0.03 0.11

State 3 0.06 0.05 0.83 0.05 0.02

State 4 0.00 0.07 0.11 0.83 0.00

State 5 0.00 0.12 0.00 0.00 0.88


32

Table 4

Proportions of latent states within each cluster of trajectories

State 1

(NP)

State 2

(GB/PA)

State 3

(PB/GA)

State 4

(VPB/PA)

State 5

(PB/VPA)

HED(%) Non-

HED(%) Total

Cluster 1 0.90 0.05 0.05 0.00 0.00 15 (25) 45 (75) 60

Cluster 2 0.01 0.72 0.11 0.04 0.12 27 (28) 71 (72) 98

Cluster 3 0.04 0.04 0.88 0.03 0.01 22 (18) 98 (82) 120

Cluster 4 0.00 0.06 0.01 0.58 0.34 26 (29) 63 (71) 89


33

Table 5

Two sample z-test results by maternal alcoholism status for states 1 (NP) and 5 (PB/VPA)

Time

Proportion of

specific state non-HED mother HED mother z-value P>|z|

1 State 1 & 2 (NP+GB/PA)

0.49 0.36 -2.24 0.01

State 5 (PB/VPA) 0.08 0.14 1.51 0.07


0.53 0.42 -1.74 0.04

State 5 (PB/VPA) 0.11 0.18 1.56 0.06


0.54 0.40 -2.37 0.009

State 5 (PB/VPA) 0.12 0.19 1.23 0.11


34

Table 6

Distribution of trajectories within different clusters by maternal alcoholism status

Cluster 1 Cluster 2 Cluster 3 Cluster 4 C* Trajectory T** A % C Trajectory T A % C Trajectory T A % C Trajectory T A % o 1 1 1 47 12 26 o 2 2 2 57 20 35 o 3 3 3 87 14 16 o 5 5 5 25 9 36+ 2 1 1 4 1 25 o 5 5 5 6 2 33 - 2 3 3 10 5 50 o 4 4 4 45 10 22+ 3 1 1 2 1 50 - 2 2 3 6 0 0 + 4 3 3 5 1 20 - 4 2 5 3 1 33+ 2 3 1 1 0 0 - 2 5 5 4 1 25 + 3 3 1 3 0 0 - 2 5 5 7 2 29+ 2 2 1 1 0 0 o 3 3 3 3 0 0 - 3 3 4 3 1 33 - 2 4 4 3 1 33- 1 2 3 1 0 0 + 3 2 2 3 0 0 + 3 2 2 2 0 0 + 4 4 3 2 0 0 + 3 3 1 2 0 0 + 5 2 2 3 1 33 - 1 3 3 2 0 0 - 3 4 4 1 0 0 - 1 1 2 1 1 100 - 2 3 3 3 1 33 - 3 3 5 2 0 0 o 4 2 4 1 1 100- 1 1 3 1 0 0 0 4 4 4 2 0 0 - 3 5 5 1 1 100 + 4 4 2 1 1 100 + 5 5 2 2 1 50 - 1 1 4 1 0 0 - 1 4 4 1 1 100 - 1 3 3 1 1 100 + 3 1 1 1 0 0 + 3 3 2 1 0 0 + 4 3 1 1 0 0 - 3 5 5 1 0 0 - 1 1 3 1 0 0 - 2 2 4 1 0 0 - 1 2 4 1 0 0 - 2 2 5 1 0 0 + 2 3 1 1 0 0 + 4 4 2 1 0 0 + 4 3 3 1 0 0 + 4 2 2 1 0 0 - 3 - 17 - 21 - 15 + 10 + 13 + 12 + 3

Total 60 15 25 98 17 17 120 22 18 89 26 29

*C indicates how a trajectory change : + improving, - deteriorating, o no change

** T= total count A= number of HED mother, %= percent HED mother


35

Figure 1. BIC values for models with 2 to 6 latent states


36

Figure 2. Conditional probability profile for the 5-State model. Within each bar, dark color

represents the lowest category of outcome.


37

Figure 3. Graphical representation of the transition matrix. The x- and y-axis include respectively

problems in academic achievement and behavior. Arrows are only shown for probabilities higher

than 8%.

Good Behavior

Good Academic

State 1

10%

Poor Behavior

Good Academic

State 3

Very PoorBehavior

Poor Academic

State 4

GoodBehavior

PoorAcademic

State 2

Poor Behavior

Very poor Academic

State 5

15%6% 11% 12% 6%

11%

7%

5%


38

Figure 4. Plot of eigenvalues versus number of principal components.


39

Figure 5. Plot of CH index versus the number of clusters used in K-mean algorithm.


40

Figure 6 Plots of two views for the first three functional coefficients ( 1 2,i iε ε and 3iε in Equation (14)). Clusters 1-4 are respectively represented as circle, square, diamond, and triangle.


41

(a) (b)

Figure 7. Prevalence of states across three time points (left to right) for two groups (a) non-HED

mother, and (b) HED mother. The states are ordered from the bottom (state 1) to the top (state 5).

temporal configuration analysis 1 · order markov chain for the latent variable assumes that the...

Documents