amodel for fetal growth and diagnosis of intrauterine growth...
TRANSCRIPT
A Model for Fetal Growth
and Diagnosis of Intrauterine Growth Restriction
Peter M. Hooper1
Damon C. Mayes2
Nestor N. Demianczuk3
1Department of Mathematical Sciences, 2Perinatal Research Centre, and 3Department of
Obstetrics and Gynecology, University of Alberta.
SummaryA model for fetal growth is developed and used to construct tools for diagnosis of intrauterine
growth restriction. Fetal weight estimates are first transformed to normally distributed
z-scores. The covariance structure over gestational ages is then estimated using a novel
regression model. The diagnostic tools include individual growth curves with error bounds,
probabilities to assess whether a fetus is small for its gestational age, and residual scores
to determine whether current growth rates are unusual. The methods were developed using
data from 13593 ultrasound examinations involving 7888 fetal subjects. The model shows
that median fetal growth velocity increases up to a gestational age of 35 weeks and then
decreases during the final weeks of pregnancy. When growth is expressed as change in log
weight, or equivalently as change proportional to current weight, the model reveals a constant
deceleration as gestational age increases from 14 to 42 weeks.
Keywords: Estimated Fetal Weight, Flexible Regression, Gestational Age, Intrauterine
Growth Restriction, Kriging, Transformation, Variogram, z-score.
No. of Figures: 8.
No. of Tables: 4.
No. of References: 23.
1Correspondence to: Peter M. Hooper, Department of Mathematical Sciences, University of
Alberta, Edmonton, Alberta, T6G 2G1, Canada (email: [email protected]).
1 Introduction
Improved models for fetal growth are needed for more accurate assessment of intrauter-
ine growth restriction. A growth restricted newborn is defined as an infant that has not
achieved its genetic growth potential in utero.1 This definition is not immediately appli-
cable as a diagnostic tool because the determination of growth potential is at present not
feasible. Consequently, many studies of fetal growth have employed a related definition of
the small-for-gestational-age (SGA) fetus; i.e., a fetus that has failed to achieve a specific
anthropometric or weight threshold by a specific gestational age. The SGA threshold is
often set at the 10th percentile of a weight distribution, conditioned on gestational age, for
some reference population.2 In some studies the weight distribution is further conditioned
on additional characteristics, such as maternal height and weight.3
Intrauterine growth restriction is associated with a variety of health risks, including
antepartum stillbirth4, and cold stress and hypoglycemia among newborns.1 Furthermore,
recent research suggests that several of the major diseases of later life, including coronary
heart disease, hypertension, type 2 diabetes, and schizophrenia, originate in impaired in-
trauterine growth and development.5−8 Evidence of growth restriction is assessed at birth,
through direct measurement of weight and other criteria, and before birth through ultrasound
measurements. In the conclusion of their review article on growth restriction, Goldenberg
and Cliver1 emphasized the potential importance of antenatal assessment and treatment:
In summary, determining whether fetuses in utero are growth restricted has be-
come one of the major tasks involved in prenatal care. Because decisions related
to surveillance, the use of invasive monitoring procedures, and ultimately delivery
of the infant itself are based on these definitions, the impact of using particular
standards for the diagnosis of growth restriction in utero potentially has a far
greater impact than using a particular set of standards at birth. The fact that
a relatively small number of studies have defined important outcomes related to
various in utero ultrasound measurements and that there is relatively little evi-
dence that making the diagnosis of growth restriction in utero changes pregnancy
outcome suggests that a substantial amount of research is necessary to better de-
fine not only the standards by which SGA is diagnosed in utero, but also the
management of those fetuses for whom a diagnosis of SGA has been made.
1
SGA is a dichotomous categorization of growth and is thus of limited use in quantifying
risk. Continuous diagnostic measures, such as z-scores and probabilities, may be more useful
for clinical assessment. In this article we develop tools for diagnosing growth restriction based
on a new model for the conditional distribution of estimated fetal weight given estimated
gestational age. We proceed in three steps. In Section 2 we introduce a transformation
from fetal weights to z-scores, defined so that the conditional distribution given estimated
gestational age is standard normal. The transformation uses a quadratic fit of log weight
against gestational age, plus a nonlinear transformation of the residuals to induce normal-
ity. The log quadratic fit shows that fetal growth velocity peaks at 35 weeks, consistent
with findings of Bertino et al.9 In Section 3 we model the covariance of the z-scores across
gestational ages using a novel nonlinear regression technique10. Our approach is similar to
a variogram or kriging analyis11, common in geostatistics, but it allows nonstationarity in
the covariance function. The model provides a breakdown of the variance into components,
attributable to measurement error and other factors, that vary with gestational age. In Sec-
tion 4 we introduce several growth diagnostics. Smoothed z-score and fetal weight curves
permit prediction of future values, with error bounds. SGA probabilities estimate the prob-
ability that fetal weight will fall below the 10th percentile at future points in time. Residual
z-scores allow one to assess whether current ultrasound measurements are abnormal, given
previous measurements. The diagnostics can be calculated using measurements from one or
several ultrasound examinations. The examinations can be scheduled at regular or irregular
intervals.
Our results are based on data from 13593 ultrasound examinations at the Royal Alexandra
Hospital in Edmonton, Canada. The examinations were carried out over the period 1991–
1998 and involve 7888 singleton pregnancies. For all subjects, gestational age was based on
the start of the last menstrual period, assumed to be two weeks before conception. Fetal
weight estimates were calculated from ultrasound measurements of fetal head circumference,
abdominal circumference, and femur length, using standard formulae.12,13 Table 1 shows the
distribution of the number of examinations per fetal subject, as well as summary statistics
for gestational ages and z-scores. In Section 5 we comment on these statistics and other
matters pertaining to our reference population. We also describe exclusion criteria used in
defining our data set.
Pan and Goldstein14 developed diagnostics for postnatal growth based on a covariance
model for z-scores. Their z-score transformation and covariance model are entirely different
2
from ours. In Section 6 we briefly describe their methods and explain why we adopted a
different approach when modeling fetal growth. In particular, we comment on a problem
inherent in applying hierarchical linear models to z-scores. In Section 7 we briefly indicate
two extensions of our methods.
2 Transformation to z-scores
Figure 1 presents a histogram of the 13593 gestational age estimates in our study and a
scatter plot of estimated fetal weight versus gestational age. A random sample of 3000
points was used for the scatter plots in Figures 1 and 2 because the pattern of variation
becomes obscured when too many points are plotted. The scatter plot of weight versus age
shows that the mean weight is a highly nonlinear function of age, and the variance of weight
increases with age. A transformation permits a tractable characterization of the conditional
weight distribution.
Let wij be the estimated fetal weight (in grams) obtained from the jth ultrasound ex-
amination of the ith subject, and let aij be the corresponding estimated gestational age (in
weeks). We employ a transformation from weights to z-scores zij = f(wij, aij) defined so
that the conditional z-score distribution, given gestational age, is well approximated by the
standard normal distribution, for ages between 14 and 42 weeks. Our transformation
z = f(w, a) = g(log(w) − q(a))/h(a) (1)
is constructed in four steps.
First, transform to the natural logarithm of weight. A plot of log weight versus gestational
age is displayed in Figure 2(a). A logarithmic transformation simplifies the analysis by
reducing curvature in the growth curve and by stabilizing the variance about the mean. The
variance of log weight remains nearly constant as a function of age.
Second, subtract an estimated quadratic curve from log weight. The function
q(a) = 0.718 + 0.322a − 0.00339a2. (2)
was obtained by a weighted least squares regression of log(wij) against aij, with weights
inversely proportional to the square root of the number of examinations per subject. This
weighted fit provides a compromise between the fit obtained by weighting each subject
3
equally (0.726 + 0.321a − 0.00338a2) and the fit obtained by weighting each examination
equally (0.709 + 0.322a − 0.00340a2). Differences among the fitted values are negligible,
varying from those in (2) by at most 0.003. Figure 2(b) shows that the quadratic curve
provides a highly accurate representation of growth on the log scale.
Third, transform the residuals r = log(w) − q(a) to approximate normal scores g(r).
This transformation was motivated by the observation that the distribution of the residuals
is approximately the same for all gestational ages. The residual distribution has heavy tails,
compared with the normal distribution, and is slightly skewed to the left. Normal scores are
approximated by fitting a linear spline with six knots to a normal probability plot of the
residuals:
g(r) = −1.787 + 1.435r + 0.883(r + .6)+ + 1.598(r + .4)+ + 2.951(r + .2)+ (3)
− 0.884(r − .2)+ − 1.810(r − .4)+ − 2.903(r − .6)+
Here x+ = max{0, x}. The plot of g in Figure 3(a) closely approximates the normal prob-
ability plot of the residuals {rij}. A linear spline approximation was selected because its
inverse is easily calculated; i.e.,
g−1(x) = 1.245 + 0.697x − 0.265(x + 2.648)+ − 0.176(x + 2.184)+ (4)
− 0.110(x + 1.401)+ + 0.0215(x − 1.346)+
+ 0.0725(x − 2.543)+ + 0.547(x − 3.378)+.
Fourth, divide the approximate normal score g(r) by an estimate of its standard deviation
h(a). An examination of the values g(rij) shows that the mean remains close to zero as
gestational age aij varies, but the standard deviation varies more substantially from one. A
quadratic spline with one knot provides a good fit when regressing g(rij)2 against aij. As
in the regression of log fetal weight against gestational age at (2), different weighted least
squares fits yield similar results. The standard deviation is estimated using the square root
of the fitted spline:
h(a) =[0.342 + 0.0285a − 0.000166a2 − 0.00495{(a − 32)+}2
]1/2. (5)
Usually it would be preferable to standardize the residuals using a robust estimate of scale
before transforming to normality. This preliminary rescaling is not required here because
4
the scale of the residuals varies only slightly. Figure 3(b) shows that h(a) varies from 0.85
to 1.04. Further data analysis (not shown) indicates that the standard normal distribution
provides a good approximation to the empirical distribution of the zij when conditioning on
gestational age groups of one or more weeks.
For fixed gestational age a, the transformation (1) from w to z is invertible; i.e.,
w = f−1(z, a) = exp{g−1(h(a)z) + q(a)}. (6)
Percentiles and predictions can thus be calculated in terms of the z-scores, then re-expressed
in terms of the original weights. In particular, the median of the normal distribution is
0, and g−1(0) ≈ 0.005, so the conditional median weight can be estimated as exp{0.005 +
q(a)}. Plots of this median growth curve and its derivative are given in Figure 4. We
observe that the weight velocity increases up to 35 weeks and decreases thereafter. This
phenomenon of velocity peaking a few weeks before birth has a simpler expression in terms
of proportional growth. If growth is expressed as change in log weight, or equivalently as
change proportional to current weight, then the phenomenon can be viewed as a consequence
of constant deceleration; i.e., the log median weight velocity dq/da = 0.322 − 0.00678a
decreases linearly with gestational age between 14 and 42 weeks.
Bertino et al.9 studied growth patterns for several one-dimensional morphometric traits
and found that growth velocities peak earlier: at 18 weeks for head measures, such as
biparietal diameter and head circumference, 20 weeks for femur length, and 22 weeks for
abdomen circumference. Velocities for three-dimensional traits peak later. They found that
the velocity of head volume (assumed proportional to cubic biparietal diameter) peaks at 31
weeks. Estimated fetal weight is a three-dimensional trait related to both head and body
measurements, so a peak velocity at 35 weeks is consistent with these earlier findings.
3 Covariance function for z-scores
Variation in the z-scores can be attributed to several factors: (i) between-subject variation
due to differences in growth rates averaged over time, (ii) error in the estimated date of
conception used to determine gestational age, (iii) within-subject variation due to changes
in fetal growth rates over time, and (iv) error in the measurements used to calculate the
estimated fetal weight. The first two factors contribute to long-term dependence among
5
z-scores for a randomly sampled fetus. Subjects with higher over-all growth rates tend to be
larger and so have positive z-scores. A late estimate for date of conception yields estimated
gestational ages that are less than the true ages, and so increases a subject’s z-scores. The
third factor results in a decrease in the correlation between a subject’s z-scores as the interval
between examinations increases. The effects of the fourth factor can be usefully modeled
as random errors distributed independently in successive examinations. This independence
assumption appears reasonable given sufficient time between examinations. In our data set,
at least four days elapsed between examinations.
Let Wj denote the weight obtained at gestational age aj for a subject randomly sampled
from our population. Let Zj = f(Wj, aj) be the corresponding z-score. We model the
joint distribution of (Z1, . . . , Zn), conditioned on (a1, . . . , an), as multivariate normal with
E(Zj) = 0 and Var(Zj) = 1. We further model Zj as a sum of two independent, normally
distributed random variables:
Zj = Tj + Uj. (7)
The Uj are interpreted as measurement errors and the Tj as latent scores; i.e., the observed
z-scores with the unobserved errors removed. The Uj are assumed to be independent random
variables with E(Uj) = 0. The Tj are modeled as correlated random variables with E(Tj) =
0. We have
Var(Tj) + Var(Uj) = 1, (8)
but each variance component is a function of aj. The variables Tj and Uj, while not directly
observable, play an important role in interpretation and prediction.
Our growth diagnostics are based on a model for the covariance of the latent scores. The
covariance is related to the variogram; i.e., the variance of the z-score differences:
Var(Z1 − Z2) = E{(Z1 − Z2)2} = 2 − 2Cov(T1, T2) for a1 �= a2. (9)
It is advantageous to work with differences because this removes inter-subject variability.
We initially investigated a variogram model, similar to that of Donnelly et al.11, where
Var(Z1 − Z2) is assumed to be a function of |a1 − a2|. This approach yields a poor fit to
empirical covariance estimates because of nonstationarity. As we shall see below, the latent
score variance increases substantially with gestational age. Diggle and Verbyla15 modeled
nonstationary covariance functions using kernel weighted local linear regression modeling
6
of the sample variogram. We also adopt a nonstationary covariance model, but employ an
alternative regression technique.
We model the covariance Cov(T1, T2) as a function of the ages a1 ≤ a2 using a linear
combination of logistic basis functions:
cT (a1, a2) =
∑Kk=1 δk exp(β0k + β1ka1 + β2ka2)∑K
k=1 exp(β0k + β1ka1 + β2ka2). (10)
This model, referred to as an adaptive logistic basis (ALB) model, can be viewed as a
neural network.10 The potential complexity of the fit is controlled by K, the number of
basis functions. As K increases, cT (a1, a2) can approximate arbitrary continuous functions.
Choosing K is analogous to choosing the bandwidth in kernel smoothing15. Our parameter
estimates are based on identity (9). For fixed K, the parameters in (10) are obtained by
minimizing∑
dijk
{1 − (1/2)(zij − zik)
2 − cT (aij, aik)}2
, (11)
where the summation is over all subjects i with at least two ultrasound examinations, and
all pairs of examinations (j, k) with aij < aik. We employ equal weights dijk = 1 for reasons
discussed below. The complexity parameter K is selected by cross-validation. We find that
K = 3 provides a good fit. Parameter estimates are listed in Table 2. The parameters are
not interpretable because model (10) is over-parameterized; i.e., without loss of generality,
one vector (βk0, βk1, βk2) can be set to zero.10
Our ALB covariance model is recommended on an empirical basis. A comparison of
values from (10) with averages of 1 − (1/2)(zij − zik)2 over subsets of the data in (aij, aik)
neighborhoods indicates a good, parsimonious fit. We used an ALB model primarily because
of our familiarity with this method. We expect that other flexible regression methods, such as
multivariate adaptive regression splines16 and triogram models17, would give roughly similar
results.
A potential problem in using a regression model to estimate the covariance function is
that the function can fail to be nonnegative definite; e.g., the estimated covariance matrix
for (T1, . . . , Tn) may have a negative eigenvalue. Diggle and Verbyla15 described an example
where this occurs. We investigated nonnegative definiteness for the ALB model using the
parameter estimates in Table 2 and ages 14.0, 14.1, . . . , 42.0. All eigenvalues for the 281×281
matrix are positive, so our estimated covariance function appears to be nonnegative definite.
7
One may consider alternative criteria for fitting the covariance model. An approach based
on the joint likelihood of the z-scores is appealing from a theoretical perspective but does
not appear to be tractable. A weighted least squares criterion (11) is easily implemented.
We have followed Diggle and Verbyla15 in assigning equal weights dijk. The contribution of
subject i to the fitted model is thus proportional to ni(ni − 1)/2, where ni is the number
of ultrasound examinations. Equal weights dijk can be motivated as follows. The ALB
model is flexible enough, allowing larger K, to estimate cT (a1, a2) locally within the region
14 ≤ a1 < a2 ≤ 42. Given a small neighbourhood in this region, each subject i has at most
one pair of ages (aij, aik) in the neighbourhood. Subjects are weighted equally at this local
level, for all subjects contributing information for the neighbourhood. Subjects with more
examinations contribute information about cT (a1, a2) for more local neighbourhoods, and so
should be assigned greater overall weight.
The preceding argument is not conclusive. The effect of dependencies among the z-score
differences has been ignored, and a justification based on local fitting is less compelling when
a simple ALB model with K = 3 is selected. One might prefer to define dijk as a decreasing
function of ni, so that subjects contribute more equally to the fitted model. The choice of
weights is likely not of great importance in this application, given the large sample size, but
the issue warrants further study.
The estimated covariance function is plotted in Figure 5(a). The upper boundary of
the plot determines the variance estimate s2T (a) = cT (a, a). The error variance estimate is
s2U(a) = 1 − s2
T (a). We observe from Figure 5(a) or expression (10) that s2T (a) increases
from 0.64 at a = 14 weeks to 0.92 at a = 42 weeks, so s2U(a) decreases from 0.36 to 0.08.
This decrease in the error variance appears reasonable. We would expect that morphometric
traits are measured with greater proportional accuracy as the fetus develops.
The effect of measurement error on estimated fetal weight can be summarized in terms
of average proportional error, defined as follows. Let T and U be independent zero mean
normal random variables with respective variances s2T (a) and s2
U(a), put Z = T + U , and
define
average proportional error = E
{|f−1(Z, a) − f−1(T, a)|
f−1(T, a)
}. (12)
Evaluation of (12) via simulation shows that average proportional error is about 6% at 14
weeks, 5% at 29 weeks, 4% at 36 weeks, and 3% at 42 weeks. These values are substantially
smaller than the lowest error rates of about 8% reported in the literature.18 This discrepancy
8
might be explained as follows. The 8% error rate is based on comparisons of estimated fetal
weight with birth weight, and so accounts for variation from several sources: error in measur-
ing ultrasound parameters, error in estimating gestational age, and error in predicting weight
from ultrasound measurements due to limitations of the prediction formula. The average
proportional error (12) reflects measurement error only. It would appear that measurement
error accounts for roughly half of the 8% error reported in earlier studies.
Since z-scores have variance one, the covariances in Figure 5(a) can be interpretated as
correlations between z-scores at different gestational ages. The correlations quantify the
limitations in our ability to predict future z-scores, and hence to predict future weights. For
example, our error in predicting Z2 at age a2 given Z1 at age a1 has variance 1 − c2T (a1, a2).
These limitations are due in part to measurement error, but are also the result of variation
inherent in the latent scores; i.e., the apparently random variation over time in actual growth
rates. This important source of variation can be quantified through the estimated latent score
correlation function
rT (a1, a2) =cT (a1, a2)
sT (a1)sT (a2), (13)
plotted in Figure 5(b). If the latent score T1 at age a1 were observable, then our error in
predicting T2 at age a2 would have variance s2T (a2){1 − r2
T (a1, a2)}.
4 Growth diagnostics
Our diagnostic measures are based on a kriging method similar to that of Donnelly et
al.11. The method employs the following well-known property of the multivariate normal
distribution.19 Suppose Y is a multivariate normal random vector with mean vector µ and
covariance matrix Σ. Let Y , µ, and Σ be partitioned,
Y =
Y1
Y2
, µ =
µ1
µ2
, Σ =
Σ11 σ12
σT12 σ22
, (14)
where Y2, µ2, and σ22 are scalars. If Σ11 is nonsingular, then the conditional distribution of Y2
given Y1 = y1 is normal with mean µ2+σT12Σ
−111 (y1−µ1) and variance σ22.1 = σ22−σT
12Σ−111 σ12.
We apply this result to predict latent scores and corresponding weights at various gestational
ages, to estimate the probability that a fetus is small for its gestational age (SGA), and to
detect rapid changes in growth through residual scores.
9
Prediction of latent scores and weights
To illustrate the construction of prediction intervals, suppose we want to predict the latent
score of a fetus at 40 weeks. If we have no information about the fetus then we predict a latent
score of T = 0. An unconditional 80% prediction interval can be defined as ±z.10sT (40) =
±1.28√
0.908 or −1.22 ≤ T ≤ 1.22. We have set the coverage probability to 80%, rather than
95%, because the 10th percentile is often used as the SGA threshold. Taking measurement
error into account, it seems reasonable to regard a fetus as SGA if its latent score falls
below the 10th percentile; i.e., below the unconditional 80% prediction interval. Applying
transformation (6), we predict a weight of W = f−1(0, 40) = 3472 grams at 40 weeks, with
80% bounds 2934 ≤ W ≤ 4109.
Now suppose we have estimated fetal weights from examinations at 28, 30, and 32 weeks.
Let Y1 be the corresponding vector of z-scores and let Y2 be the latent score T at a = 40
weeks. We then have µ = (0, 0, 0, 0)T ,
Σ =
1.000 0.815 0.794 0.609
0.815 1.000 0.836 0.672
0.794 0.836 1.000 0.734
0.609 0.672 0.734 0.908
, (15)
and σ22.1 = 0.358. If the estimated fetal weights at 28, 30, and 32 weeks are 1100, 1200, and
1300 grams, respectively, then the corresponding z-scores are −0.413,−1.441, and −1.952,
and the predicted (conditional mean) latent score at 40 weeks is −1.407. The conditional 80%
prediction bounds are −1.407 ± 1.28√
0.358 or −2.173 ≤ T ≤ −0.641. We predict a weight
of W = f−1(−1.407, 40) = 2859 grams at 40 weeks, with 80% bounds 2393 ≤ W ≤ 3178.
The estimated covariance function can be used to predict latent scores at all gestational
ages between 14 and 42 weeks. The term “prediction” as used here includes prediction of
unobserved random variables realized in the past, present, or future. The predicted scores
provide an estimate (with error bounds) for the individual fetal growth curve. Such plots
are easily automated and updated to reflect new information. Figure 6 shows plots for
the hypothetical example described above. The plot suggests normal growth prior to 28
weeks and abnormally slow growth between 28 and 32 weeks. The plot also demonstrates a
“regression to the mean” effect; i.e., the predicted growth curves bend toward the population
mean as one extrapolates beyond the available ultrasound data. This behavior reflects a real
10
phenomenon modeled in the covariance function; e.g., periods of relatively slow growth are
often followed by periods of relatively fast growth.
SGA probabilities
A fetus is typically defined to be SGA if its weight falls below the 10th percentile for its
population. We suggest that a fetus be viewed as SGA if its latent score falls below the 10th
percentile; i.e., below the lower dashed line in Figure 6. This alternative definition takes
into account the effects of measurement error on estimated fetal weight and allows one to
estimate the probability that a fetus is SGA at various gestational ages, conditioning on
available ultrasound data. In the example considered above, the probability that the fetus
is SGA at 40 weeks, given the three ultrasound observations, is estimated as
P{Y1 ≤ −1.22 |Y2} = Φ({−1.22 − (−1.407)}/√
0.358) = 0.62, (16)
where Φ is the standard normal cumulative distribution function. Figure 7 shows a plot
of the SGA probabilities. The SGA plot re-expresses information in the z-score plot on
a probability scale that clinicians may find easier to interpret. The rapid increase in the
SGA probability between 28 and 32 weeks again indicates unusually slow growth during this
period.
Residual scores
The preceding diagnostic measures involve predictions conditioned on all available z-scores.
It is also of interest to check whether a new z-score is unusual, given previous observations.
A simple approach would be to compare the apparent growth rate since the last ultrasound
with the typical rate indicated in the velocity curve of Figure 4(b). In our example, the
apparent growth rate between 28 and 32 weeks is 50 grams per week, very low compared
with the typical rate of about 175 grams per week. This comparison leaves open the question
whether differences of this magnitude are truly abnormal compared with typical fluctuations
in growth rates.
The degree of abnormality can be assessed by calculating residual scores. These are
essentially the same as the “conditional norms” suggested by Pan and Goldstein14. In ex-
pression (14), let Y2 be the new z-score and let Y1 be the vector of previous z-scores. The
11
residual score is obtained by standardizing Y2; i.e., subtract its conditional mean and then
divide by its conditional standard deviation. In our example, the residual score at 28 weeks
is −0.413, the same as the z-score since there are no earlier observations. The residual
score at 30 weeks is {−1.441− (−0.336)}/√
0.188 = −2.55. The residual score at 32 weeks is
{−1.952−(−0.947)}/√
0.134 = −2.75. In theory, the residual scores for a randomly sampled
fetus are distributed independently with a standard normal distribution. The validity of this
result was examined by cross-validation, as described below. Scores as low as −2.55 or −2.75
are highly unlikely, especially when occuring in succession, and provide strong evidence of
growth restriction between 28 and 32 weeks.
Cross-validation
We investigated the validity of our model and diagnostics by randomly dividing the fetal
subjects into a training group and a test group of roughly equal size, fitting the model using
the training group, and applying the diagnostics to both groups. There were 3982 subjects
with 6868 examinations in the training group and 3906 subjects with 6725 examinations in
the test group. We found essentially no differences when comparing diagnostics for training
and test groups. This result was expected given the simplicity of the model (only 24 param-
eters estimated for the z-score transformation and covariance function) relative to the size of
the training set. The empirical distribution of the z-scores, conditioned on gestational age, is
close to standard normal for both groups. We were unable to examine coverage of prediction
intervals for latent scores because latent scores are not observable. We did, however, investi-
gate coverage of prediction intervals for z-scores by examining the empirical distribution of
the residual scores.
Summary statistics for residual scores, listed in Table 3, are based on all examinations
that are preceded by at least one examination; i.e., exmainations where the residual score
differs from the z-score. Correlations between sequential pairs of residual scores are not
significantly different from zero (P -value > 0.1). The distribution of the residual scores is
close to standard normal, but with slightly larger variance and heavier tails. Further analysis
suggests that the conditional variance given gestational age increases slightly with age, con-
sistent with the larger kurtosis. (Skewness and kurtosis are zero for normal distributions.)
Prediction intervals for z-scores are slightly liberal; e.g., a nominal 80% interval would have
roughly 78% probability of coverage.
12
5 Reference Population
Our data originated from the computerized files of the Obstetrical Ultrasound Information
System in the Fetal Assessment Unit of the Royal Alexandra Women’s Health Program.
The Royal Alexandra Women’s Health Program is located at the Royal Alexandra Women’s
Centre, the sole tertiary care referral facility for high risk pregnancies in Edmonton. As a
result, our data likely include a somewhat higher proportion of cases with low fetal weight
than would be seen at other hospitals in the region. We would thus expect our growth
curves and weight distributions to be slightly lower than those obtained from studies of
healthier populations. This downward bias can be seen, for example, in a comparison with
distributions described by Ott.18
Some authors argue that reference norms should be based on populations displaying
normal variation in health, while others contend that norms should be developed and in-
terpreted separately for individual institutions.3 Clearly our presented norms have the best
applicability to the local region that originated the raw data. Failing the availability of local
data, the norms can be applied by other tertiary care centers, since most ultrasounds are
performed for a similar mix of indications in such units. To date, no other norm addressing
the full spectrum of anticipated growth is available. Goldenberg and Cliver1 recommend use
of common standards to permit more meaningful comparisons of studies relating SGA to
adverse outcomes. While we agree with this view, we note that a z-score transformation and
covariance model developed in one region may be usefully applied in others when attempting
to quantify health risks associated with growth restriction. Further studies relating z-scores
to outcomes are needed to allow more meaningful interpretation of these diagnostics.
Accurate estimates of gestational age are extremely important when assessing growth.
Table 4 shows the effect of error in gestational age on the evaluation of z-scores. Overesti-
mating the gestational age reduces the z-score and may result in a false SGA assessment.
The effects of errors are more pronounced earlier in the pregnancy because, as Figure 2
shows, the rate of change in log weight is greatest at this age. Ultrasound scans, performed
late in the first trimester or early in the second, provide an accurate method for dating
pregnancies.3,20 In our study, using last menstrual period date for calculation of gestational
age was deemed suitable because all subjects had an ultrasound prior to 20 weeks gestation
with ultrasound measurements corresponding to within one week of measurements expected
based on the last menstrual period date. Most of these early scans were not used to estimate
13
fetal weight because the required measurements were not sufficiently reliable. The routine
performance of early scans, however, may account for the bimodal distribution of gestational
ages seen in Figure 1.
Table 1 lists frequencies and summary statistics for groups of subjects receiving different
numbers of ultrasound examinations. Variation in the number of examinations was due to
concerns related to the health of the mother and/or fetus. This raises the possibility of bias
in the estimated z-score transformation and covariance model caused by differing patterns of
growth among the groups. We found essentially no differences among the groups, however,
so this potential source of bias appears to have had little effect. Average gestational age,
average z-score, and standard deviation of z-scores do not vary significantly among groups.
The average time between examinations is related inversely to the number of examinations.
The standard deviations reported in Table 1 refer to variation among examinations, not
subjects, and thus reflect both between-subject and within-subject variation. A one-way
analysis of variance of the z-scores was carried out for each of groups 2 through 8. The
within-subject variance was nearly the same for all groups: 0.25±0.03. Further investigation
(data not shown) revealed no differences among groups with regard to the joint distribution
of z-scores and fetal weights.
In constructing our data set, we adopted the following exclusion criteria. We excluded
multiple gestations. We excluded subjects exhibiting severe growth restriction in associa-
tion with congenital abnormalities. We excluded all examinations before 14 weeks because
ultrasonic measurements at early gestational ages are unreliable. We excluded subjects with
more than eight ultrasound examinations because these subjects tend to have more severe
health problems. During a preliminary data analysis, we excluded 407 examinations where
gross errors in recorded gestational ages were suspected. Specifically, it appeared that the
order of some gestational ages had been permuted relative to the fetal weights. We excluded
an additional 139 cases, comprising all examinations that were followed by a second exam-
ination in less than four days. This latter exclusion was motivated by an unanticipated
effect observed during preliminary data analysis: squared differences in z-scores for the same
subject tended to be substantially larger when pairs of examinations were separated by less
than four days than when pairs were separated by one to two weeks. This contradicted our
expectation that the covariance function decreases with elapsed time, as in Figure 5(a). An
explanation warranting exclusion of the cases is that an immediate callback may indicate an
unusual, and often erroneous, observation in the first examination.
14
6 Alternative models
Pan and Goldstein14 developed growth diagnostics based on a covariance model for z-scores.
In this section we briefly describe their methods and explain our preference for alternative
methods when modeling fetal growth. Their z-score transformation used the LMS method
developed by Cole and Green21, which models the median, the coefficient of variation, and
the Box-Cox power curve to remove skewness from the data, as smooth functions of age.
These smooth functions are approximated with cubic splines and estimated via maximum
penalized likelihood. The LMS method is highly flexible and appears well-suited for model-
ing postnatal growth. The method may also be suitable for modeling prenatal growth, but
we prefer a much simpler and more easily interpreted transformation. The quadratic fit of
log weight is highly accurate and provides a clear description of how growth velocity changes
over the pregnancy. The log transformation also stabilizes the variance.
Pan and Goldstein14 modeled the z-score covariance function by fitting a hierarchical
linear model (often called a random coefficient model) to the z-scores. Bertino et al.9 also
used hierarchical linear models to construct individual growth curves for morphometric traits.
In such models, the latent scores Tj in (7) are expressed in terms of a second level of latent
variables,
Tj =m∑
k=0
Bkφk(aj). (17)
Here the φk are specified basis functions, such as polynomials or spline functions. The
coefficient vector (B0, . . . , Bm) is random, varying from subject to subject, and is modeled
as multivariate normal with mean 0 and unknown covariance matrix ΣB. The covariances
among latent scores are determined by the elements σBlm of ΣB as
Cov(T1, T2) =m∑
k=0
m∑l=0
σBklφk(a1)φl(a2). (18)
The measurement errors Uj are assumed to be independent of the coefficient vector. The
error variance is determined by ΣB from expression (8):
σ2U(a) = 1 −
m∑k=0
m∑l=0
σBklφk(a)φl(a). (19)
The covariance structure of the latent scores and measurement errors is thus determined by
ΣB, given specified basis functions φk.
15
In applications of hierarchical linear models, the error variance σ2U(a) is assigned a spec-
ified form, usually constant. We note a potential problem in adopting a hierarchical linear
model with constant error variance when the response variable is a z-score. The problem is
that z-scores are defined to have variance one, and σ2U(a) in expression (19) is usually not con-
stant. This problem may be inconsequential if the estimated error variance is nearly constant.
We investigated whether this was the case in an example from Pan and Goldstein14 involving
weights of males recorded at postnatal ages from 2 to 18.5 years. Using the estimated co-
variances for random coefficients provided in their Table II, we calculated covariances among
the latent scores using expression (18) with m = 4 and φk(a) = (a−11)k. Plots of the latent
score covariance function and error variance function (19) are shown in Figure 8. We observe
substantial variability in the error variance and complex behaviour (curves crossing) in the
covariance plot for a1 < 4 and a1 > 14. These effects might be due to heteroscedasticity in
the z-scores, but are more likely artifacts associated with the fitted model; i.e., polynomial
curves tend to provide the most accurate fits at ages close to the average age.
We initially tried to fit a random coefficient model to our fetal weight data but encoun-
tered difficulties in specifying appropriate basis functions and an appropriate error variance
function. The assumption that the coefficient vector (B0, . . . , Bm) is multivariate normal
may not be justified. Its validity may depend on how the basis functions φk are defined. It
is difficult to validate the choice of a particular family of basis functions, given that few ul-
trasound examinations are available for most fetal subjects. Postnatal growth data typically
have considerably more observations per subject. The results of our Section 3 show that
the error variance is not constant, but decreases substantially over the pregnancy. It is not
clear how this behaviour can be incorporated by fitting a hierarchical linear model. These
considerations led us to the tentative conclusion that, when employing z-scores to model
prenatal growth, an analysis based on identity (9) is more direct, simpler to interpret, and
easier to validate.
7 Customized norms and multivariate diagnostics
We complete our discussion of models and diagnostics for growth restriction by mentioning
two extensions currently under investigation. The first addresses a limitation of the SGA
criterion as implemented in Figure 7: a fetus may be small for reasons unrelated to its
16
health. Various studies have examined growth norms based on subpopulations relevant
to individual subjects.3,22 Our methods can be extended in this direction by developing
regression models that relate z-scores of birth weight to characteristics known to affect
growth; e.g., maternal height, maternal weight on admission, ethnic group, parity, and sex of
the fetus. Using such a model, one could condition on relevant characteristics when predicting
latent scores, calculating SGA probabilities, and evaluating residual scores. Conditioning
would account for some of the long-term dependence in the z-scores. Opinions vary as to
which characteristics are appropriate for developing standards of comparison.1 Most would
agree that the primary goal of further conditioning is to improve diagnosis of serious health
problems. One would thus want to condition on characteristics affecting growth potential but
not health. These effects are often confounded, however, so the issue is not easily resolved.
This article has focussed on a single measure of growth, estimated fetal weight. Our
approach can be extended to develop models and diagnostics for multivariate measures of
growth.9,23 First, develop a separate z-score transformation for each variable. Second, extend
the ALB model to describe covariances between variables and across gestational ages. Third,
construct prediction regions for latent score vectors, conditioning on relevant information.
Multivariate approaches are needed to better understand growth patterns among various
morphometric traits and their connection with health problems.
Acknowledgements
We thank Nancy Bott of the Central and Northern Alberta Perinatal Audit and Education
Program for advice and facilitation of data management and transfer. We thank Xu Xiong
of the Perinatal Research Centre, University of Alberta, for helpful comments during the
initial stage of this project. And we thank two referees for comments that led to clarification
of many points. Financial support from the Natural Sciences and Engineering Research
Council of Canada is gratefully acknowledged.
17
References
1. Goldenberg, R. L. and Cliver, S. P. ‘Small for gestational age and intrauterine growthrestriction: definitions and standards’, Clinical Obstetrics and Gynecology, 40, 704–714(1997).
2. Goldenberg, R. L., Cutter, G. R., Hoffman, H. J., Foster, J. M., Nelson, K. G. andHauth, J. C. ‘Intrauterine growth retardation: standards for diagnosis’, AmericanJournal of Obstetrics and Gynecology, 161, 271–277 (1989)
3. Gardosi, J. ‘Customized growth curves’, Clinical Obstetrics and Gynecology, 40, 715–722 (1997).
4. Gardosi, J., Mul, T., Mongelli, M. ‘Application of a fetal weight standard to studyassociations between growth retardation and stillbirth’, Ultrasound Obstetrics and Gy-necology, 66, 6 (2 Suppl) (1995).
5. Godfrey, K. M. and Barker, D. J. ‘ Fetal nutrition and adult disease’, American Journalof Clinical Nutrition, 71, 1344S–1352S (2000).
6. Yiu, V., Buka, S., Zurakowski, D., McCormick, M., Brenner, B. and Jabs, K. ‘Re-lationship between birthweight and blood pressure in children’, American Journal ofKidney Disease, 33, 253–260 (1999).
7. Susser, E., Neugebauer, R, Hoek, H. W., Brown, A. S., Lin, S., Labovitz, D. andGorman, J. M., ‘Schizophrenia after prenatal famine: further evidence’, Archives ofGeneral Psychiatry, 53, 25–31 (1996).
8. Susser, E. S., Schaefer, C. A., Brown, A. S., Begg, M. D. and Wyatt, R. J. ‘Design forthe prenatal determinants of schizophrenia study’, Schizophrenia Bulletin, 26, 257–273(2000).
9. Bertino, E., Battista, E. D., Bossi, A., Pagliano, M., Fabris, C., Aicardi, G. and Milani,S. ‘Fetal growth velocity: kinetic, clinical, and biological aspects’, Archives of Diseasein Childhood, 74, F10–F15 (1996).
10. Hooper, P. M. ‘Flexible regression modeling with adaptive logistic basis functions’,Canadian Journal of Statistics, to appear.
11. Donnelly, C. A., Laird, N. M. and Ware, J. H. ‘Prediction and creation of smoothcurves for temporally corrrelated longitudinal data’, Journal of the American StatisticalAssociation, 90, 984–989 (1995).
12. Shepard, M. J., Richards, V. A., Berkowitz, R. L., Warsof, S. L. and Hobbins, J. C.‘An evaluation of two equations for predicting fetal weight by ultrasound’, AmericanJournal of Obstetrics and Gynecology, 142, 47–54 (1982).
18
13. Hadlock, F. P., Harrist, R. B., Sharman, R. S., Deter, R. L. and Park, S. K. ‘Estimationof fetal weight with the use of head, body, and femur memasurements - a prospectivestudy’, American Journal of Obstetrics and Gynecology, 151, 333–337 (1985).
14. Pan, H. and Goldstein, H. ‘Multi-level models for longitudinal growth norms’, Statisticsin Medicine, 16, 2665–2678 (1997).
15. Diggle, P. J. and Verbyla, A. R. ‘Nonparametric estimation of covariance structure inlongitudinal data’, Biometrics, 54, 401–415 (1998).
16. Friedman, J. H. ‘Multivariate adaptive regression splines’ (with discussion), Annals ofStatistics, 19, 1–141 (1991).
17. Hansen, M., Kooperberg, C. and Sardy, S. ‘Triogram models’, Journal of the AmericanStatistical Association, 93, 101–119 (1998).
18. Ott, W. J. ‘Sonographic diagnosis of intrauterine growth restriction’, Clinical Obstetricsand Gynecology, 40, 787–795 (1997).
19. Johnson, R. A. and Wichern, D. W. Applied Multivariate Statistical Analysis, FourthEdition, Prentice-Hall, Upper Saddle River NJ, (1998).
20. Mul, T., Mongelli, M. and Gardosi, J. ‘A comparative analysis of second-trimesterultrasound dating formulae in pregnancies conceived with artificial reproductive tech-niques’, Ultrasound in Obstetrics and Gynecology, 8, 397–402 (1996).
21. Cole, T. J. and Green, P. J. ‘Smoothing reference centile curves: the LMS method andpenalized likelihood’, Statistics in Medicine, 11, 1305–1319 (1992).
22. Amini, S. B., Catalano, P. M., Hirsch, V. and Mann, L. I. ‘An analysis of birth weightby gestational age using a computerized perinatal data base, 1975–1992’, Obstetricsand Gynecology, 83, 342–352 (1994).
23. Owen, P. and Khan, K. S. ‘Fetal growth velocity in the prediction of intrauterine growthretardation in a low risk population’, British Journal of Obstetrics and Gynaecology,105, 536–540 (1998).
19
Table 1: Distribution of number of ultrasound examinations per subject and summary statis-
tics by group.
Number Frequency Average Average weeks Average Std. Dev.
of exams Gest. Age between exams z-score z-score
1 4998 31.4 0.02 0.99
2 1509 32.0 5.6 −0.02 1.01
3 632 32.2 3.9 0.00 1.02
4 369 31.7 3.4 −0.07 1.01
5 189 30.9 3.1 0.08 1.00
6 108 30.1 3.0 0.01 1.01
7 52 29.8 2.5 0.07 0.91
8 31 28.7 2.4 −0.19 0.86
Table 2: Parameter estimates for the covariance model cT (a1, a2) in expression (10).
k δk βk0 βk1 βk2
1 0.9770 −4.796 0.1661 0.0343
2 0.3327 −7.252 −0.0506 0.2543
3 0.5910 0 0 0
20
Table 3: Summary statistics describing the distribution of the residual scores for training
and test groups, using all examinations preceded by one or more examinations.
Train Test
Number of examinations used 2886 2819
Correlation with preceding exam 0.023 0.017
Mean 0.013 0.031
Variance 1.091 1.130
Skewness 0.226 0.012
Kurtosis 1.056 0.950
Coverage for 80% interval 79.6 78.1
Coverage for 95% interval 93.8 93.7
Table 4: Effect of error in gestational age on evaluation of z-score. The table entries are
scores z = f(w, a) evaluated using estimated gestational age a, where w = f−1(0, a0) is the
median weight at true gestational age a0.
Error a − a0 in Gestational Age
−3 −2 −1 0 1 2 3
a0 = 14 4.4 3.5 1.8 0.0 −1.7 −2.6 −3.1
a0 = 27 2.8 1.9 1.0 0.0 −1.0 −1.6 −2.1
a0 = 40 1.2 0.8 0.4 0.0 −0.4 −0.7 −1.0
21
14 24 34 44 15 20 25 30 35 40
1000
2000
3000
4000
5000
Gestational AgeGestational Age
(a) (b)
15 20 25 30 35 40
5
6
7
8
15 20 25 30 35 40
5
6
7
8
Gestational Age Gestational Age
(a) (b)
Figure 1: (a) Histogram of estimated gestational age in weeks. (b) Scatter plot of estimated
fetal weight in grams versus estimated gestational age.
Figure 2: (a) Scatter plot of log weight versus gestational age. (b) Weekly averages of log
weight plotted against gestational age, with quadratic fit q(a) from expression (2).
22
-1 -0.5 0 0.5 1-4
-2
0
2
4
Gestational Age
(a)
15 20 25 30 35 40
0.85
0.9
0.95
1
1.05
Residual
(b)
15 20 25 30 35 40
1000
2000
3000
4000
5000
15 20 25 30 35 40
50
100
150
200(a) (b)
Gestational Age Gestational Age
Figure 3: (a) Normal scores of residuals approximated by linear spline (3). (b) Standard
deviation of the approximate normal scores as a function of gestational age, modeled by
quadratic spline (5).
Figure 4: (a) Median growth curve, a plot of the median weight estimate exp[0.005+q(a)] as
a function of gestational age. (b) Velocity curve, the derivative of the median growth curve.
23
15 20 25 30 35 40
0.4
0.5
0.6
0.7
0.8
0.9
1
15 20 25 30 35 40
0.4
0.5
0.6
0.7
0.8
0.9
1
Gestational Age Gestational Age
(a) (b)
Figure 5: Estimated covariances cT (a1, a2) and correlations rT (a1, a2) between latent scores
at gestational ages a1 and a2, with a1 ≤ a2. In both plots, separate curves are plotted
(starting from the bottom) for a1 = 14, 16, . . . , 40, with a2 represented on the horizontal
axis. The upper boundary in the covariance plot represents the estimated variance function
s2T (a).
24
Gestational Age Gestational Age
15 20 25 30 35 40-3
-2
-1
0
1
2
3
15 20 25 30 35 40
1000
2000
3000
4000
5000(a) (b)
Gestational Age
15 20 25 30 35 40
0.2
0.4
0.6
0.8
1
Figure 6: (a) Predicted latent score plot. The dashed lines represent unconditional 80%
prediction bounds for latent scores. The middle solid line represents the predicted latent
score as a function of gestational age for a particular fetus, given z-scores from examinations
at ages 28, 30, and 32 weeks. The lower and upper solid lines represent conditional 80%
prediction bounds, given the z-scores. (b) Predicted weight plot, obtained by applying the
f−1 transformation in (6) to the curves in (a).
Figure 7: Probability that a particular fetus is SGA, evaluated as a function of gestational
age. The probabilities are conditioned on z-scores from examinations at 28, 30, and 32 weeks,
as described in Figure 6.
25
4 6 8 10 12 14 16 18
0.2
0.4
0.6
0.8
1
4 6 8 10 12 14 16 18
0
0.1
0.2
0.3
0.4
Age Age
(b) (a)
Figure 8: Pan and Goldstein14 example. Here age is years since birth, not gestational
age. (a) Estimated covariances from (18) between latent scores at ages a1 and a2, with
a1 ≤ a2. Separate curves are plotted (starting from the bottom) for a1 = 2, 3, . . . , 17, with
a2 represented on the horizontal axis. (b) Estimated error variances from (19).
26