amodel for fetal growth and diagnosis of intrauterine growth...

A Model for Fetal Growth

and Diagnosis of Intrauterine Growth Restriction

Peter M. Hooper1

Damon C. Mayes2

Nestor N. Demianczuk3

1Department of Mathematical Sciences, 2Perinatal Research Centre, and 3Department of

Obstetrics and Gynecology, University of Alberta.

SummaryA model for fetal growth is developed and used to construct tools for diagnosis of intrauterine

growth restriction. Fetal weight estimates are first transformed to normally distributed

z-scores. The covariance structure over gestational ages is then estimated using a novel

regression model. The diagnostic tools include individual growth curves with error bounds,

probabilities to assess whether a fetus is small for its gestational age, and residual scores

to determine whether current growth rates are unusual. The methods were developed using

data from 13593 ultrasound examinations involving 7888 fetal subjects. The model shows

that median fetal growth velocity increases up to a gestational age of 35 weeks and then

decreases during the final weeks of pregnancy. When growth is expressed as change in log

weight, or equivalently as change proportional to current weight, the model reveals a constant

deceleration as gestational age increases from 14 to 42 weeks.

Keywords: Estimated Fetal Weight, Flexible Regression, Gestational Age, Intrauterine

Growth Restriction, Kriging, Transformation, Variogram, z-score.

No. of Figures: 8.

No. of Tables: 4.

No. of References: 23.

1Correspondence to: Peter M. Hooper, Department of Mathematical Sciences, University of

Alberta, Edmonton, Alberta, T6G 2G1, Canada (email: [email protected]).

1 Introduction

Improved models for fetal growth are needed for more accurate assessment of intrauter-

ine growth restriction. A growth restricted newborn is defined as an infant that has not

achieved its genetic growth potential in utero.1 This definition is not immediately appli-

cable as a diagnostic tool because the determination of growth potential is at present not

feasible. Consequently, many studies of fetal growth have employed a related definition of

the small-for-gestational-age (SGA) fetus; i.e., a fetus that has failed to achieve a specific

anthropometric or weight threshold by a specific gestational age. The SGA threshold is

often set at the 10th percentile of a weight distribution, conditioned on gestational age, for

some reference population.2 In some studies the weight distribution is further conditioned

on additional characteristics, such as maternal height and weight.3

Intrauterine growth restriction is associated with a variety of health risks, including

antepartum stillbirth4, and cold stress and hypoglycemia among newborns.1 Furthermore,

recent research suggests that several of the major diseases of later life, including coronary

heart disease, hypertension, type 2 diabetes, and schizophrenia, originate in impaired in-

trauterine growth and development.5−8 Evidence of growth restriction is assessed at birth,

through direct measurement of weight and other criteria, and before birth through ultrasound

measurements. In the conclusion of their review article on growth restriction, Goldenberg

and Cliver1 emphasized the potential importance of antenatal assessment and treatment:

In summary, determining whether fetuses in utero are growth restricted has be-

come one of the major tasks involved in prenatal care. Because decisions related

to surveillance, the use of invasive monitoring procedures, and ultimately delivery

of the infant itself are based on these definitions, the impact of using particular

standards for the diagnosis of growth restriction in utero potentially has a far

greater impact than using a particular set of standards at birth. The fact that

a relatively small number of studies have defined important outcomes related to

various in utero ultrasound measurements and that there is relatively little evi-

dence that making the diagnosis of growth restriction in utero changes pregnancy

outcome suggests that a substantial amount of research is necessary to better de-

fine not only the standards by which SGA is diagnosed in utero, but also the

management of those fetuses for whom a diagnosis of SGA has been made.

1

SGA is a dichotomous categorization of growth and is thus of limited use in quantifying

risk. Continuous diagnostic measures, such as z-scores and probabilities, may be more useful

for clinical assessment. In this article we develop tools for diagnosing growth restriction based

on a new model for the conditional distribution of estimated fetal weight given estimated

gestational age. We proceed in three steps. In Section 2 we introduce a transformation

from fetal weights to z-scores, defined so that the conditional distribution given estimated

gestational age is standard normal. The transformation uses a quadratic fit of log weight

against gestational age, plus a nonlinear transformation of the residuals to induce normal-

ity. The log quadratic fit shows that fetal growth velocity peaks at 35 weeks, consistent

with findings of Bertino et al.9 In Section 3 we model the covariance of the z-scores across

gestational ages using a novel nonlinear regression technique10. Our approach is similar to

a variogram or kriging analyis11, common in geostatistics, but it allows nonstationarity in

the covariance function. The model provides a breakdown of the variance into components,

attributable to measurement error and other factors, that vary with gestational age. In Sec-

tion 4 we introduce several growth diagnostics. Smoothed z-score and fetal weight curves

permit prediction of future values, with error bounds. SGA probabilities estimate the prob-

ability that fetal weight will fall below the 10th percentile at future points in time. Residual

z-scores allow one to assess whether current ultrasound measurements are abnormal, given

previous measurements. The diagnostics can be calculated using measurements from one or

several ultrasound examinations. The examinations can be scheduled at regular or irregular

intervals.

Our results are based on data from 13593 ultrasound examinations at the Royal Alexandra

Hospital in Edmonton, Canada. The examinations were carried out over the period 1991–

1998 and involve 7888 singleton pregnancies. For all subjects, gestational age was based on

the start of the last menstrual period, assumed to be two weeks before conception. Fetal

weight estimates were calculated from ultrasound measurements of fetal head circumference,

abdominal circumference, and femur length, using standard formulae.12,13 Table 1 shows the

distribution of the number of examinations per fetal subject, as well as summary statistics

for gestational ages and z-scores. In Section 5 we comment on these statistics and other

matters pertaining to our reference population. We also describe exclusion criteria used in

defining our data set.

Pan and Goldstein14 developed diagnostics for postnatal growth based on a covariance

model for z-scores. Their z-score transformation and covariance model are entirely different

2

from ours. In Section 6 we briefly describe their methods and explain why we adopted a

different approach when modeling fetal growth. In particular, we comment on a problem

inherent in applying hierarchical linear models to z-scores. In Section 7 we briefly indicate

two extensions of our methods.

2 Transformation to z-scores

Figure 1 presents a histogram of the 13593 gestational age estimates in our study and a

scatter plot of estimated fetal weight versus gestational age. A random sample of 3000

points was used for the scatter plots in Figures 1 and 2 because the pattern of variation

becomes obscured when too many points are plotted. The scatter plot of weight versus age

shows that the mean weight is a highly nonlinear function of age, and the variance of weight

increases with age. A transformation permits a tractable characterization of the conditional

weight distribution.

Let wij be the estimated fetal weight (in grams) obtained from the jth ultrasound ex-

amination of the ith subject, and let aij be the corresponding estimated gestational age (in

weeks). We employ a transformation from weights to z-scores zij = f(wij, aij) defined so

that the conditional z-score distribution, given gestational age, is well approximated by the

standard normal distribution, for ages between 14 and 42 weeks. Our transformation

z = f(w, a) = g(log(w) − q(a))/h(a) (1)

is constructed in four steps.

First, transform to the natural logarithm of weight. A plot of log weight versus gestational

age is displayed in Figure 2(a). A logarithmic transformation simplifies the analysis by

reducing curvature in the growth curve and by stabilizing the variance about the mean. The

variance of log weight remains nearly constant as a function of age.

Second, subtract an estimated quadratic curve from log weight. The function

q(a) = 0.718 + 0.322a − 0.00339a2. (2)

was obtained by a weighted least squares regression of log(wij) against aij, with weights

inversely proportional to the square root of the number of examinations per subject. This

weighted fit provides a compromise between the fit obtained by weighting each subject

3

equally (0.726 + 0.321a − 0.00338a2) and the fit obtained by weighting each examination

equally (0.709 + 0.322a − 0.00340a2). Differences among the fitted values are negligible,

varying from those in (2) by at most 0.003. Figure 2(b) shows that the quadratic curve

provides a highly accurate representation of growth on the log scale.

Third, transform the residuals r = log(w) − q(a) to approximate normal scores g(r).

This transformation was motivated by the observation that the distribution of the residuals

is approximately the same for all gestational ages. The residual distribution has heavy tails,

compared with the normal distribution, and is slightly skewed to the left. Normal scores are

approximated by fitting a linear spline with six knots to a normal probability plot of the

residuals:

g(r) = −1.787 + 1.435r + 0.883(r + .6)+ + 1.598(r + .4)+ + 2.951(r + .2)+ (3)

− 0.884(r − .2)+ − 1.810(r − .4)+ − 2.903(r − .6)+

Here x+ = max{0, x}. The plot of g in Figure 3(a) closely approximates the normal prob-

ability plot of the residuals {rij}. A linear spline approximation was selected because its

inverse is easily calculated; i.e.,

g−1(x) = 1.245 + 0.697x − 0.265(x + 2.648)+ − 0.176(x + 2.184)+ (4)

− 0.110(x + 1.401)+ + 0.0215(x − 1.346)+

+ 0.0725(x − 2.543)+ + 0.547(x − 3.378)+.

Fourth, divide the approximate normal score g(r) by an estimate of its standard deviation

h(a). An examination of the values g(rij) shows that the mean remains close to zero as

gestational age aij varies, but the standard deviation varies more substantially from one. A

quadratic spline with one knot provides a good fit when regressing g(rij)2 against aij. As

in the regression of log fetal weight against gestational age at (2), different weighted least

squares fits yield similar results. The standard deviation is estimated using the square root

of the fitted spline:

h(a) =[0.342 + 0.0285a − 0.000166a2 − 0.00495{(a − 32)+}2

]1/2. (5)

Usually it would be preferable to standardize the residuals using a robust estimate of scale

before transforming to normality. This preliminary rescaling is not required here because

4

the scale of the residuals varies only slightly. Figure 3(b) shows that h(a) varies from 0.85

to 1.04. Further data analysis (not shown) indicates that the standard normal distribution

provides a good approximation to the empirical distribution of the zij when conditioning on

gestational age groups of one or more weeks.

For fixed gestational age a, the transformation (1) from w to z is invertible; i.e.,

w = f−1(z, a) = exp{g−1(h(a)z) + q(a)}. (6)

Percentiles and predictions can thus be calculated in terms of the z-scores, then re-expressed

in terms of the original weights. In particular, the median of the normal distribution is

0, and g−1(0) ≈ 0.005, so the conditional median weight can be estimated as exp{0.005 +

q(a)}. Plots of this median growth curve and its derivative are given in Figure 4. We

observe that the weight velocity increases up to 35 weeks and decreases thereafter. This

phenomenon of velocity peaking a few weeks before birth has a simpler expression in terms

of proportional growth. If growth is expressed as change in log weight, or equivalently as

change proportional to current weight, then the phenomenon can be viewed as a consequence

of constant deceleration; i.e., the log median weight velocity dq/da = 0.322 − 0.00678a

decreases linearly with gestational age between 14 and 42 weeks.

Bertino et al.9 studied growth patterns for several one-dimensional morphometric traits

and found that growth velocities peak earlier: at 18 weeks for head measures, such as

biparietal diameter and head circumference, 20 weeks for femur length, and 22 weeks for

abdomen circumference. Velocities for three-dimensional traits peak later. They found that

the velocity of head volume (assumed proportional to cubic biparietal diameter) peaks at 31

weeks. Estimated fetal weight is a three-dimensional trait related to both head and body

measurements, so a peak velocity at 35 weeks is consistent with these earlier findings.

3 Covariance function for z-scores

Variation in the z-scores can be attributed to several factors: (i) between-subject variation

due to differences in growth rates averaged over time, (ii) error in the estimated date of

conception used to determine gestational age, (iii) within-subject variation due to changes

in fetal growth rates over time, and (iv) error in the measurements used to calculate the

estimated fetal weight. The first two factors contribute to long-term dependence among

5

z-scores for a randomly sampled fetus. Subjects with higher over-all growth rates tend to be

larger and so have positive z-scores. A late estimate for date of conception yields estimated

gestational ages that are less than the true ages, and so increases a subject’s z-scores. The

third factor results in a decrease in the correlation between a subject’s z-scores as the interval

between examinations increases. The effects of the fourth factor can be usefully modeled

as random errors distributed independently in successive examinations. This independence

assumption appears reasonable given sufficient time between examinations. In our data set,

at least four days elapsed between examinations.

Let Wj denote the weight obtained at gestational age aj for a subject randomly sampled

from our population. Let Zj = f(Wj, aj) be the corresponding z-score. We model the

joint distribution of (Z1, . . . , Zn), conditioned on (a1, . . . , an), as multivariate normal with

E(Zj) = 0 and Var(Zj) = 1. We further model Zj as a sum of two independent, normally

distributed random variables:

Zj = Tj + Uj. (7)

The Uj are interpreted as measurement errors and the Tj as latent scores; i.e., the observed

z-scores with the unobserved errors removed. The Uj are assumed to be independent random

variables with E(Uj) = 0. The Tj are modeled as correlated random variables with E(Tj) =

0. We have

Var(Tj) + Var(Uj) = 1, (8)

but each variance component is a function of aj. The variables Tj and Uj, while not directly

observable, play an important role in interpretation and prediction.

Our growth diagnostics are based on a model for the covariance of the latent scores. The

covariance is related to the variogram; i.e., the variance of the z-score differences:

Var(Z1 − Z2) = E{(Z1 − Z2)2} = 2 − 2Cov(T1, T2) for a1 �= a2. (9)

It is advantageous to work with differences because this removes inter-subject variability.

We initially investigated a variogram model, similar to that of Donnelly et al.11, where

Var(Z1 − Z2) is assumed to be a function of |a1 − a2|. This approach yields a poor fit to

empirical covariance estimates because of nonstationarity. As we shall see below, the latent

score variance increases substantially with gestational age. Diggle and Verbyla15 modeled

nonstationary covariance functions using kernel weighted local linear regression modeling

6

of the sample variogram. We also adopt a nonstationary covariance model, but employ an

alternative regression technique.

We model the covariance Cov(T1, T2) as a function of the ages a1 ≤ a2 using a linear

combination of logistic basis functions:

cT (a1, a2) =

∑Kk=1 δk exp(β0k + β1ka1 + β2ka2)∑K

k=1 exp(β0k + β1ka1 + β2ka2). (10)

This model, referred to as an adaptive logistic basis (ALB) model, can be viewed as a

neural network.10 The potential complexity of the fit is controlled by K, the number of

basis functions. As K increases, cT (a1, a2) can approximate arbitrary continuous functions.

Choosing K is analogous to choosing the bandwidth in kernel smoothing15. Our parameter

estimates are based on identity (9). For fixed K, the parameters in (10) are obtained by

minimizing∑

dijk

{1 − (1/2)(zij − zik)

2 − cT (aij, aik)}2

, (11)

where the summation is over all subjects i with at least two ultrasound examinations, and

all pairs of examinations (j, k) with aij < aik. We employ equal weights dijk = 1 for reasons

discussed below. The complexity parameter K is selected by cross-validation. We find that

K = 3 provides a good fit. Parameter estimates are listed in Table 2. The parameters are

not interpretable because model (10) is over-parameterized; i.e., without loss of generality,

one vector (βk0, βk1, βk2) can be set to zero.10

Our ALB covariance model is recommended on an empirical basis. A comparison of

values from (10) with averages of 1 − (1/2)(zij − zik)2 over subsets of the data in (aij, aik)

neighborhoods indicates a good, parsimonious fit. We used an ALB model primarily because

of our familiarity with this method. We expect that other flexible regression methods, such as

multivariate adaptive regression splines16 and triogram models17, would give roughly similar

results.

A potential problem in using a regression model to estimate the covariance function is

that the function can fail to be nonnegative definite; e.g., the estimated covariance matrix

for (T1, . . . , Tn) may have a negative eigenvalue. Diggle and Verbyla15 described an example

where this occurs. We investigated nonnegative definiteness for the ALB model using the

parameter estimates in Table 2 and ages 14.0, 14.1, . . . , 42.0. All eigenvalues for the 281×281

matrix are positive, so our estimated covariance function appears to be nonnegative definite.

7

One may consider alternative criteria for fitting the covariance model. An approach based

on the joint likelihood of the z-scores is appealing from a theoretical perspective but does

not appear to be tractable. A weighted least squares criterion (11) is easily implemented.

We have followed Diggle and Verbyla15 in assigning equal weights dijk. The contribution of

subject i to the fitted model is thus proportional to ni(ni − 1)/2, where ni is the number

of ultrasound examinations. Equal weights dijk can be motivated as follows. The ALB

model is flexible enough, allowing larger K, to estimate cT (a1, a2) locally within the region

14 ≤ a1 < a2 ≤ 42. Given a small neighbourhood in this region, each subject i has at most

one pair of ages (aij, aik) in the neighbourhood. Subjects are weighted equally at this local

level, for all subjects contributing information for the neighbourhood. Subjects with more

examinations contribute information about cT (a1, a2) for more local neighbourhoods, and so

should be assigned greater overall weight.

The preceding argument is not conclusive. The effect of dependencies among the z-score

differences has been ignored, and a justification based on local fitting is less compelling when

a simple ALB model with K = 3 is selected. One might prefer to define dijk as a decreasing

function of ni, so that subjects contribute more equally to the fitted model. The choice of

weights is likely not of great importance in this application, given the large sample size, but

the issue warrants further study.

The estimated covariance function is plotted in Figure 5(a). The upper boundary of

the plot determines the variance estimate s2T (a) = cT (a, a). The error variance estimate is

s2U(a) = 1 − s2

T (a). We observe from Figure 5(a) or expression (10) that s2T (a) increases

from 0.64 at a = 14 weeks to 0.92 at a = 42 weeks, so s2U(a) decreases from 0.36 to 0.08.

This decrease in the error variance appears reasonable. We would expect that morphometric

traits are measured with greater proportional accuracy as the fetus develops.

The effect of measurement error on estimated fetal weight can be summarized in terms

of average proportional error, defined as follows. Let T and U be independent zero mean

normal random variables with respective variances s2T (a) and s2

U(a), put Z = T + U , and

define

average proportional error = E

{|f−1(Z, a) − f−1(T, a)|

f−1(T, a)

}. (12)

Evaluation of (12) via simulation shows that average proportional error is about 6% at 14

weeks, 5% at 29 weeks, 4% at 36 weeks, and 3% at 42 weeks. These values are substantially

smaller than the lowest error rates of about 8% reported in the literature.18 This discrepancy

8

might be explained as follows. The 8% error rate is based on comparisons of estimated fetal

weight with birth weight, and so accounts for variation from several sources: error in measur-

ing ultrasound parameters, error in estimating gestational age, and error in predicting weight

from ultrasound measurements due to limitations of the prediction formula. The average

proportional error (12) reflects measurement error only. It would appear that measurement

error accounts for roughly half of the 8% error reported in earlier studies.

Since z-scores have variance one, the covariances in Figure 5(a) can be interpretated as

correlations between z-scores at different gestational ages. The correlations quantify the

limitations in our ability to predict future z-scores, and hence to predict future weights. For

example, our error in predicting Z2 at age a2 given Z1 at age a1 has variance 1 − c2T (a1, a2).

These limitations are due in part to measurement error, but are also the result of variation

inherent in the latent scores; i.e., the apparently random variation over time in actual growth

rates. This important source of variation can be quantified through the estimated latent score

correlation function

rT (a1, a2) =cT (a1, a2)

sT (a1)sT (a2), (13)

plotted in Figure 5(b). If the latent score T1 at age a1 were observable, then our error in

predicting T2 at age a2 would have variance s2T (a2){1 − r2

T (a1, a2)}.

4 Growth diagnostics

Our diagnostic measures are based on a kriging method similar to that of Donnelly et

al.11. The method employs the following well-known property of the multivariate normal

distribution.19 Suppose Y is a multivariate normal random vector with mean vector µ and

covariance matrix Σ. Let Y , µ, and Σ be partitioned,

Y =

Y1

Y2

, µ =

µ1

µ2

, Σ =

Σ11 σ12

σT12 σ22

, (14)

where Y2, µ2, and σ22 are scalars. If Σ11 is nonsingular, then the conditional distribution of Y2

given Y1 = y1 is normal with mean µ2+σT12Σ

−111 (y1−µ1) and variance σ22.1 = σ22−σT

12Σ−111 σ12.

We apply this result to predict latent scores and corresponding weights at various gestational

ages, to estimate the probability that a fetus is small for its gestational age (SGA), and to

detect rapid changes in growth through residual scores.

9

Prediction of latent scores and weights

To illustrate the construction of prediction intervals, suppose we want to predict the latent

score of a fetus at 40 weeks. If we have no information about the fetus then we predict a latent

score of T = 0. An unconditional 80% prediction interval can be defined as ±z.10sT (40) =

±1.28√

0.908 or −1.22 ≤ T ≤ 1.22. We have set the coverage probability to 80%, rather than

95%, because the 10th percentile is often used as the SGA threshold. Taking measurement

error into account, it seems reasonable to regard a fetus as SGA if its latent score falls

below the 10th percentile; i.e., below the unconditional 80% prediction interval. Applying

transformation (6), we predict a weight of W = f−1(0, 40) = 3472 grams at 40 weeks, with

80% bounds 2934 ≤ W ≤ 4109.

Now suppose we have estimated fetal weights from examinations at 28, 30, and 32 weeks.

Let Y1 be the corresponding vector of z-scores and let Y2 be the latent score T at a = 40

weeks. We then have µ = (0, 0, 0, 0)T ,

Σ =

1.000 0.815 0.794 0.609

0.815 1.000 0.836 0.672

0.794 0.836 1.000 0.734

0.609 0.672 0.734 0.908

, (15)

and σ22.1 = 0.358. If the estimated fetal weights at 28, 30, and 32 weeks are 1100, 1200, and

1300 grams, respectively, then the corresponding z-scores are −0.413,−1.441, and −1.952,

and the predicted (conditional mean) latent score at 40 weeks is −1.407. The conditional 80%

prediction bounds are −1.407 ± 1.28√

0.358 or −2.173 ≤ T ≤ −0.641. We predict a weight

of W = f−1(−1.407, 40) = 2859 grams at 40 weeks, with 80% bounds 2393 ≤ W ≤ 3178.

The estimated covariance function can be used to predict latent scores at all gestational

ages between 14 and 42 weeks. The term “prediction” as used here includes prediction of

unobserved random variables realized in the past, present, or future. The predicted scores

provide an estimate (with error bounds) for the individual fetal growth curve. Such plots

are easily automated and updated to reflect new information. Figure 6 shows plots for

the hypothetical example described above. The plot suggests normal growth prior to 28

weeks and abnormally slow growth between 28 and 32 weeks. The plot also demonstrates a

“regression to the mean” effect; i.e., the predicted growth curves bend toward the population

mean as one extrapolates beyond the available ultrasound data. This behavior reflects a real

10

phenomenon modeled in the covariance function; e.g., periods of relatively slow growth are

often followed by periods of relatively fast growth.

SGA probabilities

A fetus is typically defined to be SGA if its weight falls below the 10th percentile for its

population. We suggest that a fetus be viewed as SGA if its latent score falls below the 10th

percentile; i.e., below the lower dashed line in Figure 6. This alternative definition takes

into account the effects of measurement error on estimated fetal weight and allows one to

estimate the probability that a fetus is SGA at various gestational ages, conditioning on

available ultrasound data. In the example considered above, the probability that the fetus

is SGA at 40 weeks, given the three ultrasound observations, is estimated as

P{Y1 ≤ −1.22 |Y2} = Φ({−1.22 − (−1.407)}/√

0.358) = 0.62, (16)

where Φ is the standard normal cumulative distribution function. Figure 7 shows a plot

of the SGA probabilities. The SGA plot re-expresses information in the z-score plot on

a probability scale that clinicians may find easier to interpret. The rapid increase in the

SGA probability between 28 and 32 weeks again indicates unusually slow growth during this

period.

Residual scores

The preceding diagnostic measures involve predictions conditioned on all available z-scores.

It is also of interest to check whether a new z-score is unusual, given previous observations.

A simple approach would be to compare the apparent growth rate since the last ultrasound

with the typical rate indicated in the velocity curve of Figure 4(b). In our example, the

apparent growth rate between 28 and 32 weeks is 50 grams per week, very low compared

with the typical rate of about 175 grams per week. This comparison leaves open the question

whether differences of this magnitude are truly abnormal compared with typical fluctuations

in growth rates.

The degree of abnormality can be assessed by calculating residual scores. These are

essentially the same as the “conditional norms” suggested by Pan and Goldstein14. In ex-

pression (14), let Y2 be the new z-score and let Y1 be the vector of previous z-scores. The

11

residual score is obtained by standardizing Y2; i.e., subtract its conditional mean and then

divide by its conditional standard deviation. In our example, the residual score at 28 weeks

is −0.413, the same as the z-score since there are no earlier observations. The residual

score at 30 weeks is {−1.441− (−0.336)}/√

0.188 = −2.55. The residual score at 32 weeks is

{−1.952−(−0.947)}/√

0.134 = −2.75. In theory, the residual scores for a randomly sampled

fetus are distributed independently with a standard normal distribution. The validity of this

result was examined by cross-validation, as described below. Scores as low as −2.55 or −2.75

are highly unlikely, especially when occuring in succession, and provide strong evidence of

growth restriction between 28 and 32 weeks.

Cross-validation

We investigated the validity of our model and diagnostics by randomly dividing the fetal

subjects into a training group and a test group of roughly equal size, fitting the model using

the training group, and applying the diagnostics to both groups. There were 3982 subjects

with 6868 examinations in the training group and 3906 subjects with 6725 examinations in

the test group. We found essentially no differences when comparing diagnostics for training

and test groups. This result was expected given the simplicity of the model (only 24 param-

eters estimated for the z-score transformation and covariance function) relative to the size of

the training set. The empirical distribution of the z-scores, conditioned on gestational age, is

close to standard normal for both groups. We were unable to examine coverage of prediction

intervals for latent scores because latent scores are not observable. We did, however, investi-

gate coverage of prediction intervals for z-scores by examining the empirical distribution of

the residual scores.

Summary statistics for residual scores, listed in Table 3, are based on all examinations

that are preceded by at least one examination; i.e., exmainations where the residual score

differs from the z-score. Correlations between sequential pairs of residual scores are not

significantly different from zero (P -value > 0.1). The distribution of the residual scores is

close to standard normal, but with slightly larger variance and heavier tails. Further analysis

suggests that the conditional variance given gestational age increases slightly with age, con-

sistent with the larger kurtosis. (Skewness and kurtosis are zero for normal distributions.)

Prediction intervals for z-scores are slightly liberal; e.g., a nominal 80% interval would have

roughly 78% probability of coverage.

12

5 Reference Population

Our data originated from the computerized files of the Obstetrical Ultrasound Information

System in the Fetal Assessment Unit of the Royal Alexandra Women’s Health Program.

The Royal Alexandra Women’s Health Program is located at the Royal Alexandra Women’s

Centre, the sole tertiary care referral facility for high risk pregnancies in Edmonton. As a

result, our data likely include a somewhat higher proportion of cases with low fetal weight

than would be seen at other hospitals in the region. We would thus expect our growth

curves and weight distributions to be slightly lower than those obtained from studies of

healthier populations. This downward bias can be seen, for example, in a comparison with

distributions described by Ott.18

Some authors argue that reference norms should be based on populations displaying

normal variation in health, while others contend that norms should be developed and in-

terpreted separately for individual institutions.3 Clearly our presented norms have the best

applicability to the local region that originated the raw data. Failing the availability of local

data, the norms can be applied by other tertiary care centers, since most ultrasounds are

performed for a similar mix of indications in such units. To date, no other norm addressing

the full spectrum of anticipated growth is available. Goldenberg and Cliver1 recommend use

of common standards to permit more meaningful comparisons of studies relating SGA to

adverse outcomes. While we agree with this view, we note that a z-score transformation and

covariance model developed in one region may be usefully applied in others when attempting

to quantify health risks associated with growth restriction. Further studies relating z-scores

to outcomes are needed to allow more meaningful interpretation of these diagnostics.

Accurate estimates of gestational age are extremely important when assessing growth.

Table 4 shows the effect of error in gestational age on the evaluation of z-scores. Overesti-

mating the gestational age reduces the z-score and may result in a false SGA assessment.

The effects of errors are more pronounced earlier in the pregnancy because, as Figure 2

shows, the rate of change in log weight is greatest at this age. Ultrasound scans, performed

late in the first trimester or early in the second, provide an accurate method for dating

pregnancies.3,20 In our study, using last menstrual period date for calculation of gestational

age was deemed suitable because all subjects had an ultrasound prior to 20 weeks gestation

with ultrasound measurements corresponding to within one week of measurements expected

based on the last menstrual period date. Most of these early scans were not used to estimate

13

fetal weight because the required measurements were not sufficiently reliable. The routine

performance of early scans, however, may account for the bimodal distribution of gestational

ages seen in Figure 1.

Table 1 lists frequencies and summary statistics for groups of subjects receiving different

numbers of ultrasound examinations. Variation in the number of examinations was due to

concerns related to the health of the mother and/or fetus. This raises the possibility of bias

in the estimated z-score transformation and covariance model caused by differing patterns of

growth among the groups. We found essentially no differences among the groups, however,

so this potential source of bias appears to have had little effect. Average gestational age,

average z-score, and standard deviation of z-scores do not vary significantly among groups.

The average time between examinations is related inversely to the number of examinations.

The standard deviations reported in Table 1 refer to variation among examinations, not

subjects, and thus reflect both between-subject and within-subject variation. A one-way

analysis of variance of the z-scores was carried out for each of groups 2 through 8. The

within-subject variance was nearly the same for all groups: 0.25±0.03. Further investigation

(data not shown) revealed no differences among groups with regard to the joint distribution

of z-scores and fetal weights.

In constructing our data set, we adopted the following exclusion criteria. We excluded

multiple gestations. We excluded subjects exhibiting severe growth restriction in associa-

tion with congenital abnormalities. We excluded all examinations before 14 weeks because

ultrasonic measurements at early gestational ages are unreliable. We excluded subjects with

more than eight ultrasound examinations because these subjects tend to have more severe

health problems. During a preliminary data analysis, we excluded 407 examinations where

gross errors in recorded gestational ages were suspected. Specifically, it appeared that the

order of some gestational ages had been permuted relative to the fetal weights. We excluded

an additional 139 cases, comprising all examinations that were followed by a second exam-

ination in less than four days. This latter exclusion was motivated by an unanticipated

effect observed during preliminary data analysis: squared differences in z-scores for the same

subject tended to be substantially larger when pairs of examinations were separated by less

than four days than when pairs were separated by one to two weeks. This contradicted our

expectation that the covariance function decreases with elapsed time, as in Figure 5(a). An

explanation warranting exclusion of the cases is that an immediate callback may indicate an

unusual, and often erroneous, observation in the first examination.

14

6 Alternative models

Pan and Goldstein14 developed growth diagnostics based on a covariance model for z-scores.

In this section we briefly describe their methods and explain our preference for alternative

methods when modeling fetal growth. Their z-score transformation used the LMS method

developed by Cole and Green21, which models the median, the coefficient of variation, and

the Box-Cox power curve to remove skewness from the data, as smooth functions of age.

These smooth functions are approximated with cubic splines and estimated via maximum

penalized likelihood. The LMS method is highly flexible and appears well-suited for model-

ing postnatal growth. The method may also be suitable for modeling prenatal growth, but

we prefer a much simpler and more easily interpreted transformation. The quadratic fit of

log weight is highly accurate and provides a clear description of how growth velocity changes

over the pregnancy. The log transformation also stabilizes the variance.

Pan and Goldstein14 modeled the z-score covariance function by fitting a hierarchical

linear model (often called a random coefficient model) to the z-scores. Bertino et al.9 also

used hierarchical linear models to construct individual growth curves for morphometric traits.

In such models, the latent scores Tj in (7) are expressed in terms of a second level of latent

variables,

Tj =m∑

k=0

Bkφk(aj). (17)

Here the φk are specified basis functions, such as polynomials or spline functions. The

coefficient vector (B0, . . . , Bm) is random, varying from subject to subject, and is modeled

as multivariate normal with mean 0 and unknown covariance matrix ΣB. The covariances

among latent scores are determined by the elements σBlm of ΣB as

Cov(T1, T2) =m∑

k=0

m∑l=0

σBklφk(a1)φl(a2). (18)

The measurement errors Uj are assumed to be independent of the coefficient vector. The

error variance is determined by ΣB from expression (8):

σ2U(a) = 1 −

m∑k=0

m∑l=0

σBklφk(a)φl(a). (19)

The covariance structure of the latent scores and measurement errors is thus determined by

ΣB, given specified basis functions φk.

15

In applications of hierarchical linear models, the error variance σ2U(a) is assigned a spec-

ified form, usually constant. We note a potential problem in adopting a hierarchical linear

model with constant error variance when the response variable is a z-score. The problem is

that z-scores are defined to have variance one, and σ2U(a) in expression (19) is usually not con-

stant. This problem may be inconsequential if the estimated error variance is nearly constant.

We investigated whether this was the case in an example from Pan and Goldstein14 involving

weights of males recorded at postnatal ages from 2 to 18.5 years. Using the estimated co-

variances for random coefficients provided in their Table II, we calculated covariances among

the latent scores using expression (18) with m = 4 and φk(a) = (a−11)k. Plots of the latent

score covariance function and error variance function (19) are shown in Figure 8. We observe

substantial variability in the error variance and complex behaviour (curves crossing) in the

covariance plot for a1 < 4 and a1 > 14. These effects might be due to heteroscedasticity in

the z-scores, but are more likely artifacts associated with the fitted model; i.e., polynomial

curves tend to provide the most accurate fits at ages close to the average age.

We initially tried to fit a random coefficient model to our fetal weight data but encoun-

tered difficulties in specifying appropriate basis functions and an appropriate error variance

function. The assumption that the coefficient vector (B0, . . . , Bm) is multivariate normal

may not be justified. Its validity may depend on how the basis functions φk are defined. It

is difficult to validate the choice of a particular family of basis functions, given that few ul-

trasound examinations are available for most fetal subjects. Postnatal growth data typically

have considerably more observations per subject. The results of our Section 3 show that

the error variance is not constant, but decreases substantially over the pregnancy. It is not

clear how this behaviour can be incorporated by fitting a hierarchical linear model. These

considerations led us to the tentative conclusion that, when employing z-scores to model

prenatal growth, an analysis based on identity (9) is more direct, simpler to interpret, and

easier to validate.

7 Customized norms and multivariate diagnostics

We complete our discussion of models and diagnostics for growth restriction by mentioning

two extensions currently under investigation. The first addresses a limitation of the SGA

criterion as implemented in Figure 7: a fetus may be small for reasons unrelated to its

16

health. Various studies have examined growth norms based on subpopulations relevant

to individual subjects.3,22 Our methods can be extended in this direction by developing

regression models that relate z-scores of birth weight to characteristics known to affect

growth; e.g., maternal height, maternal weight on admission, ethnic group, parity, and sex of

the fetus. Using such a model, one could condition on relevant characteristics when predicting

latent scores, calculating SGA probabilities, and evaluating residual scores. Conditioning

would account for some of the long-term dependence in the z-scores. Opinions vary as to

which characteristics are appropriate for developing standards of comparison.1 Most would

agree that the primary goal of further conditioning is to improve diagnosis of serious health

problems. One would thus want to condition on characteristics affecting growth potential but

not health. These effects are often confounded, however, so the issue is not easily resolved.

This article has focussed on a single measure of growth, estimated fetal weight. Our

approach can be extended to develop models and diagnostics for multivariate measures of

growth.9,23 First, develop a separate z-score transformation for each variable. Second, extend

the ALB model to describe covariances between variables and across gestational ages. Third,

construct prediction regions for latent score vectors, conditioning on relevant information.

Multivariate approaches are needed to better understand growth patterns among various

morphometric traits and their connection with health problems.

Acknowledgements

We thank Nancy Bott of the Central and Northern Alberta Perinatal Audit and Education

Program for advice and facilitation of data management and transfer. We thank Xu Xiong

of the Perinatal Research Centre, University of Alberta, for helpful comments during the

initial stage of this project. And we thank two referees for comments that led to clarification

of many points. Financial support from the Natural Sciences and Engineering Research

Council of Canada is gratefully acknowledged.

17

References

1. Goldenberg, R. L. and Cliver, S. P. ‘Small for gestational age and intrauterine growthrestriction: definitions and standards’, Clinical Obstetrics and Gynecology, 40, 704–714(1997).

2. Goldenberg, R. L., Cutter, G. R., Hoffman, H. J., Foster, J. M., Nelson, K. G. andHauth, J. C. ‘Intrauterine growth retardation: standards for diagnosis’, AmericanJournal of Obstetrics and Gynecology, 161, 271–277 (1989)

3. Gardosi, J. ‘Customized growth curves’, Clinical Obstetrics and Gynecology, 40, 715–722 (1997).

4. Gardosi, J., Mul, T., Mongelli, M. ‘Application of a fetal weight standard to studyassociations between growth retardation and stillbirth’, Ultrasound Obstetrics and Gy-necology, 66, 6 (2 Suppl) (1995).

5. Godfrey, K. M. and Barker, D. J. ‘ Fetal nutrition and adult disease’, American Journalof Clinical Nutrition, 71, 1344S–1352S (2000).

6. Yiu, V., Buka, S., Zurakowski, D., McCormick, M., Brenner, B. and Jabs, K. ‘Re-lationship between birthweight and blood pressure in children’, American Journal ofKidney Disease, 33, 253–260 (1999).

7. Susser, E., Neugebauer, R, Hoek, H. W., Brown, A. S., Lin, S., Labovitz, D. andGorman, J. M., ‘Schizophrenia after prenatal famine: further evidence’, Archives ofGeneral Psychiatry, 53, 25–31 (1996).

8. Susser, E. S., Schaefer, C. A., Brown, A. S., Begg, M. D. and Wyatt, R. J. ‘Design forthe prenatal determinants of schizophrenia study’, Schizophrenia Bulletin, 26, 257–273(2000).

9. Bertino, E., Battista, E. D., Bossi, A., Pagliano, M., Fabris, C., Aicardi, G. and Milani,S. ‘Fetal growth velocity: kinetic, clinical, and biological aspects’, Archives of Diseasein Childhood, 74, F10–F15 (1996).

10. Hooper, P. M. ‘Flexible regression modeling with adaptive logistic basis functions’,Canadian Journal of Statistics, to appear.

11. Donnelly, C. A., Laird, N. M. and Ware, J. H. ‘Prediction and creation of smoothcurves for temporally corrrelated longitudinal data’, Journal of the American StatisticalAssociation, 90, 984–989 (1995).

12. Shepard, M. J., Richards, V. A., Berkowitz, R. L., Warsof, S. L. and Hobbins, J. C.‘An evaluation of two equations for predicting fetal weight by ultrasound’, AmericanJournal of Obstetrics and Gynecology, 142, 47–54 (1982).

18

13. Hadlock, F. P., Harrist, R. B., Sharman, R. S., Deter, R. L. and Park, S. K. ‘Estimationof fetal weight with the use of head, body, and femur memasurements - a prospectivestudy’, American Journal of Obstetrics and Gynecology, 151, 333–337 (1985).

14. Pan, H. and Goldstein, H. ‘Multi-level models for longitudinal growth norms’, Statisticsin Medicine, 16, 2665–2678 (1997).

15. Diggle, P. J. and Verbyla, A. R. ‘Nonparametric estimation of covariance structure inlongitudinal data’, Biometrics, 54, 401–415 (1998).

16. Friedman, J. H. ‘Multivariate adaptive regression splines’ (with discussion), Annals ofStatistics, 19, 1–141 (1991).

17. Hansen, M., Kooperberg, C. and Sardy, S. ‘Triogram models’, Journal of the AmericanStatistical Association, 93, 101–119 (1998).

18. Ott, W. J. ‘Sonographic diagnosis of intrauterine growth restriction’, Clinical Obstetricsand Gynecology, 40, 787–795 (1997).

19. Johnson, R. A. and Wichern, D. W. Applied Multivariate Statistical Analysis, FourthEdition, Prentice-Hall, Upper Saddle River NJ, (1998).

20. Mul, T., Mongelli, M. and Gardosi, J. ‘A comparative analysis of second-trimesterultrasound dating formulae in pregnancies conceived with artificial reproductive tech-niques’, Ultrasound in Obstetrics and Gynecology, 8, 397–402 (1996).

21. Cole, T. J. and Green, P. J. ‘Smoothing reference centile curves: the LMS method andpenalized likelihood’, Statistics in Medicine, 11, 1305–1319 (1992).

22. Amini, S. B., Catalano, P. M., Hirsch, V. and Mann, L. I. ‘An analysis of birth weightby gestational age using a computerized perinatal data base, 1975–1992’, Obstetricsand Gynecology, 83, 342–352 (1994).

23. Owen, P. and Khan, K. S. ‘Fetal growth velocity in the prediction of intrauterine growthretardation in a low risk population’, British Journal of Obstetrics and Gynaecology,105, 536–540 (1998).

19

Table 1: Distribution of number of ultrasound examinations per subject and summary statis-

tics by group.

Number Frequency Average Average weeks Average Std. Dev.

of exams Gest. Age between exams z-score z-score

1 4998 31.4 0.02 0.99

2 1509 32.0 5.6 −0.02 1.01

3 632 32.2 3.9 0.00 1.02

4 369 31.7 3.4 −0.07 1.01

5 189 30.9 3.1 0.08 1.00

6 108 30.1 3.0 0.01 1.01

7 52 29.8 2.5 0.07 0.91

8 31 28.7 2.4 −0.19 0.86

Table 2: Parameter estimates for the covariance model cT (a1, a2) in expression (10).

k δk βk0 βk1 βk2

1 0.9770 −4.796 0.1661 0.0343

2 0.3327 −7.252 −0.0506 0.2543

3 0.5910 0 0 0

20

Table 3: Summary statistics describing the distribution of the residual scores for training

and test groups, using all examinations preceded by one or more examinations.

Train Test

Number of examinations used 2886 2819

Correlation with preceding exam 0.023 0.017

Mean 0.013 0.031

Variance 1.091 1.130

Skewness 0.226 0.012

Kurtosis 1.056 0.950

Coverage for 80% interval 79.6 78.1

Coverage for 95% interval 93.8 93.7

Table 4: Effect of error in gestational age on evaluation of z-score. The table entries are

scores z = f(w, a) evaluated using estimated gestational age a, where w = f−1(0, a0) is the

median weight at true gestational age a0.

Error a − a0 in Gestational Age

−3 −2 −1 0 1 2 3

a0 = 14 4.4 3.5 1.8 0.0 −1.7 −2.6 −3.1

a0 = 27 2.8 1.9 1.0 0.0 −1.0 −1.6 −2.1

a0 = 40 1.2 0.8 0.4 0.0 −0.4 −0.7 −1.0

21

14 24 34 44 15 20 25 30 35 40

1000

2000

3000

4000

5000

Gestational AgeGestational Age

(a) (b)

15 20 25 30 35 40

5

6

7

8

15 20 25 30 35 40

5

6

7

8

Gestational Age Gestational Age

(a) (b)

Figure 1: (a) Histogram of estimated gestational age in weeks. (b) Scatter plot of estimated

fetal weight in grams versus estimated gestational age.

Figure 2: (a) Scatter plot of log weight versus gestational age. (b) Weekly averages of log

weight plotted against gestational age, with quadratic fit q(a) from expression (2).

22

-1 -0.5 0 0.5 1-4

-2

0

2

4

Gestational Age

(a)

15 20 25 30 35 40

0.85

0.9

0.95

1

1.05

Residual

(b)

15 20 25 30 35 40

1000

2000

3000

4000

5000

15 20 25 30 35 40

50

100

150

200(a) (b)


Figure 3: (a) Normal scores of residuals approximated by linear spline (3). (b) Standard

deviation of the approximate normal scores as a function of gestational age, modeled by

quadratic spline (5).

Figure 4: (a) Median growth curve, a plot of the median weight estimate exp[0.005+q(a)] as

a function of gestational age. (b) Velocity curve, the derivative of the median growth curve.

23

15 20 25 30 35 40

0.4

0.5

0.6

0.7

0.8

0.9

1

15 20 25 30 35 40

0.4

0.5

0.6

0.7

0.8

0.9

1


(a) (b)

Figure 5: Estimated covariances cT (a1, a2) and correlations rT (a1, a2) between latent scores

at gestational ages a1 and a2, with a1 ≤ a2. In both plots, separate curves are plotted

(starting from the bottom) for a1 = 14, 16, . . . , 40, with a2 represented on the horizontal

axis. The upper boundary in the covariance plot represents the estimated variance function

s2T (a).

24


15 20 25 30 35 40-3

-2

-1

0

1

2

3

15 20 25 30 35 40

1000

2000

3000

4000

5000(a) (b)

Gestational Age

15 20 25 30 35 40

0.2

0.4

0.6

0.8

1

Figure 6: (a) Predicted latent score plot. The dashed lines represent unconditional 80%

prediction bounds for latent scores. The middle solid line represents the predicted latent

score as a function of gestational age for a particular fetus, given z-scores from examinations

at ages 28, 30, and 32 weeks. The lower and upper solid lines represent conditional 80%

prediction bounds, given the z-scores. (b) Predicted weight plot, obtained by applying the

f−1 transformation in (6) to the curves in (a).

Figure 7: Probability that a particular fetus is SGA, evaluated as a function of gestational

age. The probabilities are conditioned on z-scores from examinations at 28, 30, and 32 weeks,

as described in Figure 6.

25

4 6 8 10 12 14 16 18

0.2

0.4

0.6

0.8

1

4 6 8 10 12 14 16 18

0

0.1

0.2

0.3

0.4

Age Age

(b) (a)

Figure 8: Pan and Goldstein14 example. Here age is years since birth, not gestational

age. (a) Estimated covariances from (18) between latent scores at ages a1 and a2, with

a1 ≤ a2. Separate curves are plotted (starting from the bottom) for a1 = 2, 3, . . . , 17, with

a2 represented on the horizontal axis. (b) Estimated error variances from (19).

26

amodel for fetal growth and diagnosis of intrauterine growth...

Documents