quantitative exposure data: interpretation, decision

Quantitative Exposure Data:

Interpretation, Decision Making,

and Statistical Tools

Purpose of Exposure Assessment

• To decide two things:

– Is the SEG’s exposure profile (exposure and its variability) adequately

characterized?

– It the exposure profile acceptable?

• A baseline exposure assessment (or comprehensive exposure

assessment) requires characterization of the SEG’s exposure

profile.

– An exposure profile is a summary “picture” of the exposure

experienced by an SEG.

• A compliance-based program will focus efforts on exposures near

OELs.

Exposure Acceptability Judgments

• A variety of tools and factors are related to the

judgment of exposure acceptability.

– process experience

– material characteristics

– toxicity knowledge

– work force characteristics

– frequency of task

– frequency of peak excursions

– monitoring results

– statistical tools

– confidence in exposure limit

– modeling techniques

– biological monitoring

– availability and adequacy of engineering controls

Statistical Considerations

• Statistical tools are powerful only if their theoretical bases and

limitations are understood by the person using them.

• Statistical issues must be considered early in the assessment

process. They should be included in the development of the

exposure assessment strategy and when determining a monitoring

strategy.

– Difficulties

• random sampling

• sufficient data

• In spite of their limitations, statistical tools are useful because they

help form a picture of the exposure profile. If their limitations are

understood, they will greatly enhance knowledge of the exposure

profile.

Approximate Sample Size Requirements to be 95% Confident that

the True Mean Exposure Is Less Than the Long-term Occupational

Exposure Limit (Power = 90%)Sample Size (n)

Ratio:

ture

mean/OEL

Low

variability

(GSD=1.5) GSD = 2.0

Moderate

variability

(GSD = 2.5) GSD = 3.0

High

variability

(GSD = 3.5)

0.75 25 82 164 266 384

0.50 7 21 41 67 96

0.25 3 10 19 30 43

0.10 2 6 13 21 30

Sample Size Estimation

Exposure Distribution and Parametric or

Nonparametric Statistical Tools

• A population distribution is a description of the relative

frequencies of the elements of that population.

• Parametric statistics

– The most powerful statistical tools require knowledge or assumptions

about the population’s distribution.

• Nonparametric statistics

– When the underlying distribution of exposure is not known,

nonparametric statistics should be used.

– These statistical tools tend to focus on robust measures such as the

distribution median or other percentile because they are less sensitive

to outliers and spurious data.

– low statistical power and more measurements needed

Common Distribution in Industrial Hygiene

• The random sampling and analytical errors associated with an air

monitoring result is usually presumed to be normally distributed.

• The random fluctuations in exposure from shift to shift or within

shifts tend to be lognormally distributed.

• Exposure fluctuations account for the vast majority of an exposure

profile’s variability (usually more than 85%).

• If we have resources to commit to exposure monitoring, usually the

most efficient approach would call for putting resources into more

measurements rather than into more precise sampling methods.

Distribution Verification

• A logprobability plot is the simplest and most straightforward way

to check data for lognormality.

• The Shapiro and Wilk Test (W-test) is the most rigorous test for

lognormality.

• If the data form a straight line on the lognormality plot, it signifies

that the data follow a lognormal distribution. Then, the line can

be used to estimate the distribution’s geometric mean and

geometric standard deviation.

50100 5001000

.005.01

.02

.05

.1

.2

.3

.4

.5

.6

.7

.8

.9

.95

.98

.99.995

Point plotting method = km

Logprobability Plot

Cumulative Probability

Concentration (mg/m3)

Making a Probability Plot

• Procedures

– Rank order the data, lowest to highest.

– Rank each value from 1 (lowest) to n (highest).

– Calculate the plotting position for each value.

• Plotting position = rank/(n+1)

– Plot the concentrations against the plotting positions.

– Examine a best line through the plotted data.

– Determine whether the data provide a reasonable fit for the straight

line.

– Estimate the distribution GM, GSD and percentiles of interest from

the best-fit line.

W-test for distribution Goodness-of-fit

• W-test is one of the most powerful tests for determining goodness-

of-fit for normal or lognormal data when n is fairly small (n 50).

• The W-test is performed as follows:

– Order the data, smallest to largest.

– Calculate k: k =n/2 if n is even; k = (n-1)/2 if n is odd.

– Calculate the W statistic:

– The data are form a normal (or lognormal if applied to the

logtransformed data) population if W is greater than a certain value.

)1(

)(

2

2

1

1

nS

xxa

W

k

i

iini

Sampling Randomly from

Stationary Populations (1)

• Random sampling

– Each element in the population must have equal likelihood of being

observed.

– Practical considerations of travel constraints, weather, process

operation parameters, budgetary limits, and the need to characterize

multiple exposure profiles make statistically randomized sampling

extremely difficult in the real world.

– To avoid known bias:

• If possible, avoid clustering your monitoring into consecutive periods.

• To monitoring different seasons to avoid biases introduced by factors that

change with weather conditions.

• To understand process cycles and avoid biases they might introduce.

• To include both typical and unusual events.



• Autocorrelation

– Autocorrelation occurs when the contaminant concentration in one

time period is related to the concentration in a previous period.

– Clustering all samples in one period when autocorrelation occurs will

result in an underestimate of variability in the exposure profile and an

imprecise estimate of the mean exposure.

– Autocorrelation can also result in underestimating or overestimating

the true degree of exposure depending on whether a high or low

concentration cycle happened to haven been grabbed.



• Stationary population

– Definition of Stationary

• A random process is said to be stationary if its distribution is independent

of the time of observation.

– Stationary population

• An underlying population that does not change during the exposure

assessment period. That is, the mean and variance of this population are

stable over time.

– If the population changes significantly over the random sampling

period, only calculations of sample descriptive statistics and decision

making on the basis of professional judgment are recommended.

– One simple procedure that can help subjectively check for population

stability is to plot the monitoring data chronologically by time of

monitoring. If any trends in the data are apparent, that is a sign the

underlying process is not stationary.

Similar Exposure Interval

• A similar exposure interval is defined as a period in which the

distribution of exposures for a SEG would be expected to be

stationary.

• The measurements needed to characterize the exposure profile

would be taken randomly within a similar exposure interval.

Relationship of Averaging Times

• It is inappropriate to average short-term data with full-shift data.

Short-term data tends to be distributed differently than full-shift

data.

• Mixing of data from different averaging times makes estimates of

variance inaccurate and precludes use of most common statistical

tools.

• Techniques are being developed to predict long-term exposure

profiles based on a time-weighted combination of exposure profiles

for the several short-term tasks. These techniques hold great

promise for providing more detailed characterizations of

exposures and for optimizing sampling efficiency using stratified

random sampling of critical tasks.

Nondetectable Data

• Monitoring result below the analytical limit of detection should not

be discarded.

• Several techniques are available for including below detection

limit data in statistical analysis.

• A factor of 0.7 times the detection limit may be most appropriate

for data with relatively low variability (GSD < 3).

• A factor of 0.5 times the detection limit may be best when the

variability is high (GSD > 3). If more than 50% of data are below

the detection limit then special techniques may be required.

• Probability plot is another way to include data below the detection

limit in the statistical analysis. These plots allow extrapolation of

the data above the detection limit to account for the data below the

detection limit for determination of a reasonable estimate of the

average and variability.

Statistical Techniques

• There is no ideal statistical technique for evaluating industrial

hygiene monitoring data.

• All measurements to be analyzed statistically should be valid in

that:

– They were collected and analyzed using a reasonably accurate and

reasonably unbiased sampling and analytical method.

– They adequately represent personal exposure.

• Descriptive statistics

– arithmetic man, standard deviation, median, range, maximum,

minimum, and fraction of samples over the OEL.

• Inferential statistics

– quantitative estimate of exposure profile

– arithmetic mean and upper tail

– If a decision must be made with a few measurements (for example, 10),

confidence is highest for the estimate of the mean , lower for the

estimate of variance, and lowest for estimate of lower or upper

percentiles.

Focus on the Arithmetic Mean

• For chronic-acting substances, the long-term average exposure

(exposure averaged over weeks or months) is a relevant index of

dose and. Therefore, a useful parameter on which to focus for

evaluating the health risk posed by such an exposure.

• For agents causing Body damping of swings in exposure

• Statistically defined OEL

– definition

• It is an acceptable exposure profile defined by the OEL’s sponsoring

organization.

• It should clearly stated whether:

– The OEL is interpreted as a long-term average (i.e., arithmetic mean of the

distribution of daily average exposures);

– A permissible exceedance of day-to-day exposures (e.g., 5%); or

– A never-to-be-exceeded maximum daily average (e.g., 100% of the daily

average exposures are less than the OEL).

Arithmetic Mean of a Lognormal Distribution

• It is the arithmetic mean, not the geometric mean, of a lognormal

exposure distribution is the best descriptior of average exposure.

• The difference between arithmetic mean and geometric mean of a

lognormal distribution increases when variance in the distribution

increases.

Estimating the Arithmetic Mean of a

Lognormal Distribution

• The recommended method for all sample sizes and GSDs is the

minimum variance unbiased estimate (MVUE).

– Unbiased and minimum variance

• The maximum likelihood estimate (MLE) is easy to calculate and

is less variable than the simple mean for large data sets (N > 50)

and high GSDs.

Confidence Limits Around the Arithmetic

Mean of a Lognormal Distribution

• Confidene limits allow one to gauge the uncertainty in the

parameter estimate. The wider the confidence limits, the less

certain the point estimate.

• Land‘s “exact” procedure is suggested for calculating

confidence limits for arithmetic mean estimates.

Focus on the Upper Tail

• For agents causing acute effects, the average exposure is not as

important as understand how high the exposure may get because

those few high exposure might pose a more important risk to

health than average exposure at lower levels.

• An examination of the exposure profile‘s upper tail will allow an

estimate of the relative frequency with which OEL may be

exceeded.

Estimating Upper Percentiles

• For agents causing acute effects, the average exposure is not as

important as understand how high the exposure may get because

those few high exposure might pose a more important risk to

health than average exposure at lower levels.

• An examination of the exposure profile‘s upper tail will allow an

estimate of the relative frequency with which OEL may be

exceeded.

Tolerance Limits

• To statistically demostrate that no more than a given percentage of

exposures are greater than a standard with some confidence.

– An industrial hygienist can have 95% confidence that no more than

5% of the exposures exceed the standard.

– In effect, an upper one-sided 95% confidence limit on the estimate of

the 95% percentile.

• Advantges:

– Tolerance limits are helpful for defining upper end of an exposure

profile.

– Tolerance limits approach may be appropriate for compliance testing.

• Disadvantages:

– It is very sensitive to sample sizes and the distribution‘s standard

deviation.

How to Choose --- The Mean or the Upper Trial

• In determining compliance with most regulatory and authoritative

OELs that exist today, a focus on the upper tril would be most

appropriate.

• In 1978, OSHA expressed in the preamble to its lead PEL:

– OSHA recongizes that there will be day-to-day variability in airborne

lead exposure experienced by a single employee. The permissible

exposure limit is a maximum allowable value which is not to be

exceeded: hence exposure must be controlled to an averahge value

well below the permissible exposure limit in order to remian in

compliance.

Analysis of Variance to Refine Critical SEGs

• Analysis of variance (ANOVA) is a statistical technique that can be

used to compare the variability of individual workers‘ exposures

with the exposure variability of the overall SEG.

– ANOVA ise used to examine the exposure variability for each

monitored individual (within worker variability) and compare it with

the worker-to-worker variability in the SEG (between-worker

variability).

• This approach can be used to check the homogeneity of the critical

SEGs for which risk of individual misclassification is most severe

and to reassign individuals as necessary.

Examining the Arithmetic Mean: Mean

Estimates and Confidence Intervals

Arithmetic Mean

• Understanding the mean of the exposure profile may be important when judging exposure

– Several short term measurements are used to characterize a daily average.

– Several day-long TWA measurements are being used to estimate the long-term average of a day-to-day exposure profile.

Arithmetic Mean

• The best predictor of dose is the exposure

distribution’s arithmetic mean, not the geometric

mean. The general technique is to:

1. Estimate the exposure distribution’s arithmetic mean.

2. Characterize the uncertainty in the arithmetic mean’s

point estimate by calculating confidence limits for the

true mean.

3. Examine the arithmetic mean’s point estimate and true

mean confidence limit(s) in light of an LTA-OEL or

other information to make a judgment on the exposure

profile.

Confidence Intervals

• Upper confidence limit (UCL):

– conservatively protective of worker health, UCL for the

arithmetic mean estimate is emphasized

• UCL1,95% (arithmetic mean’s one-sided 95%

UCL ) < LTA-OEL

– the industrial hygienist would be at least 95 % sure that

the exposure profile’s true mean was below the LTA-

OEL

• Place all of the statistical power into characterizing

the single boundary most important to the judgment

95% Upper Confidence Interval for

Arithmetic Mean

95% Upper Confidence Interval for

the Arithmetic Mean

Arithmetic Mean Point Estimate

95% certain that the

exposure profile true

mean exposure is

less than this value.

Probability Plotting and Goodness-of-Fit

• Parametric methods:

– rely on the assumption about the shape of the

underlying population distribution

• Most exposure distributions are right-skewed

and can be reasonably approximated by the

lognormal distribution

– If the probability plotting and goodness-of-fit

techniques verify a lognormal distribution, the tools

for lognormal distributions should be used

Probability Plotting and Goodness-of-Fit (Cont.)

• If the data do not seem to fit a lognormal distribution, but they do seem to fit a normal distribution, the tools for normally distributed data should be used

• If the data do not seem to fit either the normal or lognormal distribution

– Whether the SEG has been properly defined

– Whether there has been some systematic change to the underlying exposure distribution

– Descriptive and nonparametric statistics

Characterizing the Arithmetic Mean of a Lognormal Distribution

• Easy to calculate but less accurate – Sample Mean and t-Distribution Confidence Limits

• more variable for large sample size

– Maximum likelihood estimate and Confidence limits

• underestimate variability, too narrow

• Accurate but more difficult to calculate – Minimum variance unbiased estimate (MVUE)

Point estimate only

– Land’s “Exact” confidence limit Confidence limits only

Which To Use Point Estimate of the True Mean of the Lognormal Distribution

• If a computer or programmable calculator

is available, the MVUE should be used as

the preferred point estimate of the true

mean of the lognormal distribution

If not

• Sample mean: the GSD is small (<2) or

there are few samples (<15-20)

• MLE: the sample size is large (>15-20)

Which To Use Confidence Limits for the True Mean of the Lognormal Distribution

• Land’s method:

– exact confidence limits for the true mean

– if a computer is available

• MLE method:

– if a computer is not available

– underestimate the true upper confidence limit

• Easy-to calculate sample mean and t-

distribution confidence interval

– Many monitoring results available (>30)

Specific Techniques Sample Mean and t-Distribution

Confidence Limit

• Sample mean as a point estimate for the exposure distribution arithmetic mean

– no computer or programmable calculator available

– few samples (<15-20) and a small GSD (<2)

• Simple t-distribution C.I. procedure

– Developed for use with normal distributions

– Also works well for many non-normal distributions (including the lognormal distribution) when sample sizes are large (n>30 , GSD<1.5)

• Sample mean and t-distribution method:

– exposure distribution is better characterized by a normal distribution than a lognormal distribution

Calculation of the Sample Mean and Confidence Limit

• Step 1:

Calculate the sample mean ( ) and sample standard deviation (s)

• Step 2:

Calculate the confidence limits

• Step 3:

Computer the UCL to the LTA-OEL

)(95.0

)(95.0

)(

%95,1

%95,1

n

stxLCL

n

stxUCL

n

stxLC

x

• MLE: better point estimate than the sample mean

– More than 15-20 samples or a high GSD

– Easy to calculate

– Underestimate variability in many cases

– the computed UCL should be interpreted cautiously because it will often be lower that the exact UCL

Maximum Likelihood Estimate and Confidence Limits for the Arithmetic Mean of

a Lognormal Distribution

Maximum Likelihood Estimate and Confidence Limits

• Step 1:

Calculate the mean ( ) and standard deviation ( ) of the

logtransformed data where y=ln(x)

• Step 2:

Calculate the MLE:

• Step 3:

Calculate the UCL and/or LCL for the MLE

• Step 4:

Compare the UCL to the LTA-OEL

])1

(2

1exp[ 2

ysn

nyMLE

y ys

]1

exp[lnn

nstMLECL

y

Minimum Variance Unbiased Estimate of the Arithmetic Mean of a Lognormal Distribution

• MVUE: the preferred point estimate, used routinely unless no computer available

• Calculated iteratively

• Calculation using five terms will give results correct to three significant figures for sample sizes from 5 to 500 and GSDs from 2 to 5

• Step 1:

Calculate the mean ( ) and standard deviation ( ) of the logtransformed data where y=ln(x)

• Step 2:

Calculate the MVUE

Minimum Variance Unbiased Estimate Procedures

2 :

...]!3)3)(1(

)1(

!2)1(

)1()1(1[)exp(

2

3

3

5

2

2

3

yslwhere

l

nnn

n

l

nn

nl

n

nyMVUE

y ys

• Land’s exact method: the most accurate

and least-biased estimate, should be used

whenever possible

• Hewett & Ganser Graphic technique:

– Used for interpolating one of the parameters

needed for the calculation

– Equations to approximate the curves in the

graphs

Land’s “Exact” Estimate of the Arithmetic Mean Confidence Limits for a Lognormal

Distribution

• Step 1:

Calculate the mean ( ) and standard deviation ( ) of the logtransformed data where y=ln(x)

• Step 2:

Obtain the C-factor for Land’s formula (C(Sy, n, 0.05) for 95% LCL and C(Sy, n, 0.95) for 95% UCL)

• Step 3:

Calculate the 95% UCL (or 95% LCL)

• Step 4:

Compare the 95% UCL to the LTA-OEL

Land’s “Exact” Estimate Procedure

yys

)2

1exp(ˆ :

]1

)ˆexp[ln(

2

y

y

syuwhere

n

sCuCL

quantitative exposure data: interpretation, decision

Documents