theory of sampling: general principles for sampling ... · pdf file• th t f li d bthe...
TRANSCRIPT
03.10.2008
1
Theory of Sampling:General principles for sampling
Reliable Data for Waste Management 25-26, 2008, Vienna
General principles for sampling heterogeneous materials
Pentti MinkkinenAtmospheric emissions
Lappeenranta University of TechnologyE-mail: [email protected]
Copyright @ Pentti Minkkinen
WastewaterSolid wastes
03.10.2008
2
Outline
Sources of sampling errorCorrect sampling
Incorrect sampling
Definition of basic concepts (representativity, heterogeneity)
Design and optimization of samplingDesign and optimization of sampling procedures
03.10.2008
3
LOT
Primary Secondary Analysis Resultsample sample
μ
(σ)
x
sample sample
s1 s2 s3 sx
Propagation of errors: ∑= 2ss( ) Propagation of errors:
Example:
GOAL: x = μ
∑= ix ss
%5.5(%)30%)1(%)2(%)5( 2222 ==++=xs
Analytical process usually contains several sampling and samplepreparation steps, which should produce a representative sample
03.10.2008
4
When sample is representative?Definition: sample is representative, if
2222 )()()( refrSEsSEmSEr ≤+=Bias Precision
• Bias is minimized by using correct sampling• Bias is minimized by using correct sampling procedures. Key is probabilistic sampling, i.e., all constituents of target material must have an equal chance to end up into the sample.
• s(SE) standard deviation of sampling errorss(SE), standard deviation of sampling errors caused by random variation is estimated by using methods developed in sampling theory.
03.10.2008
5
Heterogeneity
Heterogeneity generates all sampling errors, except the preparation errors, PE
Two types of heterogeneities defined by Gy:• Constitution heterogeneity, CH
– SD of between smallest fragments of target material
• Distribution heterogeneity DH• Distribution heterogeneity, DH– Non-random distribution of the constituents in
target
03.10.2008
6
15
20
0 10 20 30 40 50 60 70 80 90 1000
5
10
x
y
15
20
0 10 20 30 40 50 60 70 80 90 1000
5
10
15
x
y
Two lots having the same mean aL, CHL, but different CHL
03.10.2008
7
Experimental estimation of heterogeneity, h, at sample scale
N individual fragments (if large enough), or groups of g ( g g ), g pfragments, are collected and analyzed. Heterogeneity is defined as
Ni ,,2,1 K=s
L
Lii M
Ma
aah i−= ,
wheresL Ma
= analytical result (mass fraction) of sample iiais
M = the size of sample iM = mean sample sizesM = mean sample size
∑∑=
is
iis
M
aMLa = estimate of the lot mean
03.10.2008
8
• Standard deviation of the heterogeneity, h,is the estimate of the relative standard deviation of the constitution heterogeneity
• Characterization of the distribution• Characterization of the distribution heterogeneity of flowing streams is shown later
03.10.2008
9
Sample
Dimensionality of sampling targets (lots)
Idealmixing
0-D
Sampling target 0-dimensional, if 1) The whole target can be taken as the sample 2) If the lot to be sampled can be mixed before sampling it can be treated as a 0-dimensional lot.
Fundamental Sampling Error determines the correct sampling error of 0-D targets and it can be estimated by using binomial or Poisson distributions as models, or by using Gy’s fundamental sampling error equations.
03.10.2008
10
1-D
Samples
2-D
If the lot cannot be mixed before sampling the dimensionality of the lot depends on how the samples are delimited and cut from the lot. Auto-correlation has to be taken into account in sampling error estimationcorrelation has to be taken into account in sampling error estimation.
DH is likely the dominating source of sampling errors.
03.10.2008
11
Samples
3 D3-D
Lot is 3-dimensional, if none of the dimensions is completely included in the sample. DH is likely the dominating source of sampling error.
03.10.2008
12
- in real life: waste materials often have extreme- in real life: waste materials often have extreme heterogeneity, both CH and DH
03.10.2008
13
Global Estimation ErrorGEE
Total Sampling ErrorTSE
Total Analytical ErrorTAE
Point Materialization ErrorPME
Weighting ErrorSWE
Increment Delimi-tation Error
IDE
Increment Extraction ErrorIEE
Increment and SamplePreparation Error
IPE
Point Selection ErrorPSE
Long Range Point Selection Error
PSE1
Periodic Point Selection Error
PSE2
Grouping and
Error components of analytical determination according to P. Gy
Fundamental Sampling Error
FSE
Grouping and Segregation Error
GSEGEE=TSE +TAETSE= (PSE+FSE+GSE)+(IDE+IEE+IPE)+SWE
03.10.2008
14
Weighting errorWeighting error is made if the lot mean aL is estimated as simple arithmetic mean,
whena
Lia ∑= , when
• the sampling target consists of several sub-strata of different sizes (or densities)• in process analysis when the flow-rate varies
nLa =
in process analysis, when the flow-rate varies during the sampling campaign
03.10.2008
15
Correct way: Weighted mean
∑∑=
is
iis
M
aMLa
isM
ia
where = sample mass (cut proportionally to flow-rate)or flow-rate at the time of sampling
= concentration of analyte in sample i
03.10.2008
16
5
10R
(%)
Simulation results
-5
0
5
RE
LAT
IVE
ER
RO
R
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1-10
CORRELATION COEFFICIENT
Weighting error is systematic with structurallyWeighting error is systematic with structurally (red) and circumstantially (blue) correlated data, but is random for uncorrelated (black) data. 10 % of data sampled.
03.10.2008
17
SAMPLE MATERIALIZATION:Incorrect sample delimitation
Incorrect sample profileCutter movement
03.10.2008
18
Correct sample delimitation
Correct sample profilesCutter movement
03.10.2008
19
v = constant ≤ 0.6 m/s
ba v
if d > 3 mm, b ≥ 3d = b0if d < 3 mm, b ≥ 10 mm = b0
d = diameter of largest particles
Correct design for proportional l
bc
b0 = minimum opening of the sample cutter
sampler:correct increment extraction
03.10.2008
20
Fundamental sampling error, FSE
Th t f li d b• The component of sampling error caused by random variation in an ideally mixed material consisting of different kind of particles, i.e., the error of ideal sampling process.
• FSE is the only sampling error component that• FSE is the only sampling error component that cannot be completely eliminated.
• FSE is function of the number of the analyte particles in the sample (i.e., sample size).
• FSE can be used as the TSE estimate only for 0-• FSE can be used as the TSE estimate only for 0-dimensional sampling targets
03.10.2008
21
P. Gy’s Fundamental Sampling Error ModelSampling constant
Nominal particle size (95 % top size)
)11(32
LSr MM
Cd −=σ .. Relative varianceof FSE
Nominal particle size (95 % top size)
L i
sr M
Cd 32 ≈σ if Ms << ML Simplifies to
Sample sizeLot size
La
ar
σσ = = relative standard deviationwhere
aL = average concentration of the analyte in the lot
03.10.2008
22
ESTIMATION OF SAMPLING CONSTANT C
1) THEORETICALLY, if some material properties can be estimated (particle shape particle size distribution meanestimated (particle shape, particle size distribution, mean concentration of the determinand in the lot, concentration of determinand in the rich or critical particles that contain it, density of critical particles and density of matrix particles)
2) EXPERIMENTALLY f i t i l b l i2) EXPERIMENTALLY for a given material by analyzing, say 30 to 50 randomly sampled particles (if large enough), or by analyzing small groups of particles with mean size of Ms. From the results the experimental heterogeneities, hi, and their standard deviation, sh are calculated.
3
2
dsM
C hs=
03.10.2008
23
Variance of the mean in stratified sampling
22 NN
= size of stratum (= No. of potential samples)= NO of samples taken
iNin
21
22
2
22
1
21
1
112
11 nns
NnN
ns
NnNsx −
−+
−−
=
NO. of samples taken= between strata variances= within-strata variances determination variances)
in21s22s
1jajos 222112 2
2 >>>>== NnNnNs s
If every stratum is sampled the between-strata variance is eliminated from the lot mean:
1ja,jos, 2221121>>>>⋅ NnNnNs nNx
03.10.2008
24
Estimation of point selection error, PSE
PSE is the error of the mean of a continuous lot estimated by using
PSEdepends on sample selection strategy, if consecutive values are autocorrelated. Selection options:
randomf d d
y gdiscrete samples.
stratified randomstratified systematic.
Point selection error has two components: PSE = PSE1 + PSE2PSE1 ... error component caused by random driftPSE d b li d ifPSE2 ... error component caused by cyclic drift
Statistics of correlated series is needed to evaluate the sampling variance.
03.10.2008
25
100
0 5 10 15 20 25 300
50C
ON
CEN
TRAT
ION
TIME
Random selection
TIME
0 5 10 15 20 25 300
50
100
CO
NC
ENTR
ATIO
N
Stratified selection
TIME
0 5 10 15 20 25 300
50
100
CO
NC
ENTR
ATIO
N
Systematic selection0 5 10 15 20 25 30
TIME
03.10.2008
26
When sampling autocorrelated series the same number of samples gives different uncertainties for the mean depending on selection strategy
Random sampling: n
ss p
x=
Stratified sampling:ss str
x =
sp is the process standard deviation, sstr and ssys standard deviation estimates
nx
Systematic sampling:n
ss sys
x =
where the autocor-relation has been taken into account.
Normally sp > sstr > ssys ,
except in periodic processes, where ssys may be the largest
03.10.2008
27
Systematic sampling from periodic process
0 50 100 150 200 250 300 350 400 450 5000
5
ai
TIME
aL = 0asample= 0.996
0 50 100 150 200 250 300 350 400 450 5000
5
ai
TIME
aL = 0asample= 0.689
TIME
If too low sampling frequency is used in sampling periodicprocesses there is always a danger that the mean is biased
03.10.2008
28
Estimation of PSE by variographyVariogaphic experiment: N samples collected at equal distances and analyzed, are analytical results, MMa
isi ,,
Heterogeneity of the process:
Ni ,,2,1 K=s
s
L
Lii M
Ma
aah i−= ,
Mean of the process:∑= iis aMa
sample sizes and mean sample size, respectively.i
Variogram of heterogeneity as function of sampling interval j :
( ) ( )∑−
+ −=jN
ijij hhjN
V 2
21
,,2,1 Nj K=,
Mean of the process: ∑=isMLa
( ) ( )∑=− i
jj jN 12 2,,,j,
To estimate variances the variogram has to be integrated (numericallyin Gy’s method)
03.10.2008
29
0.005
0.01
0.015
0.04V
A
0
0.02
0
0.01
0.02
VV
B
C
0
0 5 10 15 20 25 30 35 40 45 500
0.01
0.02
SAMPLE INTERVAL
V
D
Shapes of variograms: A. Random process; B. Process with non-periodic drift; C. Periodic process; D. Complex process
03.10.2008
30
Estimation of sulfur in wastewater stream
0
0.5
1h i
0 5 10 15 20 25 300.5
DAYS
Standard deviation of the heterogeneity of the process, sp = 0.282 =28.2 %
03.10.2008
31
0.15
0.05
0.1
V i
0 2 4 6 8 10 12 14 160
Sample interval (d)
Variogram of sulfur in wastewater stream
03.10.2008
32
s str20
25
s sys
5
10
15
20s r
(%)
0 5 10 150
5
Sample interval (d)
Relative standard deviation estimates, which take auto-correlation into account
03.10.2008
33
Estimate the uncertainty of the annual mean, if one sample/ week is analyzed by using systematic sample selection
Sampling interval 7 d s 7 8 %Sampling interval = 7 d ssys = 7.8 %Number of samples/y = n =52
Standard deviation of the annual mean = %1.152%8.7===
ns
s sysx
Expanded uncertainty = %2.2295.0 == xsU
Process standard deviation was 28.2 %. If the number of samples is estimated by using normal approximation (or samples are selected completely randomly) the required number of samples is for the samecompletely randomly) the required number of samples is for the sameuncertainty:
657%)1.1(%)2.28(
2
2
2
2
===x
p
ss
n
03.10.2008
34
CONCLUSIONSSampling uncertainty can be, and should be estimated
If the sampling uncertainty is not known it is questionable whether the sample should be analyzed at all
Cut sample in such a way thar you can treat the target as 0-D or 1-D target (probabilistic sampling!!)
Sampling nearly always takes a significant part of thetotal uncertainty budget
Optimization of sampling and analytical procedures mayresult significant savings, or better results, includingresult significant savings, or better results, including scales from laboratory procedures and process sampling to large national surveys
03.10.2008
35
DANKE SCHÖN
THANK YOU
DANKE SCHÖN