groundwater. notes on geostatistics monica riva, alberto guadagnini politecnico di milano, italy key...
TRANSCRIPT
Groundwater.Groundwater.Notes on geostatisticsNotes on geostatistics
Monica Riva, Alberto Guadagnini
Politecnico di Milano, Italy
Key reference:
de Marsily, G. (1986), Quantitative Hydrogeology. Academic Press, New York, 440 pp
In practiceIn practice: : random spatial variabilityrandom spatial variability of hydrogeologic medium of hydrogeologic medium properties, and properties, and stochastic naturestochastic nature of corresponding flow of corresponding flow (hydraulic (hydraulic head, fluid flux and velocity)head, fluid flux and velocity) and transport and transport (solute concentration, (solute concentration, solute flux and velocity)solute flux and velocity) variables, variables, are often ignoredare often ignored..
Instead, the Instead, the common approachcommon approach has been to analyse flow and has been to analyse flow and transport in multiscale, randomly heterogeneous soils and rocks transport in multiscale, randomly heterogeneous soils and rocks deterministicallydeterministically..
Yet with increasing frequency, the popular Yet with increasing frequency, the popular deterministicdeterministic approach approach to hydrogeologic analysis is proving to be to hydrogeologic analysis is proving to be inadequateinadequate. .
Modelling Modelling flow and transport in heterogenous mediaflow and transport in heterogenous mediamotivation and motivation and general ideageneral idea
Understanding the role of heterogeneityUnderstanding the role of heterogeneity
Jan 2000 editorial "It's the Heterogeneity!“ (Wood, W.W., It’s the Jan 2000 editorial "It's the Heterogeneity!“ (Wood, W.W., It’s the Heterogeneity!, Editorial, Heterogeneity!, Editorial, Ground WaterGround Water, 38(1), 1, 2000), 38(1), 1, 2000): : heterogeneity of chemical, biological, and flow conditions should be a heterogeneity of chemical, biological, and flow conditions should be a major concern in any remediation scenario.major concern in any remediation scenario.
Many in the groundwater community either failed to "get" the message or Many in the groundwater community either failed to "get" the message or were forced by political considerations to provide rapid, untested, site-were forced by political considerations to provide rapid, untested, site-specific active remediation technology.specific active remediation technology.
"It's the heterogeneity," and it is the Editor's guess that the natural "It's the heterogeneity," and it is the Editor's guess that the natural system is so complex that it will be many years before one can effectively system is so complex that it will be many years before one can effectively deal with heterogeneity on societally important scales.deal with heterogeneity on societally important scales.
Panel of expertsPanel of experts (DOE/RL-97-49, April 1997): (DOE/RL-97-49, April 1997): As flow and transport are As flow and transport are poorly understood, previous and poorly understood, previous and ongoing computer modellingongoing computer modelling efforts are efforts are inadequate and based on unrealistic and sometimes optimistic inadequate and based on unrealistic and sometimes optimistic assumptions, which render their assumptions, which render their output unreliableoutput unreliable..
Flow and Transport in Multiscale Fields (conceptual)
Field & laboratory-derive conductivities & dispersivitiesconductivities & dispersivities appear to varyvary continuously with the scale of observationscale of observation (conductivity support, plume travel distance). Anomalous transport.
Recent theoriestheories attempt to linklink such scale-dependencescale-dependence to multiscale multiscale structurestructure of Y = ln K.
PredictPredict observed effect of domain sizeeffect of domain size on apparent variance and integral scale of Y.
PredictPredict observed supra linear growth rate of dispersivitysupra linear growth rate of dispersivity with mean travel distance (time).
Major challengeMajor challenge: develop more powerful/general stochastic theories/models for multiscale random media, and back them with lab/field observation.
Shed some light
Conceptual difficulty:
Data deduced by means of deterministic Fickian models from laboratory and field tracer tests in a variety of porous and fractured media, under varied flow and transport regimes.
Linear regression:
aLa 0.017 s1.5
Supra-linear growth
Neuman S.P., On advective transport in fractal permeability and velocity fields, Water Res. Res., 31(6), 1455-1460, 1995.
Natural Variability. Geostatistics revisited
• Introduction: Few field findings about spatial variability
• Regionalized variables
• Interpolation methods
• Simulation methods
AVRA VALLEY
Clifton and Neuman, 1982Clifton, P.M., and S.P. Neuman, Effects of Kriging and Inverse Modeling on Conditional Simulation of the Avra Valley Aquifer in southern Arizona, Water Resour. Res., 18(4), 1215-1234, 1982.
Regional Scale
Columbus Air Force [Adams and Gelhar, 1992]
Aquifer Scale
Mt. Simon aquifer
Bakr, 1976
Local Scale
Summary: Variability is present at all scales
But, what happens if we ignore it? We will see in this class that this would lead to interpretation problems in both groundwater flow and solute transport phenomena
Examples in transport: - Scale effects in dispersion - New processes arising
Heterogeneous parameters: ALL (T, K, , S, v (q), BC, ...)
Most relevant one: T (2D), or K (3D), as they have been shown to vary orders of magnitude in an apparently homogeneous aquifer
Variability in T and/or K
Summary of data from many different places in the world. Careful though! Data are not always obtained with rigorous procedures, and moreover, as we will see throughout the course, data depend on interpretation method and scale of regularization
Data given in terms of mean and variance (dispersion around the mean value)
Variability in T and/or K
Almost always σlnT (or σlnK ) < 2 (and in most cases <1) This can be questioned, but OK by now Correlation scales (very important concept later!!)
But, what is the correct treatment for natural heterogeneity?
First of all, what do we know?
- real data at (few) selected points
- Statistical parameters
- A huge uncertainty related to the lack of data in most part of the aquifer. If parameter continuous (of course they are), then the number of locations without data is infinity
Note: The value of K at any point DOES EXIST. The problem is we do not know it (we could if we measured it, but we could never be exhaustive anyway)
Stochastic approach: K at any given point is RANDOM, coming from a predefined (maybe known, maybe not) pdf, and spatially correlated ------ REGIONALIZED VARIABLE
Regionalized Variables
T(x,ω) is a Spatial Random Function iif:
- If ω = ω0 then T(x,ω0) is a spatial function (continuity?, differentiability?)
- If x = x0 then T(x0) (actually T(x0, ω)) is a random function
Thus, as a random function, T(x0) has a univariate distribution (log-normal according to Law, 1944; Freeze, 1975)
2 21
( ) exp( ( ) /(2 ))2
x
D x d
Hoeksema and Kitanidis, 1985
Hoeksema & Kitanidis, 1985
Log-T normal, log-K normal
Both consolidated and unconsolidated deposits
Now we look at T(x), so we are interested in the multivariate distribution of T(x1), T(x2), ... T(xn):
Most frequent hypothesis:
Y=(Y(x1), Y(x2), ... Y(xn))=(ln T(x1), ln T(x2), ... ln T(xn)) Is multinormal
with
But most important: NO INDEPENDENCE
1/ 21 1/ 2
1 1( ) exp ( ) ( )
2(2 )t
nf
Y C Y μ C Y μ
1
ij
( ( ( )),..., ( ( )))
variance-covariance matrix
C ( ( ), ( )) (( ( ) ) ( ( ) ))
n
i j i i j j
E Y E Y
Cov Y Y E Y Y
μ x x
C
x x x x
What if independent?
and then we are in classical statistics
But here we are not, so we need some way to characterize dependency of one variable at some point with the SAME variable at a DIFFERENT point. This is the concept of the SEMIVARIOGRAM (or VARIOGRAM)
2ijC (( ( ) ) ( ( ) ))i i j j ijE Y Y x x
Classification of SRF• Second order stationary
E[Z(x)]=constC(x, y) is not a function of location (only of separation distance,
h)Particular case: isotropic RSF; C(h) = C(h)Anisotropic covariance: different correlation scales along
different directions
Most important property: if multinormal distribution, first and second order moments are enough to fully characterize the SRF multivariate distribution
Relaxing the stationary assumption
1. The assumption of second-order stationarity with finite variance, C(0), might not be satisfied (The experimental variance tends to increase with domain size)
2. Less stringent assumption: INTRINSIC HYPOTHESISThe variance of the first-order increments is finite AND these increments are themselves second-order stationary. Very simple example: hydraulic heads ARE non intrinsic SRF
E[Y(x + h) – Y(x)] = m(h)
var[Y(x + h) – Y(x)] = (h)Independent of x; only function of h
Usually: m(h) = 0; if not, just define a new function, Y(x) – m(x), which satisfies this consition
Definition of variogram, (h)
E[Y(x + h) – Y(x)] = 0
(h) = (1/2) var[Y(x + h) – Y(x)] = (1/2) E[(Y(x + h) – Y(x))2]
Variogram v. Covariance
1. The variogram is the mean quadratic increment of Y between two points separated by h.
2. Compare the INTRINSIC HYPOTHESIS with SECOND-ORDER STATIONARITY
E[Y(x)] = m = constant
(h) = (1/2) E[(Y(x + h) – Y(x))2] = = (1/2) ( E[Y(x + h)2] + E[Y(x)2] – 2 m2 – 2 E[Y(x + h) Y(x)] + 2 m2) = = C(0) – C(h)
0
0.2
0.4
0.6
0.8
1
0 1 2 3 4 5
covariance
variogram
h
The variogram
The definition of the Semi-Variogram is usually given by the following probabilistic formula
When dealing with real data the semi-variogram is estimated by the Experimental Semi-Variogram.
For a given separation vector, h, there is a set of observation pairs that are approximately separated by this distance. Let the number of pairs in this set be N(h).
The experimental semi-variogram is given by:
21ˆ2 i j
N
Y YN
h
hh
21( ) ( ) ( )
2E Y Y x h h h
Some comments on the variogramIf Z(x) and Z(x+h) are totally independent, then
If Z(x) and Z(x+h) are totally dependent, then
One particular case is when x = x+h. Therefore, by definition
2 2
1 2
2 2 2 22 2 21 2 1 2
1 1( ) ( )
2 21 1 1 1
2 2 2 2
E Z Z E Z Z
E Z E Z E
x x h x x h
x x h
210
2E Z Z
x x h
0 0
2( ; 0)ct In the stationary case:
Variogram Models
DEFINITIONS:
•Nugget
•Sill
•Range
•Integral distance or correlation scale
Models:
•Pure Nugget
•Spherical
•Exponential
•Gaussian
•Power
0( )h C
2 3( ) ((1.5 / ) 0.5( / ) )h h a h a 2( ) (1 exp( / ))h h a
2 2 2( ) (1 exp( / ))h h a
( ) bh ah
( )h
h
Correlation scales: Larger in T than in K. Larger in horizontal than in vertical. Fraction of the domain of interest
Additional comments• Second order stationary
E[Z(x)]=constant(h) is not a function of location
Particular case: isotropic RSF (h) = (h)Anisotropic variograms: two types of anisotropy depending
on correlation scale or sill value
Important property: (h) = 2 – C(h)Most important property: if multinormal distribution, first
and second order moments are enough to fully characterize the SRF multivariate distribution
Estimation vs. Simulation
Problem: Few data available, maybe we know mean, variance and variogram
Alternatives:
(1) Estimation (interpolation) problems: KRIGING
Kriging – BLUE
Extremely smooth
Many possible krigings Alternative: cokriging
http://www-sst.unil.ch/research/variowin/
The kriging equations - 1We want to predict the value, Z(x0), at an unsampled location, x0, using
a weighted average of the observed values at N neighboring locations, {Z(x1), Z(x2), ..., Z(xN)}. Let Z*(x0) represent the predicted value; a
weighted average estimator be written as
0 0 01 1
*( ) ( )N N
i ii i
i i
Z Z Z
x x
The associated estimation error is
*0 0 0 0 0*Z Z Z Z x x
In general, we do not know the (constant) mean, m, in the intrinsic hypothesis. We impose the additional condition of equivalence between the mathematical expectation of Z* and Z0.
*0 0E( ) 0Z Z
The kriging equations - 2
0 01
E EN
ii
i
Z Z m
This condition allows obtaining an unbiased estimator.
0 01 1
E[ ]N N
i ii
i i
Z m m
01
1N
i
i
Unknown mathematical expectation of the process Z.
The kriging equations - 3
2 2
2*0 0 0 0 0 0 0
1 1 1
E = E EN N N
i i ii i
i i i
Z Z Z Z Z Z
We wish to determine the set of weights. IMPOSE the condition
*0 0 0var( ) var minimumZ Z
2
0 01
EN
ii
i
Z Z
0 0 0 01 1
EN N
i ji j
i j
Z Z Z Z
0 0 0 01 1
EN N
i ji j
i j
Z Z Z Z
The kriging equations - 4We then use the definition of variogram
2 2
0 0
1 1E E ( ) ( )
2 2i j i j i jZ Z Z Z Z Z x x
2 2
0 0 0 0
1 1E E E
2 2i j i jZ Z Z Z Z Z Z Z
0 0 0 0Ei j i jZ Z Z Z x x x x
THEN: 0 0 0 0E i j i j i jZ Z Z Z x x x x x x
2*0 0 0 0 0 0
1 1
E = EN N
i ji j
i j
Z Z Z Z Z Z
Which I will use into:
The kriging equations - 5By substitution
We finally obtain:
2*0 0 0 0 0 0 0
1 1 1 1
0 0 01 1
E =N N N N
i j i ji j i
i j i j
N Ni j
ji j
Z Z
x x x x
x x
Noting that: 0 0 0 01 1
N Ni j
i ji j
x x x x
2*0 0 0 0 0 0
1 1 1
E = 2N N N
i j ii j i
i j i
Z Z
x x x x
The kriging equations - 6This is a constrained optimization problem. To solve it we use the method of Lagrange Multipliers from the calculus of variation. The Lagrangian objective function is
To minimize this we must take the partial derivative of the Lagrangian with respect to each of the weights and with respect to the Lagrange multiplier, and set the resulting expressions equal to zero, yielding a system of linear equations
10 0 0 0
1
0 0 01 1 1
, , , 2 ( ), ( )
( ), ( ) 2 1
NN i
ii
N N Ni j i
i ji j i
L Z x Z x
Z x Z x
The kriging equations - 7 1
0 0 0 01
0 0 01 1 1
, , , 2 ( ), ( )
( ), ( ) 2 1
NN i
ii
N N Ni j i
i ji j i
L Z x Z x
Z x Z x
0 1 1 01
0 2 2 01
0 01
01
( ), ( ) ( ), ( )
( ), ( ) ( ), ( )
( ), ( ) ( ), ( )
1
Nj
jj
Nj
jj
Nj
j N Nj
Nj
j
Z Z Z Z
Z Z Z Z
Z Z Z Z
x x x x
x x x x
x x x x
Minimize this:
and get (N+1) linear equations with (N+1) unknowns
The kriging equations - 8The complete system can be written as: A = b
1 2 1
2 1 2
1 2
11 00
00
0 , , 1
, 0 , 1
, , 0 1
1 1 1 0
,
,
1
N
N
N N
NN
Z Z Z Z
Z Z Z Z
Z Z Z Z
Z Z
Z Z
x x x x
x x x x
A
x x x x
x x
λ bx x
The kriging equations - 9We finally get the Variance of the Estimation Error
2* *0 0 0 0var =EZ Z Z Z
*0 0 0 0
1
var =N
ii
i
Z Z
x x
0TVar λ b
2*0 0 0 0 0 0
1 1 1
E = 2N N N
i j ii j i
i j i
Z Z
x x x x
0 01
( ), ( ) ( ), ( )N
jj i i
j
Z Z Z Z
x x x xHint: just replace
into
Estimation vs. Simulation (ii)
(2) Simulations: try to reproduce the “look” of the heterogeneous variable
Important when extreme values are important
Many (actually infinite) solutions, all of them equilikely (and with probability = 0 to be correct)
For each potential application we are interested in one or the other
Estimation. 1AVRA VALLEY. Regional Scale - Clifton, P.M., and S.P. Neuman, Effects of Kriging and Inverse Modeling on Conditional Simulation of the Avra Valley Aquifer in southern Arizona, Water Resour. Res., 18(4), 1215-1234, 1982.
Estimation. 2AVRA VALLEY. Regional Scale - Clifton, P.M., and S.P. Neuman, Effects of Kriging and Inverse Modeling on Conditional Simulation of the Avra Valley Aquifer in southern Arizona, Water Resour. Res., 18(4), 1215-1234, 1982.
Estimation. 3AVRA VALLEY. Regional Scale - Clifton, P.M., and S.P. Neuman, Effects of Kriging and Inverse Modeling on Conditional Simulation of the Avra Valley Aquifer in southern Arizona, Water Resour. Res., 18(4), 1215-1234, 1982.
Estimation. 4AVRA VALLEY. Regional Scale - Clifton, P.M., and S.P. Neuman, Effects of Kriging and Inverse Modeling on Conditional Simulation of the Avra Valley Aquifer in southern Arizona, Water Resour. Res., 18(4), 1215-1234, 1982.
Estimation. 5AVRA VALLEY. Regional Scale - Clifton, P.M., and S.P. Neuman, Effects of Kriging and Inverse Modeling on Conditional Simulation of the Avra Valley Aquifer in southern Arizona, Water Resour. Res., 18(4), 1215-1234, 1982.
Monte Carlo approach
2 2h th
hh11
hh22
hh20002000
..
. .
..
1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
- 3
- 2
- 1
- 0 . 1
- 0 . 0 1
0 . 0 1
0 . 1
1
2
3
1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
- 3
- 2
- 1
- 0 . 1
- 0 . 0 1
0 . 0 1
0 . 1
1
2
3
1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
- 3
- 2
- 1
- 0 . 1
- 0 . 0 1
0 . 0 1
0 . 1
1
2
3
..
. .
..
2000 simulations2000 simulations
Statistical CONDITIONAL Statistical CONDITIONAL moments, first and second moments, first and second
orderorder
CONDITIONALCONDITIONAL CROSS- CROSS-CORRELATED FIELDS Y = lnTCORRELATED FIELDS Y = lnT
NUMERICAL ANALYSIS - MONTE CARLONUMERICAL ANALYSIS - MONTE CARLO
Simple to understand Applicable to a wide range of linear and nonlinear problems High heterogeneities Conditioning
Heavy calculations Fine computational grids Reliable convergence criteria (?)
Evaluation of key statistics of medium parameters (K, porosity, …)
Synthetic generation of an ensemble of equally likely fields
Solution of flow/transport problems on each one of these
Ensemble statistics
0
0.2
0.4
0.6
0.8
1
1.2
1 10 100 1000 10000 100000 1000000
Hyd
rau
lic
hea
d v
aria
nce
Number of Monte Carlo simulations
y
Q
r
L
ir
i
x
h = HL
Problems: reliable assessment of convergence – Problems: reliable assessment of convergence – Ballio and GuadagniniBallio and Guadagnini [2004] [2004]