an adjoint -free approach to parameter estimation … · an adjoint -free approach to parameter...
TRANSCRIPT
1
AN ADJOINT-FREE APPROACH TO PARAMETER ESTIMATION
Arnold Heemink
Delft University of Technology
joint work with: Umer Altaf, Gosia Katela, Remus Hanea, Jan Dirk Jansen, Cong Xiao, Olwijn Leeuwenburgh, Martin Verlaan and Hai Xiang Lin
State space model
The (non linear) physics: where X is the state, p is vector of uncertain parameters, f represents the (numerical) model, G is a noise input matrix and W is zero mean system noise with covariance Q The measurements: where M is the measurement matrix and V is zero mean measurement noise with covariance R
1 1( , , 1) ( 1)k k kX f X p k G k W− −= − + −
kkk VXkMZ += )(
Formulation of the weak constraint data assimilation problem
It is desired to combine the data with the stochasticmodel in order to obtain an optimal estimate of the state and parameters of the system. We define the criterion:
1
1 1
2
1
2 21 0( )
1
( , ) ( )
( , , 1) T
K
k k k Rk
K
k k GQG Ck
J p X Z M k X
X f X p k p pα
−
− −
=
−=
= −
+ − − + −
∑
∑
Non linear deterministic systems. Strong constraint variational data assimilation (4Dvar): Parameter estimation
For Q=0 the error criterion reduces to: Introducing the adjoint state we can rewrite the criterion:
1 1
2 20
1
1
( ) ( )
( ( , , 1))
K
k k R Ck
KTk k k
k
J p Z M k X p p
L X f X p k
α− −
=
−
= − + −
+ − −
∑
∑
20
1
211)()( −− −+−=∑
=C
K
kRkk ppXkMZpJ α
Non linear deterministic systems. Strong constraint variational data assimilation (4Dvar): Parameter estimation
If we solve the uncoupled system: where F(k) is the tangent linear model. The gradient of the criterion can be computed by: Very efficient in combination with a gradient-based optimization scheme. BUT: we need the adjoint implementation!
11
1
0 0 1
( , , 1)
( 1) ( ) ( ( ) ), 0
k kT T
k k k k
K
X f X p kL F k L M k R Z M k XX x L
−
−+
+
= −
= − + −= =
110
1
( , , 1) ( )k K
T kk
k
f X p kJ L C p pp p
α=
−−
=
∂ −∂= − − −
∂ ∂∑
Remarks
Adjoint based methods are attractive for: • Deterministic parameter estimation problems • Data assimilation problems with a lot of data • When conservation constraints are very important • ….. But the development of the adjoint is usually too much work. Other disadvantage: Local minima can hamper the applicability.
6
Model order reduction
The basis idea of model order reduction: - Find a sub space where the action is by using an ensemble of model
simulations - Project the original model onto this sub space A major challenge is to develop efficient and non intrusive (i.e. not requiring access to the code of the original model) methods for non linear models: Ensemble type algorithms
A variational data assimilation approach using model reduction
Non-Intrusive model reduction is: • Generic and model independent • Efficient since the model simulations are used to approximate the
dynamics of the system and not just for one function evaluation
The reduced model is used to speed up the iteration process and to avoid the implementation of the adjoint.
8
Model reduction: Projection based method Suppose the dynamics of a system are described by Petrov-Galerkin projection specifies the dynamics of a variable by
( ) 61( ) ( ), , (10 )h
i i it t h O−= ∈x f x θ R
( ) 21( ) ( ), , (10 )m
i i it t m O−= ∈y h x θ R
( ) 21( ) ( ), , (10 )T k
i i it t k O−= ∈z Ψ f Φz θ R
( ) 2( ) ( ), , (10 )mi i it t m O= ∈y h Φz θ R
1( ) { , , }i kt span ϕ ϕ∈z
Proper Orthogonal Decomposition - POD Suppose we have a set of data , We seek a projection of a fixed rank k such that minimizes the total error The optimal subspace of dimension k is given by the first k eigenvectors of the covariance matrix of state variables generated by the data and the state can be approximated as where consists of k first eigenvectors of and
( ) nj it ∈x
{ }1 1 1 1( ), , ( ), , ( ), , ( )N p p Nt t t t=X x x x x
T=Π ΦΨ
21 1
( ) ( )p N
j i j ij i
t t= =
−∑∑ x Πx
( ) ( )i it t≈x Φr
Φ =Ψ ΦTXX
Tangent linear model The tangent linear approximation of the model is given by Applying a POD approach to we obtain A low-order approximation of the original model with easily available adjoint model.
( )it∆x
1( ) ( )T Ti it t θ−= +xr Ψ F r Ψ FΦ Δθ
1( ) ( )i it t −= +x θΔx ΔxF F Δθ
A non-intrusive model reduction approach The tangent linear approximation of the model is given by Applying a POD approach to we obtain The and are approximated by finite differences: This procedure requires as many simulations of the original nonlinear model as the dimension of the reduced model
( )it∆x
1( ) ( )T Ti it t θ−= +xr Ψ F r Ψ FΦ Δθ
1( ) ( )i it t −= +x θΔx ΔxF F Δθ
xF Φ θΨF
( ) ( ) ( )1
11 1( ) , ( )()
,),(
i i
i
i i j i ij j
ttt
tεε
− −−
−
∂=
+ −≈
∂x
f x Φ θ f x θΦ Φ
f x θF
x
( ) ( ) ( )11 1( ), ( )( ), ,ji i ii i ij j
tt tθθ
θ
εε
− −−+ −
≈∂
Ψ =∂
Ψθ
f f x θ Φ f xx θ θF
θ
A non-intrusive model reduction approach using radial basis functions (RBF) • By generating a number of full order model simulations and a RBF
interpolation scheme the non-linear dynamics can be described in reduced space.(Xiao, D. et al., Computers and Geoscience, 2016)
• By differentiating the RBF representation an approximation of the
tangent linear model can be obtained.
Model reduction for parameter estimation The linear reduced model can now be given in state space form by: So the (variation of the) original parameter vector is still part of the reduced model. The reduction of the number of parameters is done separately. Remark: The reduced model should only be able to reproduce the input-output behavior of the original model
1
1
( ) ( )( ) ( )0
TTi i
i i
t tt tI
θ −
−
=
xr rΨ FΨ F Φ
Δθ Δθ
Balanced truncation
We look for such that Where λ is the adjoint state. Solve the SVD of the matrix
where is the set of snapshots the reservoir model and is the set of snapshots from the adjoint model The balancing transformation is given as
1/ 21
−=Φ XVΣ 1/ 21
T T T−=Ψ Σ U Y
11 2
2
[ ]T
T TT
= =
Σ 0 VY X UΣV U U
0 0 V
T=Π ΦΨ
( ) ( )i it t∆ ≈x Φr ( ) ( )i it t≈λ Ψζ
YX
This is a balanced POD method. We need the adjoint for the state now!
The ensemble type algorithm for parameter estimation
Re-parameterization
High-order Model Simulation
Low-order Model Simulation
Gradient Calculation
Reduced Objective Function Calculation
Initial Parameters
Objective Function Calculation
Converged? Done
Building of The Low-order Model
Converged?
Low-order Adjoint Model Simulation
Sup Optimal Parameters
Parameters Update
Snapshots Simulation
Remarks
The computational effort is dominated by the generation of the reduced model (generating snapshots, computing the sensitivity matrices). The number of model simulations is roughly de dimension of the reduced model.
The number of outer loop iterations is very small: 2-5. For most iterations the reduced model can be the same and only the residuals have to be updated.
The approach is very efficient in case the simulation period of the ensemble of model simulations is very small compared to the calibration period
The approach is very well suited for parallel processing since the ensemble of model simulations can be created completely independent of each other
Illustrative example: Parameter estimation for a simple reservoir model
For more information see: Model-reduced gradient-based history matching”, Kaleta, MP, Hanea, RG, Heemink, AW and Jansen JD, Computational Geosciences, 2011
Reservoir (2D) 21x21x1 grid blocks Reservoir simulator Oil and water; relative permeability curves are known Setup for experiments • Five spot injection - production pattern • Reservoir is operated on rate constraint in the injection well and on bottom hole
pressure constraints in production wells Measurements • Measurements taken each 30 days during 250 days, before water breakthrough • Bottom hole pressure measurements from injection well with 10% error • Flow rate measurements from production wells with 5% error
5 10 15 20
2
4
6
8
10
12
14
16
18
20-31
-30.5
-30
-29.5
-29
-28.5
Synthetic example
5 10 15 20
2
4
6
8
10
12
14
16
18
20-31
-30.5
-30
-29.5
-29
-28.5
Production well
Injection well
The prior knowledge is given by an ensemble Results: Parameters reduction
Results: Gradients comparison
States reconstruction
POD BPOD ADJ BPOD-based gradient
5 10 15 20
2
4
6
8
10
12
14
16
18
20
-40
-30
-20
-10
0
10
20
30
40
POD-based gradient
5 10 15 20
2
4
6
8
10
12
14
16
18
20
-40
-30
-20
-10
0
10
20
30
40
ADJ gradient
5 10 15 20
2
4
6
8
10
12
14
16
18
20
-40
-30
-20
-10
0
10
20
30
40
Original model at timestep 30
5 10 15 20
5
10
15
20 0
5
10
15
x 105 Reduced-order model at timestep 30
5 10 15 20
5
10
15
20 0
5
10
15
x 105Original model at timestep 30
5 10 15 20
5
10
15
200
5
10
15x 10
5 Reduced-order model at timestep 30
5 10 15 20
5
10
15
200
5
10
15x 10
5
POD BPOD
Results: Log permeability fields
Numerical efficiency data Method Objective
function Reduction Time in simulations
Initial (Prior) 1657 - 1
Adjoint-based approach (30 iter) 20.05 441 (sat) + 441 (prf) + 20 (perm) 61 (=1+30*2)
Adjoint-based approach 18.65 441 (sat) + 441 (prf) + 20 (perm) 135 (= 67*2+1)
POD-based approach 20.27 41(99%) + 31(99.9%)+ 20 115 (=1+20+72+20+1+2)
Balanced POD-based approach 18.73 42(99.9%) + 6 (99.9%)+20 114 (=1+2*20+48+20+5)
True Prior POD BPOD ADJ
5 10 15 20
2
4
6
8
10
12
14
16
18
20-31
-30.5
-30
-29.5
-29
-28.5
-28
5 10 15 20
2
4
6
8
10
12
14
16
18
20-31
-30.5
-30
-29.5
-29
-28.5
-28
5 10 15 20
2
4
6
8
10
12
14
16
18
20-31
-30.5
-30
-29.5
-29
-28.5
-28
5 10 15 20
2
4
6
8
10
12
14
16
18
20-31
-30.5
-30
-29.5
-29
-28.5
-28
5 10 15 20
2
4
6
8
10
12
14
16
18
20-31
-30.5
-30
-29.5
-29
-28.5
-28
5 10 15 20
2
4
6
8
10
12
14
16
18
20
22
-31.5
-31
-30.5
-30
-29.5
-29
-28.5
-28
Parameter estimation for a large scale tidal model
For more information see: Efficient identification of uncertain parameters in a large-scale tidal model of the European continental shelf by proper orthogonal decomposition, Altaf, M.U. , Verlaan, M., Heemink, A.W, International Journal for Numerical Methods in Fluids, 2012.
Based on shallow water equations
Grid size: 1.5’ by 1.0’ (~2 km)
Grid dimensions: 1120 x 1260 cells
Active Grid Points: 869544
Time step: 2 minutes
8 main constituents
• Twin experiment: • Estimation of 7 depth parameters using generated data
(noise free)
Experiment with field data
Parameter: Depth Calibration run: 28 Dec 2006 to 30 Jan 2007 Measurement data: 01 Jan 2007 to 30 Jan 2007 Includes two spring-neap cycles Assimilation Stations: 35 Validation Stations: 15 Ensemble of forward model simulations for a period of four days (01
Jan 2007 to 04 Jan 2007)
DCSM
Divide model area in 4 sub domains + 1 overall
parameter No. of snapshots: 132 (Every three hours) 24 POD modes are required to capture 97% energy Same POD modes are used in 2rd iteration
Initial RMS: 25.7 cm
After 2rd iteration: 14.9 cm
Improvement : 42%
DCSM(Validation results)
Similar improvement as in the case of assimilation stations
Computational Cost
Estimation 5 parameters, calibration period 1 month: Effective number of simulations of 1 month: 4.7, reduction criterion 42% (2 iterations, no model update in second iteration) Estimation 20 parameters (4 bottom friction and 16 depth values), calibration period 1 month: Number of simulations of 1 month: 11, reduction criterion 50% (5 iterations, no model update in second and fourth iteration) Estimation 20 parameters, calibration period 1 year: Number of simulations of 1 year: Only 5!
Large scale models: Domain decomposition
For large scale models the use of global basis functions is not attractive. Many of them are required to represent small scale features increasing the computational effort. And more full order simulation are required for the interpolation procedure in order to obtain an accurate reduced order model
It is therefore very attractive to combine model order reduction with domain decomposition. Some illustrative twin experiments
Twin experiment Description Value Dimension 23×23×1
Grid Size 10m×10m×5m
Number of wells 8 production wells, 1 injection well
constant porosity 0.2
Fluid density 1014kg/m3, 859kg/m3
Fluid viscosity waer: 0.4cp, oil: 2cp
Initial pressure 30 MPa
Well control Injection rate: 50m3/day
BHP: 25 MPa
Initial saturaton So=0.80; Sw=0.20
connate water saturation
Swc=0.20
Production time 3 year
Model Timestep 0.03 year
measurement Timestep 5×0.03 year
Model Settings Reservoir model
Production Well
Injection Well
'true' permeability field
Domain Decomposition Global Domain parameter saturation pressure parameter saturation pressure
Domain_1
12
8 3
12 44 18
Domain_2 14 6 Domain_3 8 3 Domain_4 15 6 Domain_5 14 7 Domain_6 14 6 Domain_7 8 4 Domain_8 15 6 Domain_9 8 3
Total 12 104 44 12 44 18
Model reduction using domain decomposition
For each subdomain, the number of parameters and state variables (95%, 95% and 95% energy for parameter, pressure and saturation)
Updating Permeability
5 10 15 20
5
10
15
20
10
12
14
16
18
20
Updating Permeability
5 10 15 20
5
10
15
20
10
12
14
16
18
20
Reference Permeability
5 10 15 20
5
10
15
20
10
12
14
16
18
20
global domain
domain decomposition
In the global domain, local dynamic features cannot be effectively captured
Comparison with a gradient based optimization where the gradient is determined using finite differences
Initial Permeability
5 10 15 20
5
10
15
20
10
12
14
16
18
20Reference Permeability
5 10 15 20
5
10
15
20
10
12
14
16
18
20
Updating Permeability-Proposed method
5 10 15 20
5
10
15
20
10
12
14
16
18
20Updating Permeability-finite difference
5 10 15 20
5
10
15
20
10
12
14
16
18
20
Iteration Num Nsim J
- - 282.15
proposed method 44 25=12×2+1
108.32
finite-difference 25 108.08 325=25×(12+1)
10 20 30 40 50100
150
200
250
300
Iteration Step
Cos
t Fun
ctio
n
10 20 30 40 50 60100
150
200
250
300
Iteration Step
Cos
t Fun
ctio
n
proposed method finite-difference method
A more realistic test case: A 3D view of the wells
The “true” permeability
The subdomains
“true” permeability results reduced model results finite difference
The iterative minimization of the error criterion
Number of parameters: 44 Number of full order simulations: 112
More data and more parameters • Local parameterization (in each domain separately)
is computationally very attractive
• To obtain global and smooth patterns the local parameters are projected onto global patterns
9/26/2019
“true” permeability initial permeability assimilation results
Estimating 282 unknown parameters required 72 full order simulations
Conclusions
• Variational data assimilation is a powerful method for solving large scale parameter estimation problems. But we need the adjoint!
• Using model order reduction in combination with domain decomposition an efficient ensemble method for solving variation data assimilation problems can be obtained
• Local parameterization is very attractive: One local parameter in each sub domain requires only 2 full model simulations. Global patterns can still be estimated by first estimating local parameter patterns and then projecting the local parameter patterns onto global parameter patterns
• Main disadvantage of the approach is that that many choices (generation of
snapshots, number of modes used, interpolation method, .. ) are application dependent.