an adjoint -free approach to parameter estimation … · an adjoint -free approach to parameter...

1

AN ADJOINT-FREE APPROACH TO PARAMETER ESTIMATION

Arnold Heemink

Delft University of Technology

joint work with: Umer Altaf, Gosia Katela, Remus Hanea, Jan Dirk Jansen, Cong Xiao, Olwijn Leeuwenburgh, Martin Verlaan and Hai Xiang Lin

State space model

The (non linear) physics: where X is the state, p is vector of uncertain parameters, f represents the (numerical) model, G is a noise input matrix and W is zero mean system noise with covariance Q The measurements: where M is the measurement matrix and V is zero mean measurement noise with covariance R

1 1( , , 1) ( 1)k k kX f X p k G k W− −= − + −

kkk VXkMZ += )(

Formulation of the weak constraint data assimilation problem

It is desired to combine the data with the stochasticmodel in order to obtain an optimal estimate of the state and parameters of the system. We define the criterion:

1

1 1

2

1

2 21 0( )

1

( , ) ( )

( , , 1) T

K

k k k Rk

K

k k GQG Ck

J p X Z M k X

X f X p k p pα

−

− −

=

−=

= −

+ − − + −

∑

∑

Non linear deterministic systems. Strong constraint variational data assimilation (4Dvar): Parameter estimation

For Q=0 the error criterion reduces to: Introducing the adjoint state we can rewrite the criterion:

1 1

2 20

1

1

( ) ( )

( ( , , 1))

K

k k R Ck

KTk k k

k

J p Z M k X p p

L X f X p k

α− −

=

−

= − + −

+ − −

∑

∑

20

1

211)()( −− −+−=∑

=C

K

kRkk ppXkMZpJ α

Non linear deterministic systems. Strong constraint variational data assimilation (4Dvar): Parameter estimation

If we solve the uncoupled system: where F(k) is the tangent linear model. The gradient of the criterion can be computed by: Very efficient in combination with a gradient-based optimization scheme. BUT: we need the adjoint implementation!

11

1

0 0 1

( , , 1)

( 1) ( ) ( ( ) ), 0

k kT T

k k k k

K

X f X p kL F k L M k R Z M k XX x L

−

−+

+

= −

= − + −= =

110

1

( , , 1) ( )k K

T kk

k

f X p kJ L C p pp p

α=

−−

=

∂ −∂= − − −

∂ ∂∑

Remarks

Adjoint based methods are attractive for: • Deterministic parameter estimation problems • Data assimilation problems with a lot of data • When conservation constraints are very important • ….. But the development of the adjoint is usually too much work. Other disadvantage: Local minima can hamper the applicability.

6

Model order reduction

The basis idea of model order reduction: - Find a sub space where the action is by using an ensemble of model

simulations - Project the original model onto this sub space A major challenge is to develop efficient and non intrusive (i.e. not requiring access to the code of the original model) methods for non linear models: Ensemble type algorithms

A variational data assimilation approach using model reduction

Non-Intrusive model reduction is: • Generic and model independent • Efficient since the model simulations are used to approximate the

dynamics of the system and not just for one function evaluation

The reduced model is used to speed up the iteration process and to avoid the implementation of the adjoint.

8

Model reduction: Projection based method Suppose the dynamics of a system are described by Petrov-Galerkin projection specifies the dynamics of a variable by

( ) 61( ) ( ), , (10 )h

i i it t h O−= ∈x f x θ R

( ) 21( ) ( ), , (10 )m

i i it t m O−= ∈y h x θ R

( ) 21( ) ( ), , (10 )T k

i i it t k O−= ∈z Ψ f Φz θ R

( ) 2( ) ( ), , (10 )mi i it t m O= ∈y h Φz θ R

1( ) { , , }i kt span ϕ ϕ∈z

Proper Orthogonal Decomposition - POD Suppose we have a set of data , We seek a projection of a fixed rank k such that minimizes the total error The optimal subspace of dimension k is given by the first k eigenvectors of the covariance matrix of state variables generated by the data and the state can be approximated as where consists of k first eigenvectors of and

( ) nj it ∈x

{ }1 1 1 1( ), , ( ), , ( ), , ( )N p p Nt t t t=X x x x x

T=Π ΦΨ

21 1

( ) ( )p N

j i j ij i

t t= =

−∑∑ x Πx

( ) ( )i it t≈x Φr

Φ =Ψ ΦTXX

Tangent linear model The tangent linear approximation of the model is given by Applying a POD approach to we obtain A low-order approximation of the original model with easily available adjoint model.

( )it∆x

1( ) ( )T Ti it t θ−= +xr Ψ F r Ψ FΦ Δθ

1( ) ( )i it t −= +x θΔx ΔxF F Δθ

A non-intrusive model reduction approach The tangent linear approximation of the model is given by Applying a POD approach to we obtain The and are approximated by finite differences: This procedure requires as many simulations of the original nonlinear model as the dimension of the reduced model

( )it∆x

1( ) ( )T Ti it t θ−= +xr Ψ F r Ψ FΦ Δθ

1( ) ( )i it t −= +x θΔx ΔxF F Δθ

xF Φ θΨF

( ) ( ) ( )1

11 1( ) , ( )()

,),(

i i

i

i i j i ij j

ttt

tεε

− −−

−

∂=

+ −≈

∂x

f x Φ θ f x θΦ Φ

f x θF

x

( ) ( ) ( )11 1( ), ( )( ), ,ji i ii i ij j

tt tθθ

θ

εε

− −−+ −

≈∂

Ψ =∂

Ψθ

f f x θ Φ f xx θ θF

θ

A non-intrusive model reduction approach using radial basis functions (RBF) • By generating a number of full order model simulations and a RBF

interpolation scheme the non-linear dynamics can be described in reduced space.(Xiao, D. et al., Computers and Geoscience, 2016)

• By differentiating the RBF representation an approximation of the

tangent linear model can be obtained.

Model reduction for parameter estimation The linear reduced model can now be given in state space form by: So the (variation of the) original parameter vector is still part of the reduced model. The reduction of the number of parameters is done separately. Remark: The reduced model should only be able to reproduce the input-output behavior of the original model

1

1

( ) ( )( ) ( )0

TTi i

i i

t tt tI

θ −

−

=

xr rΨ FΨ F Φ

Δθ Δθ

Balanced truncation

We look for such that Where λ is the adjoint state. Solve the SVD of the matrix

where is the set of snapshots the reservoir model and is the set of snapshots from the adjoint model The balancing transformation is given as

1/ 21

−=Φ XVΣ 1/ 21

T T T−=Ψ Σ U Y

11 2

2

[ ]T

T TT

= =

Σ 0 VY X UΣV U U

0 0 V

T=Π ΦΨ

( ) ( )i it t∆ ≈x Φr ( ) ( )i it t≈λ Ψζ

YX

This is a balanced POD method. We need the adjoint for the state now!

The ensemble type algorithm for parameter estimation

Re-parameterization

High-order Model Simulation

Low-order Model Simulation

Gradient Calculation

Reduced Objective Function Calculation

Initial Parameters

Objective Function Calculation

Converged? Done

Building of The Low-order Model

Converged?

Low-order Adjoint Model Simulation

Sup Optimal Parameters

Parameters Update

Snapshots Simulation

Presenter

Presentation Notes

When we simulate reduced-order reservoir model with the same permeability as the one used for snapshots simulation we obtain almost identical states, as long as a sufficient fraction of relative importance was preserved. However if we strongly altered the permeability field and therefore the structure of the states, the states of the original model are the reduced-model are worse represented by the reduced-model. Because it is not possible to specify priori the validity of the reduced-order model, we use the nested approach in the development of the optimization approach.

Remarks

The computational effort is dominated by the generation of the reduced model (generating snapshots, computing the sensitivity matrices). The number of model simulations is roughly de dimension of the reduced model.

The number of outer loop iterations is very small: 2-5. For most iterations the reduced model can be the same and only the residuals have to be updated.

The approach is very efficient in case the simulation period of the ensemble of model simulations is very small compared to the calibration period

The approach is very well suited for parallel processing since the ensemble of model simulations can be created completely independent of each other

Illustrative example: Parameter estimation for a simple reservoir model

For more information see: Model-reduced gradient-based history matching”, Kaleta, MP, Hanea, RG, Heemink, AW and Jansen JD, Computational Geosciences, 2011

Reservoir (2D) 21x21x1 grid blocks Reservoir simulator Oil and water; relative permeability curves are known Setup for experiments • Five spot injection - production pattern • Reservoir is operated on rate constraint in the injection well and on bottom hole

pressure constraints in production wells Measurements • Measurements taken each 30 days during 250 days, before water breakthrough • Bottom hole pressure measurements from injection well with 10% error • Flow rate measurements from production wells with 5% error

5 10 15 20

2

4

6

8

10

12

14

16

18

20-31

-30.5

-30

-29.5

-29

-28.5

Synthetic example

5 10 15 20

2

4

6

8

10

12

14

16

18

20-31

-30.5

-30

-29.5

-29

-28.5

Production well

Injection well

The prior knowledge is given by an ensemble Results: Parameters reduction

Results: Gradients comparison

States reconstruction

POD BPOD ADJ BPOD-based gradient

5 10 15 20

2

4

6

8

10

12

14

16

18

20

-40

-30

-20

-10

0

10

20

30

40

POD-based gradient

5 10 15 20

2

4

6

8

10

12

14

16

18

20

-40

-30

-20

-10

0

10

20

30

40

ADJ gradient

5 10 15 20

2

4

6

8

10

12

14

16

18

20

-40

-30

-20

-10

0

10

20

30

40

Original model at timestep 30

5 10 15 20

5

10

15

20 0

5

10

15

x 105 Reduced-order model at timestep 30

5 10 15 20

5

10

15

20 0

5

10

15

x 105Original model at timestep 30

5 10 15 20

5

10

15

200

5

10

15x 10

5 Reduced-order model at timestep 30

5 10 15 20

5

10

15

200

5

10

15x 10

5

POD BPOD

Results: Log permeability fields

Numerical efficiency data Method Objective

function Reduction Time in simulations

Initial (Prior) 1657 - 1

Adjoint-based approach (30 iter) 20.05 441 (sat) + 441 (prf) + 20 (perm) 61 (=1+30*2)

Adjoint-based approach 18.65 441 (sat) + 441 (prf) + 20 (perm) 135 (= 67*2+1)

POD-based approach 20.27 41(99%) + 31(99.9%)+ 20 115 (=1+20+72+20+1+2)

Balanced POD-based approach 18.73 42(99.9%) + 6 (99.9%)+20 114 (=1+2*20+48+20+5)

True Prior POD BPOD ADJ

5 10 15 20

2

4

6

8

10

12

14

16

18

20-31

-30.5

-30

-29.5

-29

-28.5

-28

5 10 15 20

2

4

6

8

10

12

14

16

18

20-31

-30.5

-30

-29.5

-29

-28.5

-28

5 10 15 20

2

4

6

8

10

12

14

16

18

20-31

-30.5

-30

-29.5

-29

-28.5

-28

5 10 15 20

2

4

6

8

10

12

14

16

18

20-31

-30.5

-30

-29.5

-29

-28.5

-28

5 10 15 20

2

4

6

8

10

12

14

16

18

20-31

-30.5

-30

-29.5

-29

-28.5

-28

5 10 15 20

2

4

6

8

10

12

14

16

18

20

22

-31.5

-31

-30.5

-30

-29.5

-29

-28.5

-28

Parameter estimation for a large scale tidal model

For more information see: Efficient identification of uncertain parameters in a large-scale tidal model of the European continental shelf by proper orthogonal decomposition, Altaf, M.U. , Verlaan, M., Heemink, A.W, International Journal for Numerical Methods in Fluids, 2012.

Based on shallow water equations

Grid size: 1.5’ by 1.0’ (~2 km)

Grid dimensions: 1120 x 1260 cells

Active Grid Points: 869544

Time step: 2 minutes

8 main constituents

• Twin experiment: • Estimation of 7 depth parameters using generated data

(noise free)

Experiment with field data

Parameter: Depth Calibration run: 28 Dec 2006 to 30 Jan 2007 Measurement data: 01 Jan 2007 to 30 Jan 2007 Includes two spring-neap cycles Assimilation Stations: 35 Validation Stations: 15 Ensemble of forward model simulations for a period of four days (01

Jan 2007 to 04 Jan 2007)

DCSM

Divide model area in 4 sub domains + 1 overall

parameter No. of snapshots: 132 (Every three hours) 24 POD modes are required to capture 97% energy Same POD modes are used in 2rd iteration

Initial RMS: 25.7 cm

After 2rd iteration: 14.9 cm

Improvement : 42%

DCSM(Validation results)

Similar improvement as in the case of assimilation stations

Computational Cost

Estimation 5 parameters, calibration period 1 month: Effective number of simulations of 1 month: 4.7, reduction criterion 42% (2 iterations, no model update in second iteration) Estimation 20 parameters (4 bottom friction and 16 depth values), calibration period 1 month: Number of simulations of 1 month: 11, reduction criterion 50% (5 iterations, no model update in second and fourth iteration) Estimation 20 parameters, calibration period 1 year: Number of simulations of 1 year: Only 5!

Large scale models: Domain decomposition

For large scale models the use of global basis functions is not attractive. Many of them are required to represent small scale features increasing the computational effort. And more full order simulation are required for the interpolation procedure in order to obtain an accurate reduced order model

It is therefore very attractive to combine model order reduction with domain decomposition. Some illustrative twin experiments

Twin experiment Description Value Dimension 23×23×1

Grid Size 10m×10m×5m

Number of wells 8 production wells, 1 injection well

constant porosity 0.2

Fluid density 1014kg/m3, 859kg/m3

Fluid viscosity waer: 0.4cp, oil: 2cp

Initial pressure 30 MPa

Well control Injection rate: 50m3/day

BHP: 25 MPa

Initial saturaton So=0.80; Sw=0.20

connate water saturation

Swc=0.20

Production time 3 year

Model Timestep 0.03 year

measurement Timestep 5×0.03 year

Model Settings Reservoir model

Production Well

Injection Well

'true' permeability field

Domain Decomposition Global Domain parameter saturation pressure parameter saturation pressure

Domain_1

12

8 3

12 44 18

Domain_2 14 6 Domain_3 8 3 Domain_4 15 6 Domain_5 14 7 Domain_6 14 6 Domain_7 8 4 Domain_8 15 6 Domain_9 8 3

Total 12 104 44 12 44 18

Model reduction using domain decomposition

For each subdomain, the number of parameters and state variables (95%, 95% and 95% energy for parameter, pressure and saturation)

Updating Permeability

5 10 15 20

5

10

15

20

10

12

14

16

18

20

Updating Permeability

5 10 15 20

5

10

15

20

10

12

14

16

18

20

Reference Permeability

5 10 15 20

5

10

15

20

10

12

14

16

18

20

global domain

domain decomposition

In the global domain, local dynamic features cannot be effectively captured

Comparison with a gradient based optimization where the gradient is determined using finite differences

Initial Permeability

5 10 15 20

5

10

15

20

10

12

14

16

18

20Reference Permeability

5 10 15 20

5

10

15

20

10

12

14

16

18

20

Updating Permeability-Proposed method

5 10 15 20

5

10

15

20

10

12

14

16

18

20Updating Permeability-finite difference

5 10 15 20

5

10

15

20

10

12

14

16

18

20

Iteration Num Nsim J

- - 282.15

proposed method 44 25=12×2+1

108.32

finite-difference 25 108.08 325=25×(12+1)

10 20 30 40 50100

150

200

250

300

Iteration Step

Cos

t Fun

ctio

n

10 20 30 40 50 60100

150

200

250

300

Iteration Step

Cos

t Fun

ctio

n

proposed method finite-difference method

A more realistic test case: A 3D view of the wells

The “true” permeability

The subdomains

“true” permeability results reduced model results finite difference

The iterative minimization of the error criterion

Number of parameters: 44 Number of full order simulations: 112

More data and more parameters • Local parameterization (in each domain separately)

is computationally very attractive

• To obtain global and smooth patterns the local parameters are projected onto global patterns

9/26/2019

“true” permeability initial permeability assimilation results

Estimating 282 unknown parameters required 72 full order simulations

Conclusions

• Variational data assimilation is a powerful method for solving large scale parameter estimation problems. But we need the adjoint!

• Using model order reduction in combination with domain decomposition an efficient ensemble method for solving variation data assimilation problems can be obtained

• Local parameterization is very attractive: One local parameter in each sub domain requires only 2 full model simulations. Global patterns can still be estimated by first estimating local parameter patterns and then projecting the local parameter patterns onto global parameter patterns

• Main disadvantage of the approach is that that many choices (generation of

snapshots, number of modes used, interpolation method, .. ) are application dependent.

an adjoint -free approach to parameter estimation … · an adjoint -free approach to parameter...

Documents