10,00 modelling and analysis of geophysical data using geostatistics and machine learning vasily...
DESCRIPTION
10,00 Modelling and analysis of geophysical data using geostatistics and machine learningVasily Demyanov – Heriot–Watt Institute, Edinburgh (U.K.)Intelligent Analysis of Environmental Data (S4 ENVISA Workshop 2009)TRANSCRIPT
UNCERTAINTY QUANTIFICATION
OF GEOSCIENCE PREDICTION
MODELS BASED ON SUPPORT
VECTOR REGRESSION
V. Demyanov1, A. Pozdnoukhov2, M. Kanevski3, M. Christie1
1 Institute of Petroleum Engineering, Heriot-Watt University, Edinburgh, UK [email protected]
2 National Centre for Geocomputation, National University of Ireland, Maynooth.3 Institute of Geomatics and Risk Analysis, University of Lausanne
Outline
• Geoscience modelling under uncertainty
• Machine learning based geomodels• Semi-supervised SVR reservoir model
– Case study– Robustness to noise– Predictions with uncertainty
• Conclusions
Outline
• Geoscience modelling under uncertainty
• Machine learning based geomodels• Semi-supervised SVR reservoir model
– Case study– Robustness to noise– Predictions with uncertainty
• Conclusions
Uncertainty Quantification (UQ) Framework
Mathematical Model
(parameters, pde)
Computer Simulation
(discretisation, timestep)
Modelparameters
Observed Data
0
500
1000
1500
2000
2500
0 200 400 600 800 1000 1200 1400
time (days)
0
200
400
600
800
1000
0 200 400 600 800 1000 1200 1400
time (days)
0
200
400
600
800
1000
1200
1400
0 200 400 600 800 1000 1200 1400
time (days)
0
500
1000
1500
2000
2500
3000
3500
0 100 200 300 400 500 600
time (days)
Natural System
Simulated vs Data
0
500
1000
1500
2000
2500
0 200 400 600 800 1000 1200 1400
time (days)
0
200
400
600
800
1000
0 200 400 600 800 1000 1200 1400
time (days)
0
200
400
600
800
1000
1200
1400
0 200 400 600 800 1000 1200 1400
time (days)
0
500
1000
1500
2000
2500
3000
3500
0 100 200 300 400 500 600
time (days)
MISMATCH
Forecast Uncertainty
0
500
1000
1500
2000
2500
0 200 400 600 800 1000 1200 1400
time (days)
0
200
400
600
800
1000
0 200 400 600 800 1000 1200 1400
time (days)
0
200
400
600
800
1000
1200
1400
0 200 400 600 800 1000 1200 1400
time (days)
0
500
1000
1500
2000
2500
3000
3500
0 100 200 300 400 500 600
time (days)
Computationally expensive
Adaptive Stochastic Optimisation for UQ
Model 1Model 2Model 3…………Model n
Sampling prior distribution
ReproductionRanking
Evaluation:
Model simulation Mismatchcalculation
iteration
Inference
New population
Ensemble of Models
Inferred Ensemble of Models for prediction
Sampling algorithms:• Genetic algorithms• Particle swarm optimisation• Ant Colony optimisation• Neighbourhood approximation
Search for Matching Models Challenge• FW simulation of multiple models generated for different combinations of parameter values is computationally expensive • High-dimensional parameter space remains fairly empty and poorly described despite thousands of generated models Region of
computational efficiency 100-10,000 FW runs Number of points per axis
Num
ber
of p
aram
eter
s
UQ Framework with fast ML approximation
Mathematical Model
(parameters, pde)
Computer Simulation
(discretisation, timestep)
Modelparameters
Observed Data
0
500
1000
1500
2000
2500
0 200 400 600 800 1000 1200 1400
time (days)
0
200
400
600
800
1000
0 200 400 600 800 1000 1200 1400
time (days)
0
200
400
600
800
1000
1200
1400
0 200 400 600 800 1000 1200 1400
time (days)
0
500
1000
1500
2000
2500
3000
3500
0 100 200 300 400 500 600
time (days)
Natural System
Simulated vs Data
0
500
1000
1500
2000
2500
0 200 400 600 800 1000 1200 1400
time (days)
0
200
400
600
800
1000
0 200 400 600 800 1000 1200 1400
time (days)
0
200
400
600
800
1000
1200
1400
0 200 400 600 800 1000 1200 1400
time (days)
0
500
1000
1500
2000
2500
3000
3500
0 100 200 300 400 500 600
time (days)
Forecast Uncertainty
0
500
1000
1500
2000
2500
0 200 400 600 800 1000 1200 1400
time (days)
0
200
400
600
800
1000
0 200 400 600 800 1000 1200 1400
time (days)
0
200
400
600
800
1000
1200
1400
0 200 400 600 800 1000 1200 1400
time (days)
0
500
1000
1500
2000
2500
3000
3500
0 100 200 300 400 500 600
time (days)
Machine Learning
MISMATCH
Challenges in Geomodelling
• Improve representation of the reality with geologically realistic models based on identifiable parameters.
• More effective use of information from various sources by incorporating prior geological and expert knowledge with associate uncertainty
• Uncertainty propagation from data into the model without “freezing” assumptions and predefined model dependencies.
Aims
Uncertainty quantification with a geomodel which is able to improve geological realism by more effective use of prior information
• Model petrophysical properties in a fluvial reservoir using a robust machine learning approach – semi-supervised Support Vector Regression (SVR)
• Reproduce realistic geological structures and inherent uncertainty of the geomodel
• Integrate additional spatial data that are non-linearly correlated with reservoir properties.
Outline
• Geoscience modelling under uncertainty
• Machine learning based geomodels• Semi-supervised SVR reservoir model
– Case study– Robustness to noise– Predictions with uncertainty
• Conclusions
Support Vector Regression (SVR)
bxxKyxf i
L
iii
),()(1
bwxxf )(Kernel trick projects data into sufficiently high dimensional space:
• Linear regression in hyperspace• Complexity control with training errors:
L
ii
wCw
1
2
21min
SVR is formulated in terms of dot products of input data: (x ∙ x') → K (x , x')
where K(x,xi) is a symmetric and positively defined kernel function.
support vectors
Semi-supervised Learning Concept
• Supervised learning with a tutor– Learn from known input and output
(e.g. multi-layer perceptron neural network)
• Unsupervised learning without a tutor– Learn from known inputs only, no outputs are
available (e.g. Kohonen classification maps)
• Semi-supervised learning– Learn from a combination of data:
• Labelled with both known input and output• Unlabelled with only input available (manifold)
Kernel Methods on Geo-manifolds
• Data-driven models incorporate prior knowledge on the domain of the problem using graph models of natural manifolds
• Kernel function enforces continuity along the graph model – manifold – obtained from the prior information
Conventional regression estimate based on labelled data only (●)
Spiral manifold represented by unlabelled points (+)
Semi-supervised regression estimation follows the smoothness along the graph
Semi-supervised Approach
• Manifold assumption: data actually lie on the low-
dimensional manifold in the input space
• Geometry of the manifold can be estimated with
unlabelled data:
– incorporate natural similarities in data
– enforce smoothness on the manifold
• Manifold carries physical information and
incorporates prior physical knowledge
• Geo-manifold can reflect stochastic nature of the
inherent model uncertainty
Sources of Geo-manifold fro Reservoir Models
Geo-manifold for reservoir model can be elicited from prior information:
– on-site spatial data (seismic, well logs)
– other relevant data (outcrops, modern analogues, lab
experiments)
– expert knowledge in a non-parametric form
– parametric geological models (object shapes, process models)
– training image based models
Semi-supervised SVR Geomodel
SVR Learning Machine
poro&perm labelled data
from wells
Seismic data
+ geo-manifold unlabelled data
Stanford VI synthetic case study Semi-supervised (SVR)
Prior information
Outline
• Geoscience modelling under uncertainty
• Machine learning based geomodels• Semi-supervised SVR reservoir model
– Case study– Robustness to noise– Predictions with uncertainty
• Conclusions
Case Study
Stanford VI: a realistic synthetic reservoir data set
S. Castro, J. Caers and T. Mukerji
• Fluvial clastic reservoir:- sinuous channels- meandering channels- delta front
• Geomodel:- multi-points statistics models- sedimentation process model
• “Hard” poro/perm data from wells
•Synthetic seismic data: - 6 attributes:
AI, EI, λ, μ, Sw, Poisson ratio
Variability in Facies ModellingMulti-point simulation realisations
Training Image Hard well data Soft probabilistic data based on seismic
Case Study2D layer slices from different geological section:
• sinuous channels• delta front
porosity truth case
SVR geomodel (tuneable or fixed parameters):• Spatial correlation size
– Gaussian kernel width σ
• Continuity strength– Impact of unlabelled data of the manifold
• Smoothness along the manifold– Number of unlabelled points in the manifold – Number of neighbours in kernel regression
• Prior belief level for seismic data– Weight of additional seismic input (scaling parameter)
• Trade-off between goodness of fit and complexity– Regularisation term C determines balance between training error and
margin max
– Classification error
Stochastic Sampling for Matched Models
Misfit minimisation:
• 640 models generated in 8D parameter space• 40 good fitting models with misfit < 250
Generated models home in the regions of good fit:
170
180
200
220
250
300
500
1000
2000
5000
Misfit
channel porosity
shale porosity
channel permeability
shale permeability
channel permeabilitychannel porosity
Fitted Model: Property Distribution
porosity truth case
Realistic reproduction of geological structures detected from the prior data:– fluvial channels– thin mud channel boundaries– point-bars
Fitted Model Forecast: Fluvial Channels case
Oil and water production from 7 largest producing wells:
● History data (truth case + noise)
○ Validation truth case forecast data
Matched model
Variability of Uncertain Model Properties• Correlation
- kernel size σ
• Smoothness along the manifold - number of unlabelled points N
• Impact of additional data (seismic) on the predicted variables
• Seismic interpretation uncertainty
channel sands shale
channel sands shale
σσ
NN
Amplitude threshold for channel/shale boundary
scaling porosity scaling for permeability
Non-uniqueness of Semi-supervised SVR
Truth caseRealisation 2Realisation 1
Stochastic realisations, based on geo-manifolds generated with
different random seeds, represent inherent non-uniqueness of
the model with the given combination of the parameter values
Impact of Noise in Seismic DataOriginal seismic data with injected noise N(0,σ) ● unlabelled data
Semi-SVM porosityTruth case porosity
Semi-SVM porosity for N(0,2σ) added noise
Realisations of a single fitted model with unique set of parameters
Oil production profiles for 10 stochastic realisationsfor 6 wells:
● History data (truth case + noise)
○ Validation truth case forecast data
Oil production profiles for semi-SVR model realisations
Production: Stochastic Realisations
Multiple matching models vs Truth case porosity
Multiple good fitting models Truth case
The river delta front structure is very similar for different models due to the very clean synthetic seismic with no noise.
Fitted Model Forecast: Delta Front case
Oil and water production from 7 largest producing wells:
● History data (truth case + noise)
Fitted model
Truth case
Fitted Model Forecast: Delta Front case
Oil production from 7 largest producing wells:
● History data (truth case + noise)
Fitted model
Truth case
Confidence P10/P90 interval for production forecast based on multiple models:
Total oil and water production profiles:
● History data (truth case + noise)
○ Validation truth case forecast data
P10/P90 production forecast confidence bounds
Forecast with Uncertainty
Uncertainty of Model Parameters
Posterior probability distribution of the geomodel parameters:
• Kernel width – correlation – for poro & perm in sand or shale
• Continuity in sand and shale bodies – by N unlab
• Impact of seismic data to poro & perm – weight
Outline
• Geoscience modelling under uncertainty
• Machine learning based geomodels• Semi-supervised SVR reservoir model
– Case study– Robustness to noise– Predictions with uncertainty
• Conclusions
Conclusions• A novel learning based model of petroleum reservoir based
on capturing complex dependencies from data.• Semi-supervised SVR geomodel takes into account natural similarities
in space and data relations:
– Reproduction of geological structures and anisotropy of a fluvial systems in a realistic way based on prior information on geo-manifold represented by unlabelled data
– Robustness to noise and flexible control of signal/noise levels in data to detect geologically interpretable information
– Stochastic non-uniqueness inherent to the model is represented by the distribution of unlabelled data
• Multiple fitted models match both production history and the validation data in the forecast
• Uncertainty of the SVR model is quantified by inference of the multiple generated models, which provide uncertainty forecast envelope based on posterior probability
Further work
• Extension to 3D case by adding one more input to the SVR model
• Integrate other relevant data from outcrops and lab experiments
• Apply SVR modelling approach with Bayesian UQ framework to application in different fields: environmental and climate modelling, epidemiolgy, etc.
• 2 PhD positions in the Uncertainty Quantification project:
– Geologist, data integration
– Uncertainty modelling with machine learning
Apply to [email protected]
Acknowledgments
• J. Caers and S. Castro of Stanford University for providing Stanford VI case study
• UK EPSRC grant (GR/T24838/01)
• Swiss National Science Foundation for funding “GeoKernels: kernel-based methods for geo- and environmental sciences”
• Sponsors of Heriot-Watt Uncertainty Quantification project:
Research Summary
• Developed a novel model for petroleum reservoir based on capturing complex dependencies from data with learning methods.
• Novel model provide multiple HM model for different fluvial reservoirs: sinuous channels, delta front– both production history and the validation data in the forecast are
matched
• Benefits of the novel data driven geomodelling approach:– Reproduce realistic geological structure and anisotropy of property
distribution.– Robust to noise in prior data– Relate to identifiable properties: continuity, correlation, prior belief in
data, etc.
• Model uncertainty is described by the inference of multiple models– Posterior confidence interval describe uncertainty forecast – Uncertainty of the model parameters is quantified by posterior
probability distributions
Multiple good fitting models
Seismic dataLabelled (●) & unlabelled (+) data
Learning Machine
(SVR)
Prior information
Next Steps
• Production uncertainty forecasting based on the inference of the generated HM models.
• Extension to 3D case by adding one more input to the SVR model
• Integrate other relevant data from outcrops and lab experiments
Aims
• Explore robustness of semi-supervised SVR geomodel to noisy data
• Develop a way to reproduce inherent uncertainty of the semi-supervised SVR geomodel by stochastic realisations
• Integrate semi-supervised SVR geomodel into the Bayesian uncertainty quantification framework
Uncertainty quantification with a geomodel which is able to improve geological realism by more effective use of prior information
Content
• Motivation and Aims• Semi-supervised learning concept
– Support Vector Machine (SVM) recap
• Machine learning based geomodel– Noise pollution experiment– Inherent non-uniqueness of SVR-based model– SVR geomodel in Bayesian sampling
framework
• Conclusions
Impact of Noise in Seismic Data
In a real case additional data (seismic) are usually noisier than in our synthetic case
Channel geo-manifold defined by unlabelled points
Filtering low frequency component from seismic
Elastic impedance
Seismic is processed through a low pass filter to build a manifold of unlabelled points:
Seismic Data Polluted with NoiseGaussian noise with zero mean and 3 different std.dev σ is added.
Truth case
N(0, σ) N(0, 2σ) N(0, 3σ)
Filtering
Truth case
Only a low frequency component is left after filtering
N(0, σ) N(0, 2σ) N(0, 3σ)
Geo-manifoldUnlabelled points are generated only in the cells below the threshold
Truth case
N(0, σ) N(0, 2σ) N(0, 3σ)
Porosity SVR Estimates for Noisy DataNoise level: 1 σ Noise level: 2 σ Noise level: 3 σ
Geo-manifold becomes less concentrative and the channel “erodes” with increase of the noise level
Truth case
Prediction with a Large Noise LevelNoise level: 3σ
Even with large noise levels the channel continuity can be traced in SVR prediction although it is barely visible in the input data
Truth case
Impact of Inherent Non-uniqueness
Stochastic realisations of water production from 6 largest producing wells
NA Sampling: Misfit Distribution
Misfit of models generated by NA
Lowest misfit = 188
NA Sampling: Parameter Distributions
Histogram of parameter values for the generated models
Models generated by NA home in the regions of good fit
Support Vector Machine (SVM)
0 < i < C Support Vectors (SV)
i = 0 Normal Samples
i = C Support Vectors untypical or noisy
Trade-off between: margin maximisation & training error minimisation
Linear separation problem
Increase space dimension to solve separation problem linearly
L
ii
wCw
1
2
21min Soft margin:
i 0 slack variables to allow noisy samples & outliers to lie inside or on the outer side of the margin
wx 1+b=
1
wx 2+b=-1