big data, big computation, and machine learning · takemasa miyoshi data assimilation research team...
TRANSCRIPT
Takemasa Miyoshi
Data Assimilation Research TeamRIKEN
Ph.D. (Meteorology)Data Assimilation Scientist
Big Data, Big Computation, and Machine Learning
in Numerical Weather Prediction
Who am I? http://data-assimilation.riken.jp/~miyoshi/
http://tedxsannomiya.com/en/speakers/takemasa-miyoshi/
B.S. from Kyoto U↓
JMA administration (2y)↓
JMA NWP (1.25y)↓
UMD (2y, M.S. and Ph.D.)↓
JMA NWP (3.5y)↓
UMD (4y)↓
RIKEN (7.5y+)
http://www.data-assimilation.riken.jp/
Big Data Assimilation
SimulationsObservations
Powerful supercomputerNew sensors, IoT
Big Data Big Data100x 100x
9/11/2014, sudden local rain
6
100
100
dual polarization 100×100
elements array antenna
Multi-parameter phased array weather radar (MP-PAWR) was developed by SIP (Cross-ministerial Strategic Innovation Promotion Program) in 2014-2018as a research subject of “torrential rainfall and tornadoes prediction.”
Early forecasting by water vapor, cloud, and precipitation observation
generate develop mature
★ Saitama Univ.(MP-PAWR site)● Olympic and Paralympic venues
Radius 60 km
Radius 80 km
Arakawa basin
Development of MP-PAWR
MP-PAWR features
MP-PAWR observation area
MP-PAWR antenna
MP-PAWR installed at Saitama Univ. on Nov 21, 2017, and observation began in July 2018.
Special arrangementfor an exclusive use of Oakforest-PACS
of the U of Tokyo and Tsukuba U
Nested computational domains
30-min-lead forecastrefreshed every 30 seconds
25 August 2019 00:40 JST
JMA Nowcast10-min lead
This study10-min lead
MP-PAWRobservation
Process-driven model predictsrapid changes of rains
• Rapid development (red broken circles)• Rapid weakening (left of red circles)
Smartphone app by MTI Co. Ltd.
11
Real-time test in August 2020
12
Most of the time,30-min forecast is ready in ~3 min.after observation
Real-time test in August 2020
13
Most of the time,30-min forecast is ready in ~3 min.after observation
Real-time test in August 2020
Data Assimilation (DA)
Observations
>2
1 1+Data Assimilation
Simulations
Data Assimilation (DA)
Observations
>2
1 1+Data Assimilation
Simulations
Process-drivenDeduction
Cyber world
Data-drivenInductionReal world
DA workflow
Simulation
DA
Initial State
Simulated State
Observations
(Best estimate)
Sim-to-Obsconversion
Sim-minus-ObsDA
Data-driven
Process-driven
DA workflow
Simulation
DA
Initial State
Simulated State
Observations
(Best estimate)
Sim-to-Obsconversion
Sim-minus-ObsDA
Data-driven
Process-driven
Human knowledgeScience
Scientific methodsObservationsExperiments
Noisy/missingdata
Scientific methods
Fundamental lawsKnowledge
Dealing withnoise/miss
1st science (experimental)
ObservationsExperiments
Scientific methods
Simulation
Fundamental lawsKnowledge
Modeling
2nd science (theoretical)
ObservationsExperiments
Scientific methods
Fundamental lawsKnowledge
Model errors
3rd science (computational)
SimulationObservationsExperiments
Scientific methodsBig Data beyond
human ability
4th science (data-centric)
SimulationObservationsExperiments
Data Assimilation
Data Assimilation connects data and simulationand brings synergy
SimulationObservationsExperiments
The 5th paradigm?Statistics
Dynamical systems
5th science ??(data × computation)
SimulationObservationsExperiments
DA workflow
Simulation
DA
Initial State
Simulated State
Observations
(Best estimate)
Sim-to-Obsconversion
Sim-minus-Obs
Broad-sense DA
Data-driven
Process-driven
DA = math of errors
forecast
0 1obs
analysis
DA
Merging 2 information (Bayesian estimation)
p
Fcst ObsMerging Fcst&Obs
Probability distribution is essential.
Big Ensemble DA
Miyoshi, Kondo, Terasaki(2014, Computer)
doi:10.1109/MC.2015.332
Sample size = Resolution of probability
10240
5120
80
160
320
640
1280
2560
40
20
1.856N, 176.25E
Kondo&Miyoshi(2019, NPG)
doi:10.5194/npg-26-211-2019
Non-Gaussian metric (KLD)
10240
5120
80
160
320
640
1280
2560
40
20
Kondo&Miyoshi(2019, NPG)
doi:10.5194/npg-26-211-2019
Non-Gaussian metric (KLD)
10240
5120
80
160
320
640
1280
2560
40
20
Non-Guassian PDF captured with>1000
Kondo&Miyoshi(2019, NPG)
doi:10.5194/npg-26-211-2019
Pushing the limitsBig Data × Big Simulations
Big ensemble (10240 ensemble members)Rapid update (30-second update)
High resolution (100-m mesh) Future Numerical Weather Prediction
Fugaku Good for both ML
and Big DA(e.g, Global 3.5-km mesh
1024 samples)
K (or most other HPCs) Not suitable for ML Good for Big DA
(e.g., global 112-km mesh10240 samples)
Fugaku : K = 100 : 1
Mesh size: 32x(Grid points: 1024x)
Sample size: 0.1x
DA-AI Integration
Simulation
DA
Initial State
Simulated State
Observations
(Best estimate)
Sim-to-Obsconversion
Sim-minus-Obs
Broad-sense DA
Predict high-resolutionfrom low-resolution model
Predict model error
Model-obs relationship
Quality control
DA algorithm
Surrogate model
Need to learn big computation data on HPC (cannot move)
Conv-LSTM by Shi et al. (2015)Extended to three-dimensional radar data
3D Precip. NowcastingObservation
Weather Simulation OUTPUT
No input of future data
Otsuka, Miyoshi, et al. Poster presentation
Conv-LSTM is effective.2.5-min prediction ConvLSTM3D
(Work with Mr. Viet Phi Huynh and Prof. Pierre Tandeo)
Otsuka, Miyoshi, et al. Poster presentation
Fusing ML+DA+SimulationObservation
Weather Simulation
INPUT
OUTPUT New 3D Precip. Nowcasting
Improved forecast
Input of future data from NWP!!
(NICT)
Otsuka, Miyoshi, et al. Poster presentation
Preliminary results:Using future data in Conv-LSTM is effective.
Better
Otsuka, Miyoshi, et al. Poster presentation
DA-AI Integration
Simulation
DA
Initial State
Simulated State
Observations
(Best estimate)
Sim-to-Obsconversion
Sim-minus-Obs
Broad-sense DA
Predict high-resolutionfrom low-resolution model
Predict model error
Model-obs relationship
Quality control
DA algorithm
Surrogate model
Need to learn big computation data on HPC (cannot move)
Approaches
Process Driven Physical Model (PDPM)Numerical model
Data Driven Statistical Model (DDSM)Surrogate modeling: Convolutional Neural Networks (CNN)
Hybrid Physical Statistical Model (HPSM)Super resolution: Convolutional Neural Networks
(CNN)
Climate Model Acceleration by Machine LearningMdini, Otsuka, Miyoshi Poster presentation
Use case
• Quasi-geostrophic model (QG)• Data set: 50000 QG runs (2 scales)
Low resolution output (LR)
High resolution output (HR)
Mdini, Otsuka, Miyoshi
• PPDM output: ground truth• Linear Interpolation (LI): baseline to
evaluate CNN Super-resolution capacity
• Evaluation metrics• Mean Absolute Error (MAE)• Anomaly Correlation Coefficient (ACC)• Computation time
Experiment 3rd day outputs
Mdini, Otsuka, Miyoshi Poster presentation
Results MAE ACC
Computation time
• Predictability range: • DDSM: 2 days• HPSM: 9 days
• Computation time reduction: • DDSM: ¼• HPSM: ⅓
Mdini, Otsuka, Miyoshi Poster presentation
DA-AI Integration
Simulation
DA
Initial State
Simulated State
Observations
(Best estimate)
Sim-to-Obsconversion
Sim-minus-Obs
Broad-sense DA
Predict high-resolutionfrom low-resolution model
Predict model error
Model-obs relationship
Quality control
DA algorithm
Surrogate model
Need to learn big computation data on HPC (cannot move)
Nonlinear bias correction with ML
Model
𝒙𝒙𝑡𝑡+1𝑎𝑎 = �𝒙𝒙𝑡𝑡+1𝑓𝑓 + 𝑲𝑲 𝒚𝒚𝑡𝑡+1 − 𝐻𝐻 �𝒙𝒙𝑡𝑡+1
𝑓𝑓
observationforecast𝒙𝒙𝑡𝑡+1𝑓𝑓
Analysis
𝒙𝒙𝑡𝑡+1 = 𝑴𝑴 𝒙𝒙𝑡𝑡
LETKF
𝒚𝒚𝑡𝑡+1
Amemiya, Mohta, Miyoshi Poster presentation
Nonlinear bias correction with ML
Model
𝒙𝒙𝑡𝑡+1𝑎𝑎 = �𝒙𝒙𝑡𝑡+1𝑓𝑓 + 𝑲𝑲 𝒚𝒚𝑡𝑡+1 − 𝐻𝐻 �𝒙𝒙𝑡𝑡+1
𝑓𝑓
observationforecast𝒙𝒙𝑡𝑡+1𝑓𝑓
Bias correction
Analysis
Train the network 𝒃𝒃 with 𝒙𝒙𝑡𝑡+1𝑓𝑓 ,𝒙𝒙𝑡𝑡+1𝑎𝑎
𝒙𝒙𝑡𝑡+1 = 𝑴𝑴 𝒙𝒙𝑡𝑡
�𝒙𝒙𝑡𝑡+1𝑓𝑓 = 𝒃𝒃(𝒙𝒙𝑡𝑡+1
𝑓𝑓 , … )
LETKF
𝒚𝒚𝑡𝑡+1
RNN
Amemiya, Mohta, Miyoshi Poster presentation
LSTM/GRU implementation
• Activation:tanh / sigmoid(recurrent)
LSTM
• No regularization / dropout
𝒙𝒙𝑡𝑡𝑓𝑓,𝒙𝒙𝑡𝑡𝑎𝑎
Input𝒙𝒙𝑡𝑡−Δ𝑡𝑡𝑓𝑓 …𝒙𝒙𝑡𝑡
𝑓𝑓
Output𝒙𝒙𝑡𝑡𝑎𝑎LSTM Dense Dense
Lorenz96LETKF �𝒙𝒙𝑡𝑡
𝑓𝑓
Python TensorFlowTensorflow LSTM is implemented and integrated with LETKF codes
Network architecture
• 1 LSTM + 3 Dense layers
• Spatial Localization
Amemiya, Mohta, Miyoshi Poster presentation
Additional advection term case
LSTM and NN performs clearly better than linear regressionLarger localization leads to better improvement
𝑑𝑑𝑑𝑑𝑑𝑑𝑥𝑥𝑘𝑘 = 𝑥𝑥𝑘𝑘−1 𝑥𝑥𝑘𝑘+1 − 𝑥𝑥𝑘𝑘−2 − 𝑥𝑥𝑘𝑘 + 𝐹𝐹 + 𝑓𝑓𝑘𝑘(𝒙𝒙)
Missing term ( = negative model bias )“Nature run”
𝑑𝑑𝑑𝑑𝑑𝑑𝑥𝑥𝑘𝑘 = 𝑥𝑥𝑘𝑘−1 𝑥𝑥𝑘𝑘+1 − 𝑥𝑥𝑘𝑘−2 − 𝑥𝑥𝑘𝑘 + 𝐹𝐹Forecast model
Biased advection factor case: 𝑓𝑓𝑘𝑘 𝒙𝒙 = 0.2 × 𝑥𝑥𝑘𝑘−1 𝑥𝑥𝑘𝑘+1 − 𝑥𝑥𝑘𝑘−2
Test RMSE in bias correction
Amemiya, Mohta, Miyoshi Poster
Bias corrected analysis and forecast RMSE
• Improvement in analysis RMSE with smaller multiplicative inflation factor
• Improvement in forecast RMSE by smaller error growth ratio
Analysis RMSE Extended forecast RMSE
Amemiya, Mohta, Miyoshi Poster
Coupled model with non-local dependency case
Analysis RMSE Extended forecast RMSE
“Nature run”: Shear Lorenz96 model (Pulido et al., 2018)𝑑𝑑𝑑𝑑𝑑𝑑𝑥𝑥𝑘𝑘 = 𝑥𝑥𝑘𝑘−1 𝑥𝑥𝑘𝑘+1 − 𝑥𝑥𝑘𝑘−2 − 𝑥𝑥𝑘𝑘 + 𝐹𝐹 −
ℎ𝑐𝑐𝑏𝑏𝑓𝑓𝑘𝑘(𝒚𝒚)
𝑑𝑑𝑑𝑑𝑑𝑑𝑦𝑦𝑗𝑗 = 𝑐𝑐𝑏𝑏𝑦𝑦𝑗𝑗+1 𝑦𝑦𝑗𝑗−1 − 𝑦𝑦𝑗𝑗+2 − 𝑐𝑐𝑦𝑦𝑗𝑗 +
ℎ𝑐𝑐𝑏𝑏𝑔𝑔𝑗𝑗(𝒙𝒙) 𝑔𝑔𝑗𝑗 𝒙𝒙 = 𝛼𝛼 𝑥𝑥int ⁄𝑗𝑗 𝐾𝐾 +1 − 𝑥𝑥int ⁄𝑗𝑗 𝐾𝐾 −1
𝑓𝑓𝑘𝑘 𝒚𝒚 = �𝑗𝑗= 𝑘𝑘−1 𝐽𝐽/𝐾𝐾+1
𝑘𝑘𝐽𝐽/𝐾𝐾
𝑦𝑦𝑗𝑗
𝑑𝑑𝑑𝑑𝑑𝑑𝑥𝑥𝑘𝑘 = 𝑥𝑥𝑘𝑘−1 𝑥𝑥𝑘𝑘+1 − 𝑥𝑥𝑘𝑘−2 − 𝑥𝑥𝑘𝑘 + 𝐹𝐹
Forecast model
Amemiya, Mohta, Miyoshi Poster
DA-AI Integration
Simulation
DA
Initial State
Simulated State
Observations
(Best estimate)
Sim-to-Obsconversion
Sim-minus-Obs
Broad-sense DA
Predict high-resolutionfrom low-resolution model
Predict model error
Model-obs relationship
Quality control
DA algorithm
Surrogate model
Need to learn big computation data on HPC (cannot move)
𝑥𝑥𝑎𝑎 = 𝑥𝑥𝑏𝑏 + 𝐾𝐾(𝑦𝑦 − 𝐻𝐻(𝑥𝑥𝑏𝑏))
Physically based model
(e.g. radiative transfer model)
Observed variablesModel Variables
Machine learning model (ML) Observed variablesModel Variables
Building a general ML approach to the observation operator
Current observation operator
Proposed observation operator
Jiang, Terasaki, Miyoshi
Our goal
Region model observationsKwon et al., 2019 Snow in High
Mountain AsiaSupport vector machine
Satellite radiance
Jing et al., 2019 Sea ice Neural network Satellite radiance
Region model observationsAll kinds of surface conditions
Investigate more models: Neural networkTree method, etc.
• Satellite radiance• Venus satellite• Satellite from our
industry partners
Our goal – a general approach
We aim to build a general approach to apply ML to observation operator, so that any new observations in the future can be quickly used in DA.
Previous research using ML as an observation operator
DA-AI fusion
Simulation
DA
Initial State
Simulated State
Observations
(Best estimate)
Sim-to-Obsconversion
Sim-minus-Obs
Broad-sense DA
Predict high-resolutionfrom low-resolution model
Predict model error
Model-obs relationship
Quality control
DA algorithm
Surrogate model
Need to learn big computation data on HPC (cannot move)Using AI in DA
Fusing AI and DA with HPCNew meteorology
(the 5th Science)