big data, big computation, and machine learning · takemasa miyoshi data assimilation research team...

Takemasa Miyoshi

Data Assimilation Research TeamRIKEN

Ph.D. (Meteorology)Data Assimilation Scientist

Big Data, Big Computation, and Machine Learning

in Numerical Weather Prediction

Who am I? http://data-assimilation.riken.jp/~miyoshi/

http://tedxsannomiya.com/en/speakers/takemasa-miyoshi/

B.S. from Kyoto U↓

JMA administration (2y)↓

JMA NWP (1.25y)↓

UMD (2y, M.S. and Ph.D.)↓

JMA NWP (3.5y)↓

UMD (4y)↓

RIKEN (7.5y+)

http://www.data-assimilation.riken.jp/

Big Data Assimilation

SimulationsObservations

Powerful supercomputerNew sensors, IoT

Big Data Big Data100x 100x

9/11/2014, sudden local rain

6

100

100

dual polarization 100×100

elements array antenna

Multi-parameter phased array weather radar (MP-PAWR) was developed by SIP (Cross-ministerial Strategic Innovation Promotion Program) in 2014-2018as a research subject of “torrential rainfall and tornadoes prediction.”

Early forecasting by water vapor, cloud, and precipitation observation

generate develop mature

★ Saitama Univ.（MP-PAWR site)● Olympic and Paralympic venues

Radius 60 km

Radius 80 km

Arakawa basin

Development of MP-PAWR

MP-PAWR features

MP-PAWR observation area

MP-PAWR antenna

MP-PAWR installed at Saitama Univ. on Nov 21, 2017, and observation began in July 2018.

Special arrangementfor an exclusive use of Oakforest-PACS

of the U of Tokyo and Tsukuba U

Nested computational domains

30-min-lead forecastrefreshed every 30 seconds

25 August 2019 00:40 JST

JMA Nowcast10-min lead

This study10-min lead

MP-PAWRobservation

Process-driven model predictsrapid changes of rains

• Rapid development (red broken circles)• Rapid weakening (left of red circles)

Smartphone app by MTI Co. Ltd.

11

Real-time test in August 2020

12

Most of the time,30-min forecast is ready in ~3 min.after observation


13

Most of the time,30-min forecast is ready in ~3 min.after observation


Data Assimilation (DA)

Observations

>2

1 1+Data Assimilation

Simulations

Data Assimilation (DA)

Observations

>2

1 1+Data Assimilation

Simulations

Process-drivenDeduction

Cyber world

Data-drivenInductionReal world

DA workflow

Simulation

DA

Initial State

Simulated State

Observations

(Best estimate)

Sim-to-Obsconversion

Sim-minus-ObsDA

Data-driven

Process-driven

DA workflow

Simulation

DA

Initial State

Simulated State

Observations

(Best estimate)


Sim-minus-ObsDA

Data-driven

Process-driven

Human knowledgeScience

Scientific methodsObservationsExperiments

Noisy/missingdata

Scientific methods

Fundamental lawsKnowledge

Dealing withnoise/miss

1st science (experimental)

ObservationsExperiments

Scientific methods

Simulation


Modeling

2nd science (theoretical)

ObservationsExperiments

Scientific methods


Model errors

3rd science (computational)

SimulationObservationsExperiments

Scientific methodsBig Data beyond

human ability

4th science (data-centric)


Data Assimilation

Data Assimilation connects data and simulationand brings synergy


The 5th paradigm?Statistics

Dynamical systems

5th science ??(data × computation)


DA workflow

Simulation

DA

Initial State

Simulated State

Observations

(Best estimate)


Sim-minus-Obs

Broad-sense DA

Data-driven

Process-driven

DA = math of errors

forecast

0 1obs

analysis

DA

Merging 2 information (Bayesian estimation)

p

Fcst ObsMerging Fcst&Obs

Probability distribution is essential.

Big Ensemble DA

Miyoshi, Kondo, Terasaki(2014, Computer)

doi:10.1109/MC.2015.332

Sample size = Resolution of probability

10240

5120

80

160

320

640

1280

2560

40

20

1.856N, 176.25E

Kondo&Miyoshi(2019, NPG)

doi:10.5194/npg-26-211-2019

Non-Gaussian metric (KLD)

10240

5120

80

160

320

640

1280

2560

40

20


doi:10.5194/npg-26-211-2019

Non-Gaussian metric (KLD)

10240

5120

80

160

320

640

1280

2560

40

20

Non-Guassian PDF captured with>1000


doi:10.5194/npg-26-211-2019

Pushing the limitsBig Data × Big Simulations

Big ensemble (10240 ensemble members)Rapid update (30-second update)

High resolution (100-m mesh) Future Numerical Weather Prediction

Fugaku Good for both ML

and Big DA(e.g, Global 3.5-km mesh

1024 samples)

K (or most other HPCs) Not suitable for ML Good for Big DA

(e.g., global 112-km mesh10240 samples)

Fugaku : K = 100 : 1

Mesh size: 32x(Grid points: 1024x)

Sample size: 0.1x

DA-AI Integration

Simulation

DA

Initial State

Simulated State

Observations

(Best estimate)


Sim-minus-Obs

Broad-sense DA

Predict high-resolutionfrom low-resolution model

Predict model error

Model-obs relationship

Quality control

DA algorithm

Surrogate model

Need to learn big computation data on HPC (cannot move)

Conv-LSTM by Shi et al. (2015)Extended to three-dimensional radar data

3D Precip. NowcastingObservation

Weather Simulation OUTPUT

No input of future data

Otsuka, Miyoshi, et al. Poster presentation

Conv-LSTM is effective.2.5-min prediction ConvLSTM3D

(Work with Mr. Viet Phi Huynh and Prof. Pierre Tandeo)


Fusing ML+DA+SimulationObservation

Weather Simulation

INPUT

OUTPUT New 3D Precip. Nowcasting

Improved forecast

Input of future data from NWP!!

(NICT)


Preliminary results:Using future data in Conv-LSTM is effective.

Better


DA-AI Integration

Simulation

DA

Initial State

Simulated State

Observations

(Best estimate)


Sim-minus-Obs

Broad-sense DA


Predict model error


Quality control

DA algorithm

Surrogate model


Approaches

Process Driven Physical Model (PDPM)Numerical model

Data Driven Statistical Model (DDSM)Surrogate modeling: Convolutional Neural Networks (CNN)

Hybrid Physical Statistical Model (HPSM)Super resolution: Convolutional Neural Networks

(CNN)

Climate Model Acceleration by Machine LearningMdini, Otsuka, Miyoshi Poster presentation

Use case

• Quasi-geostrophic model (QG)• Data set: 50000 QG runs (2 scales)

Low resolution output (LR)

High resolution output (HR)

Mdini, Otsuka, Miyoshi

• PPDM output: ground truth• Linear Interpolation (LI): baseline to

evaluate CNN Super-resolution capacity

• Evaluation metrics• Mean Absolute Error (MAE)• Anomaly Correlation Coefficient (ACC)• Computation time

Experiment 3rd day outputs

Mdini, Otsuka, Miyoshi Poster presentation

Results MAE ACC

Computation time

• Predictability range: • DDSM: 2 days• HPSM: 9 days

• Computation time reduction: • DDSM: ¼• HPSM: ⅓

Mdini, Otsuka, Miyoshi Poster presentation

DA-AI Integration

Simulation

DA

Initial State

Simulated State

Observations

(Best estimate)


Sim-minus-Obs

Broad-sense DA


Predict model error


Quality control

DA algorithm

Surrogate model


Nonlinear bias correction with ML

Model

𝒙𝒙𝑡𝑡+1𝑎𝑎 = �𝒙𝒙𝑡𝑡+1𝑓𝑓 + 𝑲𝑲 𝒚𝒚𝑡𝑡+1 − 𝐻𝐻 �𝒙𝒙𝑡𝑡+1

𝑓𝑓

observationforecast𝒙𝒙𝑡𝑡+1𝑓𝑓

Analysis

𝒙𝒙𝑡𝑡+1 = 𝑴𝑴 𝒙𝒙𝑡𝑡

LETKF

𝒚𝒚𝑡𝑡+1

Amemiya, Mohta, Miyoshi Poster presentation

Nonlinear bias correction with ML

Model

𝒙𝒙𝑡𝑡+1𝑎𝑎 = �𝒙𝒙𝑡𝑡+1𝑓𝑓 + 𝑲𝑲 𝒚𝒚𝑡𝑡+1 − 𝐻𝐻 �𝒙𝒙𝑡𝑡+1

𝑓𝑓

observationforecast𝒙𝒙𝑡𝑡+1𝑓𝑓

Bias correction

Analysis

Train the network 𝒃𝒃 with 𝒙𝒙𝑡𝑡+1𝑓𝑓 ,𝒙𝒙𝑡𝑡+1𝑎𝑎

𝒙𝒙𝑡𝑡+1 = 𝑴𝑴 𝒙𝒙𝑡𝑡

�𝒙𝒙𝑡𝑡+1𝑓𝑓 = 𝒃𝒃(𝒙𝒙𝑡𝑡+1

𝑓𝑓 , … )

LETKF

𝒚𝒚𝑡𝑡+1

RNN


LSTM/GRU implementation

• Activation:tanh / sigmoid(recurrent)

LSTM

• No regularization / dropout

𝒙𝒙𝑡𝑡𝑓𝑓,𝒙𝒙𝑡𝑡𝑎𝑎

Input𝒙𝒙𝑡𝑡−Δ𝑡𝑡𝑓𝑓 …𝒙𝒙𝑡𝑡

𝑓𝑓

Output𝒙𝒙𝑡𝑡𝑎𝑎LSTM Dense Dense

Lorenz96LETKF �𝒙𝒙𝑡𝑡

𝑓𝑓

Python TensorFlowTensorflow LSTM is implemented and integrated with LETKF codes

Network architecture

• 1 LSTM + 3 Dense layers

• Spatial Localization


Additional advection term case

LSTM and NN performs clearly better than linear regressionLarger localization leads to better improvement

𝑑𝑑𝑑𝑑𝑑𝑑𝑥𝑥𝑘𝑘 = 𝑥𝑥𝑘𝑘−1 𝑥𝑥𝑘𝑘+1 − 𝑥𝑥𝑘𝑘−2 − 𝑥𝑥𝑘𝑘 + 𝐹𝐹 + 𝑓𝑓𝑘𝑘(𝒙𝒙)

Missing term ( = negative model bias )“Nature run”

𝑑𝑑𝑑𝑑𝑑𝑑𝑥𝑥𝑘𝑘 = 𝑥𝑥𝑘𝑘−1 𝑥𝑥𝑘𝑘+1 − 𝑥𝑥𝑘𝑘−2 − 𝑥𝑥𝑘𝑘 + 𝐹𝐹Forecast model

Biased advection factor case: 𝑓𝑓𝑘𝑘 𝒙𝒙 = 0.2 × 𝑥𝑥𝑘𝑘−1 𝑥𝑥𝑘𝑘+1 − 𝑥𝑥𝑘𝑘−2

Test RMSE in bias correction

Amemiya, Mohta, Miyoshi Poster

Bias corrected analysis and forecast RMSE

• Improvement in analysis RMSE with smaller multiplicative inflation factor

• Improvement in forecast RMSE by smaller error growth ratio

Analysis RMSE Extended forecast RMSE


Coupled model with non-local dependency case

Analysis RMSE Extended forecast RMSE

“Nature run”: Shear Lorenz96 model (Pulido et al., 2018)𝑑𝑑𝑑𝑑𝑑𝑑𝑥𝑥𝑘𝑘 = 𝑥𝑥𝑘𝑘−1 𝑥𝑥𝑘𝑘+1 − 𝑥𝑥𝑘𝑘−2 − 𝑥𝑥𝑘𝑘 + 𝐹𝐹 −

ℎ𝑐𝑐𝑏𝑏𝑓𝑓𝑘𝑘(𝒚𝒚)

𝑑𝑑𝑑𝑑𝑑𝑑𝑦𝑦𝑗𝑗 = 𝑐𝑐𝑏𝑏𝑦𝑦𝑗𝑗+1 𝑦𝑦𝑗𝑗−1 − 𝑦𝑦𝑗𝑗+2 − 𝑐𝑐𝑦𝑦𝑗𝑗 +

ℎ𝑐𝑐𝑏𝑏𝑔𝑔𝑗𝑗(𝒙𝒙) 𝑔𝑔𝑗𝑗 𝒙𝒙 = 𝛼𝛼 𝑥𝑥int ⁄𝑗𝑗 𝐾𝐾 +1 − 𝑥𝑥int ⁄𝑗𝑗 𝐾𝐾 −1

𝑓𝑓𝑘𝑘 𝒚𝒚 = �𝑗𝑗= 𝑘𝑘−1 𝐽𝐽/𝐾𝐾+1

𝑘𝑘𝐽𝐽/𝐾𝐾

𝑦𝑦𝑗𝑗

𝑑𝑑𝑑𝑑𝑑𝑑𝑥𝑥𝑘𝑘 = 𝑥𝑥𝑘𝑘−1 𝑥𝑥𝑘𝑘+1 − 𝑥𝑥𝑘𝑘−2 − 𝑥𝑥𝑘𝑘 + 𝐹𝐹

Forecast model


DA-AI Integration

Simulation

DA

Initial State

Simulated State

Observations

(Best estimate)


Sim-minus-Obs

Broad-sense DA


Predict model error


Quality control

DA algorithm

Surrogate model


𝑥𝑥𝑎𝑎 = 𝑥𝑥𝑏𝑏 + 𝐾𝐾(𝑦𝑦 − 𝐻𝐻(𝑥𝑥𝑏𝑏))

Physically based model

(e.g. radiative transfer model)

Observed variablesModel Variables

Machine learning model (ML) Observed variablesModel Variables

Building a general ML approach to the observation operator

Current observation operator

Proposed observation operator

Jiang, Terasaki, Miyoshi

Our goal

Region model observationsKwon et al., 2019 Snow in High

Mountain AsiaSupport vector machine

Satellite radiance

Jing et al., 2019 Sea ice Neural network Satellite radiance

Region model observationsAll kinds of surface conditions

Investigate more models: Neural networkTree method, etc.

• Satellite radiance• Venus satellite• Satellite from our

industry partners

Our goal – a general approach

We aim to build a general approach to apply ML to observation operator, so that any new observations in the future can be quickly used in DA.

Previous research using ML as an observation operator

DA-AI fusion

Simulation

DA

Initial State

Simulated State

Observations

(Best estimate)


Sim-minus-Obs

Broad-sense DA


Predict model error


Quality control

DA algorithm

Surrogate model

Need to learn big computation data on HPC (cannot move)Using AI in DA

Fusing AI and DA with HPCNew meteorology

(the 5th Science)

big data, big computation, and machine learning · takemasa miyoshi data assimilation research team...

Documents