data centric hpc for numerical weather forecasting

15
DATA CENTRIC HPC FOR NUMERICAL WEATHER FORECASTING James Faeldon Delfin Jay Sabido III Karen España IBM Philippines, STG Labs

Upload: james-faeldon

Post on 18-Dec-2014

187 views

Category:

Data & Analytics


1 download

DESCRIPTION

Presentation at the HPC for Big Data Workshop in the 2014 International Conference on Parallel Processing. Paper is published by IEEE in the Proceedings of the 2014 ICCPW.

TRANSCRIPT

Page 1: Data Centric HPC for Numerical Weather Forecasting

DATA CENTRIC HPC FOR NUMERICAL

WEATHER FORECASTING

James Faeldon

Delfin Jay Sabido III

Karen España

IBM Philippines, STG Labs

Page 2: Data Centric HPC for Numerical Weather Forecasting

Extreme Weather Events

• The Philippines is home to devastating typhoons.

• 19 typhoons a year and intense monsoon rains that can

cause widespread flooding.

• Research collaboration by the Philippine Government,

University of the Philippines and IBM (2013).

P The strongest typhoons group

near the Philippines

Image courtesy of NOAA

Typhoon Tracks Eastern Hemisphere

Before After

Super Typhoon Haiyan (Nov 2013)

Image courtesy of DigitalGlobe

Page 3: Data Centric HPC for Numerical Weather Forecasting

Coupled Models for Pre-Disaster Planning

Numerical weather model

forecasts typhoon track and intensity

Machine learning model predicts

affected population and damages

Optimization model recommends

relief supplies pre-positioning and

allocation

Typhoons can be forecasted a few days in advance.

But we need more reports, better visualization and data

exploration tools to reduce analysis cycles and facilitate

timely decisions.

Operations Center

Page 4: Data Centric HPC for Numerical Weather Forecasting

Operational Forecasting Schedule Runs

Data-Intensive

Compute-Intensive

Data-Intensive processes increasingly becoming the

bottleneck in operational forecasting workflow.

Page 5: Data Centric HPC for Numerical Weather Forecasting

Drivers for Increased Data Processing

Analytics Big Data

Page 6: Data Centric HPC for Numerical Weather Forecasting

Operational Forecasting Data Challenges

Quality Control Sampling

Verification Machine Learning

Ensemble Forecasts

Update relief operations plan based on new forecast

+ 7 historical days

663 Gb per forecast

Model Output

Statistics

6-hour

processing

and

analysis

window

ETL

Source Qty Unit Size Total Size

AWS 733 7Kb/day 5Mb/day

Satellite 1 480Mb/day 480Mb/day

Radar 7 9Gb/day 63Gb/day

Real-time Sensor Data

Res Cells Grid Cells Total Size

12km 5.2 M 307 x 481 x 35 81Gb/forecast

4km 8.8 M 619 x 406 x 35 138Gb/forecast

Forecast Data

Page 7: Data Centric HPC for Numerical Weather Forecasting

Project Goals

• Manage and process data arriving in time-sensitive

remote sensors and weather forecasts.

• Reduce data analysis cycles to facilitate timely decisions.

Page 8: Data Centric HPC for Numerical Weather Forecasting

Numerical Weather Model

Post-Processing

MapReduce, NoSQL Database

Stream Pre-Processing Date Warehouse, OLAP Database

Weather Sensors

Observations Structured Data

Data A

ssimilatio

n

Fo

reca

st D

ata

1 Remote sensor data

in various format.

2 Quality Control,

Interpolation,

Sampling, Filtering,

Classification

3 High Performance

Computing

4 Store structured and

unstructured data for

analysis and post-

processing

5 Business

intelligence, data

mining,

visualization,

verification 6 Dashboards and Reports

Automated End-to-End Process

Decision Support Tool

Reports

Page 9: Data Centric HPC for Numerical Weather Forecasting

Hardware Infrastructure

Traditional HPC

(BlueGene/P)

Commodity Servers

(x86)

Elastic

Cloud Computing

(Virtual Machines)

In-situ Big Data

MapReduce

Real-time

Data Processing

OLAP

Visualization

Numerical Weather

Models

MPP Jobs

Page 10: Data Centric HPC for Numerical Weather Forecasting

Weather Model

• WRF ARW v3.5 limited area model

• 3.4 hours using 2048 cores

BlueGene/P (850Mhz).

10

Page 11: Data Centric HPC for Numerical Weather Forecasting

Pre-Processing • Stream Processing, ETL, R, Python

• Multi-stage quality control of remote sensor data.

• Spatio-temporal interpolation and sampling.

• Star-schema data warehouse.

• NoSQL with MapReduce.

NetCDF,

Image,

CSV

Staging

Files

Low-latency

Stream

Processing

ETL

Custom Scripts NoSQL

Data Warehouse BI Cubes

Observations,

Forecast Raw

Data

Quality

Control,

Sampling,

Filtering

Structured point or topological data (small <1TB),

emphasis on data consistency.

Gridded high-resolution data (big >1TB), emphasis

on availability and scalability. Input to coupled

models down the line.

Data stores for post

processing…

Page 12: Data Centric HPC for Numerical Weather Forecasting

Post Processing

• Business Intelligence Cubes • Multi-dimensional analysis

• Dashboards and reports

• GIS Integration

• MapReduce Views (NoSQL) • Model Verification

• Ensemble Forecasts/MOS

• Ad-Hoc Data Mining

Multi-Dimensional Cubes

MapReduce Views

Reports and Dashboards Reports and visualization generated using BI and data visualization tools

Custom Scripts Coupled Models Model Output

Statistics Reports and Dashboards

Down-stream predictive models uses MapReduce views as data source

Page 13: Data Centric HPC for Numerical Weather Forecasting

Current Challenges and Future Directions

• Improvements in geostatistics: Gridded data to topological features. • River basins, flood prone area, political boundaries and other locations of

interests

• Generating statistics makes for very data-intensive processing

• Potential for parallelization.

• Efficient stream processing engine of larger tuples with longer sliding windows. • Complex quality control and verification requires longer time-series statistics

spanning multi-day historical observed and forecasted data.

• Strategy: can we retain data processing all in-memory, caching, etc..

• Efficient MapReduce views on array-based data models and other approaches.

• Improvements on data warehousing schema. • Ongoing improvements for handling spatio-temporal data.

Page 14: Data Centric HPC for Numerical Weather Forecasting

Summary

• Planning for extreme weather events is a time-critical workflow that involves complex analysis of large data-sets from various sources.

• Recent advances in Big Data and HPC enables architecture of real-world disaster planning application.

• Current integration schemes uses intermediary staging files and ETL-like scripts.

• Better algorithms and techniques are needed to improve performance and integration.

Page 15: Data Centric HPC for Numerical Weather Forecasting

James Faeldon

[email protected]

IBM Philippines, STG Labs