the rationale and methodology of the 2nd sc5 pilot

22
THE RATIONALE AND METHODOLOGY OF THE BDE 2 ND SC5 PILOT NCSR “Demokritos” 20-Dec.-16

Upload: bigdataeurope

Post on 10-Feb-2017

218 views

Category:

Technology


0 download

TRANSCRIPT

THE RATIONALE AND METHODOLOGY OF THE BDE 2ND SC5 PILOTNCSR “Demokritos”20-Dec.-16

Framework

¥ Computational modelling of atmospheric dispersion of hazardous pollutants

¥ How can BigDataEurope Integrator tools contribute to performing more efficiently computational tasks related to atmospheric dispersion of hazardous pollutants?

11-oct.-16www.big-data-europe.eu

Purposes and means¥ Air pollution abatement / early warning / countermeasures

o Anthropogenic emissions: routine, accidental (nuclear, chemical), malevolent (terrorist) – unannounced releases

o Natural emissions (e.g., volcanic eruptions)

¥ Measurements (from earth or space)

¥ Mathematical modelling¥ Combination of the above → “forward” or “inverse” modelling

through “data assimilation”

11-oct.-16www.big-data-europe.eu

Input data for dispersion modelling¥ Meteorology¥ “Source term”: knowledge of the emitted pollutant(s)

source(s): Location, quantity and conditions of release, timing

¥ Terrain characteristics, geometry of buildings etc.¥ Depending on available input and measurement data:

“forward” or “inverse” modelling

11-oct.-16www.big-data-europe.eu

Cases of “inverse” computations¥ The pollutant emission sources are NOT known:

location and / or quantity of emitted substanceso Technological accidents (e.g., chemical, nuclear), natural

disasters (e.g., volcanos): known location, unknown emission

o Un-announced technological accidents (e.g. Chernobyl), malevolent intentional releases (terrorism), nuclear tests

¥ Inverse “source-term” estimation techniques11-oct.-16www.big-data-europe.eu

Inverse source-term estimation

¥ Available information:o Measurements indicating the presence of air pollutanto Meteorological data for now and recent past

¥ Mathematical techniques blending the above with results of dispersion models to infer position and strength of emitting sourceo Special attention: multiple solutions

11-oct.-16www.big-data-europe.eu

Introducing the 2nd BDE SC5 Pilot¥ The previously mentioned mathematical techniques require

large computing times

¥ Purpose: fast estimation of source location in emergencies¥ Proposed solution: pre-calculate a large number of scenarios,

store them, and at the time of an emergency select the “most appropriate”

¥ BDE will provide the tools to perform this functionality efficiently

11-oct.-16www.big-data-europe.eu

Structure of the 2nd BDE SC5 Pilot

¥ Geographic area: Europe¥ Cases of interest: accidents at Nuclear Power Plants¥ Weather calculations:

o Re-analysis data for 20 yearso Clustering → “typical” weather circulation patternso Downscaling through WRF for the “typical” weather

circulation patterns11-oct.-16www.big-data-europe.eu

Structure of the 2nd BDE SC5 Pilot

¥ Dispersion calculations:o Calculation of dispersion patterns from NPPs for the

above downscaled typical weather circulation patternso Dispersion results: gridded and (optionally) at

monitoring stations

11-oct.-16www.big-data-europe.eu

Structure of the 2nd BDE SC5 Pilot

¥ In the event of radiation signals at some stations:o Matching of current and recent weather to closest

typical circulation patterno From the stored dispersion results pertaining to the

matched weather circulation patterns select the one that closest matches the monitoring data

o The matched dispersion pattern will reveal the most probable emission source

11-oct.-16www.big-data-europe.eu

So far …

¥ Preliminary clustering studies on limited amount of re-analysis data (while waiting for full download)o On the basis of different variables on different

pressure levels

¥ Dispersion calculations for a selected NPP for the revealed weather classes

11-oct.-16www.big-data-europe.eu

So far …¥ Selected a random date, taken as “true” accident day¥ Matching of the “true” day’s weather data with the closest

weather class from the clustering procedure¥ Dispersion calculations with the weather data of the “true” day

¥ Comparison of dispersion results based on “true” and matched weather data

11-oct.-16www.big-data-europe.eu

Workflow

www.big-data-europe.eu

ECM

WF Weather

reanalysis data (20+years) W

RF Pre-processed weather data

Clu

ster

ing Predominant

weather patterns

DIP

CO

T Dispersions for weather patterns, for a number of fixed nuclear sites

Det

ecto

r Detection of dangerous release Wea

ther

se

rvic

e Recent weather (e.g. 3 days)

Batch processing

Interactive workflow Comparison

Candidate release origins

Data

¥ ECMWF Reanalysis data¥ NCAR-UCAR Archive

o Better compatibility with WPS/WRF

¥ 20-30 yearso Approx. 6 TB in total

¥ Grib2 format – again for better compatibility with WRFo NetCDF via WPS

¥ Many variables at multiple geopotential heightswww.big-data-europe.eu

Architectural Overview

www.big-data-europe.eu

Possible additions as BDE pilot components:(1) POSTGIS(2) DIPCOT

Clustering

¥ Traditional methodso Agglomerative hierarchicalo K-means

¥ Soon to implemento NN-based feature extraction (e.g. autoencoders,

convolution nets)o (Possibly) followed by k-means

www.big-data-europe.eu

Evaluation¥ Incremental

o Clustering outcomeo Closeness of constituent weather within clusters / distance between

clusterso Dispersion characteristicso Different cluster descriptors for

v Creating cluster-based dispersionsv Matching “real data” to clusters

¥ Completeo Compare cluster-based dispersion againsto “Real data” dispersion

v For a number of hypothetical scenarioswww.big-data-europe.eu

Preliminary results¥ Clustering over 2-year period (1986, 1987)

o K=6 clusters¥ Multiple geopotentials¥ Other variables – notably wind speed – at

different heights¥ “Visual comparison” against “real data” dispersions¥ Incrementally combining more vars

www.big-data-europe.eu

Cluster quality / GHT 500hPa

www.big-data-europe.eu

• 1986, 1987• Resolution=• Items (6-hr snapshots) =

• K-means, for K-6• Geopotential height=500hPa• Dispersions well differentiated for a

specific hypothetical origin

• Real data:

Different Clustering Algorithms

www.big-data-europe.eu

Immediate Future Work

¥ Feature extractiono Taking into account multiple variableso At more heights

¥ Automatic evaluationo For a number of pre-selected scenarios

¥ Dockerisation and inclusion into the BDE architecture

www.big-data-europe.eu

11-oct.-16www.big-data-europe.eu

Thank you for your attention!