sc7 hangout 3: architecture of the bde pilot for secure societies

23
ARCHITECTURE OF THE BDE PILOT FOR SECURE SOCIETIES 3 rd BDE Hangout “Big Data in Secure Societies” 5 Dicember 2016 George Papadakis, University of Athens Postdoctoral Researcher

Upload: bigdataeurope

Post on 15-Apr-2017

737 views

Category:

Technology


0 download

TRANSCRIPT

ARCHITECTURE OF THE

BDE PILOT FOR SECURE

SOCIETIES

3rd BDE Hangout “Big Data in Secure Societies”5 Dicember 2016

George Papadakis, University of Athens

Postdoctoral Researcher

Pilot Architecture

7-déc.-16www.big-data-europe.eu

Event Detection Workflow

7-déc.-16www.big-data-europe.eu

News

Crawler

Event

DetectorLookup

Service

ED Workflow: News Crawler

Runs periodically

Monitored sources:

o Reuters news feeds (RSS)

o Selected Twitter accounts

o Keyword-based search

Possible to cover more sources if needed

7-déc.-16www.big-data-europe.eu

ED Workflow: Cassandra

Scalable, noSQL distributed database

Input Scenario I:

o Individual news items from News Crawler

o Conforms with privacy regulations

Input Scenario II:

o Events identified by Event Detector

Input Scenario III:

o Queries about the stored news items and events

7-déc.-16www.big-data-europe.eu

ED Workflow: Event Detector

Runs periodically, parallel execution based on Spark

Input:

o News items

Output:

o Events

Every event is associated with meta-data: date & location

Algorithm based on

7-déc.-16www.big-data-europe.eu

Event Detector Algorithm

Two steps:

1. Identify events

o Compare pairs of news items

o If similarity > threshold → related pair

o Form clusters based on related pairs

o If cluster has support > threshold → event

2. Enrich events

o Compare individual items with events

o If similarity > threshold → attached to event7-déc.-16www.big-data-europe.eu

ED Workflow: Lookup service

Based on Apache Lucene for fuzzy queries

Based on the GAMD dataset

o more than 180,000 location names

Input:

o Query including an extracted location name

Output:

o The corresponding geocordinates7-déc.-16www.big-data-europe.eu

Change Detection Workflow

7-déc.-16www.big-data-europe.eu

Image

Aggregator

Change

Detector

CD Workflow: Image Aggregator

Rest service called by GUI & Event Detection

Input (manual or automatic):

o Bounding box of the area of interest (WKT)

o The time of interest

o A past time, before an event of interest took place

Output:

o a set of satellite images downloaded from ESA’s SciHub.

o Subset operator

7-déc.-16www.big-data-europe.eu

Automatic call of the CD workflow

Best-effort service

Based on a queue

o Maximum capacity: 1,000 events

o Maximum waiting time: 1 week

Input:

o Event meta-data

Output:

o Areas with detected changes & corresponding satellite images

7-déc.-16www.big-data-europe.eu

CD Workflow: HDFS

Input:

o Two satellite images in zip format, each occupying few GBs.

Output:

o Distribute parts of every image to the available cluster nodes to facilitate

their efficient processing.

7-déc.-16www.big-data-europe.eu

CD Workflow: Change Detector

Parallelizes the change detection algorithm using Spark.

Input:

o Two satellite images depicting the same geolocation.

Output:

o A set of the areas with differences between the two snapshots.

7-déc.-16www.big-data-europe.eu

Change Detector Algorithm

Three steps:

1. Preprocessing to align the given images

Coregistration (4 successive operators) or

Terrain Correction (1 operator)

2. Main algorithm to perform the actual comparison

3. DBScan for clustering together pixels with changes

Two parallelization strategies:

1. Tile-centric approach (subset operator)

2. Image-centric approach (baseline approach)

7-déc.-16www.big-data-europe.eu

Common Workflow

7-déc.-16www.big-data-europe.eu

GeoTriples

Strabon

User

Interface

Common workflow: GeoTriples

Converts geospatial data into RDF.

Input Scenario I:

o Areas of change from Change Detector

Input Scenario II:

o Event summaries from Event Detector

Output:

o RDF statements

7-déc.-16www.big-data-europe.eu

Common workflow: Strabon

Scalable & efficient spatiotemporal RDF store.

Input Scenario I:

o Data coming from GeoTriples

Input Scenario II:

o SPARQL queries such as:

Get N latest event summaries from location X.

Get event summaries with keyword Y.

Output:

o Answers to the received queries.7-déc.-16www.big-data-europe.eu

Common Workflow: SemaGrow

Federates Cassandra and Strabon.

Input:

o Queries from GUI about events or locations with changes.

Output:

o Answers to the received queries.

7-déc.-16www.big-data-europe.eu

Common Workflow: Sextant - A

Web application implementing the GUI.

Input for Change Detection:

o Area selected by user through the interactive map

o Time interval (optional)

o User info

Output:

o Calls Image Aggregator

o Progress messages

7-déc.-16www.big-data-europe.eu

Common Workflow: Sextant - B

Input for Event Detection (at least one of the following):

o Keyword

o Location name or coordinates

o Time

Output:

o Latest relevant event summaries & corresponding news items.

7-déc.-16www.big-data-europe.eu

Common Workflow: Sextant - C

Cybersecurity

o User registration

Pilot credentials (encrypted)

SciHub credentials (encrypted)

Type of user (classified, unclassified)

Requires administration approval

o Authorization

7-déc.-16www.big-data-europe.eu

Common Workflow: Sextant - D

Twitter keyword search

o Retrieves tweets on the fly

o Input:

Hashtag (e.g., #bdeSC7)

Mention (e.g., @bigDataEurope)

Keyword(s)

o Output:

Latest posts from Twitter Public Stream7-déc.-16www.big-data-europe.eu

Thank you!

Questions?

Links

Strabon: http://strabon.di.uoa.gr

GeoTriples: https://github.com/LinkedEOData/GeoTriples

Sextant: http://sextant.di.uoa.gr

7-déc.-16www.big-data-europe.eu