dsd-int 2015 - datacubes understanding big eo data better - peter baumann

Post on 06-Apr-2017

285 Views

Category:

Software

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Datacubes :: Delft SW Days :: ©2015 rasdaman

Delft Software Days, Deltares, Delft, 2015-oct-26

Peter Baumann

Jacobs University | rasdaman GmbH

baumann@rasdaman.com

Datacubes:

Exploiting Big Earth Data Better

[co-funded by EU through EarthServer 1/2, PublicaMundi]

[gamingfeeds.com]

Datacubes :: Delft SW Days :: ©2015 rasdaman

Datacubes :: Delft SW Days :: ©2015 rasdaman

Datacube Research

@ Jacobs U

Large-Scale Scientific Information Systems

research group

• Flexible, scalable n-D array services

• www.jacobs-university.de/lsis

Main results:

• pioneer Array DBMS, rasdaman

• standardization:

• OGC Big Geo Data (also ISO, INSPIRE, W3C)

• ISO „Science SQL“

Hiring PhD students, PostDocs

Datacubes :: Delft SW Days :: ©2015 rasdaman

Datacubes :: Delft SW Days :: ©2015 rasdaman

sensor feeds

Data Homogenization: Making Life Simpler

5

coverage

server

sensor, image [timeseries], simulation, statistics data

Datacubes :: Delft SW Days :: ©2015 rasdaman

OGC Web Coverage Service (WCS)

OGC/ISO Coverages: regular & irregular grids, point clouds, meshes

- OGC Coverage

Implementation Schema

Large, growing

implementation basis:

rasdaman, GDAL, QGIS,

OpenLayers, OPeNDAP,

MapServer, GeoServer,

NASA WorldWind, EOx-

Server; Pyxis, ERDAS,

ArcGIS, ...

WCS Core: access to spatio-temporal coverages & subsets

- subset = trim | slice

WCS Extensions: optional functionality facets

- Scaling, CRS transformation, …, Analytics (WCPS)

Datacubes :: Delft SW Days :: ©2015 rasdaman

Array Analytics

Array Analytics :=

Efficient analysis on multi-dimensional arrays of a size several orders of

magnitude above evaluation engine„s main memory

Essential data property: n-dimensional Euclidean neighborhood

- Secondary: #dimensions, density, ...

Operations: Linear Algebra [M. Stonebraker],

statistics, image/signal processing

Datacubes :: Delft SW Days :: ©2015 rasdaman

Datacube Access: A Simple Example

t

Datacubes :: Delft SW Days :: ©2015 rasdaman

A Brief History of Array DBMSs

first appearance in literature (not first implementation)

Datacubes :: Delft SW Days :: ©2015 rasdaman

Agile Array Analytics: rasdaman

„raster data manager“: SQL + n-D arrays

Scalable parallel “tile streaming” architecture

Blueprint for ISO Array SQL standard

[rasdaman visitors]

Datacubes :: Delft SW Days :: ©2015 rasdaman

Array SQL

select id, encode(scene.band1-scene.band2)/(scene.nband1+scene.band2)), „image/tiff“ )

from LandsatScenes

where acquired between „1990-06-01“ and „1990-06-30“ and

avg( scene.band3-scene.band4)/(scene.band3+scene.band4)) > 0

create table LandsatScenes(

id: integer not null, acquired: date,

scene: row( band1: integer, ..., band7: integer ) mdarray [ 0:4999,0:4999] )

Datacubes :: Delft SW Days :: ©2015 rasdaman

Direct Data Visualization

select

encode(

struct {

red: (char) s.b7[x0:x1,x0:x1],

green: (char) s.b5[x0:x1,x0:x1],

blue: (char) s.b0[x0:x1,x0:x1],

alpha: (char) scale( d, 20 )

},

“image/png"

)

from SatImage as s, DEM as d

[JacobsU, Fraunhofer; data courtesy BGS, ESA]

Datacubes :: Delft SW Days :: ©2015 rasdaman

Goal: faster loading by adapting storage units to access patterns

Approach: partition n-D array into n-D partitions („tiles“)

Tiling classification based on degree of alignment [ICDE 1999]

Partitioned Array Storage

chunking [Sarawagi,

Stonebraker, DeWitt, ... ]

Datacubes :: Delft SW Days :: ©2015 rasdaman

Tiling: Tuning Data for Applications

tiling strategies as service tuning [Furtado]:

- regular directional area of interest

rasdaman storage layout language

insert into MyCollection

values ...

tiling area of interest [0:20,0:40], [45:80,80:85]

tile size 1000000

index d_index storage array compression zlib

„chunks“

[Sarawagi,

DeWitt, ...]

Datacubes :: Delft SW Days :: ©2015 rasdaman

Why Irregular Tiling?

e-Science often uses irregular partioning

[OpenStreetMap]

[Centrella et al: scidacreviews.org]

Datacubes :: Delft SW Days :: ©2015 rasdaman

Parallel / Distributed Query Processing

1 query 1,000+ cloud nodes

[SIGMOD DANAC 2014]

Dataset B

Dataset A

Dataset D

Dataset C

select

max((A.nir - A.red) / (A.nir + A.red))

- max((B.nir - B.red) / (B.nir + B.red))

- max((C.nir - C.red) / (C.nir + C.red))

- max((D.nir - D.red) / (D.nir + D.red))

from A, B, C, D

Datacubes :: Delft SW Days :: ©2015 rasdaman

Secured Archive Integration

First-ever direct, ad-hoc mix from protected NASA & ESA services

in OGC WCS/WCPS Web client (EarthServer + CobWeb)

Datacubes :: Delft SW Days :: ©2015 rasdaman

Web clients (m2m, browser)

Scalable Geo Service Architecture

OGC

WMS, WCS,

WCPS, WPS

distributed query

processingNo single point of failure

external

files

Internet

rasserver

databaseFile system

rasdaman

geo services

alternative

storage

Datacubes :: Delft SW Days :: ©2015 rasdaman

Inset: Hadoop not the Answer to All

no builtin knowledge about structured data types

- “Since it was not originally designed to leverage the structure

[…] its performance […] is therefore suboptimal” [Daniel Abadi]

• M. Stonebraker (XLDB 2012): „will hit a scalability wall“

Datacubes :: Delft SW Days :: ©2015 rasdaman

EarthServer: Datacubes At Your Fingertips

INSPIRE WCS :: ©2015 P. Baumann www.earthserver.eu

Datacubes :: Delft SW Days :: ©2015 rasdaman

EarthServer: Datacubes At Your Fingertips

Agile Analytics on x/y/t + x/y/z/t Earth & Planetary datacubes

- EU rasdaman + NASA WorldWind

- Rigorously standards: OGC WMS + WCS + WCPS

- 100s of TB sites now, next: 1+ Petabyte per cube

Intercontinental initiative, 3+3 years:

EU + US + AUS

INSPIRE WCS :: ©2015 P. Baumann

Phase 1 reviewers:

"proven evidence" that rasdaman

will “significantly transform [how to]

access and use data“ …and "with

no doubt has been shaping the Big

Earth Data landscape” …

www.earthserver.eu

Datacubes :: Delft SW Days :: ©2015 rasdaman

EarthServer Phase 1 & 2 Partners

www.earthserver.eu

Datacubes :: Delft SW Days :: ©2015 rasdaman

From data stewardship to service stewardship

Open-standard Coverage Cubes interoperability

- consensus across OGC, ISO, INSPIRE

EarthServer: agile analytics federation

- rasdaman Array DBMS

„A cube tells more than a million images“

Conclusion

[rasdaman/WW screenshots]

top related