cera / wdcc hannes thiemann max-planck-institut für meteorologie modelle und daten hannes.thiemann...

27
CERA / WDCC Hannes Thiemann Max-Planck-Institut für Meteorologie Modelle und Daten hannes.thiemann @ zmaw.de NCAR, October 27th – 29th, 2008

Post on 19-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CERA / WDCC Hannes Thiemann Max-Planck-Institut für Meteorologie Modelle und Daten hannes.thiemann @ zmaw.de NCAR, October 27th – 29th, 2008

CERA / WDCC

Hannes ThiemannMax-Planck-Institut für Meteorologie

Modelle und Datenhannes.thiemann @ zmaw.de

NCAR, October 27th – 29th, 2008

Page 2: CERA / WDCC Hannes Thiemann Max-Planck-Institut für Meteorologie Modelle und Daten hannes.thiemann @ zmaw.de NCAR, October 27th – 29th, 2008

Contents

Statistics Requirements + Features General architecture Implementation (current and new) Migration Summary

Page 3: CERA / WDCC Hannes Thiemann Max-Planck-Institut für Meteorologie Modelle und Daten hannes.thiemann @ zmaw.de NCAR, October 27th – 29th, 2008

Basic Statistics

WDCC / CERA: General Statistics at 01-10-2008 00:00:10 Database Size (TByte): 370

Number of blobs: 6663287791 (6.6 billion) Data access by fields and not by files.

Number of experiments: 1146

Number of datasets: 142062

Total size divided by number of BLOBs gives the average size of data access granules: 50 - 60 kB/BLOB

Page 4: CERA / WDCC Hannes Thiemann Max-Planck-Institut für Meteorologie Modelle und Daten hannes.thiemann @ zmaw.de NCAR, October 27th – 29th, 2008

Users by continent

12%

25%

27%

4%

13%

19% Campus

Germany

Europe

AF+OC+SA

North America

Asia

Active Users 1-Jan-2008 until 14-Oct-2008

Page 5: CERA / WDCC Hannes Thiemann Max-Planck-Institut für Meteorologie Modelle und Daten hannes.thiemann @ zmaw.de NCAR, October 27th – 29th, 2008

Download destinations

Download destinations 1-Jan-2008 until 14-Oct-2008

3% 12% 6%

0%

14%65%

Campus

Germany

Europe

OC, AF, SA

North America

Asia

Page 6: CERA / WDCC Hannes Thiemann Max-Planck-Institut für Meteorologie Modelle und Daten hannes.thiemann @ zmaw.de NCAR, October 27th – 29th, 2008

Records per download

66 6772

85 87 9098

010

20304050

607080

90100

1 12 120 240 600 1200 12000

Records

Per

cen

t

Page 7: CERA / WDCC Hannes Thiemann Max-Planck-Institut für Meteorologie Modelle und Daten hannes.thiemann @ zmaw.de NCAR, October 27th – 29th, 2008

Recordsize

1

10

100

1000

10000

100000

1000000

10000000

100000000

1000000000

10000000000

1 8 29 32 35 84 89 92 96 99

Percent

Byt

es

Page 8: CERA / WDCC Hannes Thiemann Max-Planck-Institut für Meteorologie Modelle und Daten hannes.thiemann @ zmaw.de NCAR, October 27th – 29th, 2008

Requirements and constraints

Access over WAN Downloads typically quite small, but huge

downloads to some extent. Small downloads imply that users are not willing to

wait long … We can not scan through large files for each

download Granularity has to be small

Page 9: CERA / WDCC Hannes Thiemann Max-Planck-Institut für Meteorologie Modelle und Daten hannes.thiemann @ zmaw.de NCAR, October 27th – 29th, 2008

Datatypes

Model data Climatological runs (global and regional) (IPCC, …) Weather forecasts (DPHASE, CEOP, …)

Reanalysis data Observational data (COPS, CARIBIC, …) Satellite data products

Page 10: CERA / WDCC Hannes Thiemann Max-Planck-Institut für Meteorologie Modelle und Daten hannes.thiemann @ zmaw.de NCAR, October 27th – 29th, 2008

Formats

CERA provides the ability to store data of any format:

These are the formats used GRIB (60%) NetCDF (18%) Other (22%)

Page 11: CERA / WDCC Hannes Thiemann Max-Planck-Institut für Meteorologie Modelle und Daten hannes.thiemann @ zmaw.de NCAR, October 27th – 29th, 2008

General Architecture

Midtier

Data

Page 12: CERA / WDCC Hannes Thiemann Max-Planck-Institut für Meteorologie Modelle und Daten hannes.thiemann @ zmaw.de NCAR, October 27th – 29th, 2008

General Architecture

Metadata Data

ProxyWebserver

Appl. Server

Entry

Reference

Status

Distribution

Contact Coverage

Parameter

SpatialReferenceLocal Adm.

Data AccessData Org

Select timestep + regionConvert format

Page 13: CERA / WDCC Hannes Thiemann Max-Planck-Institut für Meteorologie Modelle und Daten hannes.thiemann @ zmaw.de NCAR, October 27th – 29th, 2008

Storage within CERA

1 Data of timestep i

2 Data of timestep i+1

3 Data of timestep i+2

n Data of timestep i+n

Database TableD

ata

of

sing

le

varia

ble

Index

Page 14: CERA / WDCC Hannes Thiemann Max-Planck-Institut für Meteorologie Modelle und Daten hannes.thiemann @ zmaw.de NCAR, October 27th – 29th, 2008

Handicap

Handicap: not enough disk space available Data stored within database: approx. 400 TB Disks available: approx. 24 TB

Database has been coupled transparently to the HSM system

How do we avoid frequent tape accesses? Big cache Store data as close as possible according to the

needs of users: split into single variables

Page 15: CERA / WDCC Hannes Thiemann Max-Planck-Institut für Meteorologie Modelle und Daten hannes.thiemann @ zmaw.de NCAR, October 27th – 29th, 2008

TBS - RW

TblPartition 1

TBS - RW

TblPartition 2

dxdb

TBS - RO

TblPartition 1

All tablespaces are moved

“at once” to dxdb

MigoutMigin

Data migration

Page 16: CERA / WDCC Hannes Thiemann Max-Planck-Institut für Meteorologie Modelle und Daten hannes.thiemann @ zmaw.de NCAR, October 27th – 29th, 2008

Inside the datafile

Primary Key

Lob Index

Table

Blob data

Header 128k

Page 17: CERA / WDCC Hannes Thiemann Max-Planck-Institut für Meteorologie Modelle und Daten hannes.thiemann @ zmaw.de NCAR, October 27th – 29th, 2008

Frontend versus Backend

Header 128k

Filesystem Frontend HSM Backend

Header 128k

Part 1 = 512 MB

Part 2 = 512 MB

Page 18: CERA / WDCC Hannes Thiemann Max-Planck-Institut für Meteorologie Modelle und Daten hannes.thiemann @ zmaw.de NCAR, October 27th – 29th, 2008

Retrieving data

4

Header 128k

3 1

2 5

Tape Request

Page 19: CERA / WDCC Hannes Thiemann Max-Planck-Institut für Meteorologie Modelle und Daten hannes.thiemann @ zmaw.de NCAR, October 27th – 29th, 2008

Warehouse features

Compression – nothing special used within the server

Partitioning – allow parts of data to be moved to HSM

Backup Nologging - beware of crash … Read only - two copies on tape

Page 20: CERA / WDCC Hannes Thiemann Max-Planck-Institut für Meteorologie Modelle und Daten hannes.thiemann @ zmaw.de NCAR, October 27th – 29th, 2008

New implementation

Metadata database will stay as is

Oracle Databases holding data will be replaced by a new, self-made development

Why? There is a certain risk that a future version of Oracle

may not work with a / any HSM system On the long run some license costs shall be saved

Page 21: CERA / WDCC Hannes Thiemann Max-Planck-Institut für Meteorologie Modelle und Daten hannes.thiemann @ zmaw.de NCAR, October 27th – 29th, 2008

General Architecture - new

Metadata Data

Webserver

Appl. Server

Oracle-DB Blobserver

Page 22: CERA / WDCC Hannes Thiemann Max-Planck-Institut für Meteorologie Modelle und Daten hannes.thiemann @ zmaw.de NCAR, October 27th – 29th, 2008

CERA-Container

Instead of keeping data within blobs in Oracle databases, data records will be kept within so called CERA Container Files.

Ability to keep huge number of records. They provide fast access independent of position

within file (granular access). Provided fault tolerance against tape damages by

keeping checksums within the files. Enclose read/write operations against container files

in transactions. Well known format

Page 23: CERA / WDCC Hannes Thiemann Max-Planck-Institut für Meteorologie Modelle und Daten hannes.thiemann @ zmaw.de NCAR, October 27th – 29th, 2008

Migration

Concept / Team (namely Peter Drakenberg, DKRZ) Not yet really finished

Software First software ready, in order to migrate data

Convert old data Started last week, but will take at least a year

Page 24: CERA / WDCC Hannes Thiemann Max-Planck-Institut für Meteorologie Modelle und Daten hannes.thiemann @ zmaw.de NCAR, October 27th – 29th, 2008

Dataflow: outbound

1

2

Webserver

Appl. Server

34

Metadata Data

5

6

7

8

Processing

Page 25: CERA / WDCC Hannes Thiemann Max-Planck-Institut für Meteorologie Modelle und Daten hannes.thiemann @ zmaw.de NCAR, October 27th – 29th, 2008

Dataflow: inbound

Metadata Dataserver

Postprocessing

Model run

GFS

Page 26: CERA / WDCC Hannes Thiemann Max-Planck-Institut für Meteorologie Modelle und Daten hannes.thiemann @ zmaw.de NCAR, October 27th – 29th, 2008

Summary

CERA allows for the storage of data of different kind Format independent Metadata enables addressing of internal and

external data Users are typically fetching only small amounts of

data. System allows for efficient access to small data

granules By using warehousing functions like Partitioning by using small Oracle database Blobs or - in future

- CERA Container files.

Page 27: CERA / WDCC Hannes Thiemann Max-Planck-Institut für Meteorologie Modelle und Daten hannes.thiemann @ zmaw.de NCAR, October 27th – 29th, 2008

Thank you !