cera / wdcc

27
CERA / WDCC Hannes Thiemann Max-Planck-Institut für Meteorologie Modelle und Daten hannes.thiemann @ zmaw.de NCAR, October 27th – 29th, 2008

Upload: nodin

Post on 24-Jan-2016

35 views

Category:

Documents


0 download

DESCRIPTION

CERA / WDCC. Hannes Thiemann Max-Planck-Institut für Meteorologie Modelle und Daten hannes.thiemann @ zmaw.de NCAR, October 27th – 29th, 2008. Statistics Requirements + Features General architecture Implementation (current and new) Migration Summary. Contents. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CERA / WDCC

CERA / WDCC

Hannes ThiemannMax-Planck-Institut für Meteorologie

Modelle und Datenhannes.thiemann @ zmaw.de

NCAR, October 27th – 29th, 2008

Page 2: CERA / WDCC

Contents

Statistics Requirements + Features General architecture Implementation (current and new) Migration Summary

Page 3: CERA / WDCC

Basic Statistics

WDCC / CERA: General Statistics at 01-10-2008 00:00:10 Database Size (TByte): 370

Number of blobs: 6663287791 (6.6 billion) Data access by fields and not by files.

Number of experiments: 1146

Number of datasets: 142062

Total size divided by number of BLOBs gives the average size of data access granules: 50 - 60 kB/BLOB

Page 4: CERA / WDCC

Users by continent

12%

25%

27%

4%

13%

19% Campus

Germany

Europe

AF+OC+SA

North America

Asia

Active Users 1-Jan-2008 until 14-Oct-2008

Page 5: CERA / WDCC

Download destinations

Download destinations 1-Jan-2008 until 14-Oct-2008

3% 12% 6%

0%

14%65%

Campus

Germany

Europe

OC, AF, SA

North America

Asia

Page 6: CERA / WDCC

Records per download

66 6772

85 87 9098

010

20304050

607080

90100

1 12 120 240 600 1200 12000

Records

Per

cen

t

Page 7: CERA / WDCC

Recordsize

1

10

100

1000

10000

100000

1000000

10000000

100000000

1000000000

10000000000

1 8 29 32 35 84 89 92 96 99

Percent

Byt

es

Page 8: CERA / WDCC

Requirements and constraints

Access over WAN Downloads typically quite small, but huge

downloads to some extent. Small downloads imply that users are not willing to

wait long … We can not scan through large files for each

download Granularity has to be small

Page 9: CERA / WDCC

Datatypes

Model data Climatological runs (global and regional) (IPCC, …) Weather forecasts (DPHASE, CEOP, …)

Reanalysis data Observational data (COPS, CARIBIC, …) Satellite data products

Page 10: CERA / WDCC

Formats

CERA provides the ability to store data of any format:

These are the formats used GRIB (60%) NetCDF (18%) Other (22%)

Page 11: CERA / WDCC

General Architecture

Midtier

Data

Page 12: CERA / WDCC

General Architecture

Metadata Data

ProxyWebserver

Appl. Server

Entry

Reference

Status

Distribution

Contact Coverage

Parameter

SpatialReferenceLocal Adm.

Data AccessData Org

Select timestep + regionConvert format

Page 13: CERA / WDCC

Storage within CERA

1 Data of timestep i

2 Data of timestep i+1

3 Data of timestep i+2

n Data of timestep i+n

Database TableD

ata

of

sing

le

varia

ble

Index

Page 14: CERA / WDCC

Handicap

Handicap: not enough disk space available Data stored within database: approx. 400 TB Disks available: approx. 24 TB

Database has been coupled transparently to the HSM system

How do we avoid frequent tape accesses? Big cache Store data as close as possible according to the

needs of users: split into single variables

Page 15: CERA / WDCC

TBS - RW

TblPartition 1

TBS - RW

TblPartition 2

dxdb

TBS - RO

TblPartition 1

All tablespaces are moved

“at once” to dxdb

MigoutMigin

Data migration

Page 16: CERA / WDCC

Inside the datafile

Primary Key

Lob Index

Table

Blob data

Header 128k

Page 17: CERA / WDCC

Frontend versus Backend

Header 128k

Filesystem Frontend HSM Backend

Header 128k

Part 1 = 512 MB

Part 2 = 512 MB

Page 18: CERA / WDCC

Retrieving data

4

Header 128k

3 1

2 5

Tape Request

Page 19: CERA / WDCC

Warehouse features

Compression – nothing special used within the server

Partitioning – allow parts of data to be moved to HSM

Backup Nologging - beware of crash … Read only - two copies on tape

Page 20: CERA / WDCC

New implementation

Metadata database will stay as is

Oracle Databases holding data will be replaced by a new, self-made development

Why? There is a certain risk that a future version of Oracle

may not work with a / any HSM system On the long run some license costs shall be saved

Page 21: CERA / WDCC

General Architecture - new

Metadata Data

Webserver

Appl. Server

Oracle-DB Blobserver

Page 22: CERA / WDCC

CERA-Container

Instead of keeping data within blobs in Oracle databases, data records will be kept within so called CERA Container Files.

Ability to keep huge number of records. They provide fast access independent of position

within file (granular access). Provided fault tolerance against tape damages by

keeping checksums within the files. Enclose read/write operations against container files

in transactions. Well known format

Page 23: CERA / WDCC

Migration

Concept / Team (namely Peter Drakenberg, DKRZ) Not yet really finished

Software First software ready, in order to migrate data

Convert old data Started last week, but will take at least a year

Page 24: CERA / WDCC

Dataflow: outbound

1

2

Webserver

Appl. Server

34

Metadata Data

5

6

7

8

Processing

Page 25: CERA / WDCC

Dataflow: inbound

Metadata Dataserver

Postprocessing

Model run

GFS

Page 26: CERA / WDCC

Summary

CERA allows for the storage of data of different kind Format independent Metadata enables addressing of internal and

external data Users are typically fetching only small amounts of

data. System allows for efficient access to small data

granules By using warehousing functions like Partitioning by using small Oracle database Blobs or - in future

- CERA Container files.

Page 27: CERA / WDCC

Thank you !