cera / wdcc

CERA / WDCC

Hannes ThiemannMax-Planck-Institut für Meteorologie

Modelle und Datenhannes.thiemann @ zmaw.de

NCAR, October 27th – 29th, 2008

Contents

Statistics Requirements + Features General architecture Implementation (current and new) Migration Summary

Basic Statistics

WDCC / CERA: General Statistics at 01-10-2008 00:00:10 Database Size (TByte): 370

Number of blobs: 6663287791 (6.6 billion) Data access by fields and not by files.

Number of experiments: 1146

Number of datasets: 142062

Total size divided by number of BLOBs gives the average size of data access granules: 50 - 60 kB/BLOB

Users by continent

19% Campus

Germany

Europe

AF+OC+SA

North America

Active Users 1-Jan-2008 until 14-Oct-2008

Download destinations

Download destinations 1-Jan-2008 until 14-Oct-2008

3% 12% 6%

14%65%

Campus

Germany

Europe

OC, AF, SA

North America

Records per download

66 6772

85 87 9098

20304050

607080

1 12 120 240 600 1200 12000

Records

Recordsize

100000

1000000

10000000

100000000

1000000000

10000000000

1 8 29 32 35 84 89 92 96 99

Percent

Requirements and constraints

Access over WAN Downloads typically quite small, but huge

downloads to some extent. Small downloads imply that users are not willing to

wait long … We can not scan through large files for each

download Granularity has to be small

Datatypes

Model data Climatological runs (global and regional) (IPCC, …) Weather forecasts (DPHASE, CEOP, …)

Reanalysis data Observational data (COPS, CARIBIC, …) Satellite data products

Formats

CERA provides the ability to store data of any format:

These are the formats used GRIB (60%) NetCDF (18%) Other (22%)

General Architecture

Midtier

General Architecture

Metadata Data

ProxyWebserver

Appl. Server

Reference

Status

Distribution

Contact Coverage

Parameter

SpatialReferenceLocal Adm.

Data AccessData Org

Select timestep + regionConvert format

Storage within CERA

1 Data of timestep i

2 Data of timestep i+1

3 Data of timestep i+2

n Data of timestep i+n

Database TableD

Handicap

Handicap: not enough disk space available Data stored within database: approx. 400 TB Disks available: approx. 24 TB

Database has been coupled transparently to the HSM system

How do we avoid frequent tape accesses? Big cache Store data as close as possible according to the

needs of users: split into single variables

TBS - RW

TblPartition 1

TBS - RW

TblPartition 2

TBS - RO

TblPartition 1

All tablespaces are moved

“at once” to dxdb

MigoutMigin

Data migration

Inside the datafile

Primary Key

Lob Index

Blob data

Header 128k

Frontend versus Backend

Header 128k

Filesystem Frontend HSM Backend

Header 128k

Part 1 = 512 MB

Part 2 = 512 MB

Retrieving data

Header 128k

Tape Request

Warehouse features

Compression – nothing special used within the server

Partitioning – allow parts of data to be moved to HSM

Backup Nologging - beware of crash … Read only - two copies on tape

New implementation

Metadata database will stay as is

Oracle Databases holding data will be replaced by a new, self-made development

Why? There is a certain risk that a future version of Oracle

may not work with a / any HSM system On the long run some license costs shall be saved

General Architecture - new

Metadata Data

Webserver

Appl. Server

Oracle-DB Blobserver

CERA-Container

Instead of keeping data within blobs in Oracle databases, data records will be kept within so called CERA Container Files.

Ability to keep huge number of records. They provide fast access independent of position

within file (granular access). Provided fault tolerance against tape damages by

keeping checksums within the files. Enclose read/write operations against container files

in transactions. Well known format

Migration

Concept / Team (namely Peter Drakenberg, DKRZ) Not yet really finished

Software First software ready, in order to migrate data

Convert old data Started last week, but will take at least a year

Dataflow: outbound

Webserver

Appl. Server

Metadata Data

Processing

Dataflow: inbound

Metadata Dataserver

Postprocessing

Model run

Summary

CERA allows for the storage of data of different kind Format independent Metadata enables addressing of internal and

external data Users are typically fetching only small amounts of

data. System allows for efficient access to small data

granules By using warehousing functions like Partitioning by using small Oracle database Blobs or - in future

- CERA Container files.

Thank you !

cera / wdcc

parts of data

number of blobs

big cache store data

disk space available

number of datasets

number of experiments

database size tbyte

blobs mit gre requirements

Documents

paraguay - cera

h. thiemann / 07.06.2005 / 1 world data centre for climate -...

cera news v8 02 - citizens equal rights...

lena henke - pedro cera galeria pedro cera - galeria pedro

unit 3 cera and reﬂections 13 -...

cera blanca

cera cera cera { drawing 1 }

cera sanitaryware

cera sanitaryware ltd (nse code cera) - katalyst wealth...

2012 facts and first wdcc

cera estees7

museo cera (londres)

cera llp duncan firm, p.a. solomon b. cera (bar no. 099467)

data sets ( see cera-dkrz.de/wdcc/ ui...

cera tap catalogue

akademiuri cera

pedro cera galeria pedro cera - galeria pedro cera, lisbon...

katalog detaliczny 2017 · cera tłusta, skłonna do...

f. toussaint (wdcc, hamburg) / 11.11.03 / 1 cera : data...

cera d’api cera delicata cera all’acqua per legno...