data standards workflow

42
Data Standards Workflow Raw data Scripts Database Store raw data in subversion to keep track of history Stored files (netcdf) accessible through the web Extract Transform Load Charts & Maps Tools and websites Provide Add meta information Script to convert raw data into netcdf OpenEarth RawData OpenEarth OPeNDAP OpenEarth Tools

Upload: marlis

Post on 12-Feb-2016

35 views

Category:

Documents


0 download

DESCRIPTION

Data Standards Workflow. Extract. Load. Provide. Transform. Raw data. Scripts. Database. Charts & Maps. Store raw data in subversion to keep track of history. Add meta information Script to convert raw data into netcdf. Stored files (netcdf) accessible through the web. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Data Standards Workflow

Data Standards Workflow

Raw data Scripts Database

Store raw data in subversion to

keep track of history

Stored files (netcdf)

accessible through the web

Extract Transform Load

Charts & Maps

Tools and websites

Provide

Add meta information

Script to convert raw data into

netcdf

OpenEarthRawData

OpenEarth

OPeNDAP

OpenEarthTools

Page 2: Data Standards Workflow

Data Standards Workflow

Raw data Scripts Database

Store raw data in subversion to

keep track of history

Stored files (netcdf)

accessible through the web

Extract Transform Load

Charts & Maps

Tools and websites

Provide

Add meta information

Script to convert raw data into

netcdf

OpenEarthRawData

OpenEarth

OPeNDAP

OpenEarthTools

Page 3: Data Standards Workflow

Transform

• Add metadata• Store in netcdf• Save script in subversion

Page 4: Data Standards Workflow

Add metadata• Use the inspire meta data form to store

information about the dataset.• http://www.inspire-geoportal.eu/inspireEditor.htm• Click launch editor

Transform

Page 5: Data Standards Workflow

Turn validation on

Transform – add metadata

validation

Page 6: Data Standards Workflow

Location in subversion

micore

File identificationTransform – add metadata

Page 7: Data Standards Workflow

History of your data.

Transform – add metadata

quality

Page 8: Data Standards Workflow

Please fill in limitations of use.

Transform – add metadata

constraints

Page 9: Data Standards Workflow

Store in course/Pcnumber/inspire_description.xml

Transform – add metadata

Save metadata file1. Save metadata file (local)2. Add to subversion (local)3. Commit => metadata into subversion (remote)

Page 10: Data Standards Workflow

Transform

• Add metadata• Store in netcdf• Save script in subversion

Page 11: Data Standards Workflow

Store in netcdf

• What’s netcdf?• Write a script to transform data into netcdf• Using CF convention

Transform

Page 12: Data Standards Workflow

What is netcdf

• Data format defined by unidata• Data store used for coverage data and

multidimensional data• CF Metadata convention

Transform – store in netcdf - netcdf

Page 13: Data Standards Workflow

What is netcdf

XX

ZZ

TT

YY

• An array based data structure for storing multidimensional data

• N-dimensional coordinates systems• X coordinate (e.g. longitude)• Y coordinate (e.g. latitude)• Z coordinate (e.g. altitude)• Time dimension• … other dimensions

• Variables – support for multiple variables• Temperature, humidity, pressure, salinity, etc

• Geometry – implicit or explicit• Regular grid (implicit)• Irregular grid• Points

TransformTransform – store in netcdf - netcdf

Page 14: Data Standards Workflow

Storing Multidimensional Data

X Y Z Q

1 1 1 0.5

1 1 2 0.3

1 2 1 0.6

1 2 2 0.1

2 1 1 0.4

2 1 2 0.2

2 2 1 0.9

2 2 2 0.3

0.5 0.4

0.6 0.9

0.3 0.2

0.1 0.3

1 2

1

2

1

2

X Y Z

32 numbers14 numbers

Transform – store in netcdf - netcdf

Page 15: Data Standards Workflow

Data Model

Data model for netcdf and others.

Also usable for hdf, opendap, grib, etc. See the java library for details

Transform – store in netcdf - netcdf

Page 16: Data Standards Workflow

ArcGis

ArcGis also reads and writes netcdf files.

Transform – store in netcdf – netcdf - applications

Page 17: Data Standards Workflow

Your favorite text editorxml representation of a netcdf file

Transform – store in netcdf - netcdf

Page 18: Data Standards Workflow

Other Tools

NCO#diffncdiff -v time file1.nc file2.nc#compression & packingncpdq -4 -L 9 in.nc out.nc # Deflated packing (~80% lossy compression)#selecting variables by regexncks -v '^Q..' in.nc # Q01--Q99, QAA--QZZ, etc.

IDVVery useful

Web hyperslabs, cool!

Not so stable.

Transform – store in netcdf - netcdf

Page 19: Data Standards Workflow

Data Standards Workflow

Raw data Scripts Database

Store raw data in subversion to

keep track of history

Stored files (netcdf)

accessible through the web

Extract Transform Load

Charts & Maps

Tools and websites

Provide

Add meta information

Script to convert raw data into

netcdf

OpenEarthRawData

OpenEarth

OPeNDAP

OpenEarthTools

Page 20: Data Standards Workflow

Store in netcdf

• What’s netcdf?• Write a script to transform data into netcdf• Using CF convention

Transform – store in netcdf - script

Page 21: Data Standards Workflow

Write script

• Read raw data• Read header line• Read data• Read all data• Create function to read all data• Use function in Matlab

• Raw data into empty netcdf file• Create empty netcdf file• Add dimensions and variables• Store variables

• Read values

Transform – store in netcdf - script

Page 22: Data Standards Workflow

Reading raw data into memory

• Use one of the following matlab functions to read the file data into an array• fscanf

Transform – store in netcdf - script

Page 23: Data Standards Workflow

Example: Transect.txt file

1999 58 -135 3531 -130 3541 -125 3631 -120 4171 -115 6221 -110 8231 -105 9841 -100 10971 -95 12171 -90 12951… 200 -2415 210 -2995 220 -3595 99999999999 99999999999 2000 58 -135 3531 -130 3541 -125 3631 -120 4171 -115 6221 -110 8231 -105 9841 -100 10971 -95 12171 -90 12951

Header lineYear

number of points

PointsX Z X Z …. 9999999

Location: OpenEarthRawData\course\example\raw

Transform – store in netcdf - script

Page 24: Data Standards Workflow

Read header line

>> fid = fopen('..\raw\transect.txt')fid = 15

>> header = fscanf(fid, '%d', 2)header = 2000 58

>> year = header(1)year = 2000

>> npoint = header(2)npoint = 58

Transform – store in netcdf - script

Page 25: Data Standards Workflow

% read header header = fscanf(fid, '%d', 2); year = header(1); % store year in time time(i) = year; npoint = header(2); % read data data = fscanf(fid, '%d', npoint*2); data = reshape(data, [2, npoint]); % use column vectors data = data';

Read data>> % read datadata = fscanf(fid, '%d', npoint*2)

data = -150 3741 -140 3581 -135

>> data = reshape(data, [2, npoint])

data = Columns 1 through 7

-150 -140 -135 -130 3741 3581 3531 3541

1

2

>> % use column vectorsdata = data'

data = -150 3741 -140 3581 -135 3531

3

Transform – store in netcdf - script

Page 26: Data Standards Workflow

Read all data% preallocate all data % (time, coastward)transectseries = NaN(3, 58);coastward_distance = NaN(58, 1);time = NaN(3, 1);% open file and get file idfid = fopen('..\raw\transect.txt');i = 1;while (~feof(fid)) % read header header = fscanf(fid, '%d', 2); year = header(1); % store year in time time(i) = year; npoint = header(2); % read data data = fscanf(fid, '%d', npoint*2); data = reshape(data, [2, npoint]); % use column vectors data = data' % store data in transect series transectseries(i,:) = data(:,2); coastward_distance(:) = data(:,1); fgetl(fid); i = i + 1;end

Transform – store in netcdf - script

Page 27: Data Standards Workflow

Create a functionfunction transect = readtransect(filename)% preallocate all data % (time, coastward)transectseries = NaN(3, 58);coastward_distance = NaN(58, 1);time = NaN(3, 1);% open file and get file idfid = fopen(filename);i = 1;while (~feof(fid)) % read header header = fscanf(fid, '%d', 2); year = header(1); % store year in time time(i) = year; npoint = header(2); % read data data = fscanf(fid, '%d', npoint*2); data = reshape(data, [2, npoint]); % use column vectors data = data'; % store data in transect series transectseries(i,:) = data(:,2); coastward_distance(:) = data(:,1); fgetl(fid); i = i + 1;endtransect = struct('series', transectseries, … 'distance', coastward_distance, 'time', time);end

Transform – store in netcdf - script

Page 28: Data Standards Workflow

Use the new function

>> data = readtransect('..\raw\transect.txt')

data =

series: [3x58 double] distance: [58x1 double] time: [3x1 double]

Transform – store in netcdf - script

Page 29: Data Standards Workflow

Loading data into netcdf

• What does a netcdf file look like• Required meta information

Transform – store in netcdf - script

Page 30: Data Standards Workflow

Netcdf filetransect.ncnetcdf transect {dimensions: coastward = 58 ; time = 3 ;variables: float coastward_distance(coastward) ; coastward_distance:unit = "metre" ; float year(time) ; year:unit = "year" ; float height(time, coastward) ; height:unit = "metre" ;data:

coastward_distance = -135, -130,…, 150, 160, 170, 180, 190, 200, 210, 220 ; year = 1999, 2000, 2001 ; height = 353, 354, … -142, -146, -170, -206, -232, -273, -309, -346, -375, -388, … -32, … -92, -110, -127, -143, -156, -177, -211, -259, -303, -334 ;}

Transform – store in netcdf - script

Page 31: Data Standards Workflow

Create an empty netcdf file

>> nc_create_empty(outputfile)>> nc_dump(outputfile)netcdf transect.nc {

dimensions:

variables:

}

Transform – store in netcdf - script

Page 32: Data Standards Workflow

Add dimensions

nc_add_dimension(outputfile, 'crossshore', 58)nc_add_dimension(outputfile, 'time', 3)nc_dump(outputfile)>>netcdf transect.nc {

dimensions:coastward = 58 ;time = 3 ;

variables:}

help nc_add_dimension

Transform – store in netcdf - script

Page 33: Data Standards Workflow

Add variablescrossshoreVariable = struct(... 'Name', 'crossshore_distance', ... 'Nctype', 'float', ... 'Dimension', {{‘crossshore'}}, ... 'Attribute', struct('Name', 'unit', 'Value', 'metre') ... );nc_addvar(outputfile, crossshoreVariable);timeVariable = struct(... 'Name', 'year', ... 'Nctype', 'float', ... 'Dimension', {{'time'}}, ... 'Attribute', struct('Name', 'unit', 'Value', 'year') ... );nc_addvar(outputfile, timeVariable);heightVariable = struct(... 'Name', 'height', ... 'Nctype', 'float', ... 'Dimension', {{'time', ‘crossshore'}}, ... 'Attribute', struct('Name', 'unit', 'Value', 'metre') ... );nc_addvar(outputfile, heightVariable);nc_dump(outputfile)

help nc_addvar

Transform – store in netcdf - script

Page 34: Data Standards Workflow

Result

netcdf transect.nc {

dimensions:coastward = 58 ;time = 3 ;

variables:float coastward_distance(coastward), shape = [58]

coastward_distance:unit = "metre" float year(time), shape = [3]

year:unit = "year" float height(time,coastward), shape = [3 58]

height:unit = "metre"

}

Transform – store in netcdf - script

Page 35: Data Standards Workflow

Store variables

nc_varput(outputfile, 'height', data.series)nc_varput(outputfile, 'year', data.time)nc_varput(outputfile, 'coastward_distance', data.distance)

help nc_varput

Transform – store in netcdf - script

Page 36: Data Standards Workflow

Result: Netcdf filetransect.ncnetcdf transect {dimensions: coastward = 58 ; time = 3 ;variables: float coastward_distance(coastward) ; coastward_distance:unit = "metre" ; float year(time) ; year:unit = "year" ; float height(time, coastward) ; height:unit = "metre" ;data:

coastward_distance = -135, -130,…, 150, 160, 170, 180, 190, 200, 210, 220 ; year = 1999, 2000, 2001 ; height = 353, 354, … -142, -146, -170, -206, -232, -273, -309, -346, -375, -388, … -32, … -92, -110, -127, -143, -156, -177, -211, -259, -303, -334 ;}

Transform – store in netcdf - script

Page 37: Data Standards Workflow

Read values

surface(nc_varget(outputfile, 'height')')

11.5

22.5

3

020

4060

-5000

0

5000

10000

15000

Transform – store in netcdf - script

Page 38: Data Standards Workflow

Store in netcdf

• What’s netcdf?• Write a script to transform data into netcdf• Using CF convention

Transform – store in netcdf - convention

Page 39: Data Standards Workflow

CF convention

Standard used by USGS, NOAA, Arcgis, GDAL

Climate and Forecast (CF) Conventionhttp://www.unidata.ucar.edu/software/netcdf/docs/conventions.html

Initially developed for• Climate and forecast data• Atmosphere, surface and ocean model-generated data• Also used for observational datasets• CF is the most widely used convention for geospatial netCDF

data.

Transform – store in netcdf - convention

Page 40: Data Standards Workflow

Improve output

• Store extra attributes• Title• Author• Standard_name

Transform – store in netcdf - convention

Page 41: Data Standards Workflow

Transform

• Add metadata• Store in netcdf• Save script in subversion

Page 42: Data Standards Workflow

Transform – save script

Save script1. Save script (local, using matlab

https://repos.deltares.nl/repos/OpenEarthRawData/course/PCnr/scipts/)2. Add to subversion (local)3. Commit => script into subversion (remote)