med-cordex database med-cordex database = = netcdf files+ their info = file system + relational...
TRANSCRIPT
www.medcordex.eu 1
Med-CORDEX database
Med-CORDEX database =
= netcdf files + their info = File System + relational database = XFS + mysql db= file server + LAMP server
Linux, Apache, Mysql and PHP
file server
www.medcordex.eu 2
NETAPP FAS3240 HA Storage System
dual controller RAID DP technology
(two simultaneus disk failures allowed)
environment:
dual power supply (one coming from UPS) air-conditioned room
3
LAMP server
HP DL575G7 Linux Server
SLES 11SP2 Operating System no users: the machine is devoted to act as a webserver
(not only for Med-CORDEX database)
Apache 2.4.6 PHP 5.5.10 Tomcat 7.0.52
JVM 1.7.0_55 mysql 5.0.96 pure-ftpd 1.0.36
Environment: dual power supply (one coming from UPS) air-conditioned room
www.medcordex.eu
www.medcordex.eu 4
paths & filenames
ATMOSPHERIC DATA
• PATH /MEDCORDEX/<Domain>/<Institution>/<GCMModelName>/<CMIP5ExperimentName>/<CMIP5EnsembleMember>/<RCMModelName>/<RCMVersionID>/<Frequency>/<VariableName>
Our PATH shortcut: /MEDCORDEX/ALL (files are not listable)
• FILENAME VariableName_Domain_GCMModelName_CMIP5ExperimentName_CMIP5EnsembleMember_RCMModelName_RCMVersionID_Frequency[_StartTime-EndTime].nc
According to “CORDEX Archive Design” O. B. Christensen, W.J Gutowski, G.Nikulin, and S. Legutke
http ://cordex.dmi.dk
www.medcordex.eu 5
paths & filenames
OCEAN DATA
Not yet defined a standard (AFAIK)shall we use
http://cmip-pcmdi.llnl.gov/cmip5/output_req.html#req_list ?
www.medcordex.eu 6
paths & filenames
All tokens which form the PATH are derived from FILENAME but the Institution which is the name of the directory where files have been placed by each data providers
e.g. /incoming_MEDCORDEX/ENEA ENEA
In the db we use all tokens and one more info: realm which is atmosphere or ocean. Realm is deduced from the VariableName
THUS WE HAVE A CONSTRAINT !variables must ALL be unique
regardless to the realm they belong to!
www.medcordex.eu 7
uploading files
Data providers having data to upload can use ANY ftp client to do:
ftp ftp://user:[email protected] /incoming_MEDCORDEX/$INSTmput *.nc (all files into the same flat dir)
put PLEASEGO.txt (any size, also empty)
where $INST is the code of their institution (eg: ENEA)
Then they wait for the automatic daily procedure to start (at 20:00)
www.medcordex.eu 8
ingesting files
Every day at 20:00 is automatically run the “ingesting procedure”
For each dir /incoming_MEDCORDEX/$INST with PLEASEGO.txt: for each other file in the dir, the procedure:
1. verifies it’s a netcdf file ncdump -h works properly
2. splits filenames in tokens and checks their compliance to CORDEX standard
3. checks validity of variable name it is already known
4. creates the right $PATH in /MEDCORDEX5. moves the file into its $PATH6. inserts/updates the file’s record in the db also ncdump –h
continue
www.medcordex.eu 9
ingesting files
When data provider’s files are all processed a mail is sent to him/her with the log of what happened ingesting his/her data
After ingesting all files of all data providers, the procedure:1. computes some statistics and publishes them on
www.medcordex.eu/stats taking figures from db & ftp logs 2. makes all links in /MEDCORDEX/ALL3. copies the whole /MEDCORDEX directory to another host
10
downloading files
• FTP Server (can be accessed by any ftp client)• THREDDS Data Server (software by unidata.ucar.edu)
www.medcordex.eu
credentials U / D server
data providers readyU D FTP
D THREDDS
authorized users web request D FTP THREDDS
HyMeX database users
their own Mistrals db credentials
D FTP *
11
downloading data (using any FTP client)
cmd line: ftp $f/$p/ ; dir ; get filen.nc “dir” not in /ALL ncftp –u $hymex www.medcordex.eu ; cd $p ; get filen.nc wget $f/$p/file.nc wget -r $f/$p recursive get, not in /ALL
browser: $f/$p $f/$p/filen.nc
where: $f = ftp://user:[email protected] $p = MEDCORDEX/MED-xx/…/…/….$p = MEDCORDEX/ALL
www.medcordex.eu
12
downloading data (using THREDDS)
www.medcordex.eu
services: (password required only to get netcdf files)
OpENDAP use files remotely , download them HTTP server download files netcdf subset select & download sections of each file WCS Web Coverage Service serves data to WCS clients WMS Web Map Service serves data to WMS clients NCML NetCDF Markup Language to define a CDM ds ISO description of the file in ISO 19115(-2) metadata. UDDC Unidata Attribute Convention for Data
Discovery provides recommendations for netCDF attributes that can be added to netCDF files
13
downloading data (using THREDDS)
cmd line: ncdump –h $t/dodsC/$p/file.nc cdo showdate $t/dodsC/$p/file.nc cdo copy $t/dodsC/$p/file.nc local.nc ferret: use $t/dodsC/$p/file.nc
tested with: netcdf 4.3.1.1, cdo 1.6.4rc6, ferret 6.9
browser: www.medcordex.eu/tds MEDCORDEX/ALL is invisible
where: $p=MEDCORDEX/MED-xx/…/…/….$p=MEDCORDEX/ALL
$t=https://user:[email protected]:8290/medcordexwww.medcordex.eu
www.medcordex.eu 14
db fields
for each ingested netcdf file are
recorded:
codepathfnamesizencdumprealm
InstitutionVariableName DomainGCMModelNameCMIP5ExperimentNameCMIP5EnsembleMemberRCMModelName RCMVersionIDRCMmodelFrequencyStartTimeEndTime
www.medcordex.eu 15
statistics as of May 22, 2014
netcdf files size in GB
CMCC 5896 90.5CNRM 3° 7803 1° 493.5ENEA 2° 14023 97.7GUF 1° 62784 3° 303.6ICPT 5404 101.1INSTM 160 0.2IPSL 1606 113,7LMD 739 2° 429.0UCL 1012 101.8
Total 99427 1732.0