dibbs brown dog - nationaldataservice.org · • matlab data . ecosystems and ... • low level...

58
DIBBs Brown Dog An Extensible and Distributed Data Transformation Service

Upload: vunhi

Post on 02-May-2018

222 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: DIBBs Brown Dog - nationaldataservice.org · • Matlab Data . Ecosystems and ... • Low level analyses • Tags and Metadata • Previews • Other derived products ... Brown Dog

DIBBs Brown Dog An Extensible and Distributed Data Transformation Service

Page 2: DIBBs Brown Dog - nationaldataservice.org · • Matlab Data . Ecosystems and ... • Low level analyses • Tags and Metadata • Previews • Other derived products ... Brown Dog

Tabular Data Gap Filling

Climate Modeling Lidar

Flood Plain Analysis River Depth Distribution

River Maturity Stream Detection and Sinuosity

Satellite/Aerial Photos Land Cover/Usage

Water Detection (e.g. Lakes, Retaining Ponds)

Green Infrastructure

Hyperspectral

Radar

Photos

3D Reconstruction

3D Data

Human Preference Modeling

Video

People Detection/Tracking

Large Dynamic Group Behavior

Bee Detection/Tracking

Bee Colony Behavior

Underwater Photos

Color Correction

Image Stitching

Mapping

Event Detection

Species Detection/Counting Reef Changes

Food Supply

Structural Defects

Hazard Modeling

Microscopy Images

Pollen Detection/Classification

Paleoclimate

Evolution Root Tip Tracking

Phenomics

Materials Development

Cell Tracking

Tissue Classification

Renal Failure

Loss of Organ Function

Feedlot Tracking

Disease Detection

Historic Maps

River Meander

Coastline Changes

Documents

NLP

Sentiment Analysis

Regions in Conflict

Handwritten Documents Pre-Digital Datasets

Databases

Web Sites

Publications

Simulations

Page 3: DIBBs Brown Dog - nationaldataservice.org · • Matlab Data . Ecosystems and ... • Low level analyses • Tags and Metadata • Previews • Other derived products ... Brown Dog

Ecosystems and Climate Change M. Dietze, K. McHenry, A. Desai, “Model-data Synthesis and Forecasting Across the Upper Midwest: Partitioning Uncertainty and Environmental Heterogeneity in Ecosystem Carbon,” NSF DBI-1062547, 2011-2014

M. Dietze, K. McHenry, A. Desai, “ABI Development: The PEcAn Project - A Community Platform for Ecological Forecasting,” NSF DBI-1457890, 2015-2019

• Towards regional-scale high resolution estimates of plant life and carbon storage

• Scientific workflow and data assimilation system connecting a variety of models within the Ecology community to a variety of data sources

• Grown to 52 developers over the past 3 years

• NCSA / U. Illinois, BU, Brookhaven National Lab, University of Wisconsin, University of Notre Dame, Utah State, Columbia University, Pacific Northwest National Laboratory, DuPont Pioneer, Exeter College, UK, U. Arizona, Dartmouth College

Page 4: DIBBs Brown Dog - nationaldataservice.org · • Matlab Data . Ecosystems and ... • Low level analyses • Tags and Metadata • Previews • Other derived products ... Brown Dog

Ecosystems and Climate Change

• Models: • Ecosystem Demography (ED) • SIPNET • DALEC • …

• Data: • Biofuel Ecophysiological Trait and Yield Database (BETY) • Forest Inventory and Analysis (FIA) • North American Regional Reanalysis (NARR) • North American Carbon Program (NACP) • Food and Agriculture Organization (FAO) • …

Page 5: DIBBs Brown Dog - nationaldataservice.org · • Matlab Data . Ecosystems and ... • Low level analyses • Tags and Metadata • Previews • Other derived products ... Brown Dog

Ecosystems and Climate Change

• Data with Unstructured Aspects: • MODIS (Multi-spectral) • Lidar • Palsar (Radar) • Aviris (Airborne Infrared Spectrometer) • Landsat (Images)

• Published results (e.g. tables, figures, plots)

• Manually done to ingest into BETY

Page 6: DIBBs Brown Dog - nationaldataservice.org · • Matlab Data . Ecosystems and ... • Low level analyses • Tags and Metadata • Previews • Other derived products ... Brown Dog

• Settlement Vegetation data • Born Physical

• Paper, Microfiche, Alphanumeric/Color coded on vellum sheets

• Born Digital • PDF, JPEG, GIF, TIFF, XLS, XLSX, CSV, SHP, netCDF, HDF5,

XML, GRIB, GRIB2, geoTIFF, DBF, BIL, BIP, ARC, SDTS, SRTM, IMG, UA, LGW, SXW, ODS

• Ad hoc formats: • Spreadsheets • Databases • Services • R Data • Matlab Data

Ecosystems and Climate Change

• Document

Page 7: DIBBs Brown Dog - nationaldataservice.org · • Matlab Data . Ecosystems and ... • Low level analyses • Tags and Metadata • Previews • Other derived products ... Brown Dog

• Settlement Vegetation data • Born Physical

• Paper, Microfiche, Alphanumeric/Color coded on vellum sheets

• Born Digital • PDF, JPEG, GIF, TIFF, XLS, XLSX, CSV, SHP, netCDF, HDF5,

XML, GRIB, GRIB2, geoTIFF, DBF, BIL, BIP, ARC, SDTS, SRTM, IMG, UA, LGW, SXW, ODS

• Ad hoc formats: • Spreadsheets • Databases • Services • R Data • Matlab Data

Ecosystems and Climate Change

• Document • Image

Page 8: DIBBs Brown Dog - nationaldataservice.org · • Matlab Data . Ecosystems and ... • Low level analyses • Tags and Metadata • Previews • Other derived products ... Brown Dog

• Settlement Vegetation data • Born Physical

• Paper, Microfiche, Alphanumeric/Color coded on vellum sheets

• Born Digital • PDF, JPEG, GIF, TIFF, XLS, XLSX, CSV, SHP, netCDF, HDF5,

XML, GRIB, GRIB2, geoTIFF, DBF, BIL, BIP, ARC, SDTS, SRTM, IMG, UA, LGW, SXW, ODS

• Ad hoc formats: • Spreadsheets • Databases • Services • R Data • Matlab Data

Ecosystems and Climate Change

• Document • Image • Spatial

Page 9: DIBBs Brown Dog - nationaldataservice.org · • Matlab Data . Ecosystems and ... • Low level analyses • Tags and Metadata • Previews • Other derived products ... Brown Dog

• Settlement Vegetation data • Born Physical

• Paper, Microfiche, Alphanumeric/Color coded on vellum sheets

• Born Digital • PDF, JPEG, GIF, TIFF, XLS, XLSX, CSV, SHP, netCDF, HDF5,

XML, GRIB, GRIB2, geoTIFF, DBF, BIL, BIP, ARC, SDTS, SRTM, IMG, UA, LGW, SXW, ODS

• Ad hoc formats: • Spreadsheets • Databases • Services • R Data • Matlab Data

Ecosystems and Climate Change

• Document • Image • Spatial • Tabular

Page 10: DIBBs Brown Dog - nationaldataservice.org · • Matlab Data . Ecosystems and ... • Low level analyses • Tags and Metadata • Previews • Other derived products ... Brown Dog

• Settlement Vegetation data • Born Physical

• Paper, Microfiche, Alphanumeric/Color coded on vellum sheets

• Born Digital • PDF, JPEG, GIF, TIFF, XLS, XLSX, CSV, SHP, netCDF, HDF5,

XML, GRIB, GRIB2, geoTIFF, DBF, BIL, BIP, ARC, SDTS, SRTM, IMG, UA, LGW, SXW, ODS

• Ad hoc formats: • Spreadsheets • Databases • Services • R Data • Matlab Data

Ecosystems and Climate Change

• Document • Image • Spatial • Tabular • Weather

Page 11: DIBBs Brown Dog - nationaldataservice.org · • Matlab Data . Ecosystems and ... • Low level analyses • Tags and Metadata • Previews • Other derived products ... Brown Dog

• Settlement Vegetation data • Born Physical

• Paper, Microfiche, Alphanumeric/Color coded on vellum sheets

• Born Digital • PDF, JPEG, GIF, TIFF, XLS, XLSX, CSV, SHP, netCDF, HDF5,

XML, GRIB, GRIB2, geoTIFF, DBF, BIL, BIP, ARC, SDTS, SRTM, IMG, UA, LGW, SXW, ODS

• Ad hoc formats: • Spreadsheets • Databases • Services • R Data • Matlab Data

Ecosystems and Climate Change

• Document • Image • Spatial • Tabular • Weather • 3D

Page 12: DIBBs Brown Dog - nationaldataservice.org · • Matlab Data . Ecosystems and ... • Low level analyses • Tags and Metadata • Previews • Other derived products ... Brown Dog

• Settlement Vegetation data • Born Physical

• Paper, Microfiche, Alphanumeric/Color coded on vellum sheets

• Born Digital • PDF, JPEG, GIF, TIFF, XLS, XLSX, CSV, SHP, netCDF, HDF5,

XML, GRIB, GRIB2, geoTIFF, DBF, BIL, BIP, ARC, SDTS, SRTM, IMG, UA, LGW, SXW, ODS

• Ad hoc formats: • Spreadsheets • Databases • Services • R Data • Matlab Data

Ecosystems and Climate Change

• Document • Image • Spatial • Tabular • Weather • 3D • Archive, Database,

Filesystem, …

Page 13: DIBBs Brown Dog - nationaldataservice.org · • Matlab Data . Ecosystems and ... • Low level analyses • Tags and Metadata • Previews • Other derived products ... Brown Dog

“Big Data” • Large quantities of data • Large varieties of data

• “Long-Tail”

Number of grants

Dollars

http://www.slideshare.net/rheimann04/big-social-data-the-social-turn-in-big-data

Page 14: DIBBs Brown Dog - nationaldataservice.org · • Matlab Data . Ecosystems and ... • Low level analyses • Tags and Metadata • Previews • Other derived products ... Brown Dog

The “Long-Tail” of “Big Data”

Page 15: DIBBs Brown Dog - nationaldataservice.org · • Matlab Data . Ecosystems and ... • Low level analyses • Tags and Metadata • Previews • Other derived products ... Brown Dog
Page 16: DIBBs Brown Dog - nationaldataservice.org · • Matlab Data . Ecosystems and ... • Low level analyses • Tags and Metadata • Previews • Other derived products ... Brown Dog
Page 17: DIBBs Brown Dog - nationaldataservice.org · • Matlab Data . Ecosystems and ... • Low level analyses • Tags and Metadata • Previews • Other derived products ... Brown Dog

Tabular Data Gap Filling

Climate Modeling Lidar

Flood Plain Analysis River Depth Distribution

River Maturity Stream Detection and Sinuosity

Satellite/Aerial Photos Land Cover/Usage

Water Detection (e.g. Lakes, Retaining Ponds)

Green Infrastructure

Hyperspectral

Radar

Photos

3D Reconstruction

3D Data

Human Preference Modeling

Video

People Detection/Tracking

Large Dynamic Group Behavior

Bee Detection/Tracking

Bee Colony Behavior

Underwater Photos

Color Correction

Image Stitching

Mapping

Event Detection

Species Detection/Counting Reef Changes

Food Supply

Structural Defects

Hazard Modeling

Microscopy Images

Pollen Detection/Classification

Paleoclimate

Evolution Root Tip Tracking

Phenomics

Materials Development

Cell Tracking

Tissue Classification

Renal Failure

Loss of Organ Function

Feedlot Tracking

Disease Detection

Historic Maps

River Meander

Coastline Changes

Documents

NLP

Sentiment Analysis

Regions in Conflict

Handwritten Documents Pre-Digital Datasets

Databases

Web Sites

Publications

Simulations

Page 18: DIBBs Brown Dog - nationaldataservice.org · • Matlab Data . Ecosystems and ... • Low level analyses • Tags and Metadata • Previews • Other derived products ... Brown Dog

Tabular Data Gap Filling

Climate Modeling Lidar

Flood Plain Analysis River Depth Distribution

River Maturity Stream Detection and Sinuosity

Satellite/Aerial Photos Land Cover/Usage

Water Detection (e.g. Lakes, Retaining Ponds)

Green Infrastructure

Hyperspectral

Radar

Photos

3D Reconstruction

3D Data

Human Preference Modeling

Video

People Detection/Tracking

Large Dynamic Group Behavior

Bee Detection/Tracking

Bee Colony Behavior

Underwater Photos

Color Correction

Image Stitching

Mapping

Event Detection

Species Detection/Counting Reef Changes

Food Supply

Structural Defects

Hazard Modeling

Microscopy Images

Pollen Detection/Classification

Paleoclimate

Evolution Root Tip Tracking

Phenomics

Materials Development

Cell Tracking

Tissue Classification

Renal Failure

Loss of Organ Function

Feedlot Tracking

Disease Detection

Historic Maps

River Meander

Coastline Changes

Documents

NLP

Sentiment Analysis

Regions in Conflict

Handwritten Documents Pre-Digital Datasets

Databases

Web Sites

Publications

Simulations

Page 19: DIBBs Brown Dog - nationaldataservice.org · • Matlab Data . Ecosystems and ... • Low level analyses • Tags and Metadata • Previews • Other derived products ... Brown Dog

The Data

• Diversity of data types • Diversity of file formats

• Ad hoc formats • Obsolete formats • Proprietary formats

• Un-curated data • No metadata • No consistent/useful naming of files/directories

• Unstructured data • Non-text contents

• Potentially large and/or made up of many small files

Page 20: DIBBs Brown Dog - nationaldataservice.org · • Matlab Data . Ecosystems and ... • Low level analyses • Tags and Metadata • Previews • Other derived products ... Brown Dog

Tabular Data Gap Filling

Climate Modeling Lidar

Flood Plain Analysis River Depth Distribution

River Maturity Stream Detection and Sinuosity

Satellite/Aerial Photos Land Cover/Usage

Water Detection (e.g. Lakes, Retaining Ponds)

Green Infrastructure

Hyperspectral

Radar

Photos

3D Reconstruction

3D Data

Human Preference Modeling

Video

People Detection/Tracking

Large Dynamic Group Behavior

Bee Detection/Tracking

Bee Colony Behavior

Underwater Photos

Color Correction

Image Stitching

Mapping

Event Detection

Species Detection/Counting Reef Changes

Food Supply

Structural Defects

Hazard Modeling

Microscopy Images

Pollen Detection/Classification

Paleoclimate

Evolution Root Tip Tracking

Phenomics

Materials Development

Cell Tracking

Tissue Classification

Renal Failure

Loss of Organ Function

Feedlot Tracking

Disease Detection

Historic Maps

River Meander

Coastline Changes

Documents

NLP

Sentiment Analysis

Regions in Conflict

Handwritten Documents Pre-Digital Datasets

Databases

Web Sites

Publications

Simulations

Page 21: DIBBs Brown Dog - nationaldataservice.org · • Matlab Data . Ecosystems and ... • Low level analyses • Tags and Metadata • Previews • Other derived products ... Brown Dog

Tabular Data Gap Filling

Climate Modeling Lidar

Flood Plain Analysis River Depth Distribution

River Maturity Stream Detection and Sinuosity

Satellite/Aerial Photos Land Cover/Usage

Water Detection (e.g. Lakes, Retaining Ponds)

Green Infrastructure

Hyperspectral

Radar

Photos

3D Reconstruction

3D Data

Human Preference Modeling

Video

People Detection/Tracking

Large Dynamic Group Behavior

Bee Detection/Tracking

Bee Colony Behavior

Underwater Photos

Color Correction

Image Stitching

Mapping

Event Detection

Species Detection/Counting Reef Changes

Food Supply

Structural Defects

Hazard Modeling

Microscopy Images

Pollen Detection/Classification

Paleoclimate

Evolution Root Tip Tracking

Phenomics

Materials Development

Cell Tracking

Tissue Classification

Renal Failure

Loss of Organ Function

Feedlot Tracking

Disease Detection

Historic Maps

River Meander

Coastline Changes

Documents

NLP

Sentiment Analysis

Regions in Conflict

Handwritten Documents Pre-Digital Datasets

Databases

Web Sites

Publications

Simulations

Page 22: DIBBs Brown Dog - nationaldataservice.org · • Matlab Data . Ecosystems and ... • Low level analyses • Tags and Metadata • Previews • Other derived products ... Brown Dog

Processes Over the Data

• Diversity of analyses • Many forms (e.g. scripts, libraries, whole suites, services) • Many languages • Many dependencies

• Leverage towards dealing with unstructured/un-curated data • Analyses churn through data and generate new, often higher

level, data • Metadata, data about data

Page 23: DIBBs Brown Dog - nationaldataservice.org · • Matlab Data . Ecosystems and ... • Low level analyses • Tags and Metadata • Previews • Other derived products ... Brown Dog

The Problem

• A huge diversity in the data • Types • Formats • Analyses

• A huge diversity of software involved • Scripts • Applications • Libraries • Services

• Dealing with these issues has become part of the scientific workflow, its time consuming and redundant, its difficult, its varies across labs/fields, and makes reproducibility/reusability difficult!

Page 24: DIBBs Brown Dog - nationaldataservice.org · • Matlab Data . Ecosystems and ... • Low level analyses • Tags and Metadata • Previews • Other derived products ... Brown Dog

The Problem

• A huge diversity in the data • Types • Formats • Analyses

• A huge diversity of software involved • Scripts • Applications • Libraries • Services

• Dealing with these issues has become part of the scientific workflow, its time consuming and redundant, its difficult, its varies across labs/fields, and makes reproducibility/reusability difficult!

Page 25: DIBBs Brown Dog - nationaldataservice.org · • Matlab Data . Ecosystems and ... • Low level analyses • Tags and Metadata • Previews • Other derived products ... Brown Dog

A Science Driven Data Transformation Service

• Supporting Data Manipulation as a Service • File format conversions • Data set conversions • Database ingestion/dumping • Website scraping

• Supporting Data Analysis as a Service • Low level analyses • Tags and Metadata • Previews • Other derived products

• Relieve scientific community from having to address this as a first step of their workflows.

Page 26: DIBBs Brown Dog - nationaldataservice.org · • Matlab Data . Ecosystems and ... • Low level analyses • Tags and Metadata • Previews • Other derived products ... Brown Dog

Brown Dog

• Data transformations • Conversions and Extractions

• Extensibility • Easy to add new converters/extractors • Encapsulated software & dependencies

https://en.wikipedia.org/wiki/Mongrel

• API • Clients, Scalability, Provenance, Information Loss, Data

Movement

Page 27: DIBBs Brown Dog - nationaldataservice.org · • Matlab Data . Ecosystems and ... • Low level analyses • Tags and Metadata • Previews • Other derived products ... Brown Dog

• Data Access Proxy (DAP) • An extensible and distributed service for carrying out file

format conversions • Move towards an internet/world that is agnostic to file

formats • Aid in accessing a files contents independent of how it

is represented on disk

• Data Tilling Service (DTS)

• An extensible and distributed service for the extraction of new data or metadata from a file’s contents

• Provide means to query and/or relate collections of data without metadata

• Data Conversion: A transformation on digital data that largely preserves the entirety of the data. Largely reversible.

• Data Extraction: A transformation on digital data

which creates new, often higher level, data from the contents of the given data (e.g. tags, signatures). Not reversible.

Page 28: DIBBs Brown Dog - nationaldataservice.org · • Matlab Data . Ecosystems and ... • Low level analyses • Tags and Metadata • Previews • Other derived products ... Brown Dog

Brown Dog

• The Data Access Proxy (DAP) • https://dap.ncsa.illinous.edu/polyglot /api/ • File in, File out

• The Data Tilling Service (DTS) • https://dts.ncsa.illinois.edu/clowder/api/ • File in, JSON out • JSON can contain metadata, tags, signatures, links to derived

data products, etc…

Page 29: DIBBs Brown Dog - nationaldataservice.org · • Matlab Data . Ecosystems and ... • Low level analyses • Tags and Metadata • Previews • Other derived products ... Brown Dog
Page 30: DIBBs Brown Dog - nationaldataservice.org · • Matlab Data . Ecosystems and ... • Low level analyses • Tags and Metadata • Previews • Other derived products ... Brown Dog

Brown Dog

• Services!!! • Programmable interface • Client applications build on top of these services • Back with computational and storage resources • Place to preserve/reuse software/tools

Page 31: DIBBs Brown Dog - nationaldataservice.org · • Matlab Data . Ecosystems and ... • Low level analyses • Tags and Metadata • Previews • Other derived products ... Brown Dog
Page 32: DIBBs Brown Dog - nationaldataservice.org · • Matlab Data . Ecosystems and ... • Low level analyses • Tags and Metadata • Previews • Other derived products ... Brown Dog
Page 33: DIBBs Brown Dog - nationaldataservice.org · • Matlab Data . Ecosystems and ... • Low level analyses • Tags and Metadata • Previews • Other derived products ... Brown Dog
Page 35: DIBBs Brown Dog - nationaldataservice.org · • Matlab Data . Ecosystems and ... • Low level analyses • Tags and Metadata • Previews • Other derived products ... Brown Dog
Page 36: DIBBs Brown Dog - nationaldataservice.org · • Matlab Data . Ecosystems and ... • Low level analyses • Tags and Metadata • Previews • Other derived products ... Brown Dog

Clowder

• “Smart Drop Box” • Share, collaborate

on datasets • Publishing data • Social curation • Extensible Auto-

curation

Page 37: DIBBs Brown Dog - nationaldataservice.org · • Matlab Data . Ecosystems and ... • Low level analyses • Tags and Metadata • Previews • Other derived products ... Brown Dog

Architecture

Load balancer (nginx)

Data/Metadata

(MongoDB)

Event Bus (RabbitMQ)

Extractor 1 (Java)

Extractor 2 (Python)

Text Search (Elastic search)

Webapp (Scala/Play)

Webapp (Scala/Play)

Webapp (Scala/Play)

Clowder

External Software

Web Browser Custom Clients

Client

Server

Multimedia Search (Versus)

Multimedia Search (Versus)

Text Search (Elastic search)

Data/Metadata

(MongoDB)

Page 38: DIBBs Brown Dog - nationaldataservice.org · • Matlab Data . Ecosystems and ... • Low level analyses • Tags and Metadata • Previews • Other derived products ... Brown Dog

Load Balancer

API Frontend

Job Queue

Extractor

Database

1. File

2. Routing

3. File Stored

4. Job Submitted

6. Read 5. Job Picked Up

8. Write

7. Extract 7.5 Status Updates

Log Analysis

Distributed Log

Extractions

Page 39: DIBBs Brown Dog - nationaldataservice.org · • Matlab Data . Ecosystems and ... • Low level analyses • Tags and Metadata • Previews • Other derived products ... Brown Dog

extractors.connect_message_bus(extractorName=extractorName, messageType=messageType, rabbitmqURL=rabbitmqURL, rabbitmqExchange=rabbitmqExchange, processFileFunction=process_file, checkMessageFunction=check_message)

Connecting to rabbitmq

Connect

def process_file(parameters): global extractorName inputfile=parameters['inputfile'] # call actual program result = subprocess.check_output(['wc', inputfile], stderr=subprocess.STDOUT) (lines, words, characters, filename) = result.split()

Return Metadata

Work on File

extractors.upload_file_metadata(mdata=metadata, parameters=parameters)

Page 40: DIBBs Brown Dog - nationaldataservice.org · • Matlab Data . Ecosystems and ... • Low level analyses • Tags and Metadata • Previews • Other derived products ... Brown Dog

wordcount.py

Page 41: DIBBs Brown Dog - nationaldataservice.org · • Matlab Data . Ecosystems and ... • Low level analyses • Tags and Metadata • Previews • Other derived products ... Brown Dog

face.py #!/usr/bin/env python import pika import sys import json import traceback import requests import tempfile import subprocess import os import itertools import numpy as np import cv2 import time import logging from config import * import pymedici.extractors as extractors def main(): global extractorName, messageType, rabbitmqExchange, rabbitmqURL #set logging logging.basicConfig(format='%(levelname)-7s : %(name)s - %(message)s', level=logging.WARN) …

Page 42: DIBBs Brown Dog - nationaldataservice.org · • Matlab Data . Ecosystems and ... • Low level analyses • Tags and Metadata • Previews • Other derived products ... Brown Dog

Polyglot

• Wraps and automates I/O operations within arbitrary software

• Searches for conversion paths across software

• Estimates information loss

• Horizontally scalable

Page 43: DIBBs Brown Dog - nationaldataservice.org · • Matlab Data . Ecosystems and ... • Low level analyses • Tags and Metadata • Previews • Other derived products ... Brown Dog

#Application name (Version) #File types supported (e.g. document, depth, image, …) #Comma separated list of supported input formats #Comma separated list of supported output formats

Describe

#Call external application and/or carry out conversion … Convert File

Page 44: DIBBs Brown Dog - nationaldataservice.org · • Matlab Data . Ecosystems and ... • Low level analyses • Tags and Metadata • Previews • Other derived products ... Brown Dog

;OpenOffice ;document ;doc, odt, rtf, txt ;doc, odt, pdf, rtf, txt ;Run program Run, "C:\Program Files\OpenOffice.org 3\program\soffice.exe" -headless -norestore "-accept=socket`,host=local…" RunWait, "C:\Program Files\OpenOffice.org 3\program\python.exe" "C:\Converters\DocumentConverter.py" "%1%" "%2%"

OpenOffice_convert.ahk

Page 45: DIBBs Brown Dog - nationaldataservice.org · • Matlab Data . Ecosystems and ... • Low level analyses • Tags and Metadata • Previews • Other derived products ... Brown Dog

A3DReviewer_open.ahk

;Adobe 3D Reviewer (v9) ;model ;3ds, 3dxml, arc, asm, bdl, catdrawing, catpart, catproduct, catshape, cgr, dae, dlv, exp, hgl, hp, hpgl, hpl, iam, ifc, igs, iges, ipt, jt, kmz, mf1, model, neu, obj, _pd, par, pdf, pkg, plt, prc, prt, prw, psm, pwd, sab, sat, sda, sdac, sdp, sdpc, sds, sdsc, sdw, sdwc, ses, session, sldasm, sldlfp, sldprt, stl, step, stp, u3d, unv, wrl, vrml, x_b, x_t, xas, xpr, xmt, xmt_txt, xv0, xv3 ;Run program if not already running IfWinNotExist, Adobe 3D Reviewer { Run, C:\Program Files\Adobe\Acrobat 9.0\Acrobat\plug_ins3d\prc\A3DReviewer.exe WinWait, Adobe 3D Reviewer } ;Activate the window WinActivate, Adobe 3D Reviewer WinWaitActive, Adobe 3D Reviewer ;Parse filename root arg1 = %1% …

Page 46: DIBBs Brown Dog - nationaldataservice.org · • Matlab Data . Ecosystems and ... • Low level analyses • Tags and Metadata • Previews • Other derived products ... Brown Dog

PEcAn#ED_convert.R

#!/usr/bin/Rscript #PEcAn #data #pecan.zip #ed.zip .libPaths("/home/polyglot/R/library") sink(stdout(),type="message") # global variables overwrite <- TRUE verbose <- TRUE # get command line arguments args <- commandArgs(trailingOnly = TRUE) usage <- function(msg) { print(msg) print(paste0("Usage: ", args[0], " cf-nc_Input_File edOutputDir ")) print(paste0("Example1: ", args[0], " US-Dk3.pecan.nc US-Dk3.ed.zip [/tmp/watever] ")) …

Page 47: DIBBs Brown Dog - nationaldataservice.org · • Matlab Data . Ecosystems and ... • Low level analyses • Tags and Metadata • Previews • Other derived products ... Brown Dog

API Gateway

API GATEWAY REDIS

CROWD

DTS / CLOWDER

DAP / POLYGLOT

VERSUS

DATAWOLF

Request

Response

Request+

Response

Request+

Response

Page 48: DIBBs Brown Dog - nationaldataservice.org · • Matlab Data . Ecosystems and ... • Low level analyses • Tags and Metadata • Previews • Other derived products ... Brown Dog

API Gateway

FENCE

Get /keys/8d4/token Headers: Crowd Credentials

using Basic Auth

Get /dap/outputs Headers: Access token

Get /dts/api/extractions/extractors_n

ames Headers: Access token

REDIS Add token with ttl

POLYGLOT (DAP)

CLOWDER (DTS)

Get /outputs Headers: Polyglot Credentials

Get /api/extractions/extractors_nam

es Headers: Clowder Credentials

CROWD Check user credentials

1

1

1

2

3

2

3

Page 49: DIBBs Brown Dog - nationaldataservice.org · • Matlab Data . Ecosystems and ... • Low level analyses • Tags and Metadata • Previews • Other derived products ... Brown Dog
Page 50: DIBBs Brown Dog - nationaldataservice.org · • Matlab Data . Ecosystems and ... • Low level analyses • Tags and Metadata • Previews • Other derived products ... Brown Dog
Page 51: DIBBs Brown Dog - nationaldataservice.org · • Matlab Data . Ecosystems and ... • Low level analyses • Tags and Metadata • Previews • Other derived products ... Brown Dog
Page 52: DIBBs Brown Dog - nationaldataservice.org · • Matlab Data . Ecosystems and ... • Low level analyses • Tags and Metadata • Previews • Other derived products ... Brown Dog
Page 53: DIBBs Brown Dog - nationaldataservice.org · • Matlab Data . Ecosystems and ... • Low level analyses • Tags and Metadata • Previews • Other derived products ... Brown Dog
Page 54: DIBBs Brown Dog - nationaldataservice.org · • Matlab Data . Ecosystems and ... • Low level analyses • Tags and Metadata • Previews • Other derived products ... Brown Dog

Support within Data Management Plans

The data analysis/manipulation software developed here will be pushed into the NSF DIBBs: Brown Dog (ACI-1261582) project as data extractors/converters within the DTS and DAP, services providing automatic data annotations/analysis and format conversions as broadly usable internet resources. Brown Dog aims to both provide services and tools to aid in the curation, accessing, and indexing of data as well as to preserve scientific software that might be leveraged for that purpose. As Brown Dog extractors/converters, the capabilities of these tools will be preserved, will take part in an ecosystem of other extraction/conversion tools, and will be leverageable by others within the scientific community, perhaps in very different fields, as well as by the general public.

Page 55: DIBBs Brown Dog - nationaldataservice.org · • Matlab Data . Ecosystems and ... • Low level analyses • Tags and Metadata • Previews • Other derived products ... Brown Dog

Milestones

• XSEDE Tutorial • July 18th, Miami • Walk through adding and deploying new tools (i.e. converters,

extractors) • Walk through the API and creating a toy client application

• Beta Release • End of this year

Page 56: DIBBs Brown Dog - nationaldataservice.org · • Matlab Data . Ecosystems and ... • Low level analyses • Tags and Metadata • Previews • Other derived products ... Brown Dog

Polyglot

Versus Daffodil

Page 57: DIBBs Brown Dog - nationaldataservice.org · • Matlab Data . Ecosystems and ... • Low level analyses • Tags and Metadata • Previews • Other derived products ... Brown Dog
Page 58: DIBBs Brown Dog - nationaldataservice.org · • Matlab Data . Ecosystems and ... • Low level analyses • Tags and Metadata • Previews • Other derived products ... Brown Dog

http://browndog.ncsa.illinois.edu