policy based data management data-intensive computing distributed collections grid-enabled storage...

7
Policy Based Data Management Data-Intensive Computing Distributed Collections Grid-Enabled Storage iRODS Reagan W. Moore moore @diceresearch.org 1

Upload: marilynn-smith

Post on 04-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Policy Based Data Management Data-Intensive Computing Distributed Collections Grid-Enabled Storage iRODS Reagan W. Moore moore@diceresearch.org 1

Policy Based Data ManagementData-Intensive Computing

Distributed CollectionsGrid-Enabled Storage

iRODS

Reagan W. [email protected]

1

Page 2: Policy Based Data Management Data-Intensive Computing Distributed Collections Grid-Enabled Storage iRODS Reagan W. Moore moore@diceresearch.org 1

22

Policy-based Data Environments• Purpose - reason a collection is assembled

• Properties - attributes needed to ensure the purpose

• Policies - controls for enforcing desired properties,• mapped to computer actionable rules

• Procedures - functions that implement the policies

• mapped to computer actionable workflows

• Persistent state information - results of applying the procedures

• mapped to system metadata

• Assessment criteria - validation that state information conforms to the desired purpose

• mapped to periodically executed policies

2

Page 3: Policy Based Data Management Data-Intensive Computing Distributed Collections Grid-Enabled Storage iRODS Reagan W. Moore moore@diceresearch.org 1

3

Data-Intensive Computing

• Support computation at the remote storage location– Low complexity operations (small number of operations per byte)– Manage workflows through distributed rule engine

• Integrate with computation at supercomputer– High complexity operations (large number of operations per byte)

• Virtualize the workflow– Manage completion of the workflow tasks independently of the

choice of platform– Manage provenance information– Derived data products can include generation of advanced indices

to support discovery and browsing

Page 4: Policy Based Data Management Data-Intensive Computing Distributed Collections Grid-Enabled Storage iRODS Reagan W. Moore moore@diceresearch.org 1

4

User w/ClientCan Search, Access, Add and

Manage Data& Metadata

Access distributed data with Web-based Browser or iRODS GUI or Command Line clients.

Overview of iRODS Architecture

iRODS Data Server

Disk, Tape, etc.

iRODS Metadata

CatalogTrack information

iRODS Middleware

iRODS Rule Engine

Tracks Policies

4

Page 5: Policy Based Data Management Data-Intensive Computing Distributed Collections Grid-Enabled Storage iRODS Reagan W. Moore moore@diceresearch.org 1

5

Grid-Enabled Storage• Integrate data processing within storage

controller– Very high-speed access to disk– Application of rules that control execution of

procedures within the storage controller– Native data grid software runs within controller

• Connect disk to any data grid– Next generation of connectivity beyond SAN/NAS

technology– Data grid manages the properties of the collection

Page 6: Policy Based Data Management Data-Intensive Computing Distributed Collections Grid-Enabled Storage iRODS Reagan W. Moore moore@diceresearch.org 1

6

iRODS Extensible Infrastructure• Clients – specific to discipline and life cycle state

• Policies – specific to discipline

• Procedures – specific to discipline

• Remaining infrastructure is generic– Network transport

– Authentication / Authorization

– Distributed storage access

– Remote execution

– Metadata management

– Message passing

– Rule engine

Page 7: Policy Based Data Management Data-Intensive Computing Distributed Collections Grid-Enabled Storage iRODS Reagan W. Moore moore@diceresearch.org 1

7

iRODS is a "coordinated NSF/OCI-Nat'l Archives research activity" under the auspices of the President's NITRD Program and is identified as among the priorities underlying the President's 2011 Budget Supplement in the area of Human and Computer Interaction Information Management technology research.

Reagan W. [email protected]

http://irods.diceresearch.org

NSF OCI-0848296 “NARA Transcontinental Persistent Archives Prototype”NSF SDCI-0721400 “Data Grids for Community Driven Applications”

7