policy based data management data-intensive computing distributed collections grid-enabled storage...
TRANSCRIPT
Policy Based Data ManagementData-Intensive Computing
Distributed CollectionsGrid-Enabled Storage
iRODS
Reagan W. [email protected]
1
22
Policy-based Data Environments• Purpose - reason a collection is assembled
• Properties - attributes needed to ensure the purpose
• Policies - controls for enforcing desired properties,• mapped to computer actionable rules
• Procedures - functions that implement the policies
• mapped to computer actionable workflows
• Persistent state information - results of applying the procedures
• mapped to system metadata
• Assessment criteria - validation that state information conforms to the desired purpose
• mapped to periodically executed policies
2
3
Data-Intensive Computing
• Support computation at the remote storage location– Low complexity operations (small number of operations per byte)– Manage workflows through distributed rule engine
• Integrate with computation at supercomputer– High complexity operations (large number of operations per byte)
• Virtualize the workflow– Manage completion of the workflow tasks independently of the
choice of platform– Manage provenance information– Derived data products can include generation of advanced indices
to support discovery and browsing
4
User w/ClientCan Search, Access, Add and
Manage Data& Metadata
Access distributed data with Web-based Browser or iRODS GUI or Command Line clients.
Overview of iRODS Architecture
iRODS Data Server
Disk, Tape, etc.
iRODS Metadata
CatalogTrack information
iRODS Middleware
iRODS Rule Engine
Tracks Policies
4
5
Grid-Enabled Storage• Integrate data processing within storage
controller– Very high-speed access to disk– Application of rules that control execution of
procedures within the storage controller– Native data grid software runs within controller
• Connect disk to any data grid– Next generation of connectivity beyond SAN/NAS
technology– Data grid manages the properties of the collection
6
iRODS Extensible Infrastructure• Clients – specific to discipline and life cycle state
• Policies – specific to discipline
• Procedures – specific to discipline
• Remaining infrastructure is generic– Network transport
– Authentication / Authorization
– Distributed storage access
– Remote execution
– Metadata management
– Message passing
– Rule engine
7
iRODS is a "coordinated NSF/OCI-Nat'l Archives research activity" under the auspices of the President's NITRD Program and is identified as among the priorities underlying the President's 2011 Budget Supplement in the area of Human and Computer Interaction Information Management technology research.
Reagan W. [email protected]
http://irods.diceresearch.org
NSF OCI-0848296 “NARA Transcontinental Persistent Archives Prototype”NSF SDCI-0721400 “Data Grids for Community Driven Applications”
7