national archives and records administration1 integrated rules ordered data system (irods)...

20
National Archives and Records Admin National Archives and Records Admin istration istration 1 I I ntegrated ntegrated R R ules ules O O rdered rdered D D ata ata S S ystem ystem (“IRODS”) Technology Research: (“IRODS”) Technology Research: Digital Preservation Technology in a Digital Preservation Technology in a SOA Technical Context SOA Technical Context Robert Chadduck Robert Chadduck Principal Technologist Principal Technologist Electronic Records Archives Program Electronic Records Archives Program The National Archives and Records Administration The National Archives and Records Administration

Upload: reynold-berry

Post on 18-Jan-2018

218 views

Category:

Documents


0 download

DESCRIPTION

National Archives and Records Administration3 Open Source, University-based Technology Research collaboratively supported by NSF/Office of CyberInfrastructure & NARA

TRANSCRIPT

Page 1: National Archives and Records Administration1 Integrated Rules Ordered Data System (IRODS) Technology Research: Digital Preservation Technology in a

National Archives and Records AdministrationNational Archives and Records Administration 11

IIntegrated ntegrated RRules ules OOrdered rdered DData ata SSystem (“IRODS”) ystem (“IRODS”) Technology Research:Technology Research:

Digital Preservation Technology in a SOA Digital Preservation Technology in a SOA Technical ContextTechnical Context

Robert ChadduckRobert ChadduckPrincipal TechnologistPrincipal Technologist

Electronic Records Archives ProgramElectronic Records Archives ProgramThe National Archives and Records AdministrationThe National Archives and Records Administration

Page 2: National Archives and Records Administration1 Integrated Rules Ordered Data System (IRODS) Technology Research: Digital Preservation Technology in a

National Archives and Records AdministrationNational Archives and Records Administration 22

Synopsis of 18 April 2007 Invited Presentation by Synopsis of 18 April 2007 Invited Presentation by Dr. Reagan Moore, Ph.D. Dr. Reagan Moore, Ph.D.

Distinguished Scientist Distinguished Scientist San Diego Supercomputer Center San Diego Supercomputer Center

to NITRD HCI&IM Coordinating Groupto NITRD HCI&IM Coordinating Group

Page 3: National Archives and Records Administration1 Integrated Rules Ordered Data System (IRODS) Technology Research: Digital Preservation Technology in a

National Archives and Records AdministrationNational Archives and Records Administration 33

Open Source, University-based Technology Open Source, University-based Technology Research collaboratively supported by NSF/Office Research collaboratively supported by NSF/Office

of CyberInfrastructure & NARAof CyberInfrastructure & NARA

Page 4: National Archives and Records Administration1 Integrated Rules Ordered Data System (IRODS) Technology Research: Digital Preservation Technology in a

Scientific Data CollectionsScientific Data Collections

Reagan W. MooreReagan W. Moore

Wayne SchroederWayne Schroeder

Mike WanMike Wan

Arcot RajasekarArcot Rajasekar

Richard MarcianoRichard Marciano

{moore, schroede, mwan, sekar, marciano}@sdsc.edu

http://www.sdsc.edu/srb

http://irods.sdsc.edu/http://irods.sdsc.edu/

Page 5: National Archives and Records Administration1 Integrated Rules Ordered Data System (IRODS) Technology Research: Digital Preservation Technology in a

Data CollectionsData Collections• NSF Cyberinfrastructure projects

• Digital holdings for a scientific discipline• Simulation applications

• Output from supercomputers• Real-time sensor systems

• Observational data• Scientific laboratories

• Experimental data

Page 6: National Archives and Records Administration1 Integrated Rules Ordered Data System (IRODS) Technology Research: Digital Preservation Technology in a

Scientific Data ManagementScientific Data Management• Data collections

• Data organization• Data grids

• Data sharing• Data publication

• Digital Libraries• Data preservation

• Persistent archives

• SDSC uses generic data grid technology to support all data management applications

Page 7: National Archives and Records Administration1 Integrated Rules Ordered Data System (IRODS) Technology Research: Digital Preservation Technology in a

Date

Project GBs of data stored

1000Õs of files

GBs of data stored

1000Õs of files

Users with ACLs

GBs of data stored

1000Õs of files

Users with ACLs

Data Grid NSF / NVO 17,800 5,139 51,380 8,690 80 119,278 17,828 100 NSF / NPACI 1,972 1,083 17,578 4,694 380 36,514 7,483 380 Hayden 6,800 41 7,201 113 178 8,013 161 227 Pzone 438 31 812 47 49 25,681 14,793 68 NSF / LDAS-SALK 239 1 4,562 16 66 193,959 196 67 NSF / SLAC-JCSG 514 77 4,317 563 47 20,620 2,152 55 NSF / TeraGrid 80,354 685 2,962 293,539 8,038 3,267 NIH / BIRN 5,416 3,366 148 20,800 33,748 424 NCAR 1,567 8 2 LCA 1,834 39 2Digital Library NSF / LTER 158 3 233 6 35 260 41 36 NSF / Portal 33 5 1,745 48 384 2,620 53 460 NIH / AfCS 27 4 462 49 21 733 94 21 NSF / SIO Explorer 19 1 1,734 601 27 2,750 1,202 27 NSF / SCEC 15,246 1,737 52 168,931 3,545 73 LLNL 13,784 1,374 5 CHRON 6,398 2,064 5Persistent Archive NARA 7 2 63 81 58 3,793 4,983 58 NSF / NSDL 2,785 20,054 119 5,699 50,600 136 UCSD Libraries 127 202 29 190 208 29 NHPRC / PAT 1,888 521 28 RoadNet 2,608 975 30 UCTV 7,359 2 5 LOC 9,693 256 8 Earth Sci 3,794 511 5TOTAL 28 TB 6 mil 194 TB 40 mil 4,635 961 TB 153 mil 5,516

5/17/02 6/30/04 4/23/07

Page 8: National Archives and Records Administration1 Integrated Rules Ordered Data System (IRODS) Technology Research: Digital Preservation Technology in a

Data Management ChallengesData Management Challenges• Authenticity

• Manage descriptive metadata for each file• Manage access controls• Manage consistent updates to administrative metadata

• Integrity• Manage checksums• Replicate files• Synchronize replicas• Federate data grids

• Infrastructure independence• Manage collection properties • Manage interactions with storage systems• Manage distributed data

Page 9: National Archives and Records Administration1 Integrated Rules Ordered Data System (IRODS) Technology Research: Digital Preservation Technology in a

Generic InfrastructureGeneric Infrastructure• Data grids manage data distributed

across multiple types of storage systems• File systems, tape archives, object ring buffers

• Data grids manage collection attributes• Provenance, descriptive, system metadata

• Data grids manage technology evolution• At the point in time when new technology is

available, both the old and new systems can be integrated

Page 10: National Archives and Records Administration1 Integrated Rules Ordered Data System (IRODS) Technology Research: Digital Preservation Technology in a

Data GridsData Grids• SRB - Storage Resource Broker

• Persistent naming of distributed data• Management of data stored in multiple types of storage

systems• Organization of data as a shared collection with descriptive

metadata, access controls, audit trails• iRODS - integrated Rule-Oriented Data System

• Rules control execution of remote micro-services• Manage persistent state information• Validate assertions about collection• Automate execution of management policies

Page 11: National Archives and Records Administration1 Integrated Rules Ordered Data System (IRODS) Technology Research: Digital Preservation Technology in a

Preservation ManagementPreservation Management

Data ManagementEnvironment

ConservedProperties

ControlMechanisms

RemoteOperations

ManagementFunctions

AssessmentCriteria

ManagementPolicies

Capabilities

Data grid Š Management virtualizationData Management

InfrastructurePersistent

StateRules Micro-services

Data grid Š Data and trust virtualizationPhysical

InfrastructureDatabase Rule Engine Storage

System

iRODS - integrated Rule-Oriented Data SystemiRODS - integrated Rule-Oriented Data System

Page 12: National Archives and Records Administration1 Integrated Rules Ordered Data System (IRODS) Technology Research: Digital Preservation Technology in a

Rule-based Data ManagementRule-based Data Management

• Map from management policies to rules controlling execution of remote micro-services

• Manage persistent state information for results of each micro-service execution

• Support an additional three logical name spaces• Rules• Micro-services• Persistent state information

• Constitutes representation information for preservation environments

Page 13: National Archives and Records Administration1 Integrated Rules Ordered Data System (IRODS) Technology Research: Digital Preservation Technology in a

Example RulesExample Rules

• Rule composed of four parts:• Name | condition | micro-service set | recovery

• Rule to automate replication of data for a specific collection

acPostProcForPut |$objPath like /tempZone/home/rods/nvo/* | msiSysReplDataObj(nvoReplResc,null) | nop

• Rule types• Internal, administrative, user-defined• Atomic, deferred, periodic

Page 14: National Archives and Records Administration1 Integrated Rules Ordered Data System (IRODS) Technology Research: Digital Preservation Technology in a

Management VirtualizationManagement Virtualization• Standard policies expressed as rules

• Integrity• Validation of checksums• Synchronization of replicas• Data distribution• Data retention• Access controls

• Authenticity• Chain of custody - audit trails• Required preservation metadata - templates• Generation of AIPs, DIPS

Page 15: National Archives and Records Administration1 Integrated Rules Ordered Data System (IRODS) Technology Research: Digital Preservation Technology in a

New CapabilitiesNew Capabilities• Management capabilities

• Rules to validate assessment criteria• Access controls on rules • Time-dependent access controls• Access controls on each micro-service• Redaction, access controls on structures in a file• Rule to parse audit trails, verify consistency of system

• Data grid evolution• Dynamic addition of new rules / micro-services / persistent state

information• Rules to control migration from old management policies to new

management policies• Federation

• Migration of rules and micro-services with data

Page 16: National Archives and Records Administration1 Integrated Rules Ordered Data System (IRODS) Technology Research: Digital Preservation Technology in a

Federation Between Data GridsFederation Between Data Grids

Data Grid

• Logical resource name space

• Logical user name space

• Logical file name space

• Logical rule name space

• Logical micro-service name

• Logical persistent state

Data Collection B

Data Access Methods (Web Browser, DSpace, OAI-PMH)

Data Grid

• Logical resource name space

• Logical user name space

• Logical file name space

• Logical rule name space

• Logical micro-service name

• Logical persistent state

Data Collection A

Page 17: National Archives and Records Administration1 Integrated Rules Ordered Data System (IRODS) Technology Research: Digital Preservation Technology in a

Digital PreservationDigital Preservation

• Preservation is communication with the future• How do we migrate records onto new technology

(information syntax, encoding format, storage infrastructure, access protocols)?

• SRB - Storage Resource Broker data grid provides the interoperability mechanisms needed to manage multiple versions of technology

• Preservation manages communication from the past• What information do we need from the past to make

assertions about preservation assessment criteria (authenticity, integrity, chain of custody)?

• iRODS - integrated Rule-Oriented Data System

Page 18: National Archives and Records Administration1 Integrated Rules Ordered Data System (IRODS) Technology Research: Digital Preservation Technology in a

For More InformationFor More Information

Reagan W. MooreSan Diego Supercomputer Center

[email protected]

http://www.sdsc.edu/srb/http://irods.sdsc.edu/

Page 19: National Archives and Records Administration1 Integrated Rules Ordered Data System (IRODS) Technology Research: Digital Preservation Technology in a

National Archives and Records AdministrationNational Archives and Records Administration 1919

For Additional Information and DevelopmentsFor Additional Information and Developmentshttp://irods.sdsc.edu/index.php/Main_Pagehttp://irods.sdsc.edu/index.php/Main_Page

Page 20: National Archives and Records Administration1 Integrated Rules Ordered Data System (IRODS) Technology Research: Digital Preservation Technology in a

National Archives and Records AdministrationNational Archives and Records Administration 2020

Robert ChadduckRobert ChadduckPrincipal TechnologistPrincipal Technologist

Electronic Records Archives ProgramElectronic Records Archives ProgramThe National Archives and Records AdministrationThe National Archives and Records Administration

telephone: 301-827-1585telephone: 301-827-1585robert.chadduck at nara.govrobert.chadduck at nara.gov