haoxu( jewelh.ward( mike(conway( arcot(rajasekar( reagan(w...

38
Building an Extensible File System via Policybased Data Management Hao Xu Jewel H. Ward Mike Conway Arcot Rajasekar Reagan W. Moore (iRODS ConsorIum, hLp://irods.org )

Upload: others

Post on 25-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: HaoXu( JewelH.Ward( Mike(Conway( Arcot(Rajasekar( Reagan(W ...carlosm/Papers/xu-pfsw14-slides.pdf · File System ! Essential Functions: " Ingest, Store, Access ! Modern File Systems

 

Building  an  Extensible  File  System  via    Policy-­‐based  Data  Management  

Hao  Xu  Jewel  H.  Ward  Mike  Conway  Arcot  Rajasekar  Reagan  W.  Moore  

(iRODS  ConsorIum,  hLp://irods.org)    

Page 2: HaoXu( JewelH.Ward( Mike(Conway( Arcot(Rajasekar( Reagan(W ...carlosm/Papers/xu-pfsw14-slides.pdf · File System ! Essential Functions: " Ingest, Store, Access ! Modern File Systems

File System

q Essential Functions: §  Ingest, Store, Access

q Modern File Systems are built on top of traditional file systems: § Google File System, Amazon S3, Hadoop

Distributed File System § Driven by the need of a target application § Customized toward the target application

domain

Page 3: HaoXu( JewelH.Ward( Mike(Conway( Arcot(Rajasekar( Reagan(W ...carlosm/Papers/xu-pfsw14-slides.pdf · File System ! Essential Functions: " Ingest, Store, Access ! Modern File Systems

Data Management Needs in Archive and Scientific Communities

q Discoverability q Complex Metadata q Workflow Management q Data Sharing q Provenance q Long Term Preservation q Technology Migration q  Interoperability Between Infrastructures

Page 4: HaoXu( JewelH.Ward( Mike(Conway( Arcot(Rajasekar( Reagan(W ...carlosm/Papers/xu-pfsw14-slides.pdf · File System ! Essential Functions: " Ingest, Store, Access ! Modern File Systems

Challenges

Can generic infrastructure meet the needs of a diverse set of data management domains?

Page 5: HaoXu( JewelH.Ward( Mike(Conway( Arcot(Rajasekar( Reagan(W ...carlosm/Papers/xu-pfsw14-slides.pdf · File System ! Essential Functions: " Ingest, Store, Access ! Modern File Systems

Flexibility to Define a Wide Range of Application Domain Policies

q  User Community à Policies q  File ingest operations:

§  Authentication §  Authorization §  Storage Quota §  Aggregation §  Resource Selection §  Replication §  File Retention §  Metadata

Page 6: HaoXu( JewelH.Ward( Mike(Conway( Arcot(Rajasekar( Reagan(W ...carlosm/Papers/xu-pfsw14-slides.pdf · File System ! Essential Functions: " Ingest, Store, Access ! Modern File Systems

Infrastructure Support For Non-standard Application Domain Operations

q  Standard file system operations have robust support: §  Metadata §  Auditing §  Access Control List

q  Non-standard operations that are implemented as a library do not have direct support from the file system. Examples: §  Preservation – OAIS: SIP, AIP, DIP packages §  Digital library – Provenance & discovery metadata §  Processing pipeline – Format transformation

Page 7: HaoXu( JewelH.Ward( Mike(Conway( Arcot(Rajasekar( Reagan(W ...carlosm/Papers/xu-pfsw14-slides.pdf · File System ! Essential Functions: " Ingest, Store, Access ! Modern File Systems

Interoperability with Other Infrastructures

q Emergent scalability mechanisms: § Organization change

•  List à Tree à Graph (Internet) à Search

§ Data structure change •  Files, tables, streams

§  Property enforcement expectations •  Reproducible data-driven research

q Separation of how files are stored, accessed, and manipulated

Page 8: HaoXu( JewelH.Ward( Mike(Conway( Arcot(Rajasekar( Reagan(W ...carlosm/Papers/xu-pfsw14-slides.pdf · File System ! Essential Functions: " Ingest, Store, Access ! Modern File Systems

Policy-based Data Management

Page 9: HaoXu( JewelH.Ward( Mike(Conway( Arcot(Rajasekar( Reagan(W ...carlosm/Papers/xu-pfsw14-slides.pdf · File System ! Essential Functions: " Ingest, Store, Access ! Modern File Systems

Policy = Metadata + Procedure

q  Purpose      Reason  a  collecIon  is  assembled  q  Proper)es      ALributes  needed  to  ensure  the  purpose  q  Policies      Controls  for  enforcing  desired  proper)es    

§  Procedural  Policy:  Example:  When  an  object  is  ingested,  run  workflow  §  Asser?onal  Policy:  Example:  A  file  has  three  or  more  replicas  

q  Metadata    Persistent  state  §  State  informa?on  (consistency  in  a  distributed  environment)  §  Generated  through  applica?on  of  procedures  

q  Procedures  OperaIons  performed  within  the  system  §  What  to  run:  Func?ons  that  implement  the  policies  §  How  to  verify:  Valida?on  that  metadata  conforms  to  the  desired  

purpose  

Page 10: HaoXu( JewelH.Ward( Mike(Conway( Arcot(Rajasekar( Reagan(W ...carlosm/Papers/xu-pfsw14-slides.pdf · File System ! Essential Functions: " Ingest, Store, Access ! Modern File Systems

Collection Purpose Defines

Defines

Policy Property Defines Procedure Controls Updates

Periodic Assessment

Criteria Policy

SubType

Metadata

Policy-based Data Management

Page 11: HaoXu( JewelH.Ward( Mike(Conway( Arcot(Rajasekar( Reagan(W ...carlosm/Papers/xu-pfsw14-slides.pdf · File System ! Essential Functions: " Ingest, Store, Access ! Modern File Systems

Collection Purpose Defines

Attribute

Has

Defines

Policy

Has

Property Defines Procedure Controls Updates

Periodic Assessment

Criteria Policy

SubType

Metadata

Isa

Digital Object

Updates

Has

Has

Policy-based Data Management - Collection

Has

Page 12: HaoXu( JewelH.Ward( Mike(Conway( Arcot(Rajasekar( Reagan(W ...carlosm/Papers/xu-pfsw14-slides.pdf · File System ! Essential Functions: " Ingest, Store, Access ! Modern File Systems

Collection Purpose

Completeness

Correctness

Consensus

Defines

Consistency

Attribute

HasFeature

HasFeature

HasFeature

Has

Defines

Policy

Has

Property Defines Procedure Controls Updates

Periodic Assessment

Criteria Policy

SubType

Metadata

Isa

Digital Object

Updates

Has

Has

Integrity

Isa

Authenticity Isa

Access control

Isa

Policy-based Data Management – Collection Properties

HasFeature

Page 13: HaoXu( JewelH.Ward( Mike(Conway( Arcot(Rajasekar( Reagan(W ...carlosm/Papers/xu-pfsw14-slides.pdf · File System ! Essential Functions: " Ingest, Store, Access ! Modern File Systems

Collection Purpose

Completeness

Correctness

Consensus

Defines

Consistency

Attribute

HasFeature

HasFeature

HasFeature

Has

Defines

Policy

Has

Property Defines Procedure Controls Updates

Periodic Assessment

Criteria Policy

SubType

Metadata

Isa

Digital Object

Updates

Has

Has

Replication Policy

Checksum Policy

Quota Policy

Data Type Policy

Isa

Isa Integrity

Isa

Authenticity Isa

Access control

Isa

Policy-based Data Management – Collection Policies

Isa

Isa

HasFeature

Page 14: HaoXu( JewelH.Ward( Mike(Conway( Arcot(Rajasekar( Reagan(W ...carlosm/Papers/xu-pfsw14-slides.pdf · File System ! Essential Functions: " Ingest, Store, Access ! Modern File Systems

Collection Purpose

Completeness

Correctness

Consensus

Defines

Consistency

Attribute

HasFeature

HasFeature

HasFeature

Has

Defines

Policy

Has

Property Defines Procedure Controls Updates

Periodic Assessment

Criteria Policy

Workflow

SubType Isa

Function

Chains

Operation

Isa

Metadata

Isa

Digital Object

Updates

Has

Has

Replication Policy

Checksum Policy

Quota Policy

Data Type Policy

Isa

Isa Integrity

Isa

Authenticity Isa

Access control

Isa

GetUserACL

SetDataType

SetQuota

DataObjRepl

SysChksumDataObj Isa

Isa

Isa

Isa

Isa

Policy-based Data Management –Collection Procedures

Isa

Isa

HasFeature

Page 15: HaoXu( JewelH.Ward( Mike(Conway( Arcot(Rajasekar( Reagan(W ...carlosm/Papers/xu-pfsw14-slides.pdf · File System ! Essential Functions: " Ingest, Store, Access ! Modern File Systems

Collection Purpose

Completeness

Correctness

Consensus

Defines

Consistency

Attribute

HasFeature

HasFeature

HasFeature

Has

Defines

Policy

Has

Property Defines Procedure Controls Updates

Periodic Assessment

Criteria Policy

Workflow

SubType Isa

Function

Chains

Operation

Isa

Metadata

Isa

Digital Object

Updates

Has

Has

Replication Policy

Checksum Policy

Quota Policy

Data Type Policy

Isa

Isa Integrity

Isa

Authenticity Isa

Access control

Isa

GetUserACL

SetDataType

SetQuota

DataObjRepl

SysChksumDataObj Isa

Isa

Isa

Isa

Isa

DATA_ID DATA_REPL_NUM DATA_CHECKSUM

Isa Isa Isa

Policy-based Data Management – Persistent State

Isa

Isa

HasFeature

Page 16: HaoXu( JewelH.Ward( Mike(Conway( Arcot(Rajasekar( Reagan(W ...carlosm/Papers/xu-pfsw14-slides.pdf · File System ! Essential Functions: " Ingest, Store, Access ! Modern File Systems

Collection Purpose

Completeness

Correctness

Consensus

Defines

Consistency

Attribute

HasFeature

HasFeature

HasFeature

Has

Defines

Policy

Has

Property Defines Procedure Controls Updates

Client Action

Periodic Assessment

Criteria Policy

Policy Enforcement

Point

Workflow

Invokes

Has SubType Isa

Function

Chains

Operation

Isa

Metadata

Isa

Digital Object

Updates

Has

Has

Replication Policy

Checksum Policy

Quota Policy

Data Type Policy

Isa

Isa Integrity

Isa

Authenticity Isa

Access control

Isa

GetUserACL

SetDataType

SetQuota

DataObjRepl

SysChksumDataObj Isa

Isa

Isa

Isa

Isa

DATA_ID DATA_REPL_NUM DATA_CHECKSUM

Isa Isa Isa

Policy-based Data Management – Policy Enforcement

Isa

Isa

HasFeature

Page 17: HaoXu( JewelH.Ward( Mike(Conway( Arcot(Rajasekar( Reagan(W ...carlosm/Papers/xu-pfsw14-slides.pdf · File System ! Essential Functions: " Ingest, Store, Access ! Modern File Systems

Example of Policy-based Data Management

Page 18: HaoXu( JewelH.Ward( Mike(Conway( Arcot(Rajasekar( Reagan(W ...carlosm/Papers/xu-pfsw14-slides.pdf · File System ! Essential Functions: " Ingest, Store, Access ! Modern File Systems

Policy-based Infrastructure integrated Rule Oriented Data System

•  Biology •  Cognitive Science Temporal Dynamics of Learning Center •  Human genome Broad Institute, Wellcome Trust Sanger Institute, NGS •  Medicine Sick Kids Hospital •  Neuroscience International Neuroinformatics Coordinating Facility •  Plant genome the iPlant Collaborative •  Phylogenetics Phylogenetics at CC IN2P3

•  Computer Science •  Network research GENI experimental network

•  Earth Sciences •  Atmospheric science NASA Langley Atmospheric Sciences Center •  Climate NOAA National Climatic Data Center

•  NASA Center for Climate Simulations •  Ecology CEED Caveat Emptor Ecological Data •  Hydrology Institute for the Environment, UNC-CH; Hydroshare •  Oceanography Ocean Observatories Initiative •  Seismology Southern California Earthquake Center

•  Engineering •  Education repository CIBER-U

•  Physics •  Astrophysics Auger supernova search •  Cosmic Ray AMS experiment on the International Space Station •  Dark Matter Physics Edelweiss II •  High Energy Physics BaBar / Stanford Linear Accelerator •  Neutrino Physics T2K and dChooz neutrino experiments •  Optical Astronomy National Optical Astronomy Observatory •  Particle Physics Indra multi-detector collaboration at IN2P3 •  Quantum Chromodynamics IN2P3 •  Radio Astronomy Cyber Square Kilometer Array, TREND, BAOradio

•  Social Science Odum, TerraPop

Page 19: HaoXu( JewelH.Ward( Mike(Conway( Arcot(Rajasekar( Reagan(W ...carlosm/Papers/xu-pfsw14-slides.pdf · File System ! Essential Functions: " Ingest, Store, Access ! Modern File Systems

Policy Applications

q  Pre-process policy §  Applied before an operation is done

q  Operation §  May be policy controlled

q  Post-process policy §  Applied after the operation is done

q  Are these sufficient to handle the wide diversity of data management applications?

q  Does this minimize the number of required operations?

Page 20: HaoXu( JewelH.Ward( Mike(Conway( Arcot(Rajasekar( Reagan(W ...carlosm/Papers/xu-pfsw14-slides.pdf · File System ! Essential Functions: " Ingest, Store, Access ! Modern File Systems

Policy (Workflow) in Hydrology

Choose gauge or outlet (HIS)

Extract drainage area

(NHDPlus)

Digital Elevation

Model (DEM)

Worldfile Flowtable

RHESSys

Slope Aspect

Streams (NHD) Roads (DOT) Strata

Hillslope Patch

Basin Stream network

Nested watershed structure

Land Use

Leaf Area Index

Phenology

Soil Data

NLCD (EPA)

Landsat TM

MODIS

USDA

Soil and vegetation parameter files

RHESSys workflow to develop a nested watershed parameter file (worldfile) containing a nested ecogeomorphic object framework, and full, initial system state.

For each box, create a micro-service to automate task, and chain into a workflow

Page 21: HaoXu( JewelH.Ward( Mike(Conway( Arcot(Rajasekar( Reagan(W ...carlosm/Papers/xu-pfsw14-slides.pdf · File System ! Essential Functions: " Ingest, Store, Access ! Modern File Systems

Rule Engine

Policies in Software Defined Networking Control selection of network paths

GraphDB Data Policies

Network Policies

OF Controller

iRODS Server

iRODS Server

iRODS Server

iCAT

Page 22: HaoXu( JewelH.Ward( Mike(Conway( Arcot(Rajasekar( Reagan(W ...carlosm/Papers/xu-pfsw14-slides.pdf · File System ! Essential Functions: " Ingest, Store, Access ! Modern File Systems

Policy in Data Storage Aggregation / Caching / Replication

Queen Mary University of London

Source: Di Lodovico et al.

Page 23: HaoXu( JewelH.Ward( Mike(Conway( Arcot(Rajasekar( Reagan(W ...carlosm/Papers/xu-pfsw14-slides.pdf · File System ! Essential Functions: " Ingest, Store, Access ! Modern File Systems

Indexing Policies

iRODS Data

Metadata

Message Passing (AMQP)

DataBook Rules

VIVO

VIVO

Search UI

Indexing Framework

External Index

Indexing Service

OSGi

Indexer Index: Text Metadata Events

Page 24: HaoXu( JewelH.Ward( Mike(Conway( Arcot(Rajasekar( Reagan(W ...carlosm/Papers/xu-pfsw14-slides.pdf · File System ! Essential Functions: " Ingest, Store, Access ! Modern File Systems

Policies in Digital Libraries

q  SILS LifeTime Library §  Student collections range from 2 GBytes to 150 Gbytes §  Number of files from 2000 to 12,000

q  Library management Policies §  Replication, Checksums, Versioning, Strict access controls,

Quotas, Metadata catalog replication, Installation environment archiving

q  Ingestion Policies §  Automated synchronization of student directory

with LifeTime Library §  Automated loading of MP3 metadata

Page 25: HaoXu( JewelH.Ward( Mike(Conway( Arcot(Rajasekar( Reagan(W ...carlosm/Papers/xu-pfsw14-slides.pdf · File System ! Essential Functions: " Ingest, Store, Access ! Modern File Systems

Policies in Archives

Page 26: HaoXu( JewelH.Ward( Mike(Conway( Arcot(Rajasekar( Reagan(W ...carlosm/Papers/xu-pfsw14-slides.pdf · File System ! Essential Functions: " Ingest, Store, Access ! Modern File Systems

Formal Aspects of Policy-based Data Management

Page 27: HaoXu( JewelH.Ward( Mike(Conway( Arcot(Rajasekar( Reagan(W ...carlosm/Papers/xu-pfsw14-slides.pdf · File System ! Essential Functions: " Ingest, Store, Access ! Modern File Systems

Domain Model

q Entities §  Data Object, Replica, Collection, User, Resource,

Rules, Metadata, Access q Relations

§  (Collection) contains (Data Object); (Resource) stores (Replica); (Replica) replicates (Data Object); (User) owns (Data Object); (User) is granted (Access); (Access) is granted on (Data Object)

q Operations §  Get, put, replicate, etc.

Page 28: HaoXu( JewelH.Ward( Mike(Conway( Arcot(Rajasekar( Reagan(W ...carlosm/Papers/xu-pfsw14-slides.pdf · File System ! Essential Functions: " Ingest, Store, Access ! Modern File Systems

Policy

q A policy is implemented as a set of procedures defined in terms of the Domain Model §  Assertion about state: “A file has three or

more replicas” •  A procedure to maintain state consistency:

replication rule acPostProcForPut •  (Hardware, human errors) A procedure to check

state consistency: periodic integrity check

Page 29: HaoXu( JewelH.Ward( Mike(Conway( Arcot(Rajasekar( Reagan(W ...carlosm/Papers/xu-pfsw14-slides.pdf · File System ! Essential Functions: " Ingest, Store, Access ! Modern File Systems

Example of Formalism Using Monad

q  Monad Recap: §  A monad represents computations (possibly with side effects, in

our example, assume only state change) q  Monad Constructors

§  return: trivial computation that returns a value §  x >> y: do x then y §  x >>= y: feed return value of x into y

q  Monad Laws §  return x >>= f = f x (Left Id) §  f >>= return = f (Right Id) §  f >>= g >>= h = f >>= (λx.g x >>= h) (Associative)

•  A B C => A (B C)

Page 30: HaoXu( JewelH.Ward( Mike(Conway( Arcot(Rajasekar( Reagan(W ...carlosm/Papers/xu-pfsw14-slides.pdf · File System ! Essential Functions: " Ingest, Store, Access ! Modern File Systems

Domain Model

q Entities: §  DataObject, Content, Replica, Resource

q Relations: §  replica: r = replica(o,i)

r is the replica of o at resource i §  replicas: r ∈ replicas(o)

r is a replica of o

Page 31: HaoXu( JewelH.Ward( Mike(Conway( Arcot(Rajasekar( Reagan(W ...carlosm/Papers/xu-pfsw14-slides.pdf · File System ! Essential Functions: " Ingest, Store, Access ! Modern File Systems

Domain Model

q  Basic Operations: §  read : read r read content of replica r §  write : write c r write content c to replica r §  aread : aread i read ith latest audit log entry §  awrite : awrite s r append to audit log (s,r) §  repl : repl o replicate o to all resources §  newest : newest o the newest replica of object o

Page 32: HaoXu( JewelH.Ward( Mike(Conway( Arcot(Rajasekar( Reagan(W ...carlosm/Papers/xu-pfsw14-slides.pdf · File System ! Essential Functions: " Ingest, Store, Access ! Modern File Systems

Complex Operations and Policy Enforcement Points

q  Complex Operations: §  oread : oread o read the content of object o §  owrite : owrite c o write content c to object o

q  Defined in terms of Basic Operations + PEPs §  op args = pre args >>= op’ args >>= post args

q  We define oread and owrite: §  oread o = pre o >>= read >>= post o §  owrite o = pre c o >>= write c >>= post c o

Page 33: HaoXu( JewelH.Ward( Mike(Conway( Arcot(Rajasekar( Reagan(W ...carlosm/Papers/xu-pfsw14-slides.pdf · File System ! Essential Functions: " Ingest, Store, Access ! Modern File Systems

Basic Semantics

q Only one resource i §  oread

•  pre = return (replica o i) read replica of object o

•  post = return return content of replica

§  owrite •  pre = return (replica o i)

write replica of object o •  post = return

simply return

Page 34: HaoXu( JewelH.Ward( Mike(Conway( Arcot(Rajasekar( Reagan(W ...carlosm/Papers/xu-pfsw14-slides.pdf · File System ! Essential Functions: " Ingest, Store, Access ! Modern File Systems

Auditing

q One resource i + audit log §  oread

•  pre = awrite “read” o >> return (replica o i) audit + read replica of object o

•  post = return return content of replica

§  owrite •  pre = awrite “write” o >> return (replica o i)

audit + write replica of object o •  post = return

simply return

Page 35: HaoXu( JewelH.Ward( Mike(Conway( Arcot(Rajasekar( Reagan(W ...carlosm/Papers/xu-pfsw14-slides.pdf · File System ! Essential Functions: " Ingest, Store, Access ! Modern File Systems

Replication q Multiples resources

§  oread •  pre = return (replica o i)

read arbitrary replica i of object o •  post = return

return content of replica §  owrite

•  pre = return (replica o i’) write arbitrary replica i’ of object o

•  post = λx.(repl o >> return x) replicate and return

Page 36: HaoXu( JewelH.Ward( Mike(Conway( Arcot(Rajasekar( Reagan(W ...carlosm/Papers/xu-pfsw14-slides.pdf · File System ! Essential Functions: " Ingest, Store, Access ! Modern File Systems

Policy-based Data Management Concept Graph

Collection Purpose

(5 main types)

Completeness

Correctness

Consensus

Defines

Consistency

Attribute

HasFeature

HasFeature

HasFeature

Has

Defines

Policy (11 default)

Has

Property (7 default)

Defines Procedure (11 default)

Controls Updates

Clients (50)

Periodic Assessment

Criteria Policy

Policy Enforcement Points (72)

Workflow

Invokes

Has SubType Isa

Micro-service (350)

Chains

Operation

Isa

Persistent State

Information (338)

Isa

Digital Object

Updates

Has

Has

Replication Policy

Checksum Policy

Quota Policy

Data Type Policy

Isa

Isa Integrity

Isa

Authenticity Isa

Access control

Isa

msiGetUserACL

msiSetDataType

msiSetQuota

msiDataObjRepl

msiSysChksumDataObj

Isa

Isa

Isa

Isa

Isa

DATA_ID DATA_REPL_NUM DATA_CHECKSUM

Isa Isa Isa Isa

Isa

HasFeature

Archive Data grid Collection

Digital Library Processing Pipeline

SubType

Page 37: HaoXu( JewelH.Ward( Mike(Conway( Arcot(Rajasekar( Reagan(W ...carlosm/Papers/xu-pfsw14-slides.pdf · File System ! Essential Functions: " Ingest, Store, Access ! Modern File Systems

iRODS Distributed Data Management

Page 38: HaoXu( JewelH.Ward( Mike(Conway( Arcot(Rajasekar( Reagan(W ...carlosm/Papers/xu-pfsw14-slides.pdf · File System ! Essential Functions: " Ingest, Store, Access ! Modern File Systems

iRODS data grid

Integrated Rule Oriented Data System Open source software http://irods.org Supported by the iRODS Consortium