distributed storage and data management in petashare for...

Post on 14-Mar-2021

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Ismail Akturk, Mehmet Balman, Xinqi Wang and Tevfik KosarCenter for Computation and Technology at Louisiana State University, Baton Rouge, LA, 70803

iCOMMANDSFUSE

local $ petafs -m lsulocal $ ls ~/petasharelsulocal $ cd ~/petashare/lsu/tempZone/home/team1local $ cp /tmp/srcFile ./shareFilelocal $ lsshareFilelocal $ petafs -u lsu

LONI

asynchronousreplication

module

Disk Storage

metadata

iRODSSERVER

iCATSERVER

PARROT

local $ petashellpshell ~$ cd /petashare/uno/tempZone/home/team1pshell ~$ lsshareFilepshell ~$ vi shareFile"Hello PetaShare"pshell ~$ exitlocal $

PETASHELL

PETAFSlocal $ ppwd/tempZone/home/team1local $ plsC- /tempZone/home/team1 shareFilelocal $ pget shareFile ~/localFilelocal $ cat ~/localFile"Hello PetaShare"local $

Distributed Storage and Data Management in PetaShare for Collaborative Research

PetaShare supports native iRODS metadata system for speedy access to data archive and semantic-enabled cross-domain metadata for intergrated view over archives spanning multiple disciplines.

PetaShare provides very light weight client tools based on FUSE, Parrot and icommands technologies which enable easy, transparent, and scalable access at the user level to the data stored in distributed resources. These are:

▪ Petafs (Virtual File System)▪ Petashell (Shell Interface)▪ Pcommands(Customized Commands)

PetaShare leverages 40 Gigabit per second Louisiana Optical Network Initiative (LONI) infrastructure to make the interconnections, fully exploiting high bandwidth low latency optical network technologies.

PetaShare is based on evolved version of iRODS that provides a globally u n i f i e d n a m e s p a c e a c r o s s geographically distributed storage resources, as wel l as metadata management interface.

Initial implementation and deployment of PetaShare involves six institutions in Louisiana.

PetaShare manages 250 Terabytes of disk storage distributed across these institutions as well as 400 Terabytes of tape storage.

PetaShare treats storage resources and the tasks related to data access as first class entities just like the computational resources and compute tasks, and not simply the side effect of computation.

Along with data storage resources, key technologies that are being developed in PetaShare project include:▪ Data-aware Storage Systems, ▪ Data-aware Schedulers, ▪ Cross-domain Metadata Scheme,▪ Advanced Buffering and

Data Aggregation, ▪ Asynchronous Replication for

Metadata Servers

The NSF funded PetaShare project aims to enable transparent handling of underlying data sharing, archival,and retrieval mechanisms, and make data available to scientists for analysis and visualization on demand in different applications, such as:

• Coastal & Environmental Modeling, • Geospatial Analysis,• Bioinformatics, • Medical Imaging, • Fluid Dynamics, • Petroleum Engineering,• Numerical Relativity,• High Energy Physics.

Tape

asynchronousreplication

module

metadata

iRODSSERVER

iCATSERVER

Long Term Data

Archival

PCOMMANDS

This project is in part sponsored by National Science Foundation, Department of Energy, and Louisiana Board of Regents.

For further information, please visit the webpages at:http://www.petashare.orghttp://www.loni.orghttp://fuse.sf.nethttp://www.cctools.orghttp://www.irods.org

ACKNOWLEDMENTS

Semantic Metadata Store

Commandline Interface Metadata Query

Browser

ProtegeQuery Parser

Web Server

Insertion Query

Data Migration, Replication,

Load Balancing

top related