distributed storage and data management in petashare for...
TRANSCRIPT
Ismail Akturk, Mehmet Balman, Xinqi Wang and Tevfik KosarCenter for Computation and Technology at Louisiana State University, Baton Rouge, LA, 70803
iCOMMANDSFUSE
local $ petafs -m lsulocal $ ls ~/petasharelsulocal $ cd ~/petashare/lsu/tempZone/home/team1local $ cp /tmp/srcFile ./shareFilelocal $ lsshareFilelocal $ petafs -u lsu
LONI
asynchronousreplication
module
Disk Storage
metadata
iRODSSERVER
iCATSERVER
PARROT
local $ petashellpshell ~$ cd /petashare/uno/tempZone/home/team1pshell ~$ lsshareFilepshell ~$ vi shareFile"Hello PetaShare"pshell ~$ exitlocal $
PETASHELL
PETAFSlocal $ ppwd/tempZone/home/team1local $ plsC- /tempZone/home/team1 shareFilelocal $ pget shareFile ~/localFilelocal $ cat ~/localFile"Hello PetaShare"local $
Distributed Storage and Data Management in PetaShare for Collaborative Research
PetaShare supports native iRODS metadata system for speedy access to data archive and semantic-enabled cross-domain metadata for intergrated view over archives spanning multiple disciplines.
PetaShare provides very light weight client tools based on FUSE, Parrot and icommands technologies which enable easy, transparent, and scalable access at the user level to the data stored in distributed resources. These are:
▪ Petafs (Virtual File System)▪ Petashell (Shell Interface)▪ Pcommands(Customized Commands)
PetaShare leverages 40 Gigabit per second Louisiana Optical Network Initiative (LONI) infrastructure to make the interconnections, fully exploiting high bandwidth low latency optical network technologies.
PetaShare is based on evolved version of iRODS that provides a globally u n i f i e d n a m e s p a c e a c r o s s geographically distributed storage resources, as wel l as metadata management interface.
Initial implementation and deployment of PetaShare involves six institutions in Louisiana.
PetaShare manages 250 Terabytes of disk storage distributed across these institutions as well as 400 Terabytes of tape storage.
PetaShare treats storage resources and the tasks related to data access as first class entities just like the computational resources and compute tasks, and not simply the side effect of computation.
Along with data storage resources, key technologies that are being developed in PetaShare project include:▪ Data-aware Storage Systems, ▪ Data-aware Schedulers, ▪ Cross-domain Metadata Scheme,▪ Advanced Buffering and
Data Aggregation, ▪ Asynchronous Replication for
Metadata Servers
The NSF funded PetaShare project aims to enable transparent handling of underlying data sharing, archival,and retrieval mechanisms, and make data available to scientists for analysis and visualization on demand in different applications, such as:
• Coastal & Environmental Modeling, • Geospatial Analysis,• Bioinformatics, • Medical Imaging, • Fluid Dynamics, • Petroleum Engineering,• Numerical Relativity,• High Energy Physics.
Tape
asynchronousreplication
module
metadata
iRODSSERVER
iCATSERVER
Long Term Data
Archival
PCOMMANDS
This project is in part sponsored by National Science Foundation, Department of Energy, and Louisiana Board of Regents.
For further information, please visit the webpages at:http://www.petashare.orghttp://www.loni.orghttp://fuse.sf.nethttp://www.cctools.orghttp://www.irods.org
ACKNOWLEDMENTS
Semantic Metadata Store
Commandline Interface Metadata Query
Browser
ProtegeQuery Parser
Web Server
Insertion Query
Data Migration, Replication,
Load Balancing