Connecting arbitrary data sources to the grid
Shunde ZhangAustralian Research Collaboration Service
(ARCS)
eResearch SA
School of Computer Science, University of Adelaide
Background
Australian Research Collaboration Service
A successor of APACServices
– HPC– Data– Collaboration tools: AccessGrid, EVO,
Plone, drupal, Sakai
ARCS Data Fabric
ARCS Data Fabric (cont.)
A national serviceProvided to all Australian
researchersBased on iRODS
The Problem
Interoperability with “The Grid”– “The Grid”: Globus, gLite, condor, etc.– Data sources
• GridFTP-compatible: dCache• Non GridFTP-compatible: iRODS, SRB
Possible solutions– “Manual” copy (or do it in PBS script)– Copy queue
The Problem (cont.)
Movement of massive data– Both ends use same software (talks
same protocol)– Different systems are used (talks
different protocol)– Efficiency
Possible solutions– Transfer via an intermediate point
A solution - old fashioned
AWS Import/Export for Amazon S3– Ship the hard-disks by courier
company
Our Solution - GridFTP
De facto standard– Compatible with the Grid, and many grid
clientsEfficiency
– Parallel transfer– Data channel reuse– Large file transfer - in small blocks
Compatible with many file transfer services– Monitoring– Scheduling
An overview of GridFTP protocolBased on FTP with extensionsThird-party transfer
– Intermediate point not neededSecurity - GSIExtended block mode
– Parallel transfer– Striped transfer– Partial transfer
Reliable and restartableTCP and UDP
The Architecture
GridFTP interface
Generic File System Framework
Data Source Plugin
Data Source
Generic File System Framework
FileSystem
FileSystemConnection
FileObject
RandomAccessFileObject
creates
creates
creates
FileSystem interface
public String getSeparator();
public void init() throws IOException;
public FileSystemConnection
createFileSystemConnection(GSSCredential credential) throws
FtpConfigException, IOException;
public void exit();
FileSystemConnection interface
public FileObject getFileObject(String path);
public String getHomeDir();
public String getUser();
public void close() throws IOException;
public boolean isConnected();
public long getFreeSpace(String path);
FileObject interfacepublic String getName();public String getPath();public boolean exists();public boolean isFile();public boolean isDirectory();public int getPermission();public String getCanonicalPath() throws IOException;public FileObject[] listFiles();public long length();public long lastModified();public RandomAccessFileObject getRandomAccessFileObjec(String type) throws IOException;public boolean delete();public FileObject getParent();public boolean mkdir();public boolean renameTo(FileObject file);public boolean setLastModified(long t);
RandomAccessFileObject interfacepublic void seek(long offset) throws IOException;public int read() throws IOException;public int read(byte[] b) throws IOException;public int read(byte[] b, int off, int len) throws
IOException;public void close() throws IOException;public String readLine() throws IOException;public void write(int b) throws IOException;public void write(byte[] b) throws IOException;public void write(byte[] b, int off, int len) throws
IOException;public long length() throws IOException;
The Implementation - Griffin
GridFTP interface
Generic file system framework
GridFTP client
Grid job submission system
Data transfer service
Adaptor for iRODS
Adaptor for local file system
Other adaptors
iRODS Local File System Other data source
Griffin
Features
GridFTP protocol version 1Java-based
– Spring framework– OS-independent
Lightweight, stand-alone, self-contained– No need to install Globus Toolkit
Two plugins included– iRODS plugin– Local file system plugin
Open source (Apache 2 & GPL)
Parallel transfer with Griffin
Client GriffinData Source
WAN LAN/localhost
Authentication
GSI– iRODS plugin
User mapping – local file system plugin– XML file
• Maps GSI authentication (certificate DN) to internal user management system
Use case
Integration of the Grid and Data Fabric– iRODS plugin for Data Fabric– Third-party transfer to cluster (Globus
GridFTP)
Tested with– Globus.org– Globus-url-copy (5.0 and 4.x)– Globus GridFTP GUI
Performance Evaluation
Server: Two quad-core Xeon 3.16GHz CPU, 16GB memory
Client: IBM xSeries 346 with two hyper-threaded Intel Xeon 3.20GHz CPUs, 4GB memory
Network: 1Gbps LANWAN: two 10Gbps linksTransfer: 256MB, 512MB, 1GB, 2GB,
4GB, 8GB, 16GB– iCommands– Globus-url-copy
Evaluation Set up - Griffin vs iCommands
Client
iRODS
Local File System
Griffin
Jargon Adaptor
globus-url-copy iCommands
Evaluation Result Chart - Griffin vs iCommands
Evaluation Set up -Griffin vs Globus GridFTP
Client
Globus GridFTP server
Local File System
Griffin
Local FS Adaptor
globus-url-copy
Evaluation Result Chart - Griffin vs Globus GridFTP
Related work
Client library– SAGA/jSAGA– Commons-vfs
Data transfer service– Stork– PAFTP
Globus– XIO– DSI
Griffin vs. Globus GridFTP
Griffin Globus GridFTP
Java C
OS-independent *nix
Simple, standalone complex
Conclusion
A generic solution to connect arbitrary data sources to the grid– Data in/out of the grid– Data transfer between different data
sources
Java-based implementation– Standalone, lightweight– Plugable– Not depend on Globus
Future work
Currently working on a plugin for MongoDB
Java NIOUDPStriped transfer
MongoDB plugin
MongoDB– NOSQL database– Stores JSON-style documents– GridFS component
• Stores files
Plugin for griffin– Read/write files via GridFS
Acknowledgements
ARCS funded
Current Status
ARCS production serviceUsed to transfer data in/out of
ARCS Data FabricWebsite
– https://projects.arcs.org.au/trac/griffin
Thank you!
Questions/Comments?