Download - The Data Replication Service Ann Chervenak Robert Schuler USC Information Sciences Institute
The Data Replication Service
Ann Chervenak
Robert Schuler
USC Information Sciences Institute
The Data Replication Service
Included in the Tech Preview of GT4.0 release
Design is based on the publication component of the Lightweight Data Replicator system
Developed by Scott Koranda from U. Wisconsin at Milwaukee
Functionality Replicate a set of files in the Grid on a local site Users identify a set of desired files DRS queries Replica Location Service to discover current
locations of these files Creates local replicas of desired files using the Reliable File
Transfer Service Registers new replicas in Replica Location Service for discovery
Motivation for DRS
Need for higher-level data management services that integrate lower-level Grid functionality
Efficient data transfer (GridFTP, RFT) Replica registration and discovery (RLS) Eventually validation of replicas, etc.
Goal is to generalize the custom data management systems developed by several application communities
Eventually plan to provide a suite of general, configurable, higher-level data management services
DRS is the first of these services
Relationship to Other Globus Services
At requesting site, deploy:
WS-RF Services Data Replication
Service Delegation Service Reliable File Transfer
Service
Pre WS-RF Components Replica Location
Service (Local Replica Catalog and Replica Location Index)
GridFTP Server
Web Service Container
Data Replication
Service
Replicator Resource
Reliable File
Transfer Service
RFT Resource
Local Replica Catalog
Replica Location
Index
GridFTP Server
Delegation Service
Delegated Credential
Local Site
DRS Functionality
Initiate a DRS Request Discover and select among replicas that act as
source locations for data copies Transfer data to local site to create new replicas Register new replicas in catalogs
Initiating a DRS Request
Client uses GT4 Delegation Service to create a delegated credential that may be used by other services to act on behalf of user
Client creates a request file containing a replication request description including: desired logical files destination URLs
Client sends message to DRS to create the Replicator resource and passes the request file’s URL
Replicator retrieves the request file
Replica Discovery and Selection Replicator queries the Globus Replica Location Service in a two-
step process to discover locations of desired files: Query local site’s Replica Location Index to find the catalogs at
remote sites that contain mappings for the requested files Query remote Local Replica Catalogs to get the physical file names
of the replicas
Replicator selects source file for each file to be copied Current implementation chooses randomly A callout is provided for more sophisticated replica selection
decisions based on state of Grid
File Transfers to Create New Replicas
The Replicator initiates a reqeust with Globus Reliable File Transfer Service
Creates RFT resource that holds state for each data transfer
Control passes from DRS to RFT, which also retrieves the delegated credential from the Delegation Service
RFT coordinates the file transfers
Transfers are performed by GridFTP servers at the source and destination sites
After transfers complete, the Replicator checks status of each file in the transfer request
Registration of New Replicas
Replicator adds mappings for the newly created replicas to its Globus RLS Local Replica Catalog
Local Replica Catalog updates Replica Location Indexes to make new replicas visible throughout Grid
Performance Measurements: Wide Area Testing
The destination for the pull-based transfers is located in Los Angeles
Dual-processor, 1.1 GHz Pentium III workstation with 1.5 GBytes of memory and a 1 Gbit Ethernet
Runs a GT4 container and deploys services including RFT and DRS as well as GridFTP and RLS
The remote site where desired data files are stored is located at Argonne National Laboratory in Illinois
Dual-processor, 3 GHz Intel Xeon workstation with 2 gigabytes of memory with 1.1 terabytes of disk
Runs a GT4 container as well as GridFTP and RLS services
DRS Operations Measured
Create the DRS Replicator resource Discover source files for replication using local RLS
Replica Location Index and remote RLS Local Replica Catalogs
Initiate an Reliable File Transfer operation by creating an RFT resource
Perform RFT data transfer(s) Register the new replicas in the RLS Local Replica
Catalog
Experiment 1: Replicate 10 Files of Size 10 Gigabytes
Component of Operation Time (milliseconds)
Create Replicator Resource 317.0
Discover Files in RLS 449.0
Create RFT Resource 808.6
Transfer Using RFT 1186796.0
Register Replicas in RLS 3720.8
Data transfer time dominates Wide area data transfer rate of 67.4 Mbits/sec
Experiment 2: Replicate 1000 Files of Size 10 Megabytes
Component of Operation Time (milliseconds)
Create Replicator Resource 1561.0
Discover Files in RLS 9.8
Create RFT Resource 1286.6
Transfer Using RFT 963456.0
Register Replicas in RLS 11278.2
Time to create Replicator and RFT resources is larger Need to store state for 1000 outstanding transfers
Data transfer time still dominates Wide area data transfer rate of 85 Mbits/sec
Future Work
We will continue performance testing of DRS: Increasing the size of the files being transferred Increasing the number of files per DRS request
Add and refine DRS functionality as it is used by applications
E.g., add a push-based replication capability
We plan to develop a suite of general, configurable, composable, high-level data management services