core srb technology for 2005 ncoic workshop by michael wan and wayne schroeder sdsc sdsc/ucsd/npaci

41
Core SRB Technology for 2005 NCOIC Workshop By Michael Wan And Wayne Schroeder SDSC SDSC/UCSD/NPACI

Upload: alfred-bond

Post on 28-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Core SRB Technology for 2005 NCOIC Workshop By Michael Wan And Wayne Schroeder SDSC SDSC/UCSD/NPACI

Core SRB Technologyfor 2005 NCOIC Workshop

By Michael WanAnd

Wayne SchroederSDSC

SDSC/UCSD/NPACI

Page 2: Core SRB Technology for 2005 NCOIC Workshop By Michael Wan And Wayne Schroeder SDSC SDSC/UCSD/NPACI

Outline

• Basic Concepts behind SRB• SRB architecture• SRB features• SRB Usage Model

• Wayne:– SRB productization - Installation, Administration, etc– Security and Authentication– Examples and demo

Page 3: Core SRB Technology for 2005 NCOIC Workshop By Michael Wan And Wayne Schroeder SDSC SDSC/UCSD/NPACI

Initial Design of SRB

• Transparency and Uniformity– Data are increasingly distributed– Design Goal –

• use a single interface and authorization mechanism to access data across:

– Multiple hosts

– Multiple OS platforms

– Multiple resource type (UNIX FS, HPSS, UniTree, DBMS ..)

Page 4: Core SRB Technology for 2005 NCOIC Workshop By Michael Wan And Wayne Schroeder SDSC SDSC/UCSD/NPACI

Initial Design of SRB

Global view– Global Logical Name space –

• Data organization• UNIX like directories (collections) and files (data)• Mapping of logical name to physical attributes - host

address, physical path.• UNIX like API and utilities

– Single Global User Name Space• Single sign-on• No need for UNIX account on every systems• Robust access control

Page 5: Core SRB Technology for 2005 NCOIC Workshop By Michael Wan And Wayne Schroeder SDSC SDSC/UCSD/NPACI

SRB Architecture• Federated middleware system• Client/server model –

– Federation of resource servers with uniform interfaces• client-server• server-server - Each request handler has 2 versions

– Local – Remote – pass off to server that can handle the request

– All Servers use same software• Simplicity – easy to implement, easy to debug

– Robust access control • user level, grant access to multiple users• group level• tickets

• MCAT – – Metadata catalog

Page 6: Core SRB Technology for 2005 NCOIC Workshop By Michael Wan And Wayne Schroeder SDSC SDSC/UCSD/NPACI

Federation of Servers

MCAT

Server1 Server2

Mcat Server

Page 7: Core SRB Technology for 2005 NCOIC Workshop By Michael Wan And Wayne Schroeder SDSC SDSC/UCSD/NPACI

SRB as a Data Grid

SRB

MCAT

DB

SRB

SRB

SRB

SRB SRB

•Data Grid has arbitrary number of servers•Complexity is hidden from users

Page 8: Core SRB Technology for 2005 NCOIC Workshop By Michael Wan And Wayne Schroeder SDSC SDSC/UCSD/NPACI

SRB server design

• Three layers design

– Top layer • Interacts with clients and other servers through tcp/ip sockets

• User authentication

• Handle function requests – parses requests and invokes handlers in middle and bottom layers.

Page 9: Core SRB Technology for 2005 NCOIC Workshop By Michael Wan And Wayne Schroeder SDSC SDSC/UCSD/NPACI

SRB server design (cont2)

• Middle layer (logical layer)– Most requests pass through here– Input parameters are in their logical representations

(logical path name , logical resource name)– Generally, two types of requests –

• Data access – – Queries MCAT, translates from logical to physical representations

– Calls functions in the bottom (physical) layer to access data

• Metadata access – – Interacts with MCAT

Page 10: Core SRB Technology for 2005 NCOIC Workshop By Michael Wan And Wayne Schroeder SDSC SDSC/UCSD/NPACI

SRB server design (con2)

– Bottom layer (physical layer)• Where all data I/O to/from resources are done• Handles three types of resources• File system

– Drivers to interface with different FS– FS supported : UNIX, HPSS, ADS, UniTree, gridFTP (to be

released)

• DB large objects

• DB tables– Access DB tables (query, insert, …)

Page 11: Core SRB Technology for 2005 NCOIC Workshop By Michael Wan And Wayne Schroeder SDSC SDSC/UCSD/NPACI

SRB Features -Authentication

• Support 2 authentication schemes

– Encrypt1 (SDSC) – No plain text password over the net

– GSI (Globus)

– Wayne will give details

Page 12: Core SRB Technology for 2005 NCOIC Workshop By Michael Wan And Wayne Schroeder SDSC SDSC/UCSD/NPACI

Performance Enhancement

• Parallel I/O– For transferring large files

– Uses multi-threads for data transfer and disk I/O

– Interface with HPSS’s mover protocol for parallel I/O

– Parallel third party transfer for copy and replicate

– One hop data transfer between client and data resource

• Bulk Operation– Uploading and downloading large number of small files– Multi-threads– Bulk registration – 500 files in one call– 3-10 times speedup

Page 13: Core SRB Technology for 2005 NCOIC Workshop By Michael Wan And Wayne Schroeder SDSC SDSC/UCSD/NPACI

SRBserver1

SRB agent

SRBserver2

Sput – serial mode

MCAT

Sput

SRB agent

1

2

3

4

5

6

srbObjCreatesrbObjWrite

1.Logical-to-Physical mapping2. Identification of Replicas3.Access & Audit Control

Peer-to-peer

Request

Server(s) SpawningData

Transfer R

Page 14: Core SRB Technology for 2005 NCOIC Workshop By Michael Wan And Wayne Schroeder SDSC SDSC/UCSD/NPACI

SRBserver1

SRB agent

SRBserver2

Parallel mode Data Transfer – Client Initiated

MCAT

Sput -M

SRB agent

1

2

3

4

7

8srbObjPut

1.Logical-to-Physical mapping2. Identification of Replicas3.Access & Audit Control

Return socket addr.,

port and cookie

Connect to server

Data transfer

R

5

6

Page 15: Core SRB Technology for 2005 NCOIC Workshop By Michael Wan And Wayne Schroeder SDSC SDSC/UCSD/NPACI

Performance Enhancement (cont1)

• Container – – physical grouping of small files – for tape I/O or archival resources– Easy to use, transparent to users

Page 16: Core SRB Technology for 2005 NCOIC Workshop By Michael Wan And Wayne Schroeder SDSC SDSC/UCSD/NPACI

Data Replication• A SRB file can have multiple replica• Replica can be stored in different resources• Sls –l mfile

– fedsrbbrick8 0 demoResc 3029449 2005-07-29-15.37 % mfile– fedsrbbrick8 1 demoResc1 3029449 2005-07-29-21.28 % mfile

• Commands that uses replica– Sreplicate – replicate a file to the specified resource– Sbackupsrb – backup a file to the specified resource– SsyncD – Synchronize the replica of a file

Page 17: Core SRB Technology for 2005 NCOIC Workshop By Michael Wan And Wayne Schroeder SDSC SDSC/UCSD/NPACI

PhyMove –move SRB files to another resource

• Move files to another resource without making another replica

• Normally used by admin to move files around • Bulk phyMove – large number of small files• Parallel I/O – large files• Container – move files into container• Heavily used by the BBSRC project for distributed

archive.– Files uploaded to local server

– Files eventually moved to a central archival resource by admin

Page 18: Core SRB Technology for 2005 NCOIC Workshop By Michael Wan And Wayne Schroeder SDSC SDSC/UCSD/NPACI

Performance Enhancement (cont2)

• Use of checksum

– a MCAT metadata associated with a file– Checksum routines is part of server and client

codes– For verification and synchronization of data– Built into most data handling utilities

• Sput, Sget, Srsync, Schksum

Page 19: Core SRB Technology for 2005 NCOIC Workshop By Michael Wan And Wayne Schroeder SDSC SDSC/UCSD/NPACI

Metadata in SRB• SRB System Metadata• Free-form Metadata (User-defined)

– Attribute-Value-Unit Triplets…• Extensible Schema Metadata

– User Defined – Tables integrated into MCAT Core Schema

• External Database• Metadata operations

– Metadata Insertion through User Interfaces– Bulk Metadata Insertion– Template based Metadata Extraction– Query Metadata through well defined Interfaces

Page 20: Core SRB Technology for 2005 NCOIC Workshop By Michael Wan And Wayne Schroeder SDSC SDSC/UCSD/NPACI

SRB Proxy operation• Perform operations on server on behalf of user

– Operation where data is located– File format conversion, md5 checksum, subsetting

and filtering, etc

• Two types of proxy operations– Proxy commands

• Server fork and exec executable/script on server• Pipe output back to client

– Proxy functions• Functions built into server• Well defined framework for writing proxy functions

Page 21: Core SRB Technology for 2005 NCOIC Workshop By Michael Wan And Wayne Schroeder SDSC SDSC/UCSD/NPACI

HDF5-SRB Model Data flow

Client APIsrbObjRequest(void *obj, int objID)

Server APIsrbObjProcess(void *obj, int objID)

1. packMsg() 2. unpackMsg()

3. H5Obj::op()

4. Access file

5. packMsg()6. unpackMsg()

SRB Server

HDF5 Library

HDF5 file

Page 22: Core SRB Technology for 2005 NCOIC Workshop By Michael Wan And Wayne Schroeder SDSC SDSC/UCSD/NPACI

Zone Federation

• Federation of multiple MCATs – MCAT ZONE

• defines a federation of SRB resources controlled by a single MCAT

• Each Zone has full control of its own administrative domain

• Each Zone can operate entirely independently from other zone.

• Data and Resource sharing across ZONES– Use storage resources in foreign zones– Share data across zones– Copy data across zones

Page 23: Core SRB Technology for 2005 NCOIC Workshop By Michael Wan And Wayne Schroeder SDSC SDSC/UCSD/NPACI

Peer to peer Federated MCAT Zone

MCAT1

MCAT2

MCAT3Server1.1

Server1.2

Server2.1Server2.2

Server3.1

Page 24: Core SRB Technology for 2005 NCOIC Workshop By Michael Wan And Wayne Schroeder SDSC SDSC/UCSD/NPACI

SRB Client Implementations

• A set of Basic APIs– Over 160 APIs– Used by all clients to make request to servers

• Scommands– Unix like command line utilities for UNIX and

Window platforms– Over 60 - Sls, Scp, Sput, Sget …

Page 25: Core SRB Technology for 2005 NCOIC Workshop By Michael Wan And Wayne Schroeder SDSC SDSC/UCSD/NPACI

SRB Client Implementations (cont)

• inQ – Window GUI browser

• Jargon – Java SRB client classes– Pure Java implementation

• mySRB – Web based GUI– run using web browser

• Java Admin Tool– GUI for User and Resource management

• Matrix – Web service for SRB work flow

Page 26: Core SRB Technology for 2005 NCOIC Workshop By Michael Wan And Wayne Schroeder SDSC SDSC/UCSD/NPACI

inQ Windows GUI

Page 27: Core SRB Technology for 2005 NCOIC Workshop By Michael Wan And Wayne Schroeder SDSC SDSC/UCSD/NPACI

MySRB – Web Based SRB Interface

• SRB Browser

• Advanced Metadata manipulation

Page 28: Core SRB Technology for 2005 NCOIC Workshop By Michael Wan And Wayne Schroeder SDSC SDSC/UCSD/NPACI

SRB Usage Model

• Various Usage models

• Specific Usages– SLAC’s Babar experiment– UK eScience BBSRC– BIRN

Page 29: Core SRB Technology for 2005 NCOIC Workshop By Michael Wan And Wayne Schroeder SDSC SDSC/UCSD/NPACI

SRB Configuration – Peer-to-peer Data Grid

Resourceserver

Resourceserver

Resourceserver

Resourceserver

Data sharing, no central resourcetProjects – NARA, BIRN

Page 30: Core SRB Technology for 2005 NCOIC Workshop By Michael Wan And Wayne Schroeder SDSC SDSC/UCSD/NPACI

SRB Configuration - Exploding Star

Source Server

Satelliteserver

Satelliteserver

Satelliteserver

Satelliteserver

Satelliteserver

Data source – physics experimentProjects – Babar, kek

Page 31: Core SRB Technology for 2005 NCOIC Workshop By Michael Wan And Wayne Schroeder SDSC SDSC/UCSD/NPACI

SRB Configuration - Imploding Star

CentralCache Server

Satellitesourceserver

Satellitesourceserver

Satellitesourceserver

Satellitesourceserver

Satellitesourceserver

Archival Storage Model Projects – UK eScience –

BBSRC

Central Archival

server

Page 32: Core SRB Technology for 2005 NCOIC Workshop By Michael Wan And Wayne Schroeder SDSC SDSC/UCSD/NPACI

Peer to peer Federation of MCAT Zone

MCAT1

MCAT2

MCAT3Server1.1

Server1.2

Server2.1Server2.2

Server3.1

Page 33: Core SRB Technology for 2005 NCOIC Workshop By Michael Wan And Wayne Schroeder SDSC SDSC/UCSD/NPACI

Summary of the Babar Project

• Preproduction evaluation – 2003– Highlight of Wilco Kroeger’s (SLAC) talk at IEEE 2003

– Title - “Distributing Babar Data using SRB”

• BaBar Computing resources are geographically distributed: 5 Tier-A center GridKA (D), IN2P3 (F), INFN-Padova (I), RAL (UK), SLAC (USA)

• Data have to be replicated to the Tier-A sites.• Number of files is 1M. Size 100’s TB

Page 34: Core SRB Technology for 2005 NCOIC Workshop By Michael Wan And Wayne Schroeder SDSC SDSC/UCSD/NPACI

Babar Preproduction – SRB Usage

• Allows transparent access to files.– Don’t need to know host or storage medium

(disk,tape).

• Accessing files/collections by attributes.– Find files that were produced at a certain time or

site.– Find collections from a particular run period.

• Preproduction test – 2 weeks of MCAT and file transfer tests

Page 35: Core SRB Technology for 2005 NCOIC Workshop By Michael Wan And Wayne Schroeder SDSC SDSC/UCSD/NPACI

Babar Production Update

• Transferred ~70 Tb and 140K files

• Peak rate ~2 Tb/day. Average rate – 1 Tb/day

• Downtime encountered – hardware problem – DB updates

• Plan to federate SLAC and In2p3 Zones – – In2p3 picks up some of the load

• Thanks to Wilko Kroeger (SLAC) and Jean-Yves Nief (In2p3) for the info

Page 36: Core SRB Technology for 2005 NCOIC Workshop By Michael Wan And Wayne Schroeder SDSC SDSC/UCSD/NPACI

UK eScience BBSRC

• Archival of Biological Data from 16 sites to a central resource

• Data ingested into local resources

• Admin uses bulk Sphymove to move data from local resources to a central cache

• Moves data into containers

• Replicates containers to cache resource at RAL

• Replicates containers to ADS archival at RAL

• Removes cache copies

Page 37: Core SRB Technology for 2005 NCOIC Workshop By Michael Wan And Wayne Schroeder SDSC SDSC/UCSD/NPACI

UK eScience BBSRC

• Develop some software on their own– User interface using Jargon

• GUI

• Users not exposed to all SRB functionalities

– Request tracker – track data movement after ingestion

• Status – Project started at beginning of this year

– Just done with pilot program using SRB3.2

– Upgrading to 3.3 for production

Page 38: Core SRB Technology for 2005 NCOIC Workshop By Michael Wan And Wayne Schroeder SDSC SDSC/UCSD/NPACI

Biomedical Informatics Research Network (BIRN)

• Major collaboration with SDSC, several of the projects’ Co-Investigators and Co-PIs are at SDSC..

• SRB provides the ability to transparently share data across remote sites.

Page 39: Core SRB Technology for 2005 NCOIC Workshop By Michael Wan And Wayne Schroeder SDSC SDSC/UCSD/NPACI

The BIRN SRB Data Grid

Page 40: Core SRB Technology for 2005 NCOIC Workshop By Michael Wan And Wayne Schroeder SDSC SDSC/UCSD/NPACI

The BIRN Data Grid

Page 41: Core SRB Technology for 2005 NCOIC Workshop By Michael Wan And Wayne Schroeder SDSC SDSC/UCSD/NPACI

SRB in BIRN

BIRN Toolkit

Mediator

Viewing/Visualization Queries/ResultsApplications Data Management

File System

MCAT

HPSS

Data M

od

elD

ata Access

Data G

ridC

ompu

tatio

nal G

rid

Collaboration

NM

IG

rid

Man

agem

ent

Globus

GridPort

Scheduler

Distributed Resources

Database

SRB

Database