wong11/29/2015 1 sharing data using the storage resource broker (srb) ken wong the applied research...

30
Wong 06/23/22 SHARING DATA USING THE STORAGE RESOURCE BROKER (SRB) Ken Wong The Applied Research Laboratory (ARL) and The Department of Computer Science Washington University in St. Louis [email protected], http://www.arl.wustl.edu/~ken

Upload: evelyn-adams

Post on 14-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Wong11/29/2015 1 SHARING DATA USING THE STORAGE RESOURCE BROKER (SRB) Ken Wong The Applied Research Laboratory (ARL) and The Department of Computer Science

Wong 04/18/23 1

SHARING DATA USING THE STORAGE RESOURCE BROKER

(SRB)

Ken WongThe Applied Research Laboratory

(ARL)and The Department of Computer

ScienceWashington University in St. Louis

[email protected], http://www.arl.wustl.edu/~kenw

Page 2: Wong11/29/2015 1 SHARING DATA USING THE STORAGE RESOURCE BROKER (SRB) Ken Wong The Applied Research Laboratory (ARL) and The Department of Computer Science

Wong 04/18/23 2

OUTLINE OF TALK

SRB and HPSS Overview SRB Concepts and Examples Alternatives to SRB Other SRB Projects Our Experience

Page 3: Wong11/29/2015 1 SHARING DATA USING THE STORAGE RESOURCE BROKER (SRB) Ken Wong The Applied Research Laboratory (ARL) and The Department of Computer Science

Wong 04/18/23 3

WU DATA CACHE AND THE SRB

g h id o ra h p ss v 1

v B N S

AT M

6 2 2 M b p s

4 5 M b p s

b ra in m a p(sca n n ers)(S U M S )(1 .3 6 T B )(M C AT )

p etsu n -2 3 , 2 4

1 5 5 M b p s

(A rch iv es)

Page 4: Wong11/29/2015 1 SHARING DATA USING THE STORAGE RESOURCE BROKER (SRB) Ken Wong The Applied Research Laboratory (ARL) and The Department of Computer Science

Wong 04/18/23 4

WU DATA CACHE

1.4 TB DEC Storage Works RAID (Level 5)– 2-processor Sun Enterprise 450, 1 GB main memory– 622 Mbps ATM interface, 10/100 Mbps Ethernet

interface– 1.7 TB (raw) = 48 x 9 + 24 x 18 + 24 x 36 GB

Backups– Incremental: Tue, Wed, Thu– Full: Mon, Fri, Sat

Data Volume– Used: 560 GB– Burn Rate: 7.0 GB/week (This Year); 5.5 GB/week

(Lifetime)

Page 5: Wong11/29/2015 1 SHARING DATA USING THE STORAGE RESOURCE BROKER (SRB) Ken Wong The Applied Research Laboratory (ARL) and The Department of Computer Science

Wong 04/18/23 5

INSTALLATION HISTORY

Jun/Jul 98: Sun host and then 432 GB RAID– 3 year extended warranty and 3 year

maintenance on controllers Sep 98: SRB Aug 99: 24 x18.2 GB disks

– 3 year maintenance upgrade on controllers Dec 99: 24 x 36.4 GB disks

Page 6: Wong11/29/2015 1 SHARING DATA USING THE STORAGE RESOURCE BROKER (SRB) Ken Wong The Applied Research Laboratory (ARL) and The Department of Computer Science

Wong 04/18/23 6

BRAINMAP DATA GROWTH

0

2 0 0

4 0 0

6 0 0

8 0 0

1 0 0 0

1 2 0 0

1 4 0 0

0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0

Gig

abyt

es

W eek

C ap ac ityU sag e

Page 7: Wong11/29/2015 1 SHARING DATA USING THE STORAGE RESOURCE BROKER (SRB) Ken Wong The Applied Research Laboratory (ARL) and The Department of Computer Science

Wong 04/18/23 7

BRAINMAP DISK USAGE

0

5

1 0

1 5

2 0

2 5

3 0

3 5

4 0

Gig

abyt

es

F ile S y stem

Page 8: Wong11/29/2015 1 SHARING DATA USING THE STORAGE RESOURCE BROKER (SRB) Ken Wong The Applied Research Laboratory (ARL) and The Department of Computer Science

Wong 04/18/23 8

STORAGE RESOURCE BROKER (SRB)

(S R B C lien t)A p p lica tion

M C AT

D B 2, O ra cle , I llu stra , O b jectSto re H P S S , U n iTree U n ix , ftp

S R B S erv er

D istr ib u ted Storage R esou rces

Page 9: Wong11/29/2015 1 SHARING DATA USING THE STORAGE RESOURCE BROKER (SRB) Ken Wong The Applied Research Laboratory (ARL) and The Department of Computer Science

Wong 04/18/23 9

HIGH-PERFORMANCE STORAGE SYSTEM

0 .5 Tera b y te D isk S u b sy stem

8 S erv ers (d isk a n d ta p e m o v ers)1 S erv er (H iP P i m o v er)O C 1 2 -AT M , H iP P i, S P S w itch

N etw o rk

3 S ilo s

3 3 0 Tera b y te Ta p e L ib ra ry

8 K B C h u n k

IB M S P F ro n t E n d :

Page 10: Wong11/29/2015 1 SHARING DATA USING THE STORAGE RESOURCE BROKER (SRB) Ken Wong The Applied Research Laboratory (ARL) and The Department of Computer Science

Wong 04/18/23 10

HIGH-PERFORMANCE STORAGE SYSTEM

Current Usage– 150 TB (terabytes; trillion)– 15 million files

Current Capacity: 500 TBs of data (assuming a compression ratio of 1.5)

Projected Capacity: 1 PB (10^15) within a year

Page 11: Wong11/29/2015 1 SHARING DATA USING THE STORAGE RESOURCE BROKER (SRB) Ken Wong The Applied Research Laboratory (ARL) and The Department of Computer Science

Wong 04/18/23 11

SRB CONCEPTS

SRB Server: Responds to SRB requests from clients

MCAT (Metadata Catalogue)– Information about data sets and collections (Oracle DB)

SRB Client SRB Resource: A logical storage resource

– Example: HPSS storage and container cache Data Set: A file registered with the SRB Collection: Group of registered data

sets/collections Container: Data sets stored as one physical unit

– Container cache can be remote from HPSS

Page 12: Wong11/29/2015 1 SHARING DATA USING THE STORAGE RESOURCE BROKER (SRB) Ken Wong The Applied Research Laboratory (ARL) and The Department of Computer Science

Wong 04/18/23 12

SRB SYSTEM CAPABILITIES

Collection-based management of data sets Persistent identifiers for data sets Management of data sets (copies or replicas) Containers for aggregating data sets before

archiving Support for grid security infrastructure

authentication– Uses public key certificates

Support for integrating data set collections across file systems, archives, and databases

Page 13: Wong11/29/2015 1 SHARING DATA USING THE STORAGE RESOURCE BROKER (SRB) Ken Wong The Applied Research Laboratory (ARL) and The Department of Computer Science

Wong 04/18/23 13

SRB INTERFACES

Scommands (Unix commands)– Sinit/Sexit, Sput/Sget, Smkdir/Srmdir, Sls/Srm– Smkcont/Ssyncont, Slscont/Srmcont– SgetR/SgetU/SgetD

C-Programming API Browser

Page 14: Wong11/29/2015 1 SHARING DATA USING THE STORAGE RESOURCE BROKER (SRB) Ken Wong The Applied Research Laboratory (ARL) and The Department of Computer Science

Wong 04/18/23 14

PUBLISHING A DATA SET

Define the SRB environment (.srb/.MdasEnv file)mdasCollectionHome ‘/home/kenw.neurodb’

mdasDomainHome ‘neurodb’

srbUser ‘kenw’

srbHost ‘ghidorah.sdsc.edu’

defaultResource ‘cont-sdsc’

Interact with SRB server%Sinit # Connect to SRB server

%sls # See what is in my collection

%Sput ./mydata brain043 # Copy file to SRB space

%Schmod r public npaci brain043 # Give read access

%SgetD -a brain043 # Check access permissions

%Sexit # disconnect from SRB server

Page 15: Wong11/29/2015 1 SHARING DATA USING THE STORAGE RESOURCE BROKER (SRB) Ken Wong The Applied Research Laboratory (ARL) and The Department of Computer Science

Wong 04/18/23 15

GETTING A DATA SET (SCOMMANDS)

% Sinit

% Scd /home/colin.neurodb # go to Colin's collection

% Sls -l # see what is there

% Sget colin_avg20_1.0mm_at0.5mm.mnc .

# copy to this directory

% Sexit

Page 16: Wong11/29/2015 1 SHARING DATA USING THE STORAGE RESOURCE BROKER (SRB) Ken Wong The Applied Research Laboratory (ARL) and The Department of Computer Science

Wong 04/18/23 16

JINGHUA ZHOU'S WORK

Experiments– Test SRB functionality– Measures performance of basic SRB functions

Archiving (Perl Scripts)– Archive an arbitrary Unix directory to HPSS– Verify files were archived– Recover files from archival storage

Page 17: Wong11/29/2015 1 SHARING DATA USING THE STORAGE RESOURCE BROKER (SRB) Ken Wong The Applied Research Laboratory (ARL) and The Department of Computer Science

Wong 04/18/23 17

RETRIEVAL EXPERIMENTS

Load 100 MB container with 1 MB files Measure time required to retrieve N files Divide time by N to get average time for

each file Repeat after container has been moved to

tape Repeat above steps for 10 MB container

(instead of 100 MB)

Page 18: Wong11/29/2015 1 SHARING DATA USING THE STORAGE RESOURCE BROKER (SRB) Ken Wong The Applied Research Laboratory (ARL) and The Department of Computer Science

Wong 04/18/23 18

AVERAGE RETRIEVAL TIME (OLD FILES)

02 04 06 08 0

1 0 01 2 01 4 01 6 01 8 0

0 5 1 0 1 5 2 0 2 5 3 0

Sec

onds

N u m b er o f 1 M B F ile s

1 0 M B C o n ta in e r1 0 0 M B C o n ta in e r

Page 19: Wong11/29/2015 1 SHARING DATA USING THE STORAGE RESOURCE BROKER (SRB) Ken Wong The Applied Research Laboratory (ARL) and The Department of Computer Science

Wong 04/18/23 19

AVERAGE RETRIEVAL TIME (FRESH FILES)

0

1 0

2 0

3 0

4 0

5 0

6 0

7 0

0 5 1 0 1 5 2 0 2 5 3 0 3 5 4 0

Sec

onds

N u m b er o f 1 M B F ile s

1 0 M B C o n ta in e r1 0 0 M B C o n ta in e r

Page 20: Wong11/29/2015 1 SHARING DATA USING THE STORAGE RESOURCE BROKER (SRB) Ken Wong The Applied Research Laboratory (ARL) and The Department of Computer Science

Wong 04/18/23 20

COMMENTS

SRB Overhead Per Object (File)– 5-7 seconds (Early Measurements)– 2-4 seconds (Recent Measurements)

Tape Overhead Per Object (File): 100 seconds

TCP Connection Needs Tuning– Assymetric routing, bottleneck, ...– snoop and tcptrace analysis– Max Sget effective bandwidth is 8 Mbps– Max Sput effective bandwidth is 4 Mbps– Goal is 32 Mbps

Page 21: Wong11/29/2015 1 SHARING DATA USING THE STORAGE RESOURCE BROKER (SRB) Ken Wong The Applied Research Laboratory (ARL) and The Department of Computer Science

Wong 04/18/23 21

ARCHIVING

Reflect Unix directory structure in SRB collection structure

archiver NPACI/Unix account Look for inactive files within a directory Multiple versions handled by appending

modification date to file name Log all archival requests

Page 22: Wong11/29/2015 1 SHARING DATA USING THE STORAGE RESOURCE BROKER (SRB) Ken Wong The Applied Research Laboratory (ARL) and The Department of Computer Science

Wong 04/18/23 22

CURRENT WORK

TCP Tuning and SRB 1.1.7 Performance

Enhance Archival Scripts– Improve usability– Resilience to HPSS Blackouts– Parallel Archiving

Page 23: Wong11/29/2015 1 SHARING DATA USING THE STORAGE RESOURCE BROKER (SRB) Ken Wong The Applied Research Laboratory (ARL) and The Department of Computer Science

Wong 04/18/23 23

RECENT SRB DEVELOPMENTS

Data Cutter GSI authentication

– UsesX.509 certificates Container redesign

– To handle multiple archival and cache resources

Remote proxy (Spcommand) Textual annotation stored in MCAT

Page 24: Wong11/29/2015 1 SHARING DATA USING THE STORAGE RESOURCE BROKER (SRB) Ken Wong The Applied Research Laboratory (ARL) and The Department of Computer Science

Wong 04/18/23 24

ALTERNATIVES TO SRB

Distributed Database– Do not deal with file data Requires other means of

accessing files– A heavyweight solution; i.e., expense (money, expertise)– Need instances running wherever you want to have storage– If it is only meta-data, then a case can be made but ...

• Tied to a particular vendor at all sites• Have to cross link all the databases

AFS (Andrew File System)– Doesn't have concept of application metadata

• SRB has some metadata facilities now and more to come• Comments, annotations, user-controlled metadata

– SRB provides a uniform authentication and authorization system

Page 25: Wong11/29/2015 1 SHARING DATA USING THE STORAGE RESOURCE BROKER (SRB) Ken Wong The Applied Research Laboratory (ARL) and The Department of Computer Science

Wong 04/18/23 25

TOP SRB PROJECTS (SUMMARY) 2-Micron All Sky Survey

– 10 TB of data from Caltech– 5 million images sorted into 130,000 containers

Digital Embryo Project (NLM funded)– Digitizing existing slides for storage in HPSS

Particle Physics Data Grid (DOE funded)– Data mining

Information Power Grid (NASA funded) Data Visualization Corridor (DOE funded)

– Handles terabyte sized data sets for interactive viewing Neuroscience Data Set Federation

Page 26: Wong11/29/2015 1 SHARING DATA USING THE STORAGE RESOURCE BROKER (SRB) Ken Wong The Applied Research Laboratory (ARL) and The Department of Computer Science

Wong 04/18/23 26

TOP SRB PROJECTS 2-Micron All Sky Survey (2MASS)

– 10 TB of data from Caltech (3 TB done)– 5 million images sorted into 130,000 containers– SRB container technology used to manage the aggregation

process on a disk cache – Replicate Caltech data

Digital Embryo Project (NLM funded)– Digitizing existing slides for storage in HPSS– SRB used to manage data movement, aggregation into

containers, and metadata catalog– Queries against the collection

Particle Physics Data Grid (DOE funded)– Replicate data sets that are pulled into local disk caches

Page 27: Wong11/29/2015 1 SHARING DATA USING THE STORAGE RESOURCE BROKER (SRB) Ken Wong The Applied Research Laboratory (ARL) and The Department of Computer Science

Wong 04/18/23 27

TOP SRB PROJECTS

Information Power Grid (NASA funded)– SRB used to support data mining against a distributed data

set collection– Data transmission rate: 58 Mbps from SDSC to NASA Ames– Put collection management in front of storage archives

through use of the MCAT

Data Visualization Corridor (DOE funded)– SRB has been integrated with the Data Cutter system

• For remote manipulation of data sets– Handles terabyte sized data sets for interactive viewing

Neuroscience Data Set Federation

Page 28: Wong11/29/2015 1 SHARING DATA USING THE STORAGE RESOURCE BROKER (SRB) Ken Wong The Applied Research Laboratory (ARL) and The Department of Computer Science

Wong 04/18/23 28

CONCLUDING REMARKS

Documentation– http://www.sdsc.edu/DICE/SRB/index.html– http://www.arl.wustl.edu/kenw/npaci/index.html

Software– Follow SRB link– Get PGP key from SDSC– Can install subset (e.g., client only)

Applications?

Page 29: Wong11/29/2015 1 SHARING DATA USING THE STORAGE RESOURCE BROKER (SRB) Ken Wong The Applied Research Laboratory (ARL) and The Department of Computer Science

Wong 04/18/23 29

Page 30: Wong11/29/2015 1 SHARING DATA USING THE STORAGE RESOURCE BROKER (SRB) Ken Wong The Applied Research Laboratory (ARL) and The Department of Computer Science

Wong 04/18/23 30

WU DATA CACHE

vBNSvBNS

ghidorahghidorah(MCAT)(MCAT)

hpsshpss brainmapbrainmap(1.3 TB)(1.3 TB)

stp, v1stp, v1(SUMS)(SUMS)

petsun-23petsun-23(Scanners)(Scanners)

622 Mbps622 Mbps

45 Mbps45 Mbps

155 Mbps155 Mbps

ATMATMsdsc.edusdsc.edu wustl.eduwustl.edu

UCSD, UCLA, John Hopkins, U. Montana, CaltechUCSD, UCLA, John Hopkins, U. Montana, Caltech

(12 Major Users)(12 Major Users)