tactical storage: simple, secure, and semantic access to remote data

38
Tactical Storage: Tactical Storage: Simple, Secure, and Simple, Secure, and Semantic Semantic Access to Remote Data Access to Remote Data Prof. Douglas Thain Prof. Douglas Thain University of Notre Dame University of Notre Dame http://www.cse.nd.edu/~d http://www.cse.nd.edu/~d

Upload: merry

Post on 12-Jan-2016

28 views

Category:

Documents


0 download

DESCRIPTION

Tactical Storage: Simple, Secure, and Semantic Access to Remote Data. Prof. Douglas Thain University of Notre Dame http://www.cse.nd.edu/~dthain. Plentiful Computing Power. http://www.cs.wisc.edu/condor/map. As of 25 April 2006... Condor Worldwide: 56,682 CPUs / ??? TB / 1758 sites - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Tactical Storage: Simple, Secure, and Semantic Access to Remote Data

Tactical Storage:Tactical Storage:Simple, Secure, and SemanticSimple, Secure, and Semantic

Access to Remote DataAccess to Remote Data

Prof. Douglas ThainProf. Douglas Thain

University of Notre DameUniversity of Notre Dame

http://www.cse.nd.edu/~dthainhttp://www.cse.nd.edu/~dthain

Page 2: Tactical Storage: Simple, Secure, and Semantic Access to Remote Data
Page 3: Tactical Storage: Simple, Secure, and Semantic Access to Remote Data
Page 4: Tactical Storage: Simple, Secure, and Semantic Access to Remote Data

As of 25 April 2006...As of 25 April 2006...

Condor Worldwide:Condor Worldwide:– 56,682 CPUs / ??? TB / 1758 sites56,682 CPUs / ??? TB / 1758 sites

TeragridTeragrid– 15,328 CPUs / 220 TB / 6 sites15,328 CPUs / 220 TB / 6 sites

Open Science GridOpen Science Grid– 21,156 CPUs / 83 TB / 61 sites21,156 CPUs / 83 TB / 61 sites

EGEE GridEGEE Grid– Lots???Lots???

http://www.cs.wisc.edu/condor/map

Plentiful Computing PowerPlentiful Computing Power

Page 5: Tactical Storage: Simple, Secure, and Semantic Access to Remote Data

Complex Ecology of StorageComplex Ecology of Storage

Shared Filesystemshared

disk

shareddisk

privatedisk

privatedisk

privatedisk

privatedisk

HTTP, FTP, RFIO, gLite,SRB, SCP, RSYNC, HTTP...

Independent Cluster Disks

Page 6: Tactical Storage: Simple, Secure, and Semantic Access to Remote Data

Problems Accessing DataProblems Accessing DataLarge Burden on the UserLarge Burden on the User– User may not be able/willing to state files in advance.User may not be able/willing to state files in advance.– Different services/protocols available at different sites.Different services/protocols available at different sites.– Programs not modified to take advantage of services.Programs not modified to take advantage of services.

Different access modes for different purposes.Different access modes for different purposes.– File transfer: preparing system for intended use.File transfer: preparing system for intended use.– File system: access to data for running jobs.File system: access to data for running jobs.

Resources go unused.Resources go unused.– Disks on each node of a cluster.Disks on each node of a cluster.– Unorganized resources in a department/lab.Unorganized resources in a department/lab.– Would like to combine disks into larger structures.Would like to combine disks into larger structures.

A global file system can’t satisfy everyone!A global file system can’t satisfy everyone!– (Global means different things to different people.)(Global means different things to different people.)– Both a technical and social problem.Both a technical and social problem.

Page 7: Tactical Storage: Simple, Secure, and Semantic Access to Remote Data

What’s the Problem?What’s the Problem?

We often assume that the site administrator is We often assume that the site administrator is responsible for making the site comfortable for responsible for making the site comfortable for the user. (Not possible on the grid!)the user. (Not possible on the grid!)

Rather, the user should be able to bring along a Rather, the user should be able to bring along a mechanism to access multiple independent mechanism to access multiple independent (remote?) data sources.(remote?) data sources.

Of course, we have to make it Of course, we have to make it easyeasy!!

Page 8: Tactical Storage: Simple, Secure, and Semantic Access to Remote Data

Tactical Storage Systems (TSS)Tactical Storage Systems (TSS)

A TSS allows any node to serve as a file server A TSS allows any node to serve as a file server or as a file system client.or as a file system client.

All components can be deployed without special All components can be deployed without special privileges – but with security.privileges – but with security.

Users can build up complex structures.Users can build up complex structures.– Filesystems, databases, caches, ...Filesystems, databases, caches, ...– Admins need not know/care about larger structures.Admins need not know/care about larger structures.

Two Independent Concepts:Two Independent Concepts:– ResourcesResources – The raw storage to be used. – The raw storage to be used.– AbstractionsAbstractions – The organization of storage. – The organization of storage.

Page 9: Tactical Storage: Simple, Secure, and Semantic Access to Remote Data

file transfer

filesystem

filesystem

filesystem

filesystem

filesystem

filesystem

filesystem

SimpleFilesystem

App

Distributed Database Abstraction

Parrot

App

Distributed Filesystem Abstraction

Parrot

App

Cluster administrator controlspolicy on all storage in cluster

UNIX UNIX UNIX UNIX UNIX UNIX UNIX

Workstations owners controlpolicy on each machine.

fileserver

fileserver

fileserver

fileserver

fileserver

fileserver

fileserver

UNIX UNIX UNIX UNIX UNIX UNIX UNIX

???Parrot

3PT

Page 10: Tactical Storage: Simple, Secure, and Semantic Access to Remote Data

Key PropertiesKey Properties

Tactical Storage is Tactical Storage is SimpleSimple::– Appears as an ordinary filesystem.Appears as an ordinary filesystem.– Applies to unmodified applications and data w/out Applies to unmodified applications and data w/out

code changes, relinking, kernel modules, etc...code changes, relinking, kernel modules, etc...

Tactical Storage is Tactical Storage is SecureSecure::– Authentication with standard GSI or Kerberos.Authentication with standard GSI or Kerberos.– Rich distributed access control system.Rich distributed access control system.

Tactical Storage is Tactical Storage is SemanticSemantic::– Name data by meaning, not by location.Name data by meaning, not by location.– Supports external name resolution mechanisms.Supports external name resolution mechanisms.

Page 11: Tactical Storage: Simple, Secure, and Semantic Access to Remote Data
Page 12: Tactical Storage: Simple, Secure, and Semantic Access to Remote Data
Page 13: Tactical Storage: Simple, Secure, and Semantic Access to Remote Data
Page 14: Tactical Storage: Simple, Secure, and Semantic Access to Remote Data
Page 15: Tactical Storage: Simple, Secure, and Semantic Access to Remote Data
Page 16: Tactical Storage: Simple, Secure, and Semantic Access to Remote Data
Page 17: Tactical Storage: Simple, Secure, and Semantic Access to Remote Data

Access Control in File ServersAccess Control in File Servers

Unix Security is not SufficientUnix Security is not Sufficient– No global user database possible/desirable.No global user database possible/desirable.– Mapping external credentials to Unix gets messy.Mapping external credentials to Unix gets messy.

Instead, Make External Names First-ClassInstead, Make External Names First-Class– Perform access control on remote, not local, names.Perform access control on remote, not local, names.– Types: Globus, Kerberos, Unix, Hostname, AddressTypes: Globus, Kerberos, Unix, Hostname, Address

Each directory has an ACL:Each directory has an ACL:globus:/O=NotreDame/CN=DThain RWLAglobus:/O=NotreDame/CN=DThain RWLA

kerberos:[email protected] RWLkerberos:[email protected] RWL

hostname:*.cs.nd.edu RLhostname:*.cs.nd.edu RL

address:192.168.1.* RWLAaddress:192.168.1.* RWLA

Page 18: Tactical Storage: Simple, Secure, and Semantic Access to Remote Data

filesystem

filesystem

filesystem

filesystem

filesystem

filesystem

filesystem

UNIX UNIX UNIX UNIX UNIX UNIX UNIX

fileserver

fileserver

fileserver

fileserver

fileserver

fileserver

fileserver

PhysicsGroup

List

ChemistryGroup

List

Lab 5Group

List

App App

data

ACL:Lab 5 RW

Chemistry R

AppApp

data

ACL:Physics RW

Lab 5 R

Distributed Group ACLsDistributed Group ACLs

Page 19: Tactical Storage: Simple, Secure, and Semantic Access to Remote Data

Semantic Data AccessSemantic Data Access

Appl

Parrot

/usr/local = /chirp/host5.nd.edu/software/tmp = /chirp/host9.nd.edu/scratch/data = /gsiftp/ftp.nd.edu/mydata/db = resolver:find_db

host5 host9 FTP

/usr/local /tmp/data

find_db

Where is /db/dir/523?

It’s at /ftp/ftp.infn.it/db/xz

Page 20: Tactical Storage: Simple, Secure, and Semantic Access to Remote Data
Page 21: Tactical Storage: Simple, Secure, and Semantic Access to Remote Data

Remote Database AccessRemote Database Access

script

Parrotfile

serverfile

system

DB data

libdb.so

sim.exe

WANSimple FS

HEP Simulation Needs Direct DB AccessHEP Simulation Needs Direct DB Access– App linked against Objectivity DB.App linked against Objectivity DB.– Objectivity accesses filesystem directly.Objectivity accesses filesystem directly.– How to distribute application How to distribute application securelysecurely??

Solution: Remote Root Mount via Parrot:Solution: Remote Root Mount via Parrot: parrot –M /=/chirp/fileserver/rootdirparrot –M /=/chirp/fileserver/rootdir

DB code can read/write/lock files directly.DB code can read/write/lock files directly.

GSI Auth

GSI

Credit: Sander Klous @ NIKHEF

Page 22: Tactical Storage: Simple, Secure, and Semantic Access to Remote Data

Remote Application LoadingRemote Application Loading

appl

ParrotHTTPserver

filesystem

liba.so

libb.so

libc.so

Credit: Igor Sfiligoi @ Fermi National Lab

HTTP

Modular Simulation Needs Many LibrariesModular Simulation Needs Many Libraries– Devel. on workstations, then ported to grid.Devel. on workstations, then ported to grid.– Selection of library depends on analysis tech.Selection of library depends on analysis tech.– Constraint: Must use HTTP for file access.Constraint: Must use HTTP for file access.

Solution: Dynamic Link with TSS+HTTP:Solution: Dynamic Link with TSS+HTTP:– /home/cdfsoft -> /http/dcaf.fnal.gov/cdfsoft/home/cdfsoft -> /http/dcaf.fnal.gov/cdfsoft

select several MB from 60 GB of libraries

proxy

proxy

Page 23: Tactical Storage: Simple, Secure, and Semantic Access to Remote Data

Technical ProblemTechnical Problem

HTTP is not a filesystem! (No directories)HTTP is not a filesystem! (No directories)– Advantages: Firewalls, caches, admins.Advantages: Firewalls, caches, admins.

Appl

Parrot

HTTP Module

HTTPServer

root

etchome bin

alice cmsbabar

opendir(/home)

opendir(/home)

GET /home HTTP/1.0

<HTML><HEAD>

<H1>

Page 24: Tactical Storage: Simple, Secure, and Semantic Access to Remote Data

Technical ProblemTechnical ProblemSolution: Turn the directories into files.Solution: Turn the directories into files.– Can be cached in ordinary proxies!Can be cached in ordinary proxies!– Hierarchical SHA1 integrity check.Hierarchical SHA1 integrity check.

Appl

Parrot

HTTP Module

HTTPServer

root

etchome bin

alice cmsbabar

opendir(/home)

opendir(/home)

GET /home/.dir HTTP/1.0

.dir

.dir

makehttpfs

alicebabarcms

Page 25: Tactical Storage: Simple, Secure, and Semantic Access to Remote Data

Logical Access to Bio DataLogical Access to Bio Data

Many databases of biological data in different Many databases of biological data in different formats around the world:formats around the world:– Archives: Swiss-Prot, TreMBL, NCBI, etc...Archives: Swiss-Prot, TreMBL, NCBI, etc...– Replicas: Public, Shared, Private, ???Replicas: Public, Shared, Private, ???

Users and applications want to refer to data Users and applications want to refer to data objects by logical name, not location!objects by logical name, not location!– Access the nearest copy of the non-redundant protein Access the nearest copy of the non-redundant protein

database, don’t care where it is.database, don’t care where it is.

Solution: EGEE data management system maps Solution: EGEE data management system maps logical names (LFNs) to physical names (SFNs).logical names (LFNs) to physical names (SFNs).

Credit: Christophe Blanchet, Bioinformatics Center of Lyon, CNRS IBCP, Francehttp://gbio.ibcp.fr/cblanchet, [email protected]

Page 26: Tactical Storage: Simple, Secure, and Semantic Access to Remote Data

Logical Access to Bio DataLogical Access to Bio Data

BLAST

Parrot

RFIO gLite HTTP FTP

ChirpServer

FTPServer

gLiteServer

EGEE FileLocation Service

Run BLAST onLFN://ncbi.gov/nr.data

open(LFN://ncbi.gov/nr.data)

Where isLFN://ncbi.gov/nr.data?

Find it at:FTP://ibcp.fr/nr.data

nr.data

nr.data

nr.dataRETR nr.data

open(FTP://ibcp.fr/nr.data)

Page 27: Tactical Storage: Simple, Secure, and Semantic Access to Remote Data

Performance of Bio Apps on EGEEPerformance of Bio Apps on EGEE

0

50

100

150

200

250

300

350

400

450

0 200 000 400 000 600 000 800 000 1 000 000 1 200 000

Protein Database Size (sequences)

Ru

nti

me (

sec)

BLAST+Parrot

FastA+Parrot

SSearch+Parrot

BLAST+copy

FastA+copy

SSearch+copy

Page 28: Tactical Storage: Simple, Secure, and Semantic Access to Remote Data

Expandable FilesystemExpandable Filesystemfor Experimental Datafor Experimental Data

Credit: John Poirer @ Notre Dame Astrophysics Dept.

bufferdisk

2 GB/day todaycould be lots more!

dailytape

dailytapedaily

tapedailytapedaily

tape

30-yeararchive

analysiscode

Can only analyzethe most recent data.

Project GRANDhttp://www.nd.edu/~grand

Page 29: Tactical Storage: Simple, Secure, and Semantic Access to Remote Data

Expandable FilesystemExpandable Filesystemfor Experimental Datafor Experimental Data

Credit: John Poirer @ Notre Dame Astrophysics Dept.

bufferdisk

2 GB/day todaycould be lots more!

dailytape

dailytapedaily

tapedailytapedaily

tape

30-yeararchive

Project GRANDhttp://www.nd.edu/~grand

fileserver

fileserver

fileserver

fileserver

Distributed Shared Filesystem

Adapter

analysiscode

Can analyze all dataover large time scales.

Page 30: Tactical Storage: Simple, Secure, and Semantic Access to Remote Data

Current WorkCurrent Work

Credit: Jesus Izaguirre and Aaron Striegel @ Notre Dame

Now that we can easily use any storage...Now that we can easily use any storage...– Much easier to arrange data/jobs arbitrarily.Much easier to arrange data/jobs arbitrarily.– Idea: combine cluster storage / cluster comp!Idea: combine cluster storage / cluster comp!– Goal: keep jobs close to data that they need.Goal: keep jobs close to data that they need.– PINS: Processing in SToragePINS: Processing in STorage

Example: GEMS Distributed DatabankExample: GEMS Distributed Databank– Facility for creating, storing, and analyzing molecular Facility for creating, storing, and analyzing molecular

dynamics data in a cluster.dynamics data in a cluster.– Goal: Be able to easily scale both CPU and storage Goal: Be able to easily scale both CPU and storage

capacity by adding commodity nodes.capacity by adding commodity nodes.

Page 31: Tactical Storage: Simple, Secure, and Semantic Access to Remote Data

filesystem

filesystem

filesystem

filesystem

filesystem

filesystem

filesystem

UNIX UNIX UNIX UNIX UNIX UNIX UNIX

fileserver

fileserver

fileserver

fileserver

fileserver

fileserver

fileserver

meta-datadatabase

J1 J2 J3 J4

D1 D2 D3 D4

D1 D1 D3 D4

F

F(D1)

FetchD1

ComputeF(D1)

Query(Mol==“CH4”)

&&(T>300K)

Distributed Filesystem Abstraction

Adapter

App

D2 D3 D4

D2 D3 D4D1

Page 32: Tactical Storage: Simple, Secure, and Semantic Access to Remote Data

More Open ProblemsMore Open ProblemsResource ManagementResource Management– How to prevent overcommitment -> badput?How to prevent overcommitment -> badput?

SecuritySecurity– How to easily express complex policies for sharing How to easily express complex policies for sharing

and controlling combined cpu/disk?and controlling combined cpu/disk?

ReliabilityReliability– How to deal with disconnection, erasure, rejection, How to deal with disconnection, erasure, rejection,

unexpected performance, etc...unexpected performance, etc...

Garbage CollectionGarbage Collection– What’s to prevent me from filling every disk What’s to prevent me from filling every disk

everywhere with computations that I might need?everywhere with computations that I might need?

DebuggingDebugging– How do we dig out of numerous, noisy, distributed How do we dig out of numerous, noisy, distributed

logs that state relevant to a complex workflow?logs that state relevant to a complex workflow?

Page 33: Tactical Storage: Simple, Secure, and Semantic Access to Remote Data

ConclusionConclusion

Tactical storage allows end users to build large Tactical storage allows end users to build large structures out of simple building blocks without structures out of simple building blocks without

getting stuck on the ugly details.getting stuck on the ugly details.

Page 34: Tactical Storage: Simple, Secure, and Semantic Access to Remote Data

AcknowledgmentsAcknowledgments

Science Collaborators:Science Collaborators:– Christophe BlanchetChristophe Blanchet– Patrick FlynnPatrick Flynn– Sander Klous Sander Klous – Peter KunzstPeter Kunzst– Erwin LaureErwin Laure– John PoirierJohn Poirier– Igor SfiligoiIgor Sfiligoi

CS Collaborators:CS Collaborators:– Jesus IzaguirreJesus Izaguirre– Aaron StriegelAaron Striegel

CS Students:CS Students:– Paul BrennerPaul Brenner– James FitzgeraldJames Fitzgerald– Jeff HemmesJeff Hemmes– Paul MadridPaul Madrid– Chris MorettiChris Moretti– Gerhard NiederwieserGerhard Niederwieser– Phil SnowbergerPhil Snowberger– Justin WozniakJustin Wozniak

Page 35: Tactical Storage: Simple, Secure, and Semantic Access to Remote Data

For more information...For more information...

Cooperative Computing LabCooperative Computing Lab

http://www.cse.nd.edu/~cclhttp://www.cse.nd.edu/~ccl

Cooperative Computing ToolsCooperative Computing Tools

http://http://www.cctools.orgwww.cctools.org

Douglas ThainDouglas Thain– [email protected]@cse.nd.edu– http://http://www.cse.nd.edu/~dthainwww.cse.nd.edu/~dthain

Page 36: Tactical Storage: Simple, Secure, and Semantic Access to Remote Data
Page 37: Tactical Storage: Simple, Secure, and Semantic Access to Remote Data

Problem: Shared NamespaceProblem: Shared Namespacefile

server

globus:/O=NotreDame/* RWLAX

a.out

test.c test.dat

cms.exe

Page 38: Tactical Storage: Simple, Secure, and Semantic Access to Remote Data

Solution: Reservation (V) RightSolution: Reservation (V) Rightfile

server

O=NotreDame/CN=* V(RWLA)

/O=NotreDame/CN=Monk RWLA

mkdir

a.outtest.c

/O=NotreDame/CN=Monk

mkdir

/O=NotreDame/CN=Ted RWLA

a.outtest.c

/O=NotreDame/CN=Tedmkdir only!