reliable distributed obs-based file system · provides access to zfs data and metadata in local...

29
IBM Research Laboratory in Haifa zFS - A Reliable Distributed ObS-based File System Julian Satran and Avi Teperman

Upload: others

Post on 11-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Reliable Distributed ObS-based File System · Provides access to zFS data and metadata in local memory to other machines Since data is transferred from memory to memory, there is

IBM Research Laboratory in Haifa

zFS - A Reliable Distributed ObS-based File System

Julian Satran and Avi Teperman

Page 2: Reliable Distributed ObS-based File System · Provides access to zFS data and metadata in local memory to other machines Since data is transferred from memory to memory, there is

IBM Labs in Haifa

2 IBM Research Laboratory in Haifa

zFS Background

zFS is part of continued research on storageStarted with Distributed Sharing FacilityContinued with Antara Object Store DeviceObS standard is under development by SNIA and is a formal T10 projectzFS will move to future ObS

zFS is an attempt to explore completely distributed File System based on Object Storage

Page 3: Reliable Distributed ObS-based File System · Provides access to zFS data and metadata in local memory to other machines Since data is transferred from memory to memory, there is

IBM Labs in Haifa

3 IBM Research Laboratory in Haifa

zFS Goals

Operate well on few or thousands of machines Built from off-the-shelf components with Object StorageUse the memory of all machines as a global cacheAchieve almost linear scalability

Page 4: Reliable Distributed ObS-based File System · Provides access to zFS data and metadata in local memory to other machines Since data is transferred from memory to memory, there is

IBM Labs in Haifa

4 IBM Research Laboratory in Haifa

How zFS Goals are Achieved

Scalability is achieved by:Separating storage-space management from file management

Storage-space management is encapsulated in the ObSDynamically distributing file management

No central serverNo single point of failure

Machines can be dynamically added/removed from zFS

Page 5: Reliable Distributed ObS-based File System · Provides access to zFS data and metadata in local memory to other machines Since data is transferred from memory to memory, there is

IBM Labs in Haifa

5 IBM Research Laboratory in Haifa

zFS Architecture

Page 6: Reliable Distributed ObS-based File System · Provides access to zFS data and metadata in local memory to other machines Since data is transferred from memory to memory, there is

IBM Labs in Haifa

6 IBM Research Laboratory in Haifa

SAN File Systems Architecture

Many computers connected to many storage devices by high speed networkClients can have non-mediated access to storage – improved storage access

Page 7: Reliable Distributed ObS-based File System · Provides access to zFS data and metadata in local memory to other machines Since data is transferred from memory to memory, there is

IBM Labs in Haifa

7 IBM Research Laboratory in Haifa

Security vs. ProtectionProtection: buggy clients, inadvertent access, etc.

Useful inside and outside glass houseSecurity: intentional attempts at unauthorized access

Essential outside the glass houseSAN Security/Protection Today:

At unit levelZoning/Fencing/Masking are hard to use and are done at LU levelToo many actively used blocks to provide block level security

Coordination overhead too highAssume only trusted clients

SAN File Systems ChallengesSecurity and Protection

Page 8: Reliable Distributed ObS-based File System · Provides access to zFS data and metadata in local memory to other machines Since data is transferred from memory to memory, there is

IBM Labs in Haifa

8 IBM Research Laboratory in Haifa

Scalability is not an issue when volumes are partitioned among hostsHowever, Shared access is one of the touted benefits of a SANFor shared read-write access host must coordinate usage of blocks

File systems must coordinate allocation of blocks to filesCoordination can result in:

False contentionBetween hosts allocating space from different logical units

Additional communicationIt may be possible to piggyback some of the information

SAN File Systems ChallengesScalability

Page 9: Reliable Distributed ObS-based File System · Provides access to zFS data and metadata in local memory to other machines Since data is transferred from memory to memory, there is

IBM Labs in Haifa

9 IBM Research Laboratory in Haifa

Object Store Security

All operations are secured by a credentialSecurity achieved by cooperation of:

Admin – authenticates/authorizes clients and generates credentials.

ObS -- validates credential that a client presents. Credential is cryptographically hardened

ObS and admin share a secretGoals of Object Store security are:

Increased protection/securityAt level of objects rather than LUHosts do not access metadata directly

Allow non-trusted clients to sit on SANAllow shared access to storage without giving clients access to all data on volume

Client

Object Store

Security Admin

Shared Secret

Authorization Req

CredentialCredential

Page 10: Reliable Distributed ObS-based File System · Provides access to zFS data and metadata in local memory to other machines Since data is transferred from memory to memory, there is

IBM Labs in Haifa

10 IBM Research Laboratory in Haifa

zFS Components

Object Store Device (ObS)Assumes ObSs have fail over mechanismObS is rather recent, and the Working Group is approaching a first version of the spec

Lease Manager (LMGR)File Manager (FMGR)Transaction Manager (TMGR)Front-End/Cache (FE/Cache)

No Single Point of Failure

Page 11: Reliable Distributed ObS-based File System · Provides access to zFS data and metadata in local memory to other machines Since data is transferred from memory to memory, there is

IBM Labs in Haifa

11 IBM Research Laboratory in Haifa

zFS ComponentsObject Store Device (ObS)

Using ObS allows zFS to focus on File Management and ScalabilityAllocation of storage is done in the ObSSecurity is handled by the Security Admin and ObSCooperative caching poses security challenge

Page 12: Reliable Distributed ObS-based File System · Provides access to zFS data and metadata in local memory to other machines Since data is transferred from memory to memory, there is

IBM Labs in Haifa

12 IBM Research Laboratory in Haifa

zFS ComponentsLease Manager (LMGR)

The need for lease manager stems from the following facts:Locking mechanism is required to control access to disksIn SAN file systems clients can write directly to ObSs. zFS uses a locking discipline for concurrency control

Page 13: Reliable Distributed ObS-based File System · Provides access to zFS data and metadata in local memory to other machines Since data is transferred from memory to memory, there is

IBM Labs in Haifa

13 IBM Research Laboratory in Haifa

To reduce ObS’s overhead the following mechanism is used:Each ObS is associated with one lease managerObS maintains and grants to its LMGR one major lease LMGR grants object leases to the FMGRs requesting itFMGR grants range leases to the FEs requesting it

zFS ComponentszFS ComponentsLease Manager (LMGR)Lease Manager (LMGR)

Page 14: Reliable Distributed ObS-based File System · Provides access to zFS data and metadata in local memory to other machines Since data is transferred from memory to memory, there is

IBM Labs in Haifa

14 IBM Research Laboratory in Haifa

The LMGR + FMGR operation distributed and safe lease.We prefer leases over locks to reduce state related issues – that make recovery complexLeases incur the overhead of leases renewal – negligible as leases a renewed periodically with one operation-per-renewing-machine

zFS ComponentszFS ComponentsLease Manager (LMGR)Lease Manager (LMGR)

Page 15: Reliable Distributed ObS-based File System · Provides access to zFS data and metadata in local memory to other machines Since data is transferred from memory to memory, there is

IBM Labs in Haifa

15 IBM Research Laboratory in Haifa

Each file is managed by one file manager onlyEach lease request on the file is mediated by the FMGRThe first client opening a file creates an instance of the file manager (in the current implementation this instance is local to the client)The FMGR interacts with the proper LMGR to get the object lease and grants range leases to the FE

zFS ComponentszFS ComponentsFile Manager (FMGR)File Manager (FMGR)

Page 16: Reliable Distributed ObS-based File System · Provides access to zFS data and metadata in local memory to other machines Since data is transferred from memory to memory, there is

IBM Labs in Haifa

16 IBM Research Laboratory in Haifa

The FMGR keeps track of:Where (which FE) each file’s extents resideEach file’s leases

If client X requests page and lease whichresides on client Y the FMGR will direct

FE/Cache on Y to send the requested pageand lease directly to X

zFS ComponentszFS ComponentsFile Manager (FMGR)File Manager (FMGR)

Page 17: Reliable Distributed ObS-based File System · Provides access to zFS data and metadata in local memory to other machines Since data is transferred from memory to memory, there is

IBM Labs in Haifa

17 IBM Research Laboratory in Haifa

File manager assignment is dynamicFE requests lease from local FMGR (fmgri)fmgri requests object lease from LMGR.LMGR checks:

If no other FMGR holds the object lease it grants the object lease to fmgri

if another FMGR (fmgrk) has the object lease LMGR returns its address to fmgri

fmgri instructs FE to request the lease from fmgrk

zFS ComponentszFS ComponentsFile Manager (FMGR)File Manager (FMGR)

Page 18: Reliable Distributed ObS-based File System · Provides access to zFS data and metadata in local memory to other machines Since data is transferred from memory to memory, there is

IBM Labs in Haifa

18 IBM Research Laboratory in Haifa

Meta data operations handle several objectsTo ensure file system consistency zFS implements them as distributed transactionsAll meta data operations are handled by the TMGR

zFS ComponentszFS ComponentsTransaction Manager (TMGR)Transaction Manager (TMGR)

Page 19: Reliable Distributed ObS-based File System · Provides access to zFS data and metadata in local memory to other machines Since data is transferred from memory to memory, there is

IBM Labs in Haifa

19 IBM Research Laboratory in Haifa

FERuns on every client machinePresents to the application/user the standard file system APIProvides access to zFS files and directories

CacheProvides access to zFS data and metadata in local memory to other machinesSince data is transferred from memory to memory, there is a security issue if one client is un-trusted

Requires further research

zFS ComponentszFS ComponentsFrontFront--End / CacheEnd / Cache

Page 20: Reliable Distributed ObS-based File System · Provides access to zFS data and metadata in local memory to other machines Since data is transferred from memory to memory, there is

IBM Labs in Haifa

20 IBM Research Laboratory in Haifa

zFS Architecture

Page 21: Reliable Distributed ObS-based File System · Provides access to zFS data and metadata in local memory to other machines Since data is transferred from memory to memory, there is

IBM Labs in Haifa

21 IBM Research Laboratory in Haifa

zFS read() Operation

Page 22: Reliable Distributed ObS-based File System · Provides access to zFS data and metadata in local memory to other machines Since data is transferred from memory to memory, there is

IBM Labs in Haifa

22 IBM Research Laboratory in Haifa

zFS write() OperationOnly read leases

Page 23: Reliable Distributed ObS-based File System · Provides access to zFS data and metadata in local memory to other machines Since data is transferred from memory to memory, there is

IBM Labs in Haifa

23 IBM Research Laboratory in Haifa

zFS write() OperationOne write lease

Page 24: Reliable Distributed ObS-based File System · Provides access to zFS data and metadata in local memory to other machines Since data is transferred from memory to memory, there is

IBM Labs in Haifa

24 IBM Research Laboratory in Haifa

zFS Failure HandlingLMGRi failed

Detected by all FMGRs that hold leases for objects of ObSi

Each FMGR informs all FEs holding files on ObSi to flush their dirty data and release filesFMGRs instantiate new LMGRi which tries to get the ObSi major leaseOnce the previous major lease expires, one LMGRi gets the major lease and all others are terminatedOperation on ObSi resumes

Page 25: Reliable Distributed ObS-based File System · Provides access to zFS data and metadata in local memory to other machines Since data is transferred from memory to memory, there is

IBM Labs in Haifa

25 IBM Research Laboratory in Haifa

zFS Failure HandlingFMGRi failed

Detected by all FEs that hold leases granted by FMGRi Each FE flushes all its dirty data to ObSi and release filesFEs instantiate new local FMGRiWhen application requests a range lease for file F the local FMGR requests an object lease from the LMGR If another local FMGR already got the object lease, its address is passed back to the FE via the requesting FMGRThe FE connects to the correct FMGR and operation continuesAll this is transparent to the application

Page 26: Reliable Distributed ObS-based File System · Provides access to zFS data and metadata in local memory to other machines Since data is transferred from memory to memory, there is

IBM Labs in Haifa

26 IBM Research Laboratory in Haifa

zFS Future Research

Distributed TransactionsHandling meta data operations in a distributed scalable manner

Add Security MechanismInvestigate how cooperative cache integrates with the security model of ObjectStore

Page 27: Reliable Distributed ObS-based File System · Provides access to zFS data and metadata in local memory to other machines Since data is transferred from memory to memory, there is

IBM Labs in Haifa

27 IBM Research Laboratory in Haifa

Related Documents

zFS Web Sitehttp://www.haifa.il.ibm.com/projects/storage/zFS/index.html

DSF Web Sitehttp://www.haifa.il.ibm.com/projects/systems/dsf.html

PublicationszFS - A Scalable distributed File System using Object Disks

O. Rodeh & A. Teperman; MSST03Group Communication - still complex after all these years

R. Golding & O. Rodeh; SRDS 2003Object Store Based SAN File Systems

J. Satran & A. Teperman; SSCCII-2004

Page 28: Reliable Distributed ObS-based File System · Provides access to zFS data and metadata in local memory to other machines Since data is transferred from memory to memory, there is

IBM Labs in Haifa

28 IBM Research Laboratory in Haifa

Current Status

zFS implemented on Linux Kernel 2.4.19

CurrentlyExcept for TMGR all components workInitial results show ~20% improvement on read with cooperative cache

Page 29: Reliable Distributed ObS-based File System · Provides access to zFS data and metadata in local memory to other machines Since data is transferred from memory to memory, there is

IBM Labs in Haifa

29 IBM Research Laboratory in Haifa

Related WorkLustre

ClientsCluster control-system (MDS)Storage targets (programmable ObS)

StorageTankClient MSD (different IP network)Storage: currently standard SCSI disks over SAN Integration with ObjectStore

xFS and XFSxFS is similar to zFS with static distribution policyImplemented cooperative caching - M. D. Dahlin, C. J. Mather, R. Y. Wang, T. E. Anderson and D. A. Patterson, A Quantitative Analysis of Cache Policies for Scalable Network File System, Computer Science Division, University of California at Berkeley