kenny kim pspace isw5 · 2012-08-16 · 1 oasisfs (object-based storage architecture for scalable,...

29
1 OASISfs (Object-based storage Architecture for Scalable, Intelligent, and Secure file system) An Implementation of OSD based Cluster File System (and experiences of initial deployment) Kenny Kim C.E.O. PSPACE, inc [email protected]

Upload: others

Post on 21-Apr-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Kenny Kim Pspace ISW5 · 2012-08-16 · 1 OASISfs (Object-based storage Architecture for Scalable, Intelligent, and Secure file system) An Implementation of OSD based Cluster File

1

OASISfs(Object-based storage Architecture for Scalable, Intelligent, and Secure file system)

An Implementation of OSD based Cluster File System(and experiences of initial deployment)

Kenny KimC.E.O. PSPACE, [email protected]

Page 2: Kenny Kim Pspace ISW5 · 2012-08-16 · 1 OASISfs (Object-based storage Architecture for Scalable, Intelligent, and Secure file system) An Implementation of OSD based Cluster File

2

▣ Contents◈ PSPACE, inc.◈ OASISfs Development Team◈ OASISfs Overview◈ OASISfs Runtime characteristics◈ ezCon: Web based OASISfs Management Software◈ OASISfs Verification Process◈ Deployment Experience◈ Future Work (in progress)◈ Remarks

Page 3: Kenny Kim Pspace ISW5 · 2012-08-16 · 1 OASISfs (Object-based storage Architecture for Scalable, Intelligent, and Secure file system) An Implementation of OSD based Cluster File

3

PSPACE, inc.▣ Founded in 2004.

◈Housed in GyungGi, Korea◈very-very small company of 7 engineers

▣ High Performance Computing Specialists▣ Specialties in

◈ High Performance Computing: GPU Computing◈ High Performance Network: InfiniBand, 10G, Myrinet◈ High Performance Storage: OASIS, Lustre, PVFS◈ Cluster Management Software: ezConTM

◈ Resource Management Software: PBSPro, Torque

▣ Storage System Development Team◈ Began development in 2005◈ Co-Developed with ETRI (Electrical and Telecommunication Research Institute)

◈ Product debut in 3rdQ of 2007

Page 4: Kenny Kim Pspace ISW5 · 2012-08-16 · 1 OASISfs (Object-based storage Architecture for Scalable, Intelligent, and Secure file system) An Implementation of OSD based Cluster File

4

OASISfs Development Team▣7 engineers from PSPACE, inc.

◈Headed by Mr. Kenny Kim◈Mainly developing InfiniStorTM (storage solution box using OASISfs)

– End user support– Management Software Development and HA Support Development– Hardware Development (suitable for OASISfs)

▣10 engineers from ETRI◈Headed by Dr. Jun Kim◈OASIS Core Development

▣Other Participants◈MS-Windows port by SoftOnNet, inc.◈Backup support by Prof. Yoo, JS at ChungBuk National University◈Security support by S.S.K. University◈Software testing by SureSoft, inc.

Page 5: Kenny Kim Pspace ISW5 · 2012-08-16 · 1 OASISfs (Object-based storage Architecture for Scalable, Intelligent, and Secure file system) An Implementation of OSD based Cluster File

5

OASISfs Requirement▣OSD standard compliance▣Scaleable file system

◈Scalable in Capacity & Performance▣General purpose file system

◈POSIX compliance◈Support LINUX and Windows

▣Provide file management system◈Provide backup system◈Provide GUI, CLI MGMT methods

▣Complete in 18(+4) Months

Page 6: Kenny Kim Pspace ISW5 · 2012-08-16 · 1 OASISfs (Object-based storage Architecture for Scalable, Intelligent, and Secure file system) An Implementation of OSD based Cluster File

6

OASISfs: Value Proposition

Stable Scalable Manageable

Users

Easy to Users

Reliable

Resilient to Failure

CIO (and Managers)

MaximumPerformance atReasonable andPredictable Price

Administrator

Provide Intuitive andComplete

ManagementSoftware

Provide “Stable, Scale in Performance and Capacity,and Manageable” Storage File System for Normal People

By enablingScalable in Performance and CapacityEfficient Resource Share and ManagementHigh Availability, Raid SupportIntuitive and Easy to Use Management Software

Page 7: Kenny Kim Pspace ISW5 · 2012-08-16 · 1 OASISfs (Object-based storage Architecture for Scalable, Intelligent, and Secure file system) An Implementation of OSD based Cluster File

7

OASISfs Specification◈ Performance: 400MB/s per OSD (Bonded Quad Gigabit)◈ Scalability: 100’s Storage Server Devices (OSDs)

100’s Clients◈ Interoperability: Linux and Windows in Native Mode◈ Simplicity: A single, shared, coherent filesystem◈ Standards: POSIX Standards

OSD 1.0iSCSI

◈ Raid support: Linear, Linear+1, Raid0, Raid5, (RAID0+1 in future)– flexible price/performance resiliency choices

◈ Supported Platforms–Linux 32Bit, Linux 64Bit, Kernel 2.6.10, Kernel 2.6.18–Windows XP 32Bit Single CPU

Page 8: Kenny Kim Pspace ISW5 · 2012-08-16 · 1 OASISfs (Object-based storage Architecture for Scalable, Intelligent, and Secure file system) An Implementation of OSD based Cluster File

8

▣ Functionalities

OOut-of-band I/O

Performance Object StripeParallel I/O

Read BalanceI/O Road Balance

Active/StandbyMDS HA

HALinear+1, LinearRAID0, RAID5

OSD HA

ofs_fsckFile Check Utility

99 (Tested)Max Client

Scalability99 (Tested)Max OSD

2Max MDS

On-lineOSD Expansion

Linux (2.6.10, 2.6.18),Windows (XP, 32Bit)OS

TCP, InfinibandI/FNetwork

OMulti NIC Support

Interoperability

CPUs

Supported

i386, x86_64, EM64T

Platform

OASISfs 2.0Items

O100% POSIX

FunctionOlockf, flock

△mmap

OQuota Support

File Set Support

X1 Target-Multi File Set

X1 OSD-Multi Target

Omount, fstab

OBackup

OCLI

MGMT OWeb

OMonitoring

Write-ThruMDS Cache

Cache Management

Write-BackOSD Cache

Unix TypeCache Coherency

FileCache Granularity

Updater Invalidation, RevokeCache Coherency

Page 9: Kenny Kim Pspace ISW5 · 2012-08-16 · 1 OASISfs (Object-based storage Architecture for Scalable, Intelligent, and Secure file system) An Implementation of OSD based Cluster File

9

▣Block I/O & Object I/O

File

Block Device File System Object Device File System

File

Light-weightPer File Metadata

(Inode)Heavy-weightPer File Metadata

(Inode)

100110011000100001101

FilesystemMetadata

(free block bitmap)

100110011000100001101

Page 10: Kenny Kim Pspace ISW5 · 2012-08-16 · 1 OASISfs (Object-based storage Architecture for Scalable, Intelligent, and Secure file system) An Implementation of OSD based Cluster File

10

▣I-Node Management & Metadata File Management

SAN/NAS

SAN

Sync forNamespace,Inode, Data

Sync for SuperBlock, Free

Block Bitmap,Inode Bitmap

Sync forNamespace,Inode, Data

No Sync forStorageMetadata

More S

calabl

e!!

Page 11: Kenny Kim Pspace ISW5 · 2012-08-16 · 1 OASISfs (Object-based storage Architecture for Scalable, Intelligent, and Secure file system) An Implementation of OSD based Cluster File

11

▣ InfiniStorTM HW Configuration

10/100/1000 GbE

Linux Servers

Window

(App. Server)

Client Module (FM)

Metadata Server

Object Storage Server

OSD Server

(Disks)

Single Volume Storage

HA

Page 12: Kenny Kim Pspace ISW5 · 2012-08-16 · 1 OASISfs (Object-based storage Architecture for Scalable, Intelligent, and Secure file system) An Implementation of OSD based Cluster File

12

▣ MDS◈ Serves meta data information◈ holds directory & file attributes

▣ OSTs:◈ Serves file data information

▣ OASIS MGMT◈ Contains Configuration data

▣ OASIS FM◈ reads/write directly to OST storage devic

es▣ High speed network interconnect

◈ GigE◈ Myrinet

Focus on Scalable Performance and CapacitySeparation of Meta data & file dataScalable meta dataScalable file dataEfficient lockingObject architecture

▣ MDS◈ Serves meta data information◈ holds directory & file attributes

▣ OSTs:◈ Serves file data information

▣ OASIS MGMT◈ Contains Configuration data

▣ OASIS FM◈ reads/write directly to OST storage device

s▣ High speed network interconnect

◈ GigE◈ InfiniBand

▣ MDS◈ Serves meta data information◈ holds directory & file attributes

▣ OSTs:◈ Serves file data information

▣ OASIS MGMT:◈ Contains Configuration data

▣ OASIS FM◈ reads/write directly to OST storage device

s▣ High speed network interconnect

◈ GigE◈ InfiniBand

▣ MDS◈ Serves meta data information◈ holds directory & file attributes

▣ OSTs:◈ Serves file data information

▣ OASIS MGMT:◈ Contains Configuration data

▣ OASIS FM◈ reads/write directly to OST storage device

s▣ High speed network interconnect

◈ GigE◈ InfiniBand

▣ MDS◈ Serves meta data information◈ holds directory & file attributes

▣ OSTs:◈ Serves file data information

▣ OASIS MGMT:◈ Contains Configuration data

▣ OASIS FM◈ reads/write directly to OST storage device

s▣ High speed network interconnect

◈ GigE◈ InfiniBand

▣ MDS◈ Serves meta data information◈ holds directory & file attributes

▣ OSTs:◈ Serves file data information

▣ OASIS MGMT:◈ Contains Configuration data

▣ OASIS FM◈ reads/write directly to OST storage device

s▣ High speed network interconnect

◈ GigE◈ InfiniBand

Meta-data Server(MDS)

OASIS FM

OASIS MGMT

Object Storage Device(OSD)

Object Storage Device(OSD)

OASIS MGMT

OASIS FM

Meta-data Server(MDS)

Directory operations,Meta-data & concurrency

file status,file creation

file I/O &file locking

Configuration information, network connectiondetails, & security management

Not encumbered by existing architecture

▣ OASISfs Architecture

Page 13: Kenny Kim Pspace ISW5 · 2012-08-16 · 1 OASISfs (Object-based storage Architecture for Scalable, Intelligent, and Secure file system) An Implementation of OSD based Cluster File

13

▣ InfiniStorTM Protocol Architecture

VFS

SO Driver

SCSI mid Layer

OSD/iSCSI Driver

Linux SCSI Stack

OASIS File System (FM)

Multiple Object DevicesDriver (MOD)

Linux Block Layer

TCP/IP Stack

EXT3, XFS

Metadata Manager (MM)VFS iSCSI Target

T10 Object Manager

Object I/O Manager (OM)

SO Driver

SCSI mid Layer

OSD/iSCSI Driver

Linux SCSIStack

TCP/IP Stack

Metadata Server

Object Storage Target

RPCDirectory, Metadata, Concurrency

OSD/iSCSIFile Status & Creation

OSD/iSCSISystem & Parallel File I/O, File Locking

File Manager (Client)

EXT3, XFS

Namespace

Objects

TCP/IP Stack

Proprietary

Open Source

Page 14: Kenny Kim Pspace ISW5 · 2012-08-16 · 1 OASISfs (Object-based storage Architecture for Scalable, Intelligent, and Secure file system) An Implementation of OSD based Cluster File

14

▣ Characteristics◈ OSD Based Cluster File System

– Network based Data Share– Volume Management– Linux, Window based Client File System– Client of Clients Support via NFS/CIFS– Near-Linear Performance-Capacity Scalability as Storage Server Scale– Configuration possible based on user need (Performance, Capacity, Budg

et, Availability, …)◈Out-of-Band architecture◈Standard Compliance◈High Availability for Metadata Servers, Object Storage Servers

Page 15: Kenny Kim Pspace ISW5 · 2012-08-16 · 1 OASISfs (Object-based storage Architecture for Scalable, Intelligent, and Secure file system) An Implementation of OSD based Cluster File

15

▣ Execution Simulation

OSD

Client

MDSGigabit EthernetSwitch

Fileset

/

home share

big.avi

Data

Metadata

Data

InfiniStorTM Runtime Characteristics

Page 16: Kenny Kim Pspace ISW5 · 2012-08-16 · 1 OASISfs (Object-based storage Architecture for Scalable, Intelligent, and Secure file system) An Implementation of OSD based Cluster File

16

▣ Storage Virtualization using Object RAID

OASIS/MDS

2) From multiple OSDs, create VFS using provided configuration tool

3) Provide VFS Conf. Info

500GB HDD 6TB OSD1) Install 12 HDDs to a OSD

OASIS/FM

POSIXAPI

Multiple IO to/from OSDs

File

OASIS/OM

Perform Virtualization

• Virtualizes Multiple OSDs

• OASIS/MDS: Virtualized Configuration MGMT• OASIS/Client: Mapping Physical Storage System to Virtual File System

OASIS/OM

OASIS/OM

OASIS/OM

Page 17: Kenny Kim Pspace ISW5 · 2012-08-16 · 1 OASISfs (Object-based storage Architecture for Scalable, Intelligent, and Secure file system) An Implementation of OSD based Cluster File

17

▣ Online OSD Add: No Service Interruption

OASIS/MDS

OASIS ClientsOASIS OSDs

2) Add HDDs to OSD

3) Connect OSDs online

4) Register OSD to MDS by Configuration tool 5) New configuration info is applied

to all clients automatically

6) New conf. is acknowledged and use with no interrupt

1) check if space is available

Low Space

Page 18: Kenny Kim Pspace ISW5 · 2012-08-16 · 1 OASISfs (Object-based storage Architecture for Scalable, Intelligent, and Secure file system) An Implementation of OSD based Cluster File

18

▣ RAID SupportLinear OSD

Store a file on 1 OSDBalanced IO for read operation

RAID0 OSDStripe a file and store each stripe on a OSDFastest IO for both read and write operation

RAID1 OSDStorage a file on 2 OSDs (Mirror)Resilience for OSD TroubleImprove Concurrent Read Ops of a file

RAID5 OSDStripe a file and add parity for the stripes, and store eachon a OSDResilience for OSD TroubleImproved OSD usage

OASIS/MDS

OASIS ClientsOASIS OSDs

OASIS/MDS

OASIS ClientsOASIS OSDs

OASIS/MDS

OASIS ClientsOASIS OSDs

OASIS/MDS

OASIS ClientsOASIS OSDs

P

P

Page 19: Kenny Kim Pspace ISW5 · 2012-08-16 · 1 OASISfs (Object-based storage Architecture for Scalable, Intelligent, and Secure file system) An Implementation of OSD based Cluster File

19

▣ Cache-Coherency Support Method• Near Unix Semantics Support• Eliminate Performance Degration when Clients do not approach to a file at a same time

NFSServer

NFSClient

NFSClient

AAA

A’

A’A

554433221100 A’

NFS(Policy: Time interval, File Open)

LustreServer

LustreClient

LustreClient

AAA

A’

A’A

A’

Lustre(Policy: Check when access)

????????

OASISServer

OASISClient

OASISClient

AAA

A’

A’A

A’

OASIS(Policy: Updater Invalidation)

??

Page 20: Kenny Kim Pspace ISW5 · 2012-08-16 · 1 OASISfs (Object-based storage Architecture for Scalable, Intelligent, and Secure file system) An Implementation of OSD based Cluster File

20

▣ InfiniStorTM Configuration & Monitoring Software

ezCon: Web based OASIS Management Software

Page 21: Kenny Kim Pspace ISW5 · 2012-08-16 · 1 OASISfs (Object-based storage Architecture for Scalable, Intelligent, and Secure file system) An Implementation of OSD based Cluster File

21

▣Management SW: InfiniStorTM Backup Software

Page 22: Kenny Kim Pspace ISW5 · 2012-08-16 · 1 OASISfs (Object-based storage Architecture for Scalable, Intelligent, and Secure file system) An Implementation of OSD based Cluster File

22

▣Management SW : OSD Server Monitoring Software

Page 23: Kenny Kim Pspace ISW5 · 2012-08-16 · 1 OASISfs (Object-based storage Architecture for Scalable, Intelligent, and Secure file system) An Implementation of OSD based Cluster File

23

▣Management SW: Application Server Monitoring Software

Page 24: Kenny Kim Pspace ISW5 · 2012-08-16 · 1 OASISfs (Object-based storage Architecture for Scalable, Intelligent, and Secure file system) An Implementation of OSD based Cluster File

24

▣ Open Document◈ Windows Client File Management SW Spec◈ Detailed Documents

– Linux Client File Management Block– Metadata Management Block– Object Storage Management Block

▣ Test suite◈ User Level (File Access API) Test Suite

– POSIX Test Suite– Linux Test Plan Suite– Self Made Test Suite (200K cases)

◈ Blackbox based Concurrent Use Test Test Suite– 100 Clients => Each Client creates 20 threads => Each threads creates 1,000 files/s– 100 Clients => Each Client creates 50 threads => Each threads reads all files– 100 Clients => Each Client creates empty file in infinity loop (until MDS metadata spac

e run out).

OASIS Verification Process

Page 25: Kenny Kim Pspace ISW5 · 2012-08-16 · 1 OASISfs (Object-based storage Architecture for Scalable, Intelligent, and Secure file system) An Implementation of OSD based Cluster File

25

Deployment Experience 1▣ Korea Supercomputing Center Bio Informatics Division

◈Hardware Configuration– 4 OSDs, 1 MDS– 40 Application Servers– Gigabit Network

◈Application– Parallel Blast: Bio-Informatics Gene Sequencing Program– Software Pattern

– Each Application Server runs 8 Processes of Parallel Blast– Each Parallel Blast runs (read) 1MB to 4GB Target file at once– Run it indefinitely

– Each CPU creates 50 1KB files every second◈Result

– Very Slow (as expected)– No Crash– Found bottle point in OASISfs and IMPROVED parallel IO Performance by

4 times

Page 26: Kenny Kim Pspace ISW5 · 2012-08-16 · 1 OASISfs (Object-based storage Architecture for Scalable, Intelligent, and Secure file system) An Implementation of OSD based Cluster File

26

Deployment Experience 2▣ PANDORA TV: Korea’s largest UCC service

◈ Hardware Configuration– 2-3 OSDs, 1 MDS– 1-4 Application Servers– Gigabit Network

◈ Application– Streaming Service– Each Application Server (Apache Web Server) delivers 600-900 VOD Streams (FTP

Downloads)– Each Stream is of size from 00KB to 800MB– Very-Very Random Access (Mostly READ ops)– Operation type

– File open -> seek by 128KB*n -> read 128KB -> close◈ Result

– Each application server gets sustained 300MB/s read– 1 Stop by bug in Linux Kernel 2.6.10 EXT3 Hash table.

– Fixed by changing OSD FS from EXT3 to EXT2– No Crash for 3 Months

Page 27: Kenny Kim Pspace ISW5 · 2012-08-16 · 1 OASISfs (Object-based storage Architecture for Scalable, Intelligent, and Secure file system) An Implementation of OSD based Cluster File

27

Future Work (in progress)▣OASISfs Version4

◈10K Client support – Each client does 100 Random I/O’s◈Autonomous & Dynamic Data Redundancy support◈Hot Data duplication support (for frequent read access)◈DeDuplication support◈ Improved MDS

– Active-Active MDS support– Probably 8-16 Concurrent MDS support (all active)– Shared-All or Shared-Nothing (not decided)

◈OSD– 1 MDS : Multiple OST support– OST-Network mapped OST support

◈Client– Kernel Patch-less support (probably slower than current version 10%??)– RPC or Socket based (no more iSCSI???)– Probably FUSE based (??)

Page 28: Kenny Kim Pspace ISW5 · 2012-08-16 · 1 OASISfs (Object-based storage Architecture for Scalable, Intelligent, and Secure file system) An Implementation of OSD based Cluster File

28

InfiniStorTM Combines the Best of SAN and NAS

Shared data (as with NAS)

High bandwidth, low overhead,Secure Access (as with SAN)

High Scalability (much higher than NAS)

High Availability

Easy Management regardless of Client (as with NAS)

Various Communication Media SupportCan use existing network interconnects

Gb Ethernet, 10 Gb Ethernet, InfiniBand, …

Lower cost than connecting Fibre Channel to

hundreds of application clients

systemsupport

System area network

Sys Admin

Network I/O

Application ServerDedicated resources

For Metadata service and

lock management

InfiniStorTM

Page 29: Kenny Kim Pspace ISW5 · 2012-08-16 · 1 OASISfs (Object-based storage Architecture for Scalable, Intelligent, and Secure file system) An Implementation of OSD based Cluster File

29

Thank you