ddss: a low-overhead distributed data sharing substrate...

29
DDSS: A Low-Overhead Distributed Data Sharing Substrate for Cluster-Based Data-Centers over Modern Interconnects K. Vaidyanathan, S. Narravula and D. K. Panda Network Based Computing Laboratory (NBCL) The Ohio State University

Upload: vominh

Post on 03-Dec-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DDSS: A Low-Overhead Distributed Data Sharing Substrate …mvapich.cse.ohio-state.edu/static/media/publications/slide/... · DDSS: A Low-Overhead Distributed Data Sharing Substrate

DDSS: A Low-Overhead Distributed Data Sharing

Substrate for Cluster-Based Data-Centers over Modern

Interconnects

K. Vaidyanathan, S. Narravula and D. K. Panda

Network Based Computing Laboratory (NBCL)

The Ohio State University

Page 2: DDSS: A Low-Overhead Distributed Data Sharing Substrate …mvapich.cse.ohio-state.edu/static/media/publications/slide/... · DDSS: A Low-Overhead Distributed Data Sharing Substrate

Presentation Outline

• Introduction and Motivation

• Proposed DDSS Framework

• Experimental Results

• Conclusions and Future Work

Page 3: DDSS: A Low-Overhead Distributed Data Sharing Substrate …mvapich.cse.ohio-state.edu/static/media/publications/slide/... · DDSS: A Low-Overhead Distributed Data Sharing Substrate

Introduction and Motivation

WANWAN

Clients

Web-server(Apache)

DatabaseServer

(MySQL)

Storage

• Internet growth– Number of Users, Type of Service, Amount of data– E-Commerce, online-banking, stocks, airline reservations

• Data-centers enable such services– Process data and reply to queries– Need for services like caching, resource adaptation for performance,

scalability

ProxyServer

Caching,load

balancing

Application Server (PHP)

CGI, PHP

Multi-TierData-Centers

Page 4: DDSS: A Low-Overhead Distributed Data Sharing Substrate …mvapich.cse.ohio-state.edu/static/media/publications/slide/... · DDSS: A Low-Overhead Distributed Data Sharing Substrate

High-Performance Networks

• InfiniBand, 10 GigE– High Bandwidth– Low Latency

• Provides rich features– RDMA semantics, Atomic operations, Protocol offload

• OpenFabrics stack– Single interface for InfiniBand, iWARP/10 GigE, etc

• Targeted for Multi-Tier Data-Centers• Can the data-center processes coordinate

better?

Page 5: DDSS: A Low-Overhead Distributed Data Sharing Substrate …mvapich.cse.ohio-state.edu/static/media/publications/slide/... · DDSS: A Low-Overhead Distributed Data Sharing Substrate

Information-Sharing is common• Applications typically employ their own

– Data placement and management protocols– Synchronization protocols

• Data-Center services– Active Resource Adaptation

• Maintain Server state information

• Locking requirements– Caching

• Coherency & Consistency requirements

– Resource Monitoring (IBM Websphere)

• Load information shared across several servers

– Critical decisions based on shared information

ProxyModule M1(S1, S2

S3, S4 load)

ProxyModule M2

(S1, S2S3, S4 load)

ProxyModule M3

(S1, S2S3, S4 load)

Load ofServer S1

Load ofServer S2

Load ofServer S3

Load ofServer S4

Resource Monitoring Service

Page 6: DDSS: A Low-Overhead Distributed Data Sharing Substrate …mvapich.cse.ohio-state.edu/static/media/publications/slide/... · DDSS: A Low-Overhead Distributed Data Sharing Substrate

Problems with Existing approaches

• Ad-hoc messaging protocols for exchanging data• May have high overheads• Performance may depend on the system load• May not use the advanced features• May not be scalable

Page 7: DDSS: A Low-Overhead Distributed Data Sharing Substrate …mvapich.cse.ohio-state.edu/static/media/publications/slide/... · DDSS: A Low-Overhead Distributed Data Sharing Substrate

Objective

• Can we design a load resilient substrate (DDSS) for data-center applications and services utilizing advanced features such as RDMA, remote atomic operations?

Page 8: DDSS: A Low-Overhead Distributed Data Sharing Substrate …mvapich.cse.ohio-state.edu/static/media/publications/slide/... · DDSS: A Low-Overhead Distributed Data Sharing Substrate

Presentation Outline

• Introduction and Motivation

• Proposed DDSS Framework

• Experimental Results

• Conclusions and Future Work

Page 9: DDSS: A Low-Overhead Distributed Data Sharing Substrate …mvapich.cse.ohio-state.edu/static/media/publications/slide/... · DDSS: A Low-Overhead Distributed Data Sharing Substrate

Distributed Data Sharing Mechanism

Shared Data

Data-CenterApplication

ResourceAdaptationServices

LoadBalancingServices

Data-CenterApplication

ResourceMonitoringServices

ResourceMonitoringServices

Get

Get load

Get load

Put

Put load

Put load

Lock Data

Provide an effective mechanism to share data across the data-center

Page 10: DDSS: A Low-Overhead Distributed Data Sharing Substrate …mvapich.cse.ohio-state.edu/static/media/publications/slide/... · DDSS: A Low-Overhead Distributed Data Sharing Substrate

Proposed DDSS Framework

InfiniBand 10 GigE High-Speed Interconnects

ProtocolOffload

RDMA Atomic Multicast

�P� Ma�a�e�e��

�o��e���o�M���

Me�oryM���

a�aM���

!a"��Lo�#"

�o$ere��y,�o�"�"�e��yMa���e�a��e

Data-CenterApplications

Data-CenterServices

High-SpeedNetworks

AdvancedNetworkFeatures

DistributedData-Sharing

SubstrateComponents

Page 11: DDSS: A Low-Overhead Distributed Data Sharing Substrate …mvapich.cse.ohio-state.edu/static/media/publications/slide/... · DDSS: A Low-Overhead Distributed Data Sharing Substrate

Proposed Framework Contd…

• Data Management– Local vs Remote, for load

balancing

• Basic Locking– Through atomic operations

(IBA)

• Coherency and Consistency Maintenance– Strict, Write/Read, Null, Delta,

Version– Use of RDMA and atomic

operations

�P� Ma�a�e�e��

�o��e���o�M���

Me�oryM���

a�aM���

!a"��Lo�#"

�o$ere��y,�o�"�"�e��yMa���e�a��e

ProtocolOffload

RDMAAtomic

Page 12: DDSS: A Low-Overhead Distributed Data Sharing Substrate …mvapich.cse.ohio-state.edu/static/media/publications/slide/... · DDSS: A Low-Overhead Distributed Data Sharing Substrate

Proposed Framework Contd…

• Connection Management– Takes care of connection-setup and

teardown for nodes participating in DDSS

• Memory Management– Allocates a pool of memory for

DDSS on each node– Manages allocation, release

operations

• IPC Management– Access for multiple threads– Message Queues

a�a%�e��er&''l��a��o�"

(Serv��e"

Module

IPC

OpenFabricsStack

OtherApplications

OtherModules

TCP/IPStack

Page 13: DDSS: A Low-Overhead Distributed Data Sharing Substrate …mvapich.cse.ohio-state.edu/static/media/publications/slide/... · DDSS: A Low-Overhead Distributed Data Sharing Substrate

DDSS Interface

DDSS Interface• allocate_ss(…)• release_ss(…)• get(…)• put(…)• acquire_lock_ss(…)• release_lock_ss(…)• …

Key = allocate_ss(1024, NONCOHERENT_SS, 5000);

put(key, data, 10);compute();get(key,data, 10);release_ss(key);

Key = allocate_ss(1024, WRITE_COHERENT_SS, 5000);

acquire_lock_ss(key);

put(key, data, 10);release_lock_ss(key);

compute();get(key,data, 10);release_ss(key);

Page 14: DDSS: A Low-Overhead Distributed Data Sharing Substrate …mvapich.cse.ohio-state.edu/static/media/publications/slide/... · DDSS: A Low-Overhead Distributed Data Sharing Substrate

Presentation Outline

• Introduction and Motivation

• Proposed Framework

• Experimental Results

• Conclusions and Future Work

Page 15: DDSS: A Low-Overhead Distributed Data Sharing Substrate …mvapich.cse.ohio-state.edu/static/media/publications/slide/... · DDSS: A Low-Overhead Distributed Data Sharing Substrate

Experimental Testbed

• InfiniBand– Cluster with dual Intel Xeon 3.4 GHz, 1GB memory– MT25128 Mellanox HCA

• iWARP/GigE– Cluster with Intel dual Xeon 3.0 GHz, 512 MB

memory– Ammasso 1100 Gigabit Ethernet NIC

• OpenFabrics stack– IB, Ammasso (iWARP)

Page 16: DDSS: A Low-Overhead Distributed Data Sharing Substrate …mvapich.cse.ohio-state.edu/static/media/publications/slide/... · DDSS: A Low-Overhead Distributed Data Sharing Substrate

Experimental Results Outline

• Microbenchmarks– Performance of put() and get() operations

• Distributed Applications– Distributed STORM– Checkpointing Application

• Data-Center Services– Active Resource Adaptation

– Active Caching

Page 17: DDSS: A Low-Overhead Distributed Data Sharing Substrate …mvapich.cse.ohio-state.edu/static/media/publications/slide/... · DDSS: A Low-Overhead Distributed Data Sharing Substrate

Microbenchmarks

• performance of put() and get() operation for small messages is less than 65 usecs for all coherence models

0

2 0

4 0

6 0

8 0

1 0 0

1 2 0

1 4 0

1 1 6 2 5 6 4 0 9 6 6 5 5 3 6

M e s s a g e S i z e ( b y t e s )

Lat

ency

(u

secs

)

N u l lR e a dW r i t eS t r i c tV e r s i o nD e l t a

0

2 0

4 0

6 0

8 0

1 0 0

1 2 0

1 4 0

1 1 6 2 5 6 4 0 9 6 6 5 5 3 6

M e s s a g e S i z e (b y t e s )

Lat

ency

(u

secs

)

N u l lR e a dW r i t eS t r i c tV e r s i o nD e l t a

put() performance get() performance

Page 18: DDSS: A Low-Overhead Distributed Data Sharing Substrate …mvapich.cse.ohio-state.edu/static/media/publications/slide/... · DDSS: A Low-Overhead Distributed Data Sharing Substrate

Distributed STORM

• Select data of interest and transfer from storage to compute nodes

• Same dataset is processed by multiple STORM applications

� this shared dataset is placed in DDSS

• STORM using DDSS shows close to 19% improvement

01 0 0 02 0 0 03 0 0 04 0 0 05 0 0 06 0 0 07 0 0 08 0 0 0

Q u e r y E x e c u t i o n

T i m e (u s e c s )

1 K 5 K 1 0 K 1 0 0 K

# R e c o r d s

S T O R M S T O R M -D D S S

Page 19: DDSS: A Low-Overhead Distributed Data Sharing Substrate …mvapich.cse.ohio-state.edu/static/media/publications/slide/... · DDSS: A Low-Overhead Distributed Data Sharing Substrate

CR Coordination

• Checkpoint at random time

• Simulates restart from a consistent checkpoint

• Checkpoint uses DDSS for maintaining checkpoint information, locks, versions, etc

• Check-pointing applications using DDSS are highly scalable

050

100150200250300350

2 3 4 5 6 7 8 9 10 11 12

Number of cl ients

Tim

e (u

secs

)Avg Sync Time Avg Total Time

Page 20: DDSS: A Low-Overhead Distributed Data Sharing Substrate …mvapich.cse.ohio-state.edu/static/media/publications/slide/... · DDSS: A Low-Overhead Distributed Data Sharing Substrate

Active Resource Adaptation

• Monitors the load of different websites

• If a website is loaded, shift under-utilized servers to loaded websites

• Software Overhead of DDSS is < 2%

0

5 0

1 0 0

1 5 0

5 1 0 2 0 4 0 6 0 8 0

L o a d (%)

Tim

e (u

secs

)

05 0 01 0 0 01 5 0 02 0 0 02 5 0 03 0 0 0

R e c o n f i g u r a t i o n T i m e s o f t w a r e -o v e r h e a dN o o f R e c o n f i g u r a t i o n s

Page 21: DDSS: A Low-Overhead Distributed Data Sharing Substrate …mvapich.cse.ohio-state.edu/static/media/publications/slide/... · DDSS: A Low-Overhead Distributed Data Sharing Substrate

Active Caching

• Supports Strong Coherency for cached dynamic data

• Checks the back-end for current version using RDMA

• Active cache using DDSS is load-resilient

01 0 02 0 03 0 04 0 05 0 06 0 07 0 0

1 2 4 8 1 6 3 2

N u m b e r o f C o m p u t e /C o m m u n i c a t i o n T h r e a d s

Tim

e (u

secs

)

V e r s i o n C h e c k - D D S S V e r s i o n C h e c k - T C P

Page 22: DDSS: A Low-Overhead Distributed Data Sharing Substrate …mvapich.cse.ohio-state.edu/static/media/publications/slide/... · DDSS: A Low-Overhead Distributed Data Sharing Substrate

Conclusions & Future Work

• Proposed a distributed data sharing substrate• Using DDSS, data-center applications and

services, with very little modification, can get significant benefits in performance and scalability

• Implemented over OpenFabrics – applicable across InfiniBand, iWARP-capable adapters

• Future work on Fault-tolerance, support for large file sizes, advanced resource management schemes.

Page 23: DDSS: A Low-Overhead Distributed Data Sharing Substrate …mvapich.cse.ohio-state.edu/static/media/publications/slide/... · DDSS: A Low-Overhead Distributed Data Sharing Substrate

Acknowledgements

Our research is supported by the following organizations

• Current Funding support by

• Current Equipment support by

Page 24: DDSS: A Low-Overhead Distributed Data Sharing Substrate …mvapich.cse.ohio-state.edu/static/media/publications/slide/... · DDSS: A Low-Overhead Distributed Data Sharing Substrate

Web Pointers

Group Homepage: http://nowlab.cse.ohio-state.edu

Emails: {vaidyana, narravul, panda}@cse.ohio-state.edu

NBC-LAB

Page 25: DDSS: A Low-Overhead Distributed Data Sharing Substrate …mvapich.cse.ohio-state.edu/static/media/publications/slide/... · DDSS: A Low-Overhead Distributed Data Sharing Substrate

Backup Slides

Page 26: DDSS: A Low-Overhead Distributed Data Sharing Substrate …mvapich.cse.ohio-state.edu/static/media/publications/slide/... · DDSS: A Low-Overhead Distributed Data Sharing Substrate

High-Performance Networks in Data-Centers

• InfiniBand, iWARP-capable adapters– Offer several features like RDMA, atomic operations (IB), iWARP

(Ammasso, 10 GigE)

Cluster-Based Data-Center Environment

(InfiniBand, iWARP-capable Ammasso, 10 GigE)

WideArea

Network

Distributed Data-Center Environment

iWARPCluster

iWARPCluster

iWARPCluster

Page 27: DDSS: A Low-Overhead Distributed Data Sharing Substrate …mvapich.cse.ohio-state.edu/static/media/publications/slide/... · DDSS: A Low-Overhead Distributed Data Sharing Substrate

Active Resource Adaptation Design

ServerWebsite A

LoadBalancer

ServerWebsite B

Not Loaded Loaded

Load QueryLoad Query

Successful Atomic (Lock)

Successful Atomic (Update Counter)

Reconfigure Node

Successful Atomic (Unlock)

Load Shared Load Shared

RDMARDMA

P. Balaji, K. Vaidyanathan, S. Narravula and D.K. Panda “Exploiting Remote Memory Operations to Design Efficient Reconfiguration forShared Data-Centers over InfiniBand” presented at RAIT 2004

Page 28: DDSS: A Low-Overhead Distributed Data Sharing Substrate …mvapich.cse.ohio-state.edu/static/media/publications/slide/... · DDSS: A Low-Overhead Distributed Data Sharing Substrate

RDMA based Client Polling Design

Front-End Back-End

Request

Cache Hit

Cache Miss

Response

Version Read

Response

S. Narravla, P. Balaji, K. Vaidyanathan, S. Krishnamoorthy, .. /u and D.K. Panda “Supporting Strong Coherency for Active Caches in Multi-Tier Data-

Centers over InfiniBand” presented at SAN 2004

Page 29: DDSS: A Low-Overhead Distributed Data Sharing Substrate …mvapich.cse.ohio-state.edu/static/media/publications/slide/... · DDSS: A Low-Overhead Distributed Data Sharing Substrate

Microbenchmarks

• performance of put() and get() operation is less than 50 usecs

0

1 0

2 0

3 0

4 0

5 0

6 0

1 5 9

N u m b e r o f C l i e n t s

Lat

ency

(u

secs

)

N u l lR e a dW r i t eS t r i c tV e r s i o nD e l t a

0

1 0 0

2 0 0

3 0 0

4 0 0

5 0 0

6 0 0

0 .1 0 .8

L o c k C o n t e n t i o n (% )

Lat

ency

(u

secs

)

N u l lR e a dW r i t eS t r i c tV e r s i o nD e l t a