colony for-openstack-grizzly-summit

43
Copyright © 2012 NTT DATA Corporation 15/Oct/2012 NTT DATA INTELLILINK Motonobu Ichimura Inter-cloud object storage: Colony

Upload: shigetoshi-yokoyama

Post on 05-Jul-2015

303 views

Category:

Documents


1 download

DESCRIPTION

The presentation slides at OpenStack Grizzly summit on Oct.15th, 2012 in San Diego

TRANSCRIPT

Page 1: Colony for-openstack-grizzly-summit

Copyright © 2012 NTT DATA Corporation

15/Oct/2012 NTT DATA INTELLILINK

Motonobu Ichimura

Inter-cloud object storage:

Colony

Page 2: Colony for-openstack-grizzly-summit

2 Copyright © 2012 NTT DATA INTELLILINK Corporation

http://etherpad.openstack.org/grizzly-colony

EtherPad

Page 3: Colony for-openstack-grizzly-summit

3 Copyright © 2012 NTT DATA INTELLILINK Corporation

Agenda

•What is Colony ?

–Our goal

–Usecase

•How to make swift network(or region) aware

–Problems with original swift code

–Our modification

–Investigation

–Conclusion

•Future Plan

–Problems to tackle (and being tackled)

–Collaboration

Page 4: Colony for-openstack-grizzly-summit

4 Copyright © 2012 NTT DATA INTELLILINK Corporation

What is Colony?

Page 5: Colony for-openstack-grizzly-summit

5 Copyright © 2012 NTT DATA INTELLILINK Corporation

・・・

Univ. -A Cloud Univ.-B Cloud

Univ.-X Cloud

Academic

Community Cloud Education Cloud Education Cloud

Research Cloud Research Cloud

Science Information Network

Goal: academic community cloud

5

Intercloud services Intercloud services

Page 6: Colony for-openstack-grizzly-summit

6 Copyright © 2012 NTT DATA INTELLILINK Corporation

Intercloud object storage service

Swift for

intercloud

use

Swift

Swift

Swift

Swift for

intercloud

use

Swift for

intercloud

use

Swift for

local use

Swift for

intercloud use

Nova

Nova

Nova

Glance

Glance Glance

Colony federates cloud

object storage services,

like swift, to archive

intercloud object

storage service.

Page 7: Colony for-openstack-grizzly-summit

7 Copyright © 2012 NTT DATA INTELLILINK Corporation

Swift-I

Cloud-A

Swift-A

Container A1

Container A2

Container A3

Inter-cloud Container I1

Inter-cloud Container I4

Object A1-1

Object A1-2

Object A1-3

Object I4-1

Object I4-2

Object I4-3

Cloud-B

Container B1

Container B2

Container B3

Inter-cloud Container I1

Inter-cloud Container I8

Object B1-1

Object B1-2

Object B1-3

Object I1-1

Object I1-2

Object I1-3

Inter-cloud object storage service : colony

Cloud Services

Inter-cloud Container

I1 Inter-cloud Container I2

Inter-cloud Container I3

Inter-cloud Container I13

Inter-cloud Container I10

Inter-cloud Container I4

Swift-B

Geographically Geographically

Distributed Object I4-1

Object I4-2

Object I4-3

Object I1-1

Object I1-2

Object I1-3

Users’ points of view

7

Page 8: Colony for-openstack-grizzly-summit

8 Copyright © 2012 NTT DATA INTELLILINK Corporation

Colony archives the federation

Colony Apache

mod_wsgi mod_shib

Colony-horizon

Colony-keystone

Colony-dispatcher

Squid

Slapd

Ubuntu

Colony-Keystone

Slapd

Swift

Colony-Keystone

Slapd

Swift

Provide seamless access to

multiple swifts

Authenticate with

Shibboleth IdP

Shibboleth IdP

Cloud-A User

Swift-I Swift-A

Page 9: Colony for-openstack-grizzly-summit

9 Copyright © 2012 NTT DATA INTELLILINK Corporation

UseCase

We plan to use Colony as Object Storage for Clouds to Clouds migration Object Storage to delevery VM Images around Japan Object Storage to store big data.

Page 10: Colony for-openstack-grizzly-summit

10 Copyright © 2012 NTT DATA INTELLILINK Corporation

Developed software components in colony

•Colony-Horizon – based on diablo/stable Horizon with some enhancements

•Multi-region support – Users can choose which swift is used to store/retrieve objects

•Swift Container’s ACL ,metadata support

•Swift Object’s metadata support

•>5G segment upload support …

•Colony-Keystone – based on diablo/stable Keystone with some enhancements

•Authenticate with Shibboleth

•%{tanant_name} can be used for endpointTemplates in addition to %{tenant_id} to federate

cloud services

•Colony-Dispatcher - new

•Relay requests to multiple object services (and merge response for clients)

•Relay requests to a specific object service indicated by URI

•Choose the “nearest” swift-proxy server to relay requests

•Copy objects among different swifts

•Utilities - new

•Tools to simplfy admin tasks to federate object storage services

Page 11: Colony for-openstack-grizzly-summit

11 Copyright © 2012 NTT DATA INTELLILINK Corporation

Swift Swift --AA

Colony-horizon

Swift Swift --II Users can choose swift

Page 12: Colony for-openstack-grizzly-summit

12 Copyright © 2012 NTT DATA INTELLILINK Corporation

Shibboleth SPShibboleth SP ColonyColony--HorizonHorizon ColonyColony--HorizonHorizon

Shibboleth IdP

Colony-Keystone Colony-Keystone

Colony – keystone

1. ID/passwd 2. Attribute: ePPN, mail_addr

3. Attribute: ePPN

4. auth_token

0-1. User registration by mail_addr

0-2 . Associate ePPN to mail_addr

by initial access

Modifications to keystone

• Add ePPN field to keystone schema

('/token_by/eppn') and email address('/token_by/email')

('/users/{user_id}/eppn')

• Add ePPN field to keystone schema

• ADD rest api services to create token by ePPN

('/token_by/eppn') and email address('/token_by/email')

• Add a rest api service to register/update ePPN

('/users/{user_id}/eppn')

Page 13: Colony for-openstack-grizzly-summit

13 Copyright © 2012 NTT DATA INTELLILINK Corporation

Colony-dispatcher

Swift Proxy

Colony Dispatcher

Swift Proxy Swift Proxy

Swift-A (local) Swift-I (intercloud )

A:container1

A:container2

I:container1

I:container2

Swift Client

1. Swift client can send requests to Swift-A and Swift-I through Swift Dispatcher

2. Swift Dispatcher merges and sends the response from each Swift to Swift Client

Requests modified for merging responses.

•Account Info

•Container List

•X-Copy-from/to

Response merged by

Colony Dispatcher has

a prefix to indicate

which Swift is used to

store.

Response merged by

Colony Dispatcher has

a prefix to indicate

which Swift is used to

store.

1

Page 14: Colony for-openstack-grizzly-summit

14 Copyright © 2012 NTT DATA INTELLILINK Corporation

A:container1

A:container2

I:container1

I:container2

Swift Client

Swift Dispatcher can use cache proxy (like squid) per

swift proxy to retrieve objects from remote swifts.

Caching

1

Swift Proxy

Colony Dispatcher

Swift Proxy Swift Proxy

Swift-A (local) Swift-I (intercloud )

Cache(Proxy)

Page 15: Colony for-openstack-grizzly-summit

15 Copyright © 2012 NTT DATA INTELLILINK Corporation

How to swift make network aware

Page 16: Colony for-openstack-grizzly-summit

16 Copyright © 2012 NTT DATA INTELLILINK Corporation

Current implementation

Page 17: Colony for-openstack-grizzly-summit

17 Copyright © 2012 NTT DATA INTELLILINK Corporation

Problems which original swift code has

•PUT/GET performance

–Swift proxy waits all objects are put to storage servers.

–Swift proxy chooses randomly the node to retrieve object.

Page 18: Colony for-openstack-grizzly-summit

18 Copyright © 2012 NTT DATA INTELLILINK Corporation

Test Environments

Sapporo

Tokyo

9900MBps

900MBps(0.1msec)

CPU: AMD Opetron 6128 2000Mhz (16core) Mem: 32GB NIC: 10000baseT/Full

CPU: Intel(R) Xeon(R) CPU E7- 8870 (40core) Mem: 126GB NIC: 1000baseT/Full

x2

x2

Page 19: Colony for-openstack-grizzly-summit

19 Copyright © 2012 NTT DATA INTELLILINK Corporation

PUT operation

Tokyo

Proxy

Storage

Storage

Storage

Sapporo

Storage

Storage

Storage

Client

Object PUT operation is always affected by the worst case.

Page 20: Colony for-openstack-grizzly-summit

20 Copyright © 2012 NTT DATA INTELLILINK Corporation

Object's location

Page 21: Colony for-openstack-grizzly-summit

21 Copyright © 2012 NTT DATA INTELLILINK Corporation

PUT object's throughput @Tokyo (Bytes/sec)

Page 22: Colony for-openstack-grizzly-summit

22 Copyright © 2012 NTT DATA INTELLILINK Corporation

GET operation

Tokyo

Proxy

Storage

Storage

Storage

Sapporo

Storage

Storage

Storage

Client

1/replications

High-bandwidth, low-latency

High-bandwidth, low-latency

Page 23: Colony for-openstack-grizzly-summit

23 Copyright © 2012 NTT DATA INTELLILINK Corporation

Object's location

Page 24: Colony for-openstack-grizzly-summit

24 Copyright © 2012 NTT DATA INTELLILINK Corporation

GET object's throughput @Tokyo (Bytes/sec)

Performance degradation by network between Sapporo and Tokyo

Page 25: Colony for-openstack-grizzly-summit

25 Copyright © 2012 NTT DATA INTELLILINK Corporation

Our modification

Page 26: Colony for-openstack-grizzly-summit

26 Copyright © 2012 NTT DATA INTELLILINK Corporation

How to solve - Basic Idea

•Limitation

–Don’t modify data structure (including ring)

–Minimize customization

•Adding some rules to the ring’s data strcuture

–Zone information is treated as decimal number, so consider difference between zoneA and ZoneB represents a distance of zoneA and ZoneB

•Adding some zone hints to Swift proxy servers

•Changes the order of nodes for Proxy server.

Page 27: Colony for-openstack-grizzly-summit

27 Copyright © 2012 NTT DATA INTELLILINK Corporation

How to solve

[app:proxy-server]

nearby_mode = false

own_zone = 100

near_distance = 10

Tokyo

Sapporo

zone 100-102

zone 200-202

Proxy

Zone 100 Distance 10

Proxy

Zone 200 Distance 10

Proxy ,which has zone info(100) and zone distance(10), considers storage servers between zone 100-110 to be located near the proxy.

Proxy , which has zone info(200) and zone distance(10), considers storage servers between zone 200-210 to be located near the proxy.

Page 28: Colony for-openstack-grizzly-summit

28 Copyright © 2012 NTT DATA INTELLILINK Corporation

PUT operation

Tokyo

Proxy

Storage

A

Storage

B

Storage

C

Sapporo

Storage D

Storage

F

Storage

G

Client

Proxy initially puts objects to the nearest storage servers using zone information and zone distance. Then object replicator replicates it the proper position asyncronasly.

zone_info: 100 zone_distance: 10

Page 29: Colony for-openstack-grizzly-summit

29 Copyright © 2012 NTT DATA INTELLILINK Corporation

PUT operation

Tokyo

Proxy

Storage

A

Storage

B

Storage

C

Sapporo

StorageD

Storage

E

Storage

F

Client

××

××

××

Hinted hand off

This is the same situation that all storage servers located in Supporo are broken.

Page 30: Colony for-openstack-grizzly-summit

30 Copyright © 2012 NTT DATA INTELLILINK Corporation

GET operation

Tokyo

Proxy

Storage

Storage

Storage

Sapporo

Storage

Storage

Storage

Client

1.First, try to retrieve object from storage server near the proxy. 2.After that, try to retrieve object from storage server indicated as a primary zone

Page 31: Colony for-openstack-grizzly-summit

31 Copyright © 2012 NTT DATA INTELLILINK Corporation

DELETE operation

Tokyo

Proxy

Storage

Storage

Storage

Sapporo

Storage

Storage

Storage

Client

1.First, try to delete object from storage server near the proxy 2.After that, try to delete object from storage server indicated as a primary zone

Page 32: Colony for-openstack-grizzly-summit

32 Copyright © 2012 NTT DATA INTELLILINK Corporation

Code

def get_near_nodes(self, account, container, obj, own_zone, near_distance): """ Get the partition and nodes same as get_nodes, :param account: account name :param container: container name :param obj: object name :param own_zone: top number of zone name :param near_distance: recognize matched zone name which start from own_zone to a number add own_zone and this number. :returns: a tuple of (partition, list of node dicts) """ part, nodes = self.get_nodes(account, container, obj) def isnearby(one, other, distance): if one <= other and one + distance > other: return True return False near_nodes = [] for node in nodes: if isnearby(own_zone, node['zone'], near_distance): near_nodes.append(node) if len(near_nodes) <= self.replica_count: for node in self.get_more_nodes(part): if isnearby(own_zone, node['zone'], near_distance): near_nodes.append(node) if len(near_nodes) >= self.replica_count: break return part, near_nodes

ring.py

@@ -1044,6 +1056,14 @@ def POST(self, req): 1056 container_partition, containers, _junk, req.acl, _junk = ¥ 1057 self.container_info(self.account_name, self.container_name, 1058 account_autocreate=self.app.account_autocreate) 1059 + if self.app.nearby_mode: 1060 + partition, near_nodes = self.app.object_ring.get_near_nodes( 1061 + self.account_name, self.container_name, self.object_name, 1062 + self.app.own_zone, self.app.near_distance) 1063 + print 'before nodes: %s' % containers 1064 + containers = near_nodes + ¥ 1065 + [cont for cont in containers if cont['zone'] not in [c['zone'] for c in near_nodes]] 1066 + print 'after nodes: %s' % containers 1047 1067 if 'swift.authorize' in req.environ: 1048 1068 aresp = req.environ['swift.authorize'](req) 1049 1069 if aresp:

adding get_near_nodes() to ring.py

proxy/server.py

and then modify proxy/server.py to use get_near_nodes() for each method.

Page 33: Colony for-openstack-grizzly-summit

33 Copyright © 2012 NTT DATA INTELLILINK Corporation

Investigation

1K 1M 10M 100M 1G

0

5,000,000

10,000,000

15,000,000

20,000,000

25,000,000

30,000,000

35,000,000

40,000,000

PUT Average (bytes/sec) @Sapporo

Original

Patched

1K 1M 10M 100M 1G

0

20,000,000

40,000,000

60,000,000

80,000,000

100,000,000

120,000,000

140,000,000

160,000,000

PUT Average (bytes/sec) @Tokyo

Original

Patched

Page 34: Colony for-openstack-grizzly-summit

34 Copyright © 2012 NTT DATA INTELLILINK Corporation

Using Cache

Tokyo

Proxy

Storage

Storage

Storage

Sapporo

Storage

Storage

Storage

Client

Kyusyu

Proxy

How about the case of all objects are located to remote areas ?

Page 35: Colony for-openstack-grizzly-summit

35 Copyright © 2012 NTT DATA INTELLILINK Corporation

Colony-Dispatcher as a cache

Colony-Dispatcher can be a swift-proxy-proxy with cache mechanism

Page 36: Colony for-openstack-grizzly-summit

36 Copyright © 2012 NTT DATA INTELLILINK Corporation

Investigation – Cache effectiveness

Using Colony-Dispatcher as a cache, the performance to retrieve objects from remote area could be nice.

1K 1M 10M 100M 1G

0

50,000,000

100,000,000

150,000,000

200,000,000

250,000,000

GET average (bytes/sec) @Tokyo

Column K

Column K

Column K

Column K

1K 1M 10M 100M 1G

0

50,000,000

100,000,000

150,000,000

200,000,000

250,000,000

300,000,000

350,000,000

GET average (bytes/sec) @Sapporo

Column K

Column K

Column K

Column K

Page 37: Colony for-openstack-grizzly-summit

37 Copyright © 2012 NTT DATA INTELLILINK Corporation

Conclusion

•Re-ordering the nodes by regions for Proxy resolves GET/PUT performance issues

–And this feature can be implemented with minimum(<50 lines of code) customization.

•Using cache is a good idea for inter-cloud use

Page 38: Colony for-openstack-grizzly-summit

38 Copyright © 2012 NTT DATA INTELLILINK Corporation

Our future plan

Page 39: Colony for-openstack-grizzly-summit

39 Copyright © 2012 NTT DATA INTELLILINK Corporation

Problems to tackle

•Object’s location •Adding Region concepts to the ring structure might help.

–Primary nodes isolated by region

•Replication’s performance

– Key factor

• We aggressivelly used hinted-hand-off mechanism to

– Using UDT instead of TCP for replication

– Using pyinotify to I/O event driven replication

– Separation of Network for replication

– Hop by Hop replication

Page 40: Colony for-openstack-grizzly-summit

40 Copyright © 2012 NTT DATA INTELLILINK Corporation

Are you interested in Colony ?

•Please contact with me if you are interested in Colony project.

–We want to collaborate with people who want to use/develop swift as a inter-cloud object store.

Page 41: Colony for-openstack-grizzly-summit

41 Copyright © 2012 NTT DATA INTELLILINK Corporation

Are you interested in academic clouds?

•If you are interested in the way how to integrate clouds using dodai and clony

–My colleague (guan-san) will make a presentation about dodai (Cluster as a service) at 17:20 @Manchester A

–Yokoyama-san (a member of NII) might talk about how to integrate both Colony and Dodai on LT

Page 42: Colony for-openstack-grizzly-summit

42 Copyright © 2012 NTT DATA INTELLILINK Corporation

Thank you.

Page 43: Colony for-openstack-grizzly-summit

43 Copyright © 2012 NTT DATA INTELLILINK Corporation

Q&A

•Please phase your question using simple grammar if possible.