openstack summit tokyo - know-how of challlenging deploy/operation ntt docomo's mail cloud...

43
Copyright © 2015 NTT DATA Corporation 2015/10/27 NTT DATA Corporation Know-how of Challlenging Deploy/Operation NTT DOCOMO's Mail Cloud System Powered by OpenStack Swift

Upload: masaaki-nakagawa

Post on 15-Apr-2017

1.059 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO's Mail Cloud System Powered by OpenStack Swift

Copyright © 2015 NTT DATA Corporation

2015/10/27 NTT DATA Corporation

Know-how of Challlenging Deploy/Operation NTT DOCOMO's Mail

Cloud System Powered by OpenStack Swift

Page 2: OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO's Mail Cloud System Powered by OpenStack Swift

2 Copyright © 2015 NTT DATA Corporation

Abstract

Docomo mail is 24/7 cloud mail system which has accesses from over 20 million people. This mail system stores user's mail archive in OpenStack Swift with Peta Byte scale capacity deployed by NTT DATA. We have been successfully operating this service since Sep 2014 without any downtime. In this session, we'll present the actual issues and challenges we have faced and conquered.

Page 3: OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO's Mail Cloud System Powered by OpenStack Swift

3 Copyright © 2015 NTT DATA Corporation

Today’s contents and presenter

○Project Overview

Changes of Japanese mobile situation and abstraction of this project

– Project Manager : Sosuke Kakehi

○Migrate process

Process of migrating swift to existed docomo mail system

– OpenStack Swift Engineer : Masaaki Nakagawa

○Technical challenges

Swift technical challenges on this project

– OpenStack Engineer : Ryosei Kasai

○Operating session

Large scale swift operation

– OpenStack Swift Engineer : Masaaki Nakagawa

Page 4: OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO's Mail Cloud System Powered by OpenStack Swift

Copyright © 2013 NTT DATA Corporation 4

Project Overview

Page 5: OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO's Mail Cloud System Powered by OpenStack Swift

5 Copyright © 2015 NTT DATA Corporation

Project Overview

1 NTT Docomo's Cloud Mail System

2 Project Background

3 Customer Requirements

Page 6: OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO's Mail Cloud System Powered by OpenStack Swift

6 Copyright © 2015 NTT DATA Corporation

Cloud Mail System

NTT Docomo's Cloud Mail System - System Summary

• Docomo Mail - NTT Docomo’s Cloud Mail Service

• Over 20 million users

• Powered by OpenStack Swift

High Performance Storage

Object Storage OpenStack Swift

Later Mail

Tablet PC Smart Phone

Archived Mail

Stored to Swift

Page 7: OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO's Mail Cloud System Powered by OpenStack Swift

7 Copyright © 2015 NTT DATA Corporation

NTT Docomo's Cloud Mail System - System Scale

• Geographically Distributed Swift Cluster

• Over 6.4 Peta Byte Logical Capacity

• Over Hundreds of Servers

Site2

Site3

Site4

Site1

Proxy Node

Storage Node Region1

Storage Node Region2

Storage Node Region3

Page 8: OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO's Mail Cloud System Powered by OpenStack Swift

8 Copyright © 2015 NTT DATA Corporation

Project Background

Shift from “Feature phone” to “Smart phone”

Service

Service

Service

Service

Smart Phone / Tablet PC

Service

Documents

Text

Photos

Music Movie Application

E-mail Data Size was increased

Page 9: OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO's Mail Cloud System Powered by OpenStack Swift

9 Copyright © 2015 NTT DATA Corporation

Cost

Cost Cost

Cost Cost Cost

Project Background

High-end Storage

High-end Storage

High-end Storage

High-end Storage

High-end Storage

Extend the High-end Storage, extend, extend

= expensive cost, cost, cost

High-end Storage

Page 10: OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO's Mail Cloud System Powered by OpenStack Swift

10 Copyright © 2015 NTT DATA Corporation

Customer Requirements

High Availability

Low Cost

High Scalability

OSS(Software Storage) + IA Server

Disaster Recovery

etc

Adopt OpenStack Swift

Page 11: OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO's Mail Cloud System Powered by OpenStack Swift

Copyright © 2013 NTT DATA Corporation 11

Migrate session

Page 12: OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO's Mail Cloud System Powered by OpenStack Swift

12 Copyright © 2015 NTT DATA Corporation

Overview of migration session

NTT DOCOMO has launched docomo mail service since Oct 2013, and swift was installed docomo mail system at Jan 2015. When we migrated swift to docomo mail system, docomo mail did not stop user service.

In this section, I would like to introduce overall of docomo mail system and migration process.

later older

Oct, 2013 docomo mail service in

Jan, 2015 Swift service in

May, 2014 test user start to use swift

Oct, 2015 General user start to test use Swift

Page 13: OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO's Mail Cloud System Powered by OpenStack Swift

13 Copyright © 2015 NTT DATA Corporation

swift (archived mail holder)

High speed block storage (later mail holder)

Swift migrate session System construction overview

Docomo mail frontend server (proxy of block storage and swift)

Proxy

Storage Storage Storage

Internet

archived user mail

archived user mail

archived user mail

user mail user mail user mail

Page 14: OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO's Mail Cloud System Powered by OpenStack Swift

14 Copyright © 2015 NTT DATA Corporation

Swift migrate session Mail access flow

Docomo mail frontend server (proxy of block storage and swift)

Block Storage

Proxy

Storage Storage Storage

Internet

archived user mail

archived user mail

archived user mail

access device

user mail user mail user mail

User mail will be archived/stored to swift

Page 15: OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO's Mail Cloud System Powered by OpenStack Swift

15 Copyright © 2015 NTT DATA Corporation

Swift migrate session System construction (before swift installed)

Docomo mail frontend server

Block Storage

Internet

archived user mail

archived user mail

user mail

Page 16: OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO's Mail Cloud System Powered by OpenStack Swift

16 Copyright © 2015 NTT DATA Corporation

Swift migrate session Migration 1st step – deploy swift and test

Docomo mail frontend server

Block Storage

Proxy

Storage Storage Storage

Internet

• Deploy swift • Trouble test • Tuning

archived user mail

archived user mail

user mail

Page 17: OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO's Mail Cloud System Powered by OpenStack Swift

17 Copyright © 2015 NTT DATA Corporation

Swift migrate session Migration 2nd step – copy test user’s archived mail

Docomo mail frontend server

Block Storage

Proxy

Storage Storage Storage

Internet

Copy test user’s archived mail

General user’s mail is not copied

archived user mail

archived user mail

archived user mail

archived user mail

archived user mail

user mail

Page 18: OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO's Mail Cloud System Powered by OpenStack Swift

18 Copyright © 2015 NTT DATA Corporation

Swift migrate session Migration 3rd step – copy general user’s archived mail

Docomo mail frontend server

Block Storage

Proxy

Storage Storage Storage

Internet

Move general user’s archived mail

keep all mail archive against swift trouble

archived user mail

archived user mail

archived user mail

archived user mail

archived user mail

user mail

Page 19: OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO's Mail Cloud System Powered by OpenStack Swift

19 Copyright © 2015 NTT DATA Corporation

Swift migrate session Migration 4th step – launch service

Docomo mail frontend server

Block Storage

Proxy

Storage Storage Storage

Internet

archived user mail

archived user mail

archived user mail

archived user mail

archived user mail

user mail

Page 20: OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO's Mail Cloud System Powered by OpenStack Swift

20 Copyright © 2015 NTT DATA Corporation

Conclusion of migrate session

• Firstly, docomo mail has only block storage

• We need to deploy and migrate swift with no down time

• To achieve it, we divide migrate to 4 steps

– Deploy

– Test user mail copy to swift

– General user mail copy to swift with remaining block storage

– System durability check

• We achieve no service down migration

As I said , in migrating, we achieve some technical challenges. Next session, Mr. Kasai introduce it.

Page 21: OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO's Mail Cloud System Powered by OpenStack Swift

Copyright © 2013 NTT DATA Corporation 21

Technical session

Page 22: OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO's Mail Cloud System Powered by OpenStack Swift

22 Copyright © 2015 NTT DATA Corporation

Our Technical Challenges

1 Durability assurance

2 Geographically distributed cluster

3 Quality

Page 23: OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO's Mail Cloud System Powered by OpenStack Swift

23 Copyright © 2015 NTT DATA Corporation

Challenge 1: Durability assurance

• Quality requirement in Japan

• This system needs very high quality.

• Everything should be under control

• System design for normal situation

• System design for defeat situation

Even on distributed system

• Analyze every behavior before building system

Page 24: OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO's Mail Cloud System Powered by OpenStack Swift

24 Copyright © 2015 NTT DATA Corporation

Recovery test in variety of defeat pattern

• Variety of failure pattern

(1) The point of failure • Disk, NIC, Process, Node, …

(2) The number of failures • 1, 2, 3, 4, …

(3) The range of failures • 1 node, multiple nodes/zones/regions, …

100s of test cases!!

Case #201

Proxy

Sto

rage

Sto

rage

Sto

rage

Sto

rage

Sto

rage

Sto

rage

Zone1 Zone2

Region 1

Case #201

Proxy

Sto

rage

Sto

rage

Sto

rage

Sto

rage

Sto

rage

Sto

rage

Zone1 Zone2

Region 1

Case #001

Proxy

Storage Storage Storage

Case #001

Proxy

Storage Storage Storage

Case #001

Proxy

Storage Storage Storage

Case #101

Proxy

Storage Storage Storage

Case #301

Proxy

Storage Storage Storage

Case #501

Proxy

Sto

rage

Sto

rage

Sto

rage

Sto

rage

Sto

rage

Sto

rage

Zone1 Zone2

Region 1

Page 25: OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO's Mail Cloud System Powered by OpenStack Swift

25 Copyright © 2015 NTT DATA Corporation

Result of recovery test

• Extreme durability and recoverability of swift

• Swift rarely loses data in it. Only accurate snipe or great disaster can causes data lost.

Page 26: OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO's Mail Cloud System Powered by OpenStack Swift

26 Copyright © 2015 NTT DATA Corporation

private network

Site 3

Storage

Site 4

Storage

Site 2

Storage

Challenge 2: Geographically distributed cluster

• Geographically distributed swift cluster to realize disaster recovery

• Important points to evaluate global distribution

1. Client request

2. Durability Site 1

Proxy 300km~ 300km~

300km~ 300km~

300km~

Page 27: OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO's Mail Cloud System Powered by OpenStack Swift

27 Copyright © 2015 NTT DATA Corporation

Pseudo-global cluster

• Pseudo-global cluster with simulated network latency

• Proxy and 3 Storage regions placed in different locations

• 10~200msec latency between locations simulated by tc

• TL msec latency for one way, 2*TL msec latency for round trip

Proxy

Storage region 1

Storage region 2

Storage region 3

10~200msec latency

10~200msec latency

10~200msec latency

10~200msec latency

10~200msec latency

10~200msec latency

Client Proxy

Storage region1

TLmsec

TLmsec

Page 28: OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO's Mail Cloud System Powered by OpenStack Swift

28 Copyright © 2015 NTT DATA Corporation

2 points of Pseudo-global cluster testing

1. Client request

• Object PUT/GET/DELETE from client

• Error rate

• Turnaround time for 1 request

• Throughput

• Latency between proxy and storage

2. Durability

• Auto recovery by object-replicator

• Error rate

• Turnaround time of 1 sync process

• Throughput

• Latency between storages

Proxy

Storage region 1

Storage region 2

Storage region 3

Storage region 1

Storage region 2

Storage region 3

Client

Proxy

PUT GET

Client

Page 29: OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO's Mail Cloud System Powered by OpenStack Swift

29 Copyright © 2015 NTT DATA Corporation

Test1: Client request

Object PUT/GET/DELETE from client

• No error caused by latency

• Degradation of turnaround time

• No throughput degradation for concurrent requests

latency

limitation of network bandwidth

PUT/GET

DELETE

Latency concurrency

Throughput Turnaround time

Page 30: OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO's Mail Cloud System Powered by OpenStack Swift

30 Copyright © 2015 NTT DATA Corporation

Test2: Durability

Auto recovery by object-replicator

• No error caused by latency

• Performance degradation of one process

• No throughput degradation for concurrent process

Latency concurrency

Throughput

latency

limitation of network bandwidth

Defeat

Recovery

Performance

Page 31: OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO's Mail Cloud System Powered by OpenStack Swift

31 Copyright © 2015 NTT DATA Corporation

Challenge 3: Quality

1. Software Quality

• All processes work well ?

• Account / Container / Object

• server / replicator / updater / reaper

2. System Quality

• Our system is working well ?

• All nodes

• All APIs

Page 32: OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO's Mail Cloud System Powered by OpenStack Swift

32 Copyright © 2015 NTT DATA Corporation

Software quality

1 Add process name checking into swift-init

2 Prevent redundant commenting by drive-audit

3 Remove invalid connection checking in db_replicator

4 Add timestamp checking in AccountBroker.is_status_deleted

5 Fix error log of proxy-server when cache middleware is disabled

Source Code Analysis and Customize

• Official patch (below)

• Original patch

Strict test all processes

and more …

Our official patch

Page 33: OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO's Mail Cloud System Powered by OpenStack Swift

33 Copyright © 2015 NTT DATA Corporation

System quality

storage servers …

Tempest

proxy servers

checking tool

Test all nodes

• Automation testing tools for

1. APIs : All swift APIs, including error case

2. Nodes : All swift nodes

• Extended Tempest and checking tool

Test all APIs

Page 34: OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO's Mail Cloud System Powered by OpenStack Swift

34 Copyright © 2015 NTT DATA Corporation

Our solutions

1 Durability assurance

2 Geographically distributed cluster

3 Quality

Recovery test in variety of failure pattern

Performance test of frontend/backend with pseudo-global swift cluster

・Source Code Analysis and Customize ・Automated testing

Challenge Solutions

Page 35: OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO's Mail Cloud System Powered by OpenStack Swift

Copyright © 2013 NTT DATA Corporation 35

Operating session

Page 36: OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO's Mail Cloud System Powered by OpenStack Swift

36 Copyright © 2015 NTT DATA Corporation

Overview of operating session

Operation scheme of Docomo mail is high confidential.

We would like to introduce about NTT DATA swift solution's operation.

Docomo mail system uses NTT DATA swift solution with customizing.

Page 37: OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO's Mail Cloud System Powered by OpenStack Swift

37 Copyright © 2015 NTT DATA Corporation

Operating session Large scale system makes operation costly

Large scale Swift

scale out management repair tuning

Page 38: OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO's Mail Cloud System Powered by OpenStack Swift

38 Copyright © 2015 NTT DATA Corporation

Operating session Reduce operating work amount

Parallel access (pssh / pscp)

Automatic deploy (kickstart)

Tuning (svn / puppet)

Master repository

Page 39: OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO's Mail Cloud System Powered by OpenStack Swift

39 Copyright © 2015 NTT DATA Corporation

Operating session Reduce operation frequency

Disk failure Node down Server Process Down Backend process down ex)auditor process

Service affect

Page 40: OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO's Mail Cloud System Powered by OpenStack Swift

40 Copyright © 2015 NTT DATA Corporation

Operating session Stop monitoring which low priority

Periodic performance check

monitoring alert

Page 41: OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO's Mail Cloud System Powered by OpenStack Swift

41 Copyright © 2015 NTT DATA Corporation

Conclusion of operating session

• Swift is consisted by many nodes

• System operating costs of Swift tend to be costly

• NTT DATA has know-how to reduce swift operation cost

– Using operation parallelized tool

– Customizing for monitoring priority

– Change monitoring items to periodic check

Page 42: OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO's Mail Cloud System Powered by OpenStack Swift

42 Copyright © 2015 NTT DATA Corporation

Conclusion of this presentation

We introduce usage, challenge, and operating OpenStack swift at docomo mail service system

• System migration with no service down time

• Three technical achievement

• Reduce operating cost

Docomo mail has been service with no down time.

If you have something questions, please come to NTT booth.

○Attention All company names, product names, and service names mentioned are trademarks or registered trademarks of the respective companies

Page 43: OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO's Mail Cloud System Powered by OpenStack Swift

Copyright © 2011 NTT DATA Corporation

Copyright © 2015 NTT DATA Corporation