OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO's Mail Cloud System Powered by OpenStack Swift

Download OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO's Mail Cloud System Powered by OpenStack Swift

Post on 15-Apr-2017

1.050 views

Category:

Technology

0 download

Embed Size (px)

TRANSCRIPT

<ul><li><p>Copyright 2015 NTT DATA Corporation </p><p>2015/10/27 DATA Corporation </p><p>Know-how of Challlenging Deploy/Operation NTT DOCOMO's Mail </p><p>Cloud System Powered by OpenStack Swift </p></li><li><p>2 Copyright 2015 NTT DATA Corporation </p><p>Abstract </p><p>Docomo mail is 24/7 cloud mail system which has accesses from over 20 million people. This mail system stores user's mail archive in OpenStack Swift with Peta Byte scale capacity deployed by NTT DATA. We have been successfully operating this service since Sep 2014 without any downtime. In this session, we'll present the actual issues and challenges we have faced and conquered. </p></li><li><p>3 Copyright 2015 NTT DATA Corporation </p><p>Todays contents and presenter </p><p>Project Overview </p><p>Changes of Japanese mobile situation and abstraction of this project </p><p> Project Manager : Sosuke Kakehi </p><p>Migrate process </p><p>Process of migrating swift to existed docomo mail system </p><p> OpenStack Swift Engineer : Masaaki Nakagawa </p><p>Technical challenges </p><p>Swift technical challenges on this project </p><p> OpenStack Engineer : Ryosei Kasai </p><p>Operating session </p><p>Large scale swift operation </p><p> OpenStack Swift Engineer : Masaaki Nakagawa </p></li><li><p>Copyright 2013 NTT DATA Corporation 4 </p><p>Project Overview </p></li><li><p>5 Copyright 2015 NTT DATA Corporation </p><p>Project Overview </p><p>1 NTT Docomo's Cloud Mail System </p><p>2 Project Background </p><p>3 Customer Requirements </p></li><li><p>6 Copyright 2015 NTT DATA Corporation </p><p>Cloud Mail System </p><p>NTT Docomo's Cloud Mail System - System Summary </p><p> Docomo Mail - NTT Docomos Cloud Mail Service </p><p> Over 20 million users </p><p> Powered by OpenStack Swift </p><p>High Performance Storage </p><p>Object Storage OpenStack Swift </p><p>Later Mail </p><p>Tablet PC Smart Phone </p><p>Archived Mail </p><p>Stored to Swift </p></li><li><p>7 Copyright 2015 NTT DATA Corporation </p><p>NTT Docomo's Cloud Mail System - System Scale </p><p> Geographically Distributed Swift Cluster </p><p> Over 6.4 Peta Byte Logical Capacity </p><p> Over Hundreds of Servers </p><p>Site2 </p><p>Site3 </p><p>Site4 </p><p>Site1 </p><p>Proxy Node </p><p>Storage Node Region1 </p><p>Storage Node Region2 </p><p>Storage Node Region3 </p></li><li><p>8 Copyright 2015 NTT DATA Corporation </p><p>Project Background </p><p>Shift from Feature phone to Smart phone </p><p>Service </p><p>Service </p><p>Service </p><p>Service </p><p>Smart Phone / Tablet PC </p><p>Service </p><p>Documents </p><p>Text </p><p>Photos </p><p>Music Movie Application </p><p>E-mail Data Size was increased </p></li><li><p>9 Copyright 2015 NTT DATA Corporation </p><p>Cost </p><p>Cost Cost </p><p>Cost Cost Cost </p><p>Project Background </p><p>High-end Storage </p><p>High-end Storage </p><p>High-end Storage </p><p>High-end Storage </p><p>High-end Storage </p><p>Extend the High-end Storage, extend, extend </p><p> = expensive cost, cost, cost </p><p>High-end Storage </p></li><li><p>10 Copyright 2015 NTT DATA Corporation </p><p>Customer Requirements </p><p>High Availability </p><p>Low Cost </p><p>High Scalability </p><p>OSS(Software Storage) + IA Server </p><p>Disaster Recovery </p><p>etc </p><p>Adopt OpenStack Swift </p></li><li><p>Copyright 2013 NTT DATA Corporation 11 </p><p>Migrate session </p></li><li><p>12 Copyright 2015 NTT DATA Corporation </p><p>Overview of migration session </p><p>NTT DOCOMO has launched docomo mail service since Oct 2013, and swift was installed docomo mail system at Jan 2015. When we migrated swift to docomo mail system, docomo mail did not stop user service. </p><p>In this section, I would like to introduce overall of docomo mail system and migration process. </p><p>later older </p><p>Oct, 2013 docomo mail service in </p><p>Jan, 2015 Swift service in </p><p>May, 2014 test user start to use swift </p><p>Oct, 2015 General user start to test use Swift </p></li><li><p>13 Copyright 2015 NTT DATA Corporation </p><p>swift (archived mail holder) </p><p>High speed block storage (later mail holder) </p><p>Swift migrate session System construction overview </p><p>Docomo mail frontend server (proxy of block storage and swift) </p><p>Proxy </p><p>Storage Storage Storage </p><p>Internet </p><p>archived user mail </p><p>archived user mail </p><p>archived user mail </p><p>user mail user mail user mail </p></li><li><p>14 Copyright 2015 NTT DATA Corporation </p><p>Swift migrate session Mail access flow </p><p>Docomo mail frontend server (proxy of block storage and swift) </p><p>Block Storage </p><p>Proxy </p><p>Storage Storage Storage </p><p>Internet </p><p>archived user mail </p><p>archived user mail </p><p>archived user mail </p><p>access device </p><p>user mail user mail user mail </p><p>User mail will be archived/stored to swift </p></li><li><p>15 Copyright 2015 NTT DATA Corporation </p><p>Swift migrate session System construction (before swift installed) </p><p>Docomo mail frontend server </p><p>Block Storage </p><p>Internet </p><p>archived user mail </p><p>archived user mail </p><p>user mail </p></li><li><p>16 Copyright 2015 NTT DATA Corporation </p><p>Swift migrate session Migration 1st step deploy swift and test </p><p>Docomo mail frontend server </p><p>Block Storage </p><p>Proxy </p><p>Storage Storage Storage </p><p>Internet </p><p> Deploy swift Trouble test Tuning </p><p>archived user mail </p><p>archived user mail </p><p>user mail </p></li><li><p>17 Copyright 2015 NTT DATA Corporation </p><p>Swift migrate session Migration 2nd step copy test users archived mail </p><p>Docomo mail frontend server </p><p>Block Storage </p><p>Proxy </p><p>Storage Storage Storage </p><p>Internet </p><p>Copy test users archived mail </p><p>General users mail is not copied </p><p>archived user mail </p><p>archived user mail </p><p>archived user mail </p><p>archived user mail </p><p>archived user mail </p><p>user mail </p></li><li><p>18 Copyright 2015 NTT DATA Corporation </p><p>Swift migrate session Migration 3rd step copy general users archived mail </p><p>Docomo mail frontend server </p><p>Block Storage </p><p>Proxy </p><p>Storage Storage Storage </p><p>Internet </p><p>Move general users archived mail </p><p>keep all mail archive against swift trouble </p><p>archived user mail </p><p>archived user mail </p><p>archived user mail </p><p>archived user mail </p><p>archived user mail </p><p>user mail </p></li><li><p>19 Copyright 2015 NTT DATA Corporation </p><p>Swift migrate session Migration 4th step launch service </p><p>Docomo mail frontend server </p><p>Block Storage </p><p>Proxy </p><p>Storage Storage Storage </p><p>Internet </p><p>archived user mail </p><p>archived user mail </p><p>archived user mail </p><p>archived user mail </p><p>archived user mail </p><p>user mail </p></li><li><p>20 Copyright 2015 NTT DATA Corporation </p><p>Conclusion of migrate session </p><p> Firstly, docomo mail has only block storage </p><p> We need to deploy and migrate swift with no down time </p><p> To achieve it, we divide migrate to 4 steps </p><p> Deploy </p><p> Test user mail copy to swift </p><p> General user mail copy to swift with remaining block storage </p><p> System durability check </p><p> We achieve no service down migration </p><p>As I said , in migrating, we achieve some technical challenges. Next session, Mr. Kasai introduce it. </p></li><li><p>Copyright 2013 NTT DATA Corporation 21 </p><p>Technical session </p></li><li><p>22 Copyright 2015 NTT DATA Corporation </p><p>Our Technical Challenges </p><p>1 Durability assurance </p><p>2 Geographically distributed cluster </p><p>3 Quality </p></li><li><p>23 Copyright 2015 NTT DATA Corporation </p><p>Challenge 1: Durability assurance </p><p> Quality requirement in Japan </p><p> This system needs very high quality. </p><p> Everything should be under control </p><p> System design for normal situation </p><p> System design for defeat situation </p><p> Even on distributed system </p><p> Analyze every behavior before building system </p></li><li><p>24 Copyright 2015 NTT DATA Corporation </p><p>Recovery test in variety of defeat pattern </p><p> Variety of failure pattern </p><p>(1) The point of failure Disk, NIC, Process, Node, </p><p>(2) The number of failures 1, 2, 3, 4, </p><p>(3) The range of failures 1 node, multiple nodes/zones/regions, </p><p>100s of test cases!! </p><p>Case #201 </p><p>Proxy </p><p>Sto</p><p>rage </p><p>Sto</p><p>rage </p><p>Sto</p><p>rage </p><p>Sto</p><p>rage </p><p>Sto</p><p>rage </p><p>Sto</p><p>rage </p><p>Zone1 Zone2 </p><p>Region 1 </p><p>Case #201 </p><p>Proxy </p><p>Sto</p><p>rage </p><p>Sto</p><p>rage </p><p>Sto</p><p>rage </p><p>Sto</p><p>rage </p><p>Sto</p><p>rage </p><p>Sto</p><p>rage </p><p>Zone1 Zone2 </p><p>Region 1 </p><p>Case #001 </p><p>Proxy </p><p>Storage Storage Storage </p><p>Case #001 </p><p>Proxy </p><p>Storage Storage Storage </p><p>Case #001 </p><p>Proxy </p><p>Storage Storage Storage </p><p>Case #101 </p><p>Proxy </p><p>Storage Storage Storage </p><p>Case #301 </p><p>Proxy </p><p>Storage Storage Storage </p><p>Case #501 </p><p>Proxy </p><p>Sto</p><p>rage </p><p>Sto</p><p>rage </p><p>Sto</p><p>rage </p><p>Sto</p><p>rage </p><p>Sto</p><p>rage </p><p>Sto</p><p>rage </p><p>Zone1 Zone2 </p><p>Region 1 </p></li><li><p>25 Copyright 2015 NTT DATA Corporation </p><p>Result of recovery test </p><p> Extreme durability and recoverability of swift </p><p> Swift rarely loses data in it. Only accurate snipe or great disaster can causes data lost. </p></li><li><p>26 Copyright 2015 NTT DATA Corporation </p><p>private network </p><p>Site 3 </p><p>Storage </p><p>Site 4 </p><p>Storage </p><p>Site 2 </p><p>Storage </p><p>Challenge 2: Geographically distributed cluster </p><p> Geographically distributed swift cluster to realize disaster recovery </p><p> Important points to evaluate global distribution </p><p>1. Client request </p><p>2. Durability Site 1 </p><p>Proxy 300km~ 300km~ </p><p>300km~ 300km~ </p><p>300km~ </p></li><li><p>27 Copyright 2015 NTT DATA Corporation </p><p>Pseudo-global cluster </p><p> Pseudo-global cluster with simulated network latency </p><p> Proxy and 3 Storage regions placed in different locations </p><p> 10~200msec latency between locations simulated by tc </p><p> TL msec latency for one way, 2*TL msec latency for round trip </p><p>Proxy </p><p>Storage region 1 </p><p>Storage region 2 </p><p>Storage region 3 </p><p>10~200msec latency </p><p>10~200msec latency </p><p>10~200msec latency </p><p>10~200msec latency </p><p>10~200msec latency </p><p>10~200msec latency </p><p>Client Proxy </p><p>Storage region1 </p><p>TLmsec </p><p>TLmsec </p></li><li><p>28 Copyright 2015 NTT DATA Corporation </p><p>2 points of Pseudo-global cluster testing </p><p>1. Client request </p><p> Object PUT/GET/DELETE from client </p><p> Error rate </p><p> Turnaround time for 1 request </p><p> Throughput </p><p> Latency between proxy and storage </p><p>2. Durability </p><p> Auto recovery by object-replicator </p><p> Error rate </p><p> Turnaround time of 1 sync process </p><p> Throughput </p><p> Latency between storages </p><p>Proxy </p><p>Storage region 1 </p><p>Storage region 2 </p><p>Storage region 3 </p><p>Storage region 1 </p><p>Storage region 2 </p><p>Storage region 3 </p><p>Client </p><p>Proxy </p><p>PUT GET </p><p>Client </p></li><li><p>29 Copyright 2015 NTT DATA Corporation </p><p>Test1: Client request </p><p>Object PUT/GET/DELETE from client </p><p> No error caused by latency </p><p> Degradation of turnaround time </p><p> No throughput degradation for concurrent requests </p><p>latency </p><p>limitation of network bandwidth </p><p>PUT/GET </p><p>DELETE </p><p>Latency concurrency </p><p>Throughput Turnaround time </p></li><li><p>30 Copyright 2015 NTT DATA Corporation </p><p>Test2: Durability </p><p>Auto recovery by object-replicator </p><p> No error caused by latency </p><p> Performance degradation of one process </p><p> No throughput degradation for concurrent process </p><p>Latency concurrency </p><p>Throughput </p><p>latency </p><p>limitation of network bandwidth </p><p>Defeat </p><p>Recovery </p><p>Performance </p></li><li><p>31 Copyright 2015 NTT DATA Corporation </p><p>Challenge 3: Quality </p><p>1. Software Quality </p><p> All processes work well ? </p><p> Account / Container / Object </p><p> server / replicator / updater / reaper </p><p>2. System Quality </p><p> Our system is working well ? </p><p> All nodes </p><p> All APIs </p></li><li><p>32 Copyright 2015 NTT DATA Corporation </p><p>Software quality </p><p>1 Add process name checking into swift-init </p><p>2 Prevent redundant commenting by drive-audit </p><p>3 Remove invalid connection checking in db_replicator </p><p>4 Add timestamp checking in AccountBroker.is_status_deleted </p><p>5 Fix error log of proxy-server when cache middleware is disabled </p><p> Source Code Analysis and Customize </p><p> Official patch (below) </p><p> Original patch </p><p> Strict test all processes </p><p>and more </p><p>Our official patch </p></li><li><p>33 Copyright 2015 NTT DATA Corporation </p><p>System quality </p><p>storage servers </p><p>Tempest </p><p>proxy servers </p><p>checking tool </p><p>Test all nodes </p><p> Automation testing tools for </p><p>1. APIs : All swift APIs, including error case </p><p>2. Nodes : All swift nodes </p><p> Extended Tempest and checking tool </p><p>Test all APIs </p></li><li><p>34 Copyright 2015 NTT DATA Corporation </p><p>Our solutions </p><p>1 Durability assurance </p><p>2 Geographically distributed cluster </p><p>3 Quality </p><p>Recovery test in variety of failure pattern </p><p>Performance test of frontend/backend with pseudo-global swift cluster </p><p>Source Code Analysis and Customize Automated testing </p><p>Challenge Solutions </p></li><li><p>Copyright 2013 NTT DATA Corporation 35 </p><p>Operating session </p></li><li><p>36 Copyright 2015 NTT DATA Corporation </p><p>Overview of operating session </p><p>Operation scheme of Docomo mail is high confidential. </p><p>We would like to introduce about NTT DATA swift solution's operation. </p><p>Docomo mail system uses NTT DATA swift solution with customizing. </p></li><li><p>37 Copyright 2015 NTT DATA Corporation </p><p>Operating session Large scale system makes operation costly </p><p>Large scale Swift </p><p>scale out management repair tuning </p></li><li><p>38 Copyright 2015 NTT DATA Corporation </p><p>Operating session Reduce operating work amount </p><p>Parallel access (pssh / pscp) </p><p>Automatic deploy (kickstart) </p><p>Tuning (svn / puppet) </p><p>Master repository </p></li><li><p>39 Copyright 2015 NTT DATA Corporation </p><p>Operating session Reduce operation frequency </p><p>Disk failure Node down Server Process Down Backend process down ex)auditor process </p><p>Service affect </p></li><li><p>40 Copyright 2015 NTT DATA Corporation </p><p>Operating session Stop monitoring which low priority </p><p>Periodic performance check </p><p>monitoring alert </p></li><li><p>41 Copyright 2015 NTT DATA Corporation </p><p>Conclusion of operating session </p><p> Swift is consisted by many nodes </p><p> System operating costs of Swift tend to be costly </p><p> NTT DATA has know-how to reduce swift operation cost </p><p> Using operation parallelized tool </p><p> Customizing for monitoring priority </p><p> Change monitoring items to periodic check </p></li><li><p>42 Copyright 2015 NTT DATA Corporation </p><p>Conclusion of this presentation </p><p>We introduce usage, challenge, and operating OpenStack swift at docomo mail service system </p><p> System migration with no service down time </p><p> Three technical achievement </p><p> Reduce operating cost </p><p>Docomo mail has been service with no down time. </p><p>If you have something questions, please come to NTT booth. </p><p>Attention All company names, product names, and service names mentioned are trademarks or registered trademarks of the respective companies </p></li><li><p>Copyright 2011 NTT DATA Corporation </p><p>Copyright 2015 NTT DATA Corporation </p></li></ul>

Recommended

View more >