nttドコモ様 導入事例 openstack summit 2016 barcelona 講演「expanding and deepening ntt...

54
Copyright©2016 NTT DOCOMO, INC. All rights reserved. Expanding and Deepening NTT DOCOMO’s private cloud NTT DOCOMO Inc. Jun Ishii Kojiro Amano VirtualTech Japan Hiromichi Ito

Upload: virtualtech-japan-inc

Post on 23-Jan-2018

1.385 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)

Copyright©2016 NTT DOCOMO, INC. All rights reserved.

Expanding and Deepening

NTT DOCOMO’s private cloud

NTT DOCOMO Inc. Jun Ishii

Kojiro Amano

VirtualTech Japan Hiromichi Ito

Page 2: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)

DOCOMO, INC All Rights Reserved

Jun Ishii

o Research Engineer, NTT DOCOMO

o Developer, operator and technical consultant in NTT

DOCOMO private cloud

Hiromichi Ito

o CTO, VirtualTech Japan

o One of the first members of proposing OpenStack Bare

Metal Provisioning (currently called "Ironic")

Kojiro Amano

o Research Engineer, NTT DOCOMO

o Security consultant in NTT DOCOMO private cloud

About us

Page 3: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)

Copyright©2016 NTT DOCOMO, INC. All rights reserved.

Expanding strategy of

our private cloud

Scale-up

strategy

Page 4: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)

DOCOMO, INC All Rights Reserved

o One year after launched our private cloud,

it goes larger and larger!

Jun. 2015 Oct. 2016 Mar. 2017

Number of DCs 1 2 4

Number of HWs 50 300 900

Cores 1500 10000 Over 35000

Page 5: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)

DOCOMO, INC All Rights Reserved

o Why could we expand our cloud so fast ?

o Main Strategy : Forest and Tree

Make a forest

Fill in trees

Page 6: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)

DOCOMO, INC All Rights Reserved

o Three important details

Decided to migrate a large scale in-house system

Obtain budget for making a forest

Fast deployment methods by normalization

Enable to create the forest quickly

Novel challenges for various users

Enrich the forest to plant various trees

• L2GW

• GPU instance

• Reference model

• Security update

Page 7: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)

DOCOMO, INC All Rights Reserved

o Three important details

Decided to migrate a large scale in-house system

Obtain budget for making a forest

Fast deployment methods by normalization

Enable to create the forest quickly

Novel challenges for various users

Enrich the forest to plant various trees

• L2GW

• GPU instance

• Reference model

• Security update

Page 8: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)

DOCOMO, INC All Rights Reserved

o Decided to migrate a large scale in-house system

Update whole system due to HW EOL

o OpenStack-based cloud has many strengths.

Three years TCO is superior to an on-premises

Reduce 22% TCO, CAPEX & OPEX

Distributed architecture is compatible with cloud.

REST interfaces are suitable for maintaining systems.

Feasibility of migration/replication between long distance

L2GW, details are mentioned in later.

Page 9: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)

DOCOMO, INC All Rights Reserved

o Three important details

Decided to migrate a large scale in-house system

Obtain budget for making a forest

Fast deployment methods by normalization

Enable to create the forest quickly

Novel challenges for various users

Enrich the forest to plant various trees

• L2GW

• GPU instance

• Reference model

• Security update

Page 10: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)

DOCOMO, INC All Rights Reserved

o Fast deployment methods by normalization

DRY : Don't repeat yourself

o How to

deploy

set up [compute node, swift storage node] more

deal with hardware trouble

ansible, KB (See our Tokyo summit presentation)

o Only take one month to deploy new DC

From after racking and cabling ends till finish first QA test

Over 300 nodes, HW configuration settings & OpenStack install are just

finished in 10 days by 5 operators.

Page 11: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)

DOCOMO, INC All Rights Reserved

o Three important details

Decided to migrate a large scale in-house system

Obtain budget for making a forest

Fast deployment methods by normalization

Enable to create the forest quickly

Novel challenges for various users

Enrich the forest to plant various trees

• L2GW

• GPU instance

• Reference model

• Security update

Page 12: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)

DOCOMO, INC All Rights Reserved

o Novel challenges… to satisfy various users' will, there are many

difficulty and many know-how.

Add functions

L2GW

GPU instance

Reduce time to construct and manage users' systems

Reference model

Security update

Page 13: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)

DOCOMO, INC All Rights Reserved

o Three important details

Decided to migrate a large scale in-house system

Obtain budget for making a forest

Fast deployment methods by normalization

Enable to create the forest quickly

Novel challenges for various users

Enrich the forest to plant various trees

• L2GW

• GPU instance

• Reference model

• Security update

These reasons enable our private cloud so FaT !!

Page 14: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)

Copyright©2016 NTT DOCOMO, INC. All rights reserved.

How deepening our private cloud

Enrich

for users

Page 15: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)

Copyright©2016 NTT DOCOMO, INC. All rights reserved.

L2 Gatewayfor

connecting existing large-scale networks

and inter-cloud networking.

I'm good at

connecting.

Page 16: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)

DOCOMO, INC All Rights Reserved

○ Overview

Our user has large scale existing network and proprietary computer

systems.

– This network system has the great ability that provides Layer-2 connectivity

to nationwide.

– This proprietary computer system side does not have enough flexibility.

REST API

Service mobility

They decided to migrate to OpenStack on this renewal timing.

Network system side migration must be minimal.

Our user requested new two network services.

– Connect the tenant network between the two datacenters

– Connect instance and existing equipment with the layer-2

Page 17: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)

DOCOMO, INC All Rights Reserved

○ Before

DC 1RTT 20-40ms

WAN

(Nationwide)

DC 2

dedicated

equipment

dedicated

equipment

dedicated

equipment

dedicated

equipment

proprietary

computer

system

proprietary

computer

system

Page 18: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)

DOCOMO, INC All Rights Reserved

○ After

DC 1RTT 20-40ms

WAN

(Nationwide)

DC 2

dedicated

equipment

dedicated

equipment

dedicated

equipment

dedicated

equipment

L2

GW

RT

L2

GW

RT

Page 19: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)

DOCOMO, INC All Rights Reserved

○ Our user's requirements

High Availability

– Do not share control services between data center.

– Avoid Single-Point-Of-Failure (SPOF)

Technical limitations

– Do not change IP addressing and routing architecture.

– Do not use Network Address Translation(NAT).

– Must connect instance and existing equipment by Layer 2.

Performance

– Total throughput requirement is several dozen Gbps.

– Average packet size is smaller than general private clouds.

– Several hundred VLAN must connect.

Page 20: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)

DOCOMO, INC All Rights Reserved

○ Our user's requirements

High Availability

– Do not share control services between data center.

– Avoid Single-Point-Of-Failure (SPOF).

Technical limitations

– Do not change IP addressing and routing architecture.

– Do not use Network Address Translation(NAT).

– Must connect instance and existing equipment by Layer 2.

Performance

– Total throughput requirement is several dozen Gbps.

– Average packet size is smaller than general private clouds.

– Several hundred VLAN must connect.

We choose "Region" zoning model.

In "Region" model all service is separated correctly.

Page 21: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)

DOCOMO, INC All Rights Reserved

○ Our user's requirements

High Availability

– Do not share control services between data center.

– Avoid Single-Point-Of-Failure (SPOF).

Technical limitations

– Do not change IP addressing and routing architecture.

– Do not use Network Address Translation(NAT).

– Must connect instance and existing equipment by Layer 2.

Performance

– Total throughput requirement is several dozen Gbps.

– Average packet size is smaller than general private clouds.

– Several hundred VLAN must connect.

Our base OpenStack deployment model avoid SPOF already.

Page 22: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)

DOCOMO, INC All Rights Reserved

○ Our user's requirements

High Availability

– Do not share control services between data center.

– Avoid Single-Point-Of-Failure (SPOF).

Technical limitations

– Do not change IP addressing and routing architecture.

– Do not use Network Address Translation(NAT).

– Must connect instance and existing equipment by Layer 2.

Performance

– Total throughput requirement is several dozen Gbps.

– Average packet size is smaller than general private clouds.

– Several hundred VLAN must connect.

Existing system's IP addressing and routing architecture can

deploy on the overlay network.

Page 23: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)

DOCOMO, INC All Rights Reserved

○ Our user's requirements

High Availability

– Do not share control services between data center.

– Avoid Single-Point-Of-Failure (SPOF).

Technical limitations

– Do not change IP addressing and routing architecture.

– Do not use Network Address Translation(NAT).

– Must connect instance and existing equipment by Layer 2.

Performance

– Total throughput requirement is several dozen Gbps.

– Average packet size is smaller than general private clouds.

– Several hundred VLAN must connect.

NAT is must technique for floating IP and connecting the external network.

But, This system does not request the floating IP address function.

Page 24: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)

DOCOMO, INC All Rights Reserved

○ Our user's requirements

High Availability

– Do not share control services between data center.

– Avoid Single-Point-Of-Failure (SPOF).

Technical limitations

– Do not change IP addressing and routing architecture.

– Do not use Network Address Translation(NAT).

– Must connect instance and existing equipment by Layer 2.

Performance

– Total throughput requirement is several dozen Gbps.

– Average packet size is smaller than general private clouds.

– Several hundred VLAN must connect.Our base OpenStack deploy model is using L3 ECMP fabric and VXLAN.

So, We choose VXLAN Layer 2 Gateway(L2 Gateway).

Page 25: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)

DOCOMO, INC All Rights Reserved

○ Our user's requirements

High Availability

– Do not share control services between data center.

– Avoid Single-Point-Of-Failure (SPOF).

Technical limitations

– Do not change IP addressing and routing architecture.

– Do not use Network Address Translation(NAT).

– Must connect instance and existing equipment by Layer 2.

Performance

– Total throughput requirement is several dozen Gbps.

– Average packet size is smaller than general private clouds.

– Several hundred VLAN must connect.

Software based VXLAN L2 Gateway does not match short packet workload.

So, We choose using hardware based VXLAN L2 Gateway.

Page 26: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)

DOCOMO, INC All Rights Reserved

○ Equipment selection

Hardware VTEP

– Modern L3 switch chipset has Hardware VTEP gateway functions.

Intel FM6000, Broadcom Trident II, Trident II+, Tomahawk

– We tried to examine both Intel FM6000 and Broadcom Trident II based L3

network switch.

– Finally, we compared three vendors' L3 switches. (A, D, J)

Comparison result

– Vendor A's L3 switch can support VXLAN within a Multi-Chassis LAG

(MLAG) deployment. Other vendors can not. (as of June 2016)

– All vendors' L3 switches cleared performance criteria.

– All vendors OVSDB protocol support has some issues.

We choose vendor A's L3 switch. Because they support MLAG.

Page 27: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)

DOCOMO, INC All Rights Reserved

○ Software test and proof-of-concept

Test target

– Neutron networking-l2gw

API's and implementations to support L2 Gateways in Neutron.

– Networking-l2gw provides "L2GW Service Plugin" and "L2GW Agent".

"L2GW Service Plugin" provides L2GW API services.

"L2GW agent" controls L3 switch by OVSDB protocol.

Test results

– Several minor bugs (Already fixed by the community.)

– Missing of features that is required for the production environment.

SSL support (Already implemented by the community.)

Handling Mcast_Macs_Remote table (We created modified patch for

vendor A based on community patch, not merged yet.)

Page 28: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)

DOCOMO, INC All Rights Reserved

○ Controller Node

Neutron Server

L2GW Service

Plugin

API

ML2

L3

Nova

Keystone

Glance

Cinder

Horizon

ML2 L2POP

Compute Node

ML2 OVS

Agent

Open vSwitch

VTEP

Virtual

Switch

OVSDB

Server

ML2 L2POP

A’s Management Virtual applianceNetwork Node

L2GW

Agent

ML2 OVS Agent

Open vSwitchVTEP

L3

Agent

A’s

Hardware VTEP

WAN

Hardware

VTEP

Virtual Router

VLAN VLAN

OVSDB

ServerOVSDB

protocol

OVSDB

Server

Control

Page 29: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)

DOCOMO, INC All Rights Reserved

○ Result of pilot tests with as scale as production environment

Hardware L2GW side

– OVSDB Server crash issues

When inserting a large number of record at one time, OVSDB server has crashed. (This issue already fixed by the vendor.)

Networking-L2GW side

– We encountered several critical bugs.

But It is hard to reproduce.

When hit these bugs, L2GW agent stopped.

– L2GW agent recovery from a crash state is terrible.

L2 gateway agent always syncs state between neutron database and OVSDB.

Unfortunately, when L2GW agent crashed or stalled, these two databases sometimes lost sync.

So, We must re-register L2GW connections manually when met these bugs.

Page 30: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)

DOCOMO, INC All Rights Reserved

o Network trouble occurred without missing a week.

The L2GW agent is unstable.

The l2gw agent does not work correctly after few days when users run a long

test.

• That test includes continuous instance creating and deleting.

• That test includes continuous CRUD testing for neutron virtual network

port.

The instance could not communicate another region instance and existing

equipment.

Page 31: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)

DOCOMO, INC All Rights Reserved

o We decided to not use Networking-l2gw in the production

environment for the time being.

We could not reproduce connection troubles between OVSDB and the

L2GW agent that occurred a weekly.

We could not fix all critical bugs that we encountered.

In our environment, a port status issue occurred.

That issue will cause L2GW agent problem.

We would not like to use the l2population.

The l2population does not have enough scalability yet.

Keep to a delivery date.Delivery date delay. aka. Our project death.

Page 32: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)

DOCOMO, INC All Rights Reserved

o Our solution

We created a manual procedure to manage L2GW.

Manage an OVSDB by CLI

• "vtep-ctl" command

Manage an OVS flow table by CLI

• "ovs-ofctl" command

We created an automation system based on the manual procedure.

The upper management system calls the system.

This system is working correctly now.

Page 33: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)

DOCOMO, INC All Rights Reserved

o Results

We provide stable L2GW which is connecting the two regions instances

and existing equipment.

We passed all test criteria provided from the client.

Gets excellent service flexibility by the OpenStack.

The existing network configuration was kept that our user requested.

o Next challenges of our L2GW project

Fix known issues of networking-l2gw.

We would like to provide OpenStack API for managing L2GW.

Fix scalability issue of l2population.

Investigate EVPN for expanding L2GW services.

Page 34: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)

Copyright©2016 NTT DOCOMO, INC. All rights reserved.

GPU instance

Machine learning

on OpenStack

Page 35: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)

DOCOMO, INC All Rights Reserved

Nvidia Tesla M40

for CUDA/cuDNN

Page 36: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)

DOCOMO, INC All Rights Reserved

o Deploy

To deploy GPU nodes, not only PCI pass-through functions but also

IOMMU functions must be enabled.

Enable PCI pass-through

Whitelist, alias, flavor

See OpenStack wiki*

Enable IOMMU

Add grub settings to grub.conf

GRUB_CMDLINE_LINUX_DEFAULT=“$GRUB_CMDLINE_LINUX_DEFAUL

T intel_iommu=on”

dmseg | grep –e DMAR –e IOMMU

* https://wiki.openstack.org/wiki/Pci_passthrough

Page 37: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)

DOCOMO, INC All Rights Reserved

o Operate : Take care of flavor memory size

As a result of our verification, IOMMU allocates all memory resources

when instances are launched.

If you set flavor memory size large enough and launch maximum number

of instances, OOM-killer might kill the qemu process.

Swapping doesn't work well because IOMMU allocate memory too fast.

Normal compute node

Mem space

IOMMU-enabled

compute node

Mem space

HostOS

HostOS

Instance

AInstance B

Instance A Instance B

Allocated

arter

ballooning

Page 38: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)

DOCOMO, INC All Rights Reserved

o A workarounds to this problem are to take enough margin for

host OS.

Reduce flavor memory size

Sometimes too uncomfortable for GPU users

Set reserved_host_memory_mb in nova.cfg to large size

Also affect other flavors

Decrease maximum number of instances on per host

→Any other solutions? Help us!

Page 39: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)

DOCOMO, INC All Rights Reserved

o How should we offer GPU flavor to in-house users?

o GPU with OpenStack, pros/cons

As a point of Pros. Cons.

Virtualization More stable than container Some GPU card trouble needs host

reboot

Immutableness Fast deploy, fast PDCA cycle Difficulty fair sharing GPU resources

Preparation

before

run machine

learning

Can provide device driver and

CUDA pre-installed image file

Need to follow new version, new

combination of driver, guest OS, CUDA

ver, library ver…

Cooperate with GPU instance users is important for private cloud providers

Page 40: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)

Copyright©2016 NTT DOCOMO, INC. All rights reserved.

Reference Model

Page 41: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)

DOCOMO, INC All Rights Reserved

Aim to migrate some of in-house APP to our cloud

Problem Strict security policies:over 100 guidelines for system architecture

when users reconstruct APP on our cloud

A lot of efforts are required to meet the policies.

Nice if predefined models are provided:)

Monitoring

Security

vulnerabilityCertificate

Remote Access

Identification and

authentication IDS/IPS

Log

Encryption

Server Network Storage Operation

Firewall

Redundancy

BackupConfiguration

management

… … … …

Page 42: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)

DOCOMO, INC All Rights Reserved

“Reference Model” on our cloud System architecture based on many of security policies

Sets of OSS stacks that have been heavily tested on our project

Our cloud

Web three-tier

model

Heat template• Web basic model

• Web three-tier model

Input Template file into heat

Virtual router

Virtual network

Jump server

Proxy server

Web basic

model

Virtual router

Virtual network

LB/Web server

Jump server

Proxy ServerLB/Web server

DB ServerWEB/LBJump

…...

AP server

Images

Mechanism

Page 43: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)

DOCOMO, INC All Rights Reserved

System architecture of Reference Model

WEB/LB WEB/LB

DB DB

Storage Storage

Internet

AP AP

Backup

Public-NW 192.168.10.0/24

Private-NW 192.168.20.0/24

Management-NW 192.168.30.0/24

SSL termination SSL termination

LB

VI

P

DB

VI

P

VPNSSL -VPNProxy/NT

PMonitoring

Storage

LB-HA-NW

DB-HA-NW

DB-repl-NW

End User

HTTPS

Operator

VPN

Page 44: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)

DOCOMO, INC All Rights Reserved

WEB(Apache)/LB(LVS w/ Ultramonkey) server HTTPS, dummy certificates installed by default

WAF for IDS/ IPS

The key point is to not only install, but also complete the

default setting about security.

Why didn’t we use LBaaS_v1 ?

LBaaS_v1(juno) doesn’t satisfy with use cases of our users.

Required to

• set security group to LB(LBaaS_v2:not yet)

• terminate SSL at LB(LBaaS_v2:done)

• provide sorry page (LBaaS_v2:not yet)

Page 45: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)

DOCOMO, INC All Rights Reserved

VPN(OpenVPN) server SSL-VPN for secure remote access

Tools for operation of SSL-VPN, such as create and revoke certificates

Why didn’t we use VPNaaS_v1 ?

The algorithm for authentication in IKE phase1 accepted sha1, which

will be encryption losing safety assurance.

VPNaaS in recent version “Newton” accepted sha256.

Page 46: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)

DOCOMO, INC All Rights Reserved

The covered areas by “Reference Model”

60% of security policies

Future Update this model by adding the missing parts about security policies

Aims to cover 100%

Monitoring

Security

vulnerabilityCertificate

Remote Access

Identification and

authentication IDS/IPS

Log

Encryption

Server Network Storage Operation

Firewall

Redundancy

BackupConfiguration

management

… … … …

Page 47: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)

Copyright©2016 NTT DOCOMO, INC. All rights reserved.

Security Update

Page 48: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)

DOCOMO, INC All Rights Reserved

Current daily operation about vulnerabilities Most of the operation is manual.

Check vulnerabilities

Risk assessment of vulnerabilities

Management TODO list

Update our cloud

Caused

Human error

• Forget to check vulnerabilities

Time:1hours/day

More important operation of security as our cloud expands

Nice if these operations can be automated:)

Page 49: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)

DOCOMO, INC All Rights Reserved

Current Operation

We proposed how to be automatic through processes.

Testing to enable us to reduce human error

Check

vulnerabilities

Risk

assessmentManagement

TODO list

Update

our cloud

Semi-

automatic

OperationBy the script checking

package-version related

with vulnerabilities

By checking

vendor siteBy Excel

Semi-

automatic

By making ansible

playbook

Page 50: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)

DOCOMO, INC All Rights Reserved

Check vulnerabilities CVE & CVSS

CVE: attached ID to vulnerabilities

CVSS: score to vulnerabilities

API “CVE-search” is used for check Github: https://github.com/cve-search/cve-search

Page 51: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)

DOCOMO, INC All Rights Reserved

Risk Assessment Key point

CVSS risk assessment is not always match with our environment.

Important

The usage and version of package(→script) Whether the host can be internal NW or not.

Vulnerability that guest OS can invade host OS

Need to re-evaluate the CVSS score for each host regarding its

environment

Page 52: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)

DOCOMO, INC All Rights Reserved

Management TODO list Do not forget vulnerabilities which have high risk until the patch of the

vulnerability is applied.

Important

Even if the CVSS score is low, it will sometimes become high score

in our environment.

Need to check the same vulnerabilities continuously

Page 53: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)

DOCOMO, INC All Rights Reserved

Update Our cloud Semi-automatic Procedure

Manual interventions are required only for check points

Consider the influence on users’ instance

Future Apply our proposed way in the test environment at first

Extend our tools for user’s self check

Live Migration

users’ instance

Security

Update

Return back

user’s instance

Check point①The normality

of User’s APP

Check point②The normality of

Openstack function

Check point③The normality of

User’s APP

Page 54: NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT DOCOMOs Private Cloud」 - OpenStack最新情報セミナー(2016年12月)

DOCOMO, INC All Rights Reserved

Thank you for listening!