1 opnfv summit 2015 doctor - fault management gerald kunzmann, docomo carlos goncalves, nec ryota...
TRANSCRIPT
1
OPNFV Summit 2015
Doctor - Fault
ManagementGerald Kunzmann, DOCOMO
Carlos Goncalves, NEC
Ryota Mibu, NEC
2
Doctor Overview
• Goal
– Build fault management and maintenance framework
• Approach
– Identify requirement– Gap Analysis– Implementation work in Upstream (OpenStack)– Integration and testing
• Status
– Initial Requirement study, architecture design, Gap analysis : Done– Collaborative Development: On-going (3 merged Blueprints in
OpenStack Liberty)– Standardization Sync: On-going (by NFV member efforts, joint meeting)
3
Doctor Members
• At project creation (Dec 2014)
– NTT DOCOMO, Sprint– NEC, Nokia, Ericsson, Huawei, ClearPath Network, Cisco
• Now (Oct 2015)
– NTT DOCOMO, Sprint, AT&T, Telecom Italia, KDDI– NEC, Nokia, Ericsson, Huawei, ClearPath Network, Cisco
Cloudbase Solutions, Spirent, Intel, ZTE
2x
4
Assumption of VNF (NFV Application)
• Telco Applications basically deployed in active-standby or active-active fashion
App (Active) App (Standby)
VM VM
Machine Machine
App and App Manager
(VNFM) cannot detect HW
failures directly
App state will be switched when failure
occurred
5
Consumer C1 Consumer C2 Consumer C3
Virtualized Infrastructure Manager (VIM), e.g. OpenStack
Resource Map
Server – VM mappingServer S1 VM-1, VM-2Server S2 VM-7Server S3 VM-4
Ownership informationVM-1, VM-7 Consumer C1VM-2 Consumer C2VM-4 Consumer C3
Resource Pool
Hypervisor
Hardware Server S1
VM-1
Hypervisor
Hardware Server S2
Hypervisor
Hardware Server S3
VM-2 VM-7 VM-4
X1. Fault Monitoring
- Hardware fault- Hypervisor fault- Host OS fault
6. Execute Instruction- e.g. migrate VM
2. Inform the Consumer?If YES, find owner of
affected VMs from database
OpenStack Northbound Interface
3. FaultNotification(VM ID, Fault ID)
5. Instruction(VM ID)
4. Switch to SBY configurationV
Use Case 1: Fault management
6
Consumer C1 Consumer C2 Consumer C3
Virtualized Infrastructure Manager (VIM), e.g. OpenStack
Resource Map
Server – VM mappingServer S1 VM-1, VM-2Server S2 VM-7Server S3 VM-4
Ownership informationVM-1, VM-7 Consumer C1VM-2 Consumer C2VM-4 Consumer C3
Resource Pool
Hypervisor
Hardware Server S1
VM-1
Hypervisor
Hardware Server S2
Hypervisor
Hardware Server S3
VM-2 VM-7 VM-4 6. Execute Instruction- e.g. migrate VM
OpenStack Northbound Interface
3. Maintenance Notification
(VM ID)5. Instruction(VM ID)
4. Switch to SBY configuration
V
2. Which VMs are affected?Find Consumer owning the VM(s) from the database.
Administrator
1. Maintenance Request (Server S3)
Use Case 2: Maintenance
7
Fault Management Sequence
Virtualized Infrastructure
Applications
VIM User and Administrator
Virtualized Infrastructure Manager (VIM)= OpenStack
Virtual Comput
e
Virtual Storage
Virtual Network
Virtualization Layer
Hardware Resources
App App App
Detectio
n
Reaction
Doctor Scope
8
Key Requirements as VIM
Immediate Notification
Consistent Resource State
Awareness
Extensible Monitoring
Fault Correlation
9
Doctor Architecture and Typical Scenario
Monitor
Notifier
Manager
Virtualized Infrastructure
(Resource Pool)
AlarmConf.
3. Update State2. Find Affected
Application
ControllerController
Controller
Resource Map
1. Raw Failure
Inspector
4. Notify all
5. Notify Error
0. Set Alarm
6-. Action
Failure Policy
MonitorMonitor
10
Doctor OSS Map
Monitor
Notifier
Manager
Virtualized Infrastructure
(Resource Pool)
AlarmConf.
3. Update State2. Find Affected
Application
ControllerController
Controller
Resource Map
1. Raw Failure
Inspector
4. Notify all
5. Notify Error
0. Set Alarm
6-. Action
Failure Policy
MonitorMonitor
Ceilometer
e.g. Monasca
e.g. Zabbix
Cinder
Neutron
Nova
11
Doctor OSS Development
Monitor
Notifier
Manager
Virtualized Infrastructure
(Resource Pool)
AlarmConf.
3. Update State2. Find Affected
Application
ControllerController
Controller
Resource Map
1. Raw Failure
Inspector
4. Notify all
5. Notify Error
0. Set Alarm
6-. Action
Failure Policy
MonitorMonitor
Ceilometer
Event Alarm
Cinder
Neutron
Nova
State Correction
e.g. Zabbix
e.g. Monasca
12
Doctor Blueprints in Liberty Cycle
Project BlueprintSpec Drafter
Developer Status
Ceilometer
Event Alarm EvaluatorRyota Mibu (NEC)
Ryota Mibu (NEC)
Completed (Liberty)
Nova
New nova API call to mark nova-compute down
Tomi Juvonen (Nokia)
Roman Dobosz (Intel)
Completed (Liberty)
Support forcing service downTomi Juvonen (Nokia)
Carlos Goncalves (NEC)
Completed (Liberty)
Get valid server stateTomi Juvonen (Nokia)
Spec approved (Mitaka)
Add notification for service status change
Balazs Gibizer (Ericsson)
Balazs Gibizer (Ericsson)
Waiting for spec approval (Mitaka)
✓
✓
✓
13
Doctor BP Detail: Nova – Mark Nova-Compute Down
Host / Machine
Hypervisor
VM
nova comput
e
nova api
nova conduct
or
nova schedule
r
nova DBqueu
e
External Monitoring
Service
vSwitch
BMC
EXISTING(periodic update)
Force-down API
NEW APIto update nova-computeservice state
service state
MonitoringClient
14
Doctor BP Detail: Ceilometer - Event Alarm
sample
Notification-driven alarm
evaluatorNEW Shortcut(notification-based)
EXISTING(polling-based)
Manager
Audit Service
stats
notification
event
CinderNeutro
nNova
15
Doctor Southbound API
UserNFVI
Conf.Polic
yControlle
rInspector Notifier
Admin
Conf.
Monitor
ConfigurationFault Messaging
Unified Event API Monitor
Monitor
Threshold
Enable
Enable
16
Doctor Status
Notifier MonitorController Inspector
Ceilometer Z
abbixNova
Monasca? DPD
K
Neutr
on
Cin
der
Done
Next
Ste
pTo-Be Arch.
Design
Gap Analysis
Blueprint
Coding
Integration
OPNFV Release
Dec 2014
Sep 2015
Feb 2016
Mar 2015
17
Don’t miss out...• “Doctor – Fault Management”
Project Theater, Wednesday, 3:55 pm – 4:15 pm
• “Doctor: Failure Detection and Notifiaction for NFV” DOCOMO booth, PoC Demo Zone