fault localization (pinpoint) project proposal for opnfv september 2015 version 0.8 1

12
Fault Localization (Pinpoint) Project Proposal for OPNFV September 2015 Version 0.8 1

Upload: liliana-powers

Post on 17-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Fault Localization (Pinpoint) Project Proposal for OPNFV September 2015 Version 0.8 1

1

Fault Localization (Pinpoint)Project Proposal for OPNFV

September 2015Version 0.8

Page 2: Fault Localization (Pinpoint) Project Proposal for OPNFV September 2015 Version 0.8 1

2

Fault Localization – Overview

• The process of deducing the exact source of a failure from a set of observed indications– A set of algorithms– A set of APIs– Focus on cloud NFV networking– Extendable to compute and storage

• Fault localization is also known as fault isolation, alarm/event correlation, and root cause analysis (RCA)

Page 3: Fault Localization (Pinpoint) Project Proposal for OPNFV September 2015 Version 0.8 1

3

VM2

Hypervisor

vSwitch

ToR Switch

NIC

NIC

Hypervisor

vSwitch

ToR Switch

NIC

VM1

VNF1

VNF2

Failure:Network Function

Doesn’t work

Probable cause: iptables not configured

Probable cause: MTU size

misconfiguration

Probable cause: NIC failure

Fault Localization (FL) – Example• VNF #2 indicates that it is not working (no sessions, no network

connectivity etc.)• Several causes may result this: iptables, MTU and NIC failure problems• The FL process should find the exact source problem !

Page 4: Fault Localization (Pinpoint) Project Proposal for OPNFV September 2015 Version 0.8 1

4

Fault Localization APIs

Fault Localization System(Set of analysis methods)

Fault/PerformanceInformation sources• Events• Alarms• Statistics• Logs

System configuration

• Expected/desired configuration as known by the CMS

System models• Layering• Dependencies• Topology• Connectivity• Policy

System OAM tools• Active tools like ping,

trace etc.)

Get infoGet info Get infoSet config

Set testGet test-info

Find root cause(s)Find correlated failures

Root cause(s)Correlated failures

User/System

Page 5: Fault Localization (Pinpoint) Project Proposal for OPNFV September 2015 Version 0.8 1

5

Fault Localization System(Set of analysis methods)

Get infoGet info Get infoSet config

Set testGet test-info

Find root cause(s)Find correlated failures

Root cause(s)Correlated failures

User/System

Fault Localization in OpenStack

Neutron/NovaCeilometer/

Monasca/ ExternalNeutron/ Nova/

ExternalNeutron/Nova

SDN Controller

Fault/PerformanceInformation sources• Events/ Alarms• Statistics• Logs• Prediction

System configuration

• Expected/desired configuration as known by the CMS

System models• Layering• Dependencies• Topology• Connectivity• Policy

System OAM tools• Active tools like ping,

trace etc.)

Page 6: Fault Localization (Pinpoint) Project Proposal for OPNFV September 2015 Version 0.8 1

6Neutron/Nova

Ceilometer/ Monasca

Neutron/ Nova/ External

Neutron/Nova/Cinder

etc.

Relationships with other projects(1)

Fault Localization System(Set of analysis methods)

Fault/PerformanceInformation sources• Events• Alarms• Statistics• Logs

System configuration

• Expected/desired configuration as known by the CMS

System models• Layering• Dependencies• Topology• Connectivity• Policy

System OAM tools• Active tools like ping,

trace etc.)

Get infoGet info Get infoSet config

Set testGet test-info

Find root cause(s)Find correlated failures

Root cause(s)Correlated failures

User/System

Yard

stick

DoctorBottleneck

Page 7: Fault Localization (Pinpoint) Project Proposal for OPNFV September 2015 Version 0.8 1

7

Relationships with other projects (2)

• Projects underway or being proposed in OPNFV:– Doctor:

• The Doctor project is focused on fault notification but has also some notion of event aggregation. In this context, it can be one of the inputs for the Pinpoint project

– Yardstick:• Configuration verification testing project. Provide a testing frame work and

several basic testing methods. These could be used as possible OAM tools framework for the Pinpoint project

– Bottleneck:• This project aims automated testing environment as part of deployment to

figure out system bottlenecks and performance in staging phase before deployment. It is oriented to performance and focus on staging phase.

Page 9: Fault Localization (Pinpoint) Project Proposal for OPNFV September 2015 Version 0.8 1

9

Reference in ONUG RFI Requirements• Requirement for fault correlation in

Network State Collection, Correlation and Analytics Product/RFI Requirements – May,2015

Page 10: Fault Localization (Pinpoint) Project Proposal for OPNFV September 2015 Version 0.8 1

10

Proposed Project Scope

Fault Localization

Neutron Ceilometer Others

SDN Controller

Openstack Services

Project Scope VIM

NFVI

1 2 3

4 5

6

7

VNF/ VNFM

Config, OAM, Topology

Statistics

Page 11: Fault Localization (Pinpoint) Project Proposal for OPNFV September 2015 Version 0.8 1

11

Proposed Project Scope - cont• Focus on networking fault-localization APIs for network connectivity faults

• Use cases : Service continuity, Network load based placement and migration

• In scope:– Network fault localization requirements in virtual environment– Gap analysis for the APIs for the above use cases e.g :

• API for root-cause of a connectivity problem between VNF/VMs• API for OAM tools for Ethernet/IP technologies• API to retrieve network topology information• API for fault and performance collection engines

– Active tests and statistics retrieval required for the above use cases

• Future extensions– Extend the APIs for

• Fault localization requirements for compute and storage• Other OAM tools

– POC that will include simple fault localization analysis logic as reference implementation– Extend for upper layers of NFV (along side with OPNFV evaluation)

Page 12: Fault Localization (Pinpoint) Project Proposal for OPNFV September 2015 Version 0.8 1

12

Thank You !