motivation: finding the root cause of a symptom

15
Differential Provenance: Better Network Diagnostics with Reference Events Ang Chen Yang Wu Andreas Haeberlen Wenchao Zhou + Boon Thau Loo University of Pennsylvania Georgetown University +

Upload: teresa-riley

Post on 08-Jan-2018

221 views

Category:

Documents


3 download

DESCRIPTION

Debugging networks with provenance C received packet Packet P Packet P B sent packet A B C B received packet Rule match on B Rule installed by controller A sent packet A received packet Rule match on A Incoming packet at controller Typical debuggers tell us what happened: NetSight: Packet histories Y!: Network provenance Key benefit: Rich explanation of what, when, and why.

TRANSCRIPT

Page 1: Motivation: Finding the root cause of a symptom

Differential Provenance:Better Network Diagnostics with Reference Events

Ang Chen Yang WuAndreas Haeberlen Wenchao Zhou+ Boon Thau Loo

University of Pennsylvania Georgetown University+

Page 2: Motivation: Finding the root cause of a symptom

2

Motivation: Finding the root cause of a symptom

• Networks can (and frequently do!) have bugs• Example: Software-defined networks• We need a good debugger!

Web server 1 DPIWeb server 2

Overly specific flow entry

InternetBob

Traffic arriving at the wrong server !?!

4.3.2.0/244.3.3.0/24

Page 3: Motivation: Finding the root cause of a symptom

3

Debugging networks with provenance

• Typical debuggers tell us what happened:• NetSight: Packet histories• Y!: Network provenance

• Key benefit: Rich explanation of what, when, and why.

A B C

C received packetB sent packet

B received packet

Rule match on B

A sent packetA received packet

Rule match on A

Packet P

Packet P

Rule installed by controllerIncoming packetat controller

Page 4: Motivation: Finding the root cause of a symptom

4

Problem: Explanation can be too big!

Root cause:faulty rule

• The problem: Finding the root cause in a large provenance tree.

root

Rule 7:Next-

hop=port2

Packet arrives at wrong server

Page 5: Motivation: Finding the root cause of a symptom

5

Key insight: Use reference events!

• Remember that some packets were routed correctly.• The same things should have happened to all

packets!• Key insight: If we have both a (bad) symptom and a

(good) reference, we only need to reason about the differences between them!

Web server 1 DPIWeb server 2

S1 S2 S3 S4 S5

S6

Bob

Page 6: Motivation: Finding the root cause of a symptom

6

fault

reference

Field 3 of configentry 4 is wrong!

A new debugger

• Bob collects both a bad symptom and a good reference

• Bob sends both events to the debugger• Debugger generates provenance, outputs

difference• Ideally, there is only one diff—the root cause!

Bob

Debugger

Page 7: Motivation: Finding the root cause of a symptom

7

Outline- Motivation: Network diagnostics- Background- Key insight- A new debugger- Differential provenance

- Are references typically available?- Strawman approach- Our approach- Initial results

- Conclusion

Page 8: Motivation: Finding the root cause of a symptom

8

Are references typically available?

• Survey: • Posts on the ‘Outages’ mailing list in Sept-Dec 2014.• 64 posts related to diagnostics.• 42/64 (66%) posts involve both a fault and some

reference.

• Examples:• Some DNS servers have stale records, but others are

good• Probes sometimes fail, sometimes succeed• More examples in the paper

Page 9: Motivation: Finding the root cause of a symptom

9

Strawman solution

• A strawman solution: Pick out different nodes in trees.• Bad provenance: 201 nodes• Reference provenance: 156 nodes• Naïve diff: 278 nodes!

- =Bad provenance Reference provenance

?

Page 10: Motivation: Finding the root cause of a symptom

10

Why does the strawman not work?

• Observation: The diff can be larger than the individual trees.

• Reason #1: Differences that “do not matter”• E.g., timestamps, packet payloads, etc.

• Reason #2: “Butterfly effect”• A small difference can change later events drastically!

Faulty rule

Page 11: Motivation: Finding the root cause of a symptom

11

Differential provenance

• Approach: Change past events, and think about what could have happened.• (1) Find some early ‘differences’ in the trees.• (2) Change the faulty node to a correct equivalent.• (3) Use replay to determine what would have happened.• (4) Output the set of changes that align the trees.

Bad provenance Reference provenance

Output: - Rule 7: change port- Rule 9: change range

Page 12: Motivation: Finding the root cause of a symptom

12

Technical challenges• Challenge #1: Where do we start?• Heuristics: Change early events, minimum changes…• E.g., prefer changing 1 event than 1000 events.

• Challenge #2: How should we make the change?• Approach: Think about what should have happened.• E.g., packet should go to switch 2, not 1.

• Challenge #3: Irrelevant differences?• Approach: Equivalence relations between events.• E.g., IPs 4.3.2.1 and 4.3.3.1

• See paper for more details.

Page 13: Motivation: Finding the root cause of a symptom

13

Setup

• Setup• Platform: RapidNet• SDN: 6 switches, 2 servers• The symptom: misrouted packets from 4.3.2.0/24• The reference: packets from 4.3.3.0/24

Web server 1 DPI

Overly specific flow entry

Internet4.3.2.0/244.3.3.0/24

Page 14: Motivation: Finding the root cause of a symptom

14

Initial results

• Differential provenance finds a single node (the faulty rule) to be the root cause!

Fault: 201 nodes

Reference: 156 nodes

Differential provenance

Naïve diff

=

= Rule 7: next hop should be port 1, not 2!

Page 15: Motivation: Finding the root cause of a symptom

15

Conclusion• Debugging networks is hard• Need good debuggers!

• Provenance can find the causes of an event• Problem: Explanation can be too detailed.

• Idea: Use reference events• Sufficient to find the (few) differences to the observed

symptom• New debugger based on differential provenance

• Result: Very precise diagnostics• Ideally, can identify a single root cause!

Thanks!