Ang's slides

Download Report

Transcript Ang's slides

Differential Provenance:
Better Network Diagnostics with Reference Events
Ang Chen
Yang Wu
Andreas Haeberlen
Wenchao Zhou+
University of Pennsylvania
Boon Thau Loo
Georgetown University+
Motivation: Finding the root cause of a symptom
Traffic arriving
at the wrong
server !?!
Overly specific flow entry
Internet
4.3.2.0/24
4.3.3.0/24
Bob
Web server 2
Web server 1
•
DPI
Networks can (and frequently do!) have bugs
•
•
Example: Software-defined networks
We need a good debugger!
1
Debugging networks with provenance
C received packet
Packet P Packet P
B sent packet
A
B
C
B received packet
Rule match on B
A sent packet
A received packet
•
•
Rule match on A
Typical debuggers tell us what happened:
•
•
Rule installed by
controller
Incoming packet
at controller
NetSight: Packet histories
Y!: Network provenance
Key benefit: Rich explanation of what, when, and why.
2
Problem: Explanation can be too big!
Rule 7:
Next-hop=port2
ot
ro
root
Root cause:
faulty rule
Packet arrives
at wrong server
•
The problem: Finding the root cause in a large
provenance tree.
3
Key insight: Use reference events!
S1
S2
S3
S4
S5
S6
Web server 2
Web server 1
•
•
•
Bob
DPI
Remember that some packets were routed correctly.
The same things should have happened to all packets!
Key insight: If we have both a (bad) symptom and a (good)
reference, we only need to reason about the differences
between them!
4
A new debugger
fault
Field 3 of config
entry 4 is wrong!
Bob
reference
Debugger
•
•
•
Bob collects both a bad symptom and a good reference
Bob sends both events to the debugger
Debugger generates provenance, outputs difference
• Ideally, there is only one diff—the root cause!
5
Outline
-
Motivation: Network diagnostics
Background
Key insight
A new debugger
Differential provenance
-
Are references typically available?
Strawman approach
Our approach
Initial results
- Conclusion
6
Are references typically available?
•
•
Survey:
•
•
•
Posts on the ‘Outages’ mailing list in Sept-Dec 2014.
64 posts related to diagnostics.
42/64 (66%) posts involve both a fault and some reference.
Examples:
•
•
•
Some DNS servers have stale records, but others are good
Probes sometimes fail, sometimes succeed
More examples in the paper
7
Strawman solution
faulty rule
-
root
root
= ?
new
root
Bad provenance
Reference provenance
root
•
A strawman solution: Pick out different nodes in trees.
•
•
•
Bad provenance: 201 nodes
Reference provenance: 156 nodes
Naïve diff: 278 nodes!
8
Why does the strawman not work?
Faulty rule
•
•
•
Observation: The diff can be larger than the individual trees.
Reason #1: Differences that “do not matter”
•
E.g., timestamps, packet payloads, etc.
Reason #2: “Butterfly effect”
•
A small difference can change later events drastically!
9
Differential provenance
Output:
- Rule 7: change port
- Rule 9: change range
•
Bad provenance
Reference provenance
Approach: Change past events, and think about what
could have happened.
• (1) Find some early ‘differences’ in the trees.
• (2) Change the faulty node to a correct equivalent.
• (3) Use replay to determine what would have happened.
• (4) Output the set of changes that align the trees.
10
Technical challenges
•
•
•
•
Challenge #1: Where do we start?
•
•
Heuristics: Change early events, minimum changes…
E.g., prefer changing 1 event than 1000 events.
Challenge #2: How should we make the change?
•
•
Approach: Think about what should have happened.
E.g., packet should go to switch 2, not 1.
Challenge #3: Irrelevant differences?
•
•
Approach: Equivalence relations between events.
E.g., IPs 4.3.2.1 and 4.3.3.1
See paper for more details.
11
Setup
Overly specific flow entry
Internet
Web server 1
•
4.3.2.0/24
4.3.3.0/24
DPI
Setup
•
•
•
•
Platform: RapidNet
SDN: 6 switches, 2 servers
The symptom: misrouted packets from 4.3.2.0/24
The reference: packets from 4.3.3.0/24
12
Initial results
=
new
root
Fault: 201 nodes
Naïve diff
root
Reference: 156 nodes
=
Rule 7: next hop should
be port 1, not 2!
Differential provenance
•
Differential provenance finds a single node (the faulty
rule) to be the root cause!
13
Conclusion
•
•
•
•
Debugging networks is hard
•
Need good debuggers!
Provenance can find the causes of an event
•
Problem: Explanation can be too detailed.
Idea: Use reference events
•
•
Sufficient to find the (few) differences to the observed symptom
New debugger based on differential provenance
Result: Very precise diagnostics
•
Ideally, can identify a single root cause!
Thanks!
14