Introduction to Computer Science

Download Report

Transcript Introduction to Computer Science

Network Simulation and Testing
Polly Huang
EE NTU
http://cc.ee.ntu.edu.tw/~phuang
[email protected]
Dynamics Papers
• Hongsuda Tangmunarunkit, Ramesh Govindan, and Scott Shenker.
Internet path inflation due to policy routing. In Proceedings of the
SPIE ITCom, pages 188-195, Denver, CO, USA, August 2001. SPIE
• Lixin Gao. On inferring automonous system relationships in the
internet. ACM/IEEE Transactions on Networking, 9(6):733-745,
December 2001
• Vern Paxson. End-to-end internet packet dynamics. ACM/IEEE
Transactions on Networking, 7(3):277-292, June 1999
• Craig Labovitz, G. Robert Malan, Farnam Jahanian. Internet
Routing Instability. ACM/IEEE Transactions on Networking,
6(5):515-528, October 1998
Polly Huang, NTU EE
2
Doing Your Own Analysis
• Having a problem
• Need to simulate or to test
• Define experiments
– Base scenarios
– Scaling factors
– Metrics of investigation
Polly Huang, NTU EE
3
Base Scenarios
• The source models
– To generate traffic
• The topology models
– To generate the network
• Then?
Polly Huang, NTU EE
4
Internet Dynamics
• How traffic flow across the network
– Routing
– Shortest path?
Policy routing
• How failures occur
– Packets dropped
– Routes failed
– i.i.d?
Polly Huang, NTU EE
Packet/Route dynamics
5
Identifying Internet Dynamics
Routing Policy
Packet Dynamics
Routing Dynamics
To the best of our knowledge,
we could now generate:
AS-level topology
Hierarchical router-level topology
The Problem
• Does it matter what routing computation we
use?
• Equivalent of
– Can I just do shortest path computation?
Polly Huang, NTU EE
8
Topology with Policy
• Internet Path Inflation Due to Policy Routing
• Hongsuda Tangmunarunkit, Ramesh Govindan, Scott
Shenker
• In Proceedings of the SPIE ITCom, pages 188-195, Denver,
CO, USA, August 2001. SPIE
Polly Huang, NTU EE
9
Paper of Choice
• Methodological value
– A simple ‘re-examine’ type of study
– To strengthen technical value of prior work
• Technical value
– Actual paths are not the shortest due to routing policy.
– The routing policy is business-driven and can be quite
hard to obtain.
– Shown in this paper, for simulation study concerning
large-scale route path characteristics, a simple shortestAS policy routing may be sufficient.
Polly Huang, NTU EE
10
Inter-AS Routing
AS 2
AS 3
source
destination
AS 1
AS 5
AS 4
Polly Huang, NTU EE
shortest
11
Hierarchical Routing
destination
source
Intra-AS shortest
Polly Huang, NTU EE
Inter-AS shortest
12
Flat Routing
shortest
source
Polly Huang, NTU EE
destination
13
5:3
Hierarchical Routing is not optimal
Or
Routes are inflated
How sub-optimal?
Prior Work
• Based on
– An actual router-level graph
– An actual AS-level graph at the same time
– Overlay the AS-level graph on the router-level graph
• Compute
– For each source-destination pair
– Shortest path using hierarchical routing
– Shortest path using flat routing
• Compare route length
– In number of router hops
Polly Huang, NTU EE
16
Prior Conclusions
• 80% of the paths are inflated
• 20% of the paths are inflated > 50%
• There exists a better detour for 50% of the
source-destination pairs
– There exists an intermediate node i such that
Length(s-i-d) < Length(s-d)
Polly Huang, NTU EE
17
This Work
• To address 2 shortcomings
– There’s now a newer router-level graph
– There’s now a more sophisticated policy model
• Paper #4
• Inter-AS routing is not quite ‘shortest-AS routing’
Polly Huang, NTU EE
18
Newer vs. Older Graph
• Inflation difference not the same
– Difference is larger in the newer graph
– Due to the newer graph being larger
• Inflation ratio remains the same
Polly Huang, NTU EE
19
Shortest-AS vs. Policy-AS Routing
• Shortest-AS
– Simplified model
– Every AS is equal
• Policy-AS
– Realistic model
– Not all ASs are the same
• Some are provider ASs
• Some are customer ASs
• Customer ASs do not transit traffic
Polly Huang, NTU EE
20
Consider TANET CHT
UUNET
Through UUNET?
Provider
TANET
Through NTU?
Polly Huang, NTU EE
CHT
NTU
Customer
21
Routing with Constraints
• Routes could be
– Going up
– Going down
– Going up and then down
• Routes can never be
– Going down and then up
Polly Huang, NTU EE
22
Inferring the Constraints
• On Inferring Autonomous System Relationships in the
Internet
• Lixin Gao
• ACM/IEEE Transactions on Networking, 9(6):733-745,
December 2001
Polly Huang, NTU EE
23
Not All ASs the Same
• 2 types of ASs
– Customer
– Provider
• 3 types of Relationships
– Customer-provider
– Provider-provider
• Peer-peer
• Sibling-sibling
Polly Huang, NTU EE
24
Customer-Provider
• Formal definition
– A provider transits for its customer
– A customer does no transit for its provider
• Informal
– Provider: I’ll take any traffic
– Customer: I’ll take only the traffic to me (or my
customers)
Polly Huang, NTU EE
25
Peer-Peer
• Formal Definition
– A provider does not transit for another provider
• Informal
– I’ll take only the traffic to me (or my customers)
– You’ll take only the traffic to you (or your customers)
Polly Huang, NTU EE
26
Sibling-Sibling
• Formal Definition
– A provider transits for another provider
• Informal
– I’ll take any traffic
– You’ll take any traffic
Polly Huang, NTU EE
27
Never “Going Down and then Up”
• A provider-customer link can be followed by only
– Provider-customer link
– (Or sibling-sibling link)
• A peer-peer link can be followed by only
– Provider-customer link
– (Or sibling-sibling link)
Polly Huang, NTU EE
28
Heuristics
• Compute out-degrees
• For each AS path in routing tables
– 1st AS with the max degree the root of hierarchy
– From the root, drawing providercustomer
relationship down 2 ends of the AS path
Polly Huang, NTU EE
29
Determining Siblings
• After gone through all AS paths
• Any AS pair being both provider and
customer to each other are siblings
Polly Huang, NTU EE
30
Determining Peers
• Do another pass on the AS paths in routing
tables
• For each AS path
– Top AS who does not have sibling relationships
with the neighboring ASs
– Could have peering relationship with the higher
out-degree neighbor
– Given the Top AS and the higher out-degree
neighbor are comparable in out-degree
Polly Huang, NTU EE
31
Back to Path Inflation
• Draw the customer-provider, peer-peer, and
sibling-sibling relationships on the overlay AS
graph
• Compute the best routes under the ‘never going
down and then up’ constraint
• Compare the inflation difference and ratio again
with these running at the inter-AS level
– Shortest
– Policy
Polly Huang, NTU EE
32
Shortest vs. Policy Routing
• Pretty much the same both in terms of
– Inflation difference
– Inflation ratio
Polly Huang, NTU EE
33
Therefore
• The observations from the prior work holds
– With a newer graph
– With the more realistic inter-AS policy routing
Polly Huang, NTU EE
34
Now forget path inflation
How far away is the shortest to the
policy inter-AS routing?
Shortest vs. Policy
• In AS hops
– 95% paths have the same length
– Policy routes always longer
• In router hops
– 84% paths have the same length
– Some policy routes longer, some shorter
Polly Huang, NTU EE
36
95% and 84% are pretty good numbers
Therefore shortest path at the interAS level might be OK…
To Answer the Question
• Can we simply do shortest path
computation?
– A likely yes for AS-level graph
– A firm no for hierarchical graph
• Must separate inter-AS shortest and intra-AS
shortest
Polly Huang, NTU EE
38
Questions?
Identifying Internet Dynamics
Routing Policy
Packet Dynamics
Routing Dynamics
It’s never a perfect world…
The Problem
• But how perfect is the Internet?
• The Internet
– A network of computers with stored information
– Some valuable, some relevant
– You participate by putting information up or getting
information down
– From time to time, you can’t quite do some of these
things you want to do
Polly Huang, NTU EE
42
Why is that?
At the philosophical level…
Humans are so bound to failures.
And the Internet is human-made.
But, Seriously…
Consider loading a Web page
Web Surfing Failures
• The ‘window’ waving forever?
• An error message saying network not
reachable
• An error message saying the server too busy
• An error message saying the server is down
• Anything else?
Polly Huang, NTU EE
46
Network Specific Failures
• The ‘window’ waving forever?
• An error message saying network not
reachable
• An error message saying the server too busy
• An error message saying the server is down
• Anything else?
Polly Huang, NTU EE
47
The Causes
• The ‘window’ waving forever
– Congestion in the network
– Buffer overflow
– Packet drops
• An error message saying network not reachable
–
–
–
–
Network outage
Broken cables, Frozen routers
Route re-computation
Route instability
Polly Huang, NTU EE
48
Back to the Problem
• But how perfect is the Internet?
• Equivalent of
– Packets can be dropped
• How frequent
• How much
– Routes may be unstable
• How frequent
• For how long
Polly Huang, NTU EE
49
Significance
• Knowing the characteristics of packet drops
and route instability helps
– Design for fault-tolerance
– Test for fault-tolerance
Polly Huang, NTU EE
50
There are tons of formal/informal
study on the dynamics…
Let’s take a look at a couple that are classical
Packet Dynamics
• End-to-End Internet Packet Dynamics
• Vern Paxson
• ACM/IEEE Transactions on Networking, 7(3):277-292,
June 1999
Polly Huang, NTU EE
52
Emphasis in Reverse Order
• Real subject of study
– Packet loss
– Packet delay
• Necessary assessment
– The unexpected
– Bandwidth estimation
Polly Huang, NTU EE
53
Measurement
• Instrumentation
– 35 sites, 9 countries
– Education, research, provider, company
• 2 runs
– N1: Dec 1994
– N2: Nov-Dec 1995
– 21 sites in common
Polly Huang, NTU EE
54
Measurement Methodology
• Each site running NPD
– A daemon program
– Sender side sends 100KB TCP transfer
• Sender and receiver sides both
– tcpdump the packets
• Noteworthy
– Measurement occurred in Poisson arrival
• Unbiased to time of measurement
– N2 used big max window size
• Prevent window size to limit the TCP connection throughput
Polly Huang, NTU EE
55
Packet Loss
• Overall loss rate:
– N1 2.7%, N2 5.2%
– N2 higher, because of big max window?
• I.e. Pumping more data into the network therefore more loss?
• Big max window in N2 is not a factor
– By separating data and ack loss
– Assumption: ack traffic in a half lower rate
• Won’t stress the network
– Ack loss: N1 2.88%, N2 5.14%
– Data loss: N1 2.65%, N2 5.28%
Polly Huang, NTU EE
56
Quiescent vs. Busy
• Definition
– Quiescent: connections without ack drops
– Busy: otherwise
• About 50% of the connections are quiescent
• For connections are busy
– Loss rate: N1 5.7%, N2 9.2%
Polly Huang, NTU EE
57
More Numbers
• Geographical effect
• Time of the day effect
Polly Huang, NTU EE
58
Towards a Markov Chain Model
• For hours long
– No-loss connection now indicates further no-loss
connection in the future
– Lossy connection now indicates further lossy
connections in the future
• For minutes long
– The rate remains similar
pn
No loss
Polly Huang, NTU EE
pl
1-pn
Loss
1-pl
59
Another Classification
• Data
– Loaded data: packets experiencing queueing delay due
to own connection
– Unloaded data: packets not experiencing queueing
delay due to own connection
– Bottleneck bandwidth measurement is needed here to
determine whether a packet is loaded or not
• Ack
– Simply acks
Polly Huang, NTU EE
60
3 Major Observations
• Although loss rate very high (47%, 65%, 68%), all
connections complete in 10 minutes
• Loss of data and ack not correlated
• Cumulative distribution of per connection loss rate
– Exponential for data
– Not so exponential for ack
– Adaptive sampling contributing to the exponential
observation?
Polly Huang, NTU EE
61
More on the Markov Chain Model
• The loss rate Pu
– The rate of loss
• The conditional loss rate Pc
– The rate of loss when the previous packet is lost
• Contrary to the earlier work
– Losses are busty
– Duration shows pareto upper tail
– (Polly: maybe more log-normal)
Polly Huang, NTU EE
62
You might ask…pl ,pn?
pn
pl
1-pn
No loss
Loss
1-pl
Polly Huang, NTU EE
63
Values for the pl’s
N1
N2
Loaded data
49%
50%
Unloaded data
20%
25%
Ack
25%
31%
Polly Huang, NTU EE
64
Possible Invariant
• Conditional loss rate
• For the value remains relatively close over
the 1 year period
• More up-to-date data to verifying this?
• The loss burst size log normal?
• Both interested research questions
Polly Huang, NTU EE
65
Packet Delay
• Looking at one-way transit times (OTT)
• There’s model for OTT distribution
– Shifted gamma
– Parameters changes with regards to time and
path…
• Internet path are asymmetric
– OTT one way often not equal OTT the other
way
Polly Huang, NTU EE
66
Timing Compression
• Ack compressions are small events
• So not really pose threads on
– Ack clocking
– Rate estimation based control
• Data compression very rare
– For outlier filtering
Polly Huang, NTU EE
67
Queueing Delay
• Variance of OTT over different time scales
– For each time scale 
– Divide the packets arrival into intervals of 
– For all 2 neighboring intervals l, r
• ml the median of OTT in interval l
• mr the median of OTT in interval r
• Calculate (ml-mr)
• Variance of OTT over  is median of all (ml-mr)
Polly Huang, NTU EE
68
Finding the Dominant Scale
• Looking for ’s whose queueing variance
are large
– Where control most needed
• For example, if those ’s re smaller than
RTT
– Then TCP doesn’t need to bother adapting to
queueing fluctuations
Polly Huang, NTU EE
69
Oh Well
• Queueing delay variations occur
– Dominantly on 0.1-1 sec scales
– But non-negligibly on larger scales
Polly Huang, NTU EE
70
Share of Bandwidth
• Pretty much uniformly distributed
Polly Huang, NTU EE
71
Conclusions on Analysis
• Common assumptions violated
–
–
–
–
–
In-order packet delivery
FIFO queueing
Independent loss
Single congestion time scale
Path asymmetry
• Behavior
– Very wide range, not one typical
Polly Huang, NTU EE
72
Conclusions on Design
• Measurement methodology
– TCP-based measurement shown viable
– Sender-side only inferior
• TCP implementation
– Sufficiently conservative
Polly Huang, NTU EE
73
The Pathologies
The strange stuff
Packet Re-Ordering
• Varying widely and too few samples
• Therefore, deriving only a rule of thumb
– The Internet paths sometimes experience bad
reordering
– Mainly due to route flapping
– Occasionally this funny case of router implementation
• Buffering packets while processing a route update
• Sending these packets interleaving with the post-update
arrivals
Polly Huang, NTU EE
75
Orthogonal to TCP SACK
• Receiver end modification
– 20 msec wait before sending duplicate
acknowledgement
– Waiting for re-ordered packets therefore lower false
duplicate acknowledge
– Dup acks should be indication of losses
• Sender end motification
– Fast retransmission after 2 duplicate acknowledgements
– Reactive fast retransmission, higher throughput
Polly Huang, NTU EE
76
Packet Replication
• Very strange, can’t quite explain
– A pair of acks duped 9 times, arriving 32 msec apart
– A data packet duped 23 times, arriving in burst
• False-configured bridge?
• Observation
– Most of these site specific
– But small number of dups spread between other sites
– Senders dup packets too
Polly Huang, NTU EE
77
Packet Corruption
• Checksum good?
• Problem
– The traces contain only the header data
– Pure ack OK, the header = the packet
– Data not OK, the header <> the packet
• Use an corruption inferring algorithm in
tcpanaly
Polly Huang, NTU EE
78
Corruption Rate
• 1 corruption out of 5000 data packets
• 1 corruption out of 300,000 pure acks
• Possible reasons of the difference
–
–
–
–
Header compression
Packet size
Inferring tool discrepancy
Other router/link level implementation artifacts
Polly Huang, NTU EE
79
Implication
• 16-bit checksum no longer sufficient
– A corrupted packet has a one 216th chance to have the
same checksum as the non-corrupted packet
– I.e., one out of the 216 corrupted packet can’t be
detected by the checksum
• Since 1 out of 5000 data packets is corrupted
– 1 out of 5000 * 216 (300 M) packets can’t be identified
as corrupted by the TCP 16-bit checksum
– Consider one Gbps link and packet size 1Kb  1M Pps
– 3 seconds per falsely received corrupted packet
Polly Huang, NTU EE
80
Estimating Bottleneck Bandwidth
• The packet pair technique
– Send 2 packets back to back (or close enough)
• Inter-packet time, T2-T1, very small
– When then go across the bottleneck
• Serving packet 1 while packet 2 will be queued
• Packet 2 immediately follow packet 1
– Packets will be stretched
• Internet-packet time, T2-T1 , now the transmission time of
packet 1
– Estimated bandwidth = (Size of packet 1)/(T2-T1 )
Polly Huang, NTU EE
81
This Won’t Work
• Bottleneck bandwidth higher than sending
rate
• Out-of-order delivery
• Clock resolution
• Changes in the bottleneck bandwidth
• Multi bottlenecks
Polly Huang, NTU EE
82
PBM
• Instead of sending a pair
• Send a bunch
• More robust again the multi bottleneck
problem
Polly Huang, NTU EE
83
Questions?
Identifying Internet Dynamics
Routing Policy
Packet Dynamics
Routing Dynamics
Route Instability
• Internet Routing Instability
• Craig Labovitz, G. Robert Malan, Farnam Jahanian
• ACM/IEEE Transactions on Networking, 6(5):515-528,
October 1998
Polly Huang, NTU EE
86
BGP Specific
• BGP is an important part of the Internet
– Connecting the domains
– Widespread
– Known in prior work that route failure could result in
• Packet loss
• Longer network delay
• Network outage (Time to globally converge to local change)
• A closer look at the BGP dynamics
– How much route updates are sent
– How frequent are they sent
– How useful are these updates
Polly Huang, NTU EE
87
BGP
(In a Slide)
• The routing protocol running among the border
routers
– Path Vector
– Think DV
– Exchange not just next hop, but entire path
• Dynamics
– In case of link/router recovery
• Exchange from the recovering point the route announcements
– In case of link/router down
• Exchange from the closed point the route withdraws
– Route updates
• Including route announcements/withdraws
Polly Huang, NTU EE
88
Data Collection
• Monitoring exchange of route updates
– Over 9 month period
– 5 public exchange points in the core
• Exchange point
– Connecting points of ASs
– Public exchange: of the US government
– Private exchange: of the commercial providers
Polly Huang, NTU EE
89
Terminology
• AS
– You all know
– In the path of the path vector exchanged by BGP
• AS-PATH
• Prefix
– Basically network address
– The source/destination of the route entries in BGP
• 140.119.154/24
• 140.119/16
Polly Huang, NTU EE
90
Classification of Problems
• Forward instability
– Legitimate topological changes affecting paths
• Routing policy fluctuation
– Changes in routing policy but not affecting
forwarding paths
• Pathological updates
– Redundant information not affecting routing
nor forwarding
Polly Huang, NTU EE
91
Forwarding Instability
• WADiff
–
–
–
–
A route is explicitly withdrawn
Replaced with an alternative route
As it becomes unreachable
The alternative route is different in AS-PATH or nexthop
• AADiff
– A route is implicitly withdrawn
– Replaced with an alternative route
– As it becomes unreachable or a preferred alternative
route becomes available
Polly Huang, NTU EE
92
• WADup
In the Middle
– A route is explicitly withdrawn
– Then re-announced as reachable
– Could be
• Pathological
• Forwarding instability: transient topological change
• AADup
– A route is implicitly withdrawn
– Replaced with a duplicate of the original route
• Same AS-PATH and next-hop
– Could be
• Pathological
• Policy fluctuation: differ in other policy attributes
Polly Huang, NTU EE
93
Pathological
• WWDup
– Repeated withdraws for a prefix no longer
reachable
– Pathological
Polly Huang, NTU EE
94
Observations – The Majority
• Pathological updates (redundant)
– Minimum effect on
• Route quality
• Router processing load
– Some not agree
– Adding significant amount of traffic
• 300 updates/second could crash a high-end router
Polly Huang, NTU EE
95
Observation - Instability
• Forwarding instability
– 3-10% WADiff
– 5-20% AADiff
– 10-50% WADup
• Policy fluctuation
– AADup quite high
– But most probably pathological
• Need this
– The Internet routing works become of these necessary
and frequent updates
Polly Huang, NTU EE
96
Observation – Distribution
• No spacial correlation
– Correlates to router implementation instead
• Temporal
– Time the the date effect, date of the week effect
– Therefore correlates to network congestion
• Periodicity
– 30, 60 second period
– For self-sync, mis-configuration, BGP is soft-state
based, etc
Polly Huang, NTU EE
97
Basically, not saying much…
But for the background
And ease of reading
Questions?
What Should You Do?
• Routing policy
– Intra-AS: shortest path
– Inter-AS: shortest path (95%, 84% OK)
– Better model in progress…
• Packet losses
– 2-state markov chain model
• pl: some info
• pn: no info…
• Routing instability: outage time
– The paper #2 of the original paper set (OSPF vs. DV)
Polly Huang, NTU EE
100