Transcript Document
Part III: Overlays, peer-to-peer
Jinyang Li
Overlays are everywhere
• Internet is an overlay on top of telephone networks • Overlays: a network on top of Internet – Endpoints (instead of routers) are nodes – Multi-hop paths among routers are links • Instant deployment!
What can overlays do?
• Routing – Improve routing robustness (e.g. convergence speed) – Multicast – Anonymous communication • New applications – Peer-to-peer file sharing and lookup – Content distribution networks – Peer-to-peer live streaming • Your imagination is the limit
Why overlays?
• Internet is ossified – IPv6 proposed in 1992, still not widely deployed – Multicast (1988), QoS (early 90s) etc.
• Avoid burdening routers with new features • End hosts are cheap and capable – Copy and store files – Perform expensive cryptographic operations – Perform expensive coding/decoding operations – …
Today’s class
• Overlays that take over routers’ jobs – Resilient Overlay Networks (RON) – Application-level multicast (NICE)
RON’s motivation
• Internet routing is not reliable Paxson 95-97 Labovitz 97-00 • 3.3% of all routes had serious problems Chandra 01 • 10% of routes available < 95% of the time • 65% of routes available < 99.9% of the time • 3-min minimum detection+recovery time; often 15 mins • 40% of outages took 30+ mins to repair • 5% of faults last more than 2.75 hours
Internet routing is unsatisfactory
• Slow in detecting outage and recovery • Unable to use multiple redundant paths • Unable to detect badly performing paths • Applications have no control of paths Q: Why can’t we fix BGP? Q2: Hasn’t multi-homing already solved the fault tolerance problem?
BGP converges slowly
Given a failure, can take up to 15 minutes to see BGP.
Sometimes, not at all. [Feamster]
RON in a nutshell
A small set of (<100) nodes)
Scalable BGP-based IP routing substrate
• What failures?
– Outages: configuration/software error, broken links – Performance failures: severe congestion, Dos attacks
RON’s goals
• Fast failure detection and recovery – Detect & fail-over within seconds • Applications influence path selection – Applications define failures – Applications define path metrics • Expressive and fine-grained policies – Who and what applications are allowed to use what paths
Why would RON work?
RON testbed study (2003): About 60% of failures within two hops of edge • RON routes around many link “failures” – If exists a node whose paths to S, D doe not contain failed link • RON cannot route around access link failure
RON Design
Nodes in Different ASes
Conduit Forwarder Prober Router
Application-specific routing tables Policy routing module
Performance Database
RON library
Conduit Forwarder Prober Router
Link-state routing protocol, disseminates info using RON!
1 0.8
0.6
0.4
0.2
0 0 0.2
0.4
0.6
0.8
1 30-min avg loss rate with RON
RON loss rate is never more than 30%
RON routes around failures
30-minute average loss rates Loss Rate 10% 20% 30% 50% 80% 100% RON Better 479 127 32 20 14 10 No Change 57 4 0 0 0 0 RON Worse 47 15 0 0 0 0 6,825 “path hours” represented here 5 “path hours” of 100% loss (complete outage) 38 “path hours” of TCP outage (>= 30% loss)
RON routed around all of these!
One indirection hop provides almost all the benefit!
Resilience Against DoS Attacks
1 0.9
0.8
0.7
0.6
0.5
2x increase from RON (109 samples)
Throughput Improvement
0.4
0.3
0.2
1/2x decrease from RON (20 samples) 0.1
bw samples 0 0.1
1 Ratio of RON throughput to direct throughput (logscale) 10 5%
Lessons of RON
• End hosts know better about performance and outages than routers • Internet routing trades off scalability for performance and fast failover • A small amount of redundancy goes a long way
Scalability
RON’s tradeoff
BGP
???
Routing overlays (
e.g.,
RON) Performance (fast convergence etc.) Flexibility (application specific metric & policy)
Open Questions
• Efficiency – generates redundant traffic on access links • Scaling – Probing traffic is O(N^2) – Can a RON be made to scale to > 50 nodes?
– Is a 1000 node RON much better than 50-node?
• Interaction of overlays and IP network • Interaction of multiple overlays
Application level multicast
A.k.a. overlay multicast End host multicast
Why multicast?
• Send the same stream of data to many hosts – Internet radio/TV/conference – Stock quote dissemination – Multiplayer network games • An efficient way to send data to many hosts
Naïve approach is wasteful
• Sender’s outgoing link carries n copies of data – 128Kbps mp3 stream, 10,000 listeners = 1.28Gbps
IP multicast service model
• Mimic LAN broadcast – Anyone can send, everyone hears • Use multicast address – 224.0.0.0 -- 239.255.255.255 (2^28 addresses) – Each address is called a “group” • End hosts register with routers to receive packets
Basic multicast techniques
• Construct trees • Why trees? (why not meshes?) • How many trees?
– Shared vs. source specific trees • Criteria of a “good” tree?
• Who build trees?
– Routers vs. end hosts
IP multicast
• Routers construct multicast trees for packet replication and forwarding • Efficient (low latency, no dup pkts on links)
IP multicast: Augmenting DV
• How to broadcast using DV routing tables without loops?
– Idea : shortest paths from S to all nodes form a tree – RPF protocol : A router duplicates and forwards all packets if they arrive via the shortest path to S
Reverse path flooding (RPF)
a a: a, 0 b: b, 1 c: c, 10 d: c, 11
1
c: c, 1 d: d, 0 d
10 1
b a: a, 1 b: b, 0 c: c, 1 d: c, 2
1
c a: a, 10 b: b, 1 c: c, 0 d: d, 1 • C does not forward packets from A and vice versa • However, link a <--> c sees two packets
Reverse path broadcast (RPB)
• RPF causes every ‘upstream’ routers on a LAN (link) to send a copy • RPB: only one router sends a copy – Routers listen to each others’ DV advertisements – Only the one with lowest hopcount sends
IP multicast: augmenting DV
• Requires symmetric paths • Needs to prune unnecessary broadcast packets to achieve multicast [Deering et. Al. SIGCOMM 1988, TOCS 1990]
IP multicast: augmenting LS
• Basic LS: each router floods with changes in link state • LS w/ multicast: routers monitor local multicast group membership and changes result in flooding • Routers use Dijkstra to compute SP trees – How expensive to compute trees for N nodes, E edges, G groups?
IP multicast has not taken off
• Requires support from routers – Do ISPs have incentives to support multicast?
• Not scalable – Routers keep state for every active group! • Multicast group addresses cannot be aggregated •Group membership changes much more frequently than links going up and down • Difficult to provide congestion/flow control, reliability and security
Overlay multicast
• Multicast code run on end hosts • End hosts can copy&store data • No change to IP infrastructure needed • Easy to implement complex functionalities: flow control, security, layered multicast etc.
• Less efficient: higher delay, duplicate pkts per link
Overlay multicast challenge
• How can hosts form an efficient tree?
– Hosts do know all that routers know • What’s wrong with a random tree?
– Stretch: packets travel farther than have to – Stress: packets traverse links multiple times • A particular concern with access links and cross country links
Bad tree vs good tree
Cluster-based trees (NICE) Reside in 1 cluster Reside in 2 clusters Reside in 3 clusters • A hierarchy of clusters – Cluster consists of [k,3k-1] members – Log N depth
Cluster-based trees (NICE) • Each node knows all members of its cluster(s)
Cluster-based trees
• Cluster nodes according to latency – packets do not travel too far out of the way • Not perfect – Packets are sent to cluster heads (who are in the middle) so might overshoot
NICE in action
• How to join a hierarchy?
– Which is the right cluster?
– How long does join take?
• How to split/merge clusters?
• What if a cluster head fails?
When do clustering not work well?
Cogent MCI MIT Harvard MIT & Harvard peers with each other Boston U • Key assumption: low latency is transitive – As a node descends tree to join, assumes children of close-by cluster head are also close-by
What did you learn today?
Lessons
• Where should a functionality reside? Routers vs. end hosts – End hosts • Scalability vs. Performance • Flexibility •Instant deployment!
– Routers • Efficiency
Project draft report
• You should be able to reuse your draft for the final report • You should have complete related work by now • You should have a complete plan – Most of the system design – Most of the experiment designs • If you have preliminary graphs, use them, try to explain them
The sandwich method for explanation
• An easy example illustrating the basic idea • Detailed explanations of challenges and how your system addresses them • Does it work in general environments?