Transcript Document

Part III: Overlays, peer-to-peer

Jinyang Li

Overlays are everywhere

• Internet is an overlay on top of telephone networks • Overlays: a network on top of Internet – Endpoints (instead of routers) are nodes – Multi-hop paths among routers are links • Instant deployment!

What can overlays do?

• Routing – Improve routing robustness (e.g. convergence speed) – Multicast – Anonymous communication • New applications – Peer-to-peer file sharing and lookup – Content distribution networks – Peer-to-peer live streaming • Your imagination is the limit

Why overlays?

• Internet is ossified – IPv6 proposed in 1992, still not widely deployed – Multicast (1988), QoS (early 90s) etc.

• Avoid burdening routers with new features • End hosts are cheap and capable – Copy and store files – Perform expensive cryptographic operations – Perform expensive coding/decoding operations – …

Today’s class

• Overlays that take over routers’ jobs – Resilient Overlay Networks (RON) – Application-level multicast (NICE)

RON’s motivation

• Internet routing is not reliable Paxson 95-97 Labovitz 97-00 • 3.3% of all routes had serious problems Chandra 01 • 10% of routes available < 95% of the time • 65% of routes available < 99.9% of the time • 3-min minimum detection+recovery time; often 15 mins • 40% of outages took 30+ mins to repair • 5% of faults last more than 2.75 hours

Internet routing is unsatisfactory

• Slow in detecting outage and recovery • Unable to use multiple redundant paths • Unable to detect badly performing paths • Applications have no control of paths Q: Why can’t we fix BGP? Q2: Hasn’t multi-homing already solved the fault tolerance problem?

BGP converges slowly

Given a failure, can take up to 15 minutes to see BGP.

Sometimes, not at all. [Feamster]

RON in a nutshell

A small set of (<100) nodes)

Scalable BGP-based IP routing substrate

• What failures?

– Outages: configuration/software error, broken links – Performance failures: severe congestion, Dos attacks

RON’s goals

• Fast failure detection and recovery – Detect & fail-over within seconds • Applications influence path selection – Applications define failures – Applications define path metrics • Expressive and fine-grained policies – Who and what applications are allowed to use what paths

Why would RON work?

RON testbed study (2003): About 60% of failures within two hops of edge • RON routes around many link “failures” – If exists a node whose paths to S, D doe not contain failed link • RON cannot route around access link failure

RON Design

Nodes in Different ASes

Conduit Forwarder Prober Router

Application-specific routing tables Policy routing module

Performance Database

RON library

Conduit Forwarder Prober Router

Link-state routing protocol, disseminates info using RON!

1 0.8

0.6

0.4

0.2

0 0 0.2

0.4

0.6

0.8

1 30-min avg loss rate with RON

RON loss rate is never more than 30%

RON routes around failures

30-minute average loss rates Loss Rate 10% 20% 30% 50% 80% 100% RON Better 479 127 32 20 14 10 No Change 57 4 0 0 0 0 RON Worse 47 15 0 0 0 0 6,825 “path hours” represented here 5 “path hours” of 100% loss (complete outage) 38 “path hours” of TCP outage (>= 30% loss)

RON routed around all of these!

One indirection hop provides almost all the benefit!

Resilience Against DoS Attacks

1 0.9

0.8

0.7

0.6

0.5

2x increase from RON (109 samples)

Throughput Improvement

0.4

0.3

0.2

1/2x decrease from RON (20 samples) 0.1

bw samples 0 0.1

1 Ratio of RON throughput to direct throughput (logscale) 10 5%

Lessons of RON

• End hosts know better about performance and outages than routers • Internet routing trades off scalability for performance and fast failover • A small amount of redundancy goes a long way

Scalability

RON’s tradeoff

BGP

???

Routing overlays (

e.g.,

RON) Performance (fast convergence etc.) Flexibility (application specific metric & policy)

Open Questions

• Efficiency – generates redundant traffic on access links • Scaling – Probing traffic is O(N^2) – Can a RON be made to scale to > 50 nodes?

– Is a 1000 node RON much better than 50-node?

• Interaction of overlays and IP network • Interaction of multiple overlays

Application level multicast

A.k.a. overlay multicast End host multicast

Why multicast?

• Send the same stream of data to many hosts – Internet radio/TV/conference – Stock quote dissemination – Multiplayer network games • An efficient way to send data to many hosts

Naïve approach is wasteful

• Sender’s outgoing link carries n copies of data – 128Kbps mp3 stream, 10,000 listeners = 1.28Gbps

IP multicast service model

• Mimic LAN broadcast – Anyone can send, everyone hears • Use multicast address – 224.0.0.0 -- 239.255.255.255 (2^28 addresses) – Each address is called a “group” • End hosts register with routers to receive packets

Basic multicast techniques

• Construct trees • Why trees? (why not meshes?) • How many trees?

– Shared vs. source specific trees • Criteria of a “good” tree?

• Who build trees?

– Routers vs. end hosts

IP multicast

• Routers construct multicast trees for packet replication and forwarding • Efficient (low latency, no dup pkts on links)

IP multicast: Augmenting DV

• How to broadcast using DV routing tables without loops?

– Idea : shortest paths from S to all nodes form a tree – RPF protocol : A router duplicates and forwards all packets if they arrive via the shortest path to S

Reverse path flooding (RPF)

a a: a, 0 b: b, 1 c: c, 10 d: c, 11

1

c: c, 1 d: d, 0 d

10 1

b a: a, 1 b: b, 0 c: c, 1 d: c, 2

1

c a: a, 10 b: b, 1 c: c, 0 d: d, 1 • C does not forward packets from A and vice versa • However, link a <--> c sees two packets

Reverse path broadcast (RPB)

• RPF causes every ‘upstream’ routers on a LAN (link) to send a copy • RPB: only one router sends a copy – Routers listen to each others’ DV advertisements – Only the one with lowest hopcount sends

IP multicast: augmenting DV

• Requires symmetric paths • Needs to prune unnecessary broadcast packets to achieve multicast [Deering et. Al. SIGCOMM 1988, TOCS 1990]

IP multicast: augmenting LS

• Basic LS: each router floods with changes in link state • LS w/ multicast: routers monitor local multicast group membership and changes result in flooding • Routers use Dijkstra to compute SP trees – How expensive to compute trees for N nodes, E edges, G groups?

IP multicast has not taken off

• Requires support from routers – Do ISPs have incentives to support multicast?

• Not scalable – Routers keep state for every active group! • Multicast group addresses cannot be aggregated •Group membership changes much more frequently than links going up and down • Difficult to provide congestion/flow control, reliability and security

Overlay multicast

• Multicast code run on end hosts • End hosts can copy&store data • No change to IP infrastructure needed • Easy to implement complex functionalities: flow control, security, layered multicast etc.

• Less efficient: higher delay, duplicate pkts per link

Overlay multicast challenge

• How can hosts form an efficient tree?

– Hosts do know all that routers know • What’s wrong with a random tree?

– Stretch: packets travel farther than have to – Stress: packets traverse links multiple times • A particular concern with access links and cross country links

Bad tree vs good tree

Cluster-based trees (NICE) Reside in 1 cluster Reside in 2 clusters Reside in 3 clusters • A hierarchy of clusters – Cluster consists of [k,3k-1] members – Log N depth

Cluster-based trees (NICE) • Each node knows all members of its cluster(s)

Cluster-based trees

• Cluster nodes according to latency – packets do not travel too far out of the way • Not perfect – Packets are sent to cluster heads (who are in the middle) so might overshoot

NICE in action

• How to join a hierarchy?

– Which is the right cluster?

– How long does join take?

• How to split/merge clusters?

• What if a cluster head fails?

When do clustering not work well?

Cogent MCI MIT Harvard MIT & Harvard peers with each other Boston U • Key assumption: low latency is transitive – As a node descends tree to join, assumes children of close-by cluster head are also close-by

What did you learn today?

Lessons

• Where should a functionality reside? Routers vs. end hosts – End hosts • Scalability vs. Performance • Flexibility •Instant deployment!

– Routers • Efficiency

Project draft report

• You should be able to reuse your draft for the final report • You should have complete related work by now • You should have a complete plan – Most of the system design – Most of the experiment designs • If you have preliminary graphs, use them, try to explain them

The sandwich method for explanation

• An easy example illustrating the basic idea • Detailed explanations of challenges and how your system addresses them • Does it work in general environments?