Building Peer-to-Peer Systems With Chord, a Distributed

Transcript Building Peer-to-Peer Systems With Chord, a Distributed

Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Dr. Yingwu Zhu

A peer-to-peer storage problem • 10000 scattered music enthusiasts • Willing to store and serve replicas • Contributing resources, e.g., storage, bandwidth, etc.

• How do you find the data? • Efficient Lookup mechanism needed!

Key=“title” Value=MP3 data… Publisher The lookup problem N 1 N 2 N 3 N 4 Internet ?

N 5 N 6 Client Lookup(“title”)

Centralized lookup (Napster) SetLoc(“title”, N4) Publisher@ N 4 Key=“title” Value=MP3 data… N 1 N 2 DB N 9 N 6 N 7 N 3 Client Lookup(“title”) N 8 Simple, but O(

) state and a single point of failure Legal issues!

Napster: Publish insert(X, 123.2.21.23) ...

Publish I have X, Y, and Z!

123.2.21.23

123.2.0.18

Napster: Search Fetch search(A) --> 123.2.0.18

Query Reply Where is file A?

Napster • Central Napster server • Can ensure correct results?

• Bottleneck for scalability • Single point of failure • Susceptible to denial of service • Malicious users • Lawsuits, legislation • Search is centralized • File transfer is direct (peer-to-peer)

Flooded queries (Gnutella) N 1 N 2 Publisher@ N 4 Key=“title” Value=MP3 data… N 6 N 9 N 7 N 3 N 8 Lookup(“title”) Client Robust, but worst case O(

) messages per lookup

Gnutella: Query Flooding

Breadth-First Search (BFS)

= source = forward query = processed query = found result = forward response

Gnutella: Query Flooding • A node/peer connects to a set of Gnutella neighbors • Forward queries to neighbors • Client which has the Information responds.

• Flood network with TTL for termination + Results are complete – Bandwidth wastage

Gnutella: Random Walk • Improved over query flooding • • Same overly structure to Gnutella Forward the query to random subset of it neighbors + Reduced bandwidth requirements – Incomplete results – High latency

Peer nodes

Kazza (Fasttrack Networks) • Hybrid of centralized Napster and decentralized Gnutella • Super-peers act as local search hubs • Each super-peer is similar to a Napster server for a small portion of the network • Super-peers are automatically chosen by the system based on their capacities (storage, bandwidth, etc.) and availability (connection time) • Users upload their list of files to a super-peer • Super-peers periodically exchange file lists • You send queries to a super-peer for files of interest • The local super-peer may flood the queries to other super peers for the files of interest, if it cannot satisfy the queries.

• Exploit the heterogeneity of peer nodes

Kazza • Uses supernodes to improve scalability, establish hierarchy • Uptime, bandwidth • Closed-source • Uses HTTP to carry out download • Encrypted protocol; queuing, QoS

KaZaA: Network Design “Super Nodes”

KaZaA: File Insert insert(X, 123.2.21.23) ...

Publish I have X!

123.2.21.23

KaZaA: File Search search(A) --> 123.2.22.50

123.2.22.50

Query Where is file A?

Replies search(A) --> 123.2.0.18

123.2.0.18

Routed queries (Freenet, Chord, etc.) N 1 N 2 Publisher Key=“title” Value=MP3 data… N 4 N 9 N 6 N 7 N 3 N 8 Client Lookup(“title”)

Routing challenges • Define a useful key nearness metric • Keep the hop count small • Keep the tables small • Stay robust despite rapid change (node addition/removal) • Freenet: emphasizes anonymity • Chord: emphasizes efficiency and simplicity

Chord properties • Efficient: O(

log(N)

) messages per lookup • N is the total number of servers • Scalable: O(

log(N)

) state per node • Robust: survives massive failures • Proofs are in paper / tech report • Assuming no malicious participants

Chord overview • Provides peer-to-peer hash lookup: • Lookup(key)  • Mapping: key  IP address IP address • How does Chord route lookups?

• How does Chord maintain routing tables?

Chord IDs • Key identifier = SHA-1(key) • Node identifier = SHA-1(IP address) • Both are uniformly distributed • Both exist in the same ID space • How to map key IDs to node IDs?

Node 105 Consistent hashing [Karger 97] Key 5 K5 N105 K20 Circular 7-bit ID space N32 N90 K80 A key is stored at its successor : node with next higher ID

N105 Basic lookup N120 N10 “Where is key 80?” “N90 has K80” N32 K80 N90 N60

Simple lookup algorithm Lookup(my-id, key-id) n = my successor if my-id < n < key-id call Lookup(id) on node n

// next hop

else return my successor

// done

• Correctness depends only on successors

“Finger table” allows log(N)-time lookups ½ Fast track/ Express lane ¼

1/8 1/16 1/32 1/64 1/128

N80

Finger

points to successor of

n+2 i

112 ¼ N120 ½

1/8 1/16 1/32 1/64 1/128

N80

Lookup with fingers Lookup(my-id, key-id) look in local finger table for highest node n s.t. my-id < n < key-id if n exists call Lookup(id) on node n

// next hop

else return my successor

// done

Lookups take O(

log(N)

) hops N5 N110 N10 N20 K19 N99 N32 Lookup(K19) N80 N60

Joining: linked list insert N25 N36 1. Lookup(36) N40 K30 K38

Join (2) 2. N36 sets its own successor pointer N25 N40 K30 K38 N36

Join (3) 3. Copy keys 26..36

from N40 to N36 N25 N40 K30 K38 N36 K30

Join (4) N25 4. Set N25’s successor pointer N40 K30 K38 N36 K30 Update finger pointers in the background Correct successors produce correct lookups

Failures might cause incorrect lookup N113 N120 N10 N102 Lookup(90) N85 N80 N80 doesn’t know correct successor, so incorrect lookup

Solution: successor lists • Each node knows

immediate successors • After failure, will know first live successor • Correct successors guarantee correct lookups • Guarantee is with some probability

Choosing the successor list length • Assume 1/2 of nodes fail • P(successor list all dead) =

(1/2) r

• Depends on independent failure • P(no broken nodes) =

(1 – (1/2) r ) N

• • I.e. P(this node breaks the Chord ring)

r = 2log(N)

makes prob. =

1 – 1/N

Lookup with fault tolerance Lookup(my-id, key-id) look in local finger table if n exists and successor-list for highest node n s.t. my-id < n < key-id call Lookup(id) on node n

// next hop

if call failed, remove n from finger table return Lookup(my-id, key-id) else return my successor

// done

Chord status • Working implementation as part of CFS • Chord library: 3,000 lines of C++ • Deployed in small Internet testbed • Includes: • Correct concurrent join/fail • Proximity-based routing for low delay • Load control for heterogeneous nodes • Resistance to spoofed node IDs

Experimental overview • Quick lookup in large systems • Low variation in lookup costs • Robust despite massive failure • See paper for more results Experiments confirm theoretical results

Chord lookup cost is O(log N) Constant is 1/2 Number of Nodes

Failure experimental setup • Start 1,000 CFS/Chord servers • Successor list has 20 entries • Wait until they stabilize • Insert 1,000 key/value pairs • Five replicas of each • Stop X% of the servers • Immediately perform 1,000 lookups

Massive failures have little impact

1.4

1.2

1 0.8

0.6

0.4

0.2

(1/2) 6 is 1.6%

5 10 15 20 25 30 35 40 45 50

Failed Nodes (Percent)

Related Work • CAN (Ratnasamy, Francis, Handley, Karp, Shenker) • Pastry (Rowstron, Druschel) • Tapestry (Zhao, Kubiatowicz, Joseph) • Chord emphasizes simplicity

Chord Summary • Chord provides peer-to-peer hash lookup • Efficient: O(

log(n)

) messages per lookup • Robust as nodes fail and join • Good primitive for peer-to-peer systems http://www.pdos.lcs.mit.edu/chord

Reflection on Chord • Strict overlay structure • Strict data placement • If data keys are uniformly distributed, and # of keys >> # of nodes • Load balanced • Each node has O(1/N) fraction of keys • Node addition/deletion only move O(1/N) load, load movement is minimized!

Reflection on Chord • Routing table (successor list + finger table) • Deterministic • Network topology unaware • Routing latency could be a problem • Proximity Neighbor Selection (PNS) • m neighbor candidates, choose min latency • Still O(logN) hops

Reflection on Chord • Predecessor + Successor must be correct, aggressively maintained • Finger tables are lazily maintained • Tradeoff: bandwidth, routing correctness

Reflection on Chord • Assume uniform node distribution • In the wild, nodes are heterogeneous • Load imbalance!

• Virtual servers • A node hosts multiple virtual servers • O(logN)

Join: lazy finger update is OK N2 N25 K30 N36 N40 N2 finger should now point to N36, not N40 Lookup(K30) visits only nodes < 30, will undershoot

CFS: a peer-to-peer storage system • Inspired by Napster, Gnutella, Freenet • Separates publishing from serving • Uses spare disk space, net capacity • Avoids centralized mechanisms • Delete this slide?

• Mention “distributed hash lookup”

CFS architecture move later?

Block storage Availability / replication Authentication Caching Consistency Server selection Keyword search Dhash distributed block store Lookup Chord • Powerful lookup simplifies other mechanisms

Building Peer-to-Peer Systems With Chord, a Distributed

Transcript Building Peer-to-Peer Systems With Chord, a Distributed

Gnutella: Query Flooding

Directory