File Sharing : Hash/Lookup Yossi Shasho (HW in last slide) • Based on Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications •Partially based.
Download ReportTranscript File Sharing : Hash/Lookup Yossi Shasho (HW in last slide) • Based on Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications •Partially based.
File Sharing : Hash/Lookup Yossi Shasho (HW in last slide) • Based on Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications •Partially based on The Impact of DHT Routing Geometry on Resilience and Proximity • Partially based on Building a Low-latency, Proximity-aware DHT-Based P2P Network http://www.computer.org/portal/web/csdl/doi/10.1109/KSE.2009.49 • Some slides liberally borrowed from: • Carnegie Melon Peer-2-Peer 15-411 • Petar Maymounkov and David Mazières’ Kademlia Talk, New York University 1 Peer-2-Peer – Distributed systems without any centralized control or hierarchical organization. – Long list of applications: • • • • Redundant storage Permanence Selection of nearby servers Anonymity, search, authentication, hierarchical naming and more – Core operation in most p2p systems is efficient location of data items 2 Outline 3. Chord: a DHT scheme • • • • 1. The Problem 2. Distributed hash tables (DHT) 3 Geometry Lookup Node Joins Performance 4. Extras 3. Chord: a DHT scheme • Geometry • Lookup • Node Joins • Performance 1. The Problem 2. Distributed hash tables (DHT) 4 4. Extras Think Big • /home/google/ • One namespace, thousands of servers – Map each key (=filename) to a value (=server) – Hash table? Think again • • • • • 5 What if a new server joins? server fails? How to keep track of all servers? What about redundancy? And proximity? Not scalable, Centralized, Fault intolerant Lots of new problems to come up… 3. Chord: a DHT scheme • Geometry • Lookup • Node Joins • Performance 1. The Problem 2. Distributed hash tables (DHT) 6 4. Extras DHT: Overview • Abstraction: a distributed “hash-table” (DHT) data structure: – put(id, item); – item = get(id); • Scalable, Decentralized, Fault Tolerant • Implementation: nodes in system form a distributed data structure – Can be Ring, Tree, Hypercube, Skip List, Butterfly Network, ... 7 DHT: Overview (2) • Many DHTs: 8 DHT: Overview (3) • Good properties: – Distributed construction/maintenance – Load-balanced with uniform identifiers – O(log n) hops / neighbors per node – Provides underlying network proximity 9 Consistent Hashing • When adding rows (servers) to hash-table, we don’t want all keys to change their mappings • When adding the Nth row, we want ~1/N of the keys to change their mappings. • Is this achievable? Yes. 10 3. Chord: a DHT scheme • • • • 1. The Problem 2. Distributed hash tables (DHT) 11 Geometry Lookup Node Joins Performance 4. Extras 12 13 Chord: Overview • • • • • 14 Just one operation: item = get(id) Each node needs routing info about few other nodes O(logN) for lookup, O(log2N) for join/leave Simple, provable correctness, provable performance Apps built on top of it do the rest Chord: Geometry • Identifier space [1,N], example: binary strings • Keys (filenames) and values (server IPs) on the same identifier space • Keys & values evenly distributed • Now, put this identifier space on a circle • Consistent Hashing: A key is stored at its successor. 15 Chord: Geometry (2) • A key is stored at its successor: node with next higher ID Node 105 • Get(5)=32 • Get(20)=32 • Get(80)=90 • Who maps to 105? Nobody. Key 5 N105 K20 Circular ID space N90 K80 16 K5 N32 Chord: Back to Consistent Hashing • “When adding the Nth row, we want ~1/N of the keys to change their mappings.” (The problem, a few slides back) Node 105 Key 5 K5 N15 • Get(5)=3215 Get(5)=32 • Get(20)=32 • Get(80)=90 • Who maps to 105? Nobody. N105 K20 Circular ID space N32 N90 N50 K80 17 3. Chord: a DHT scheme • • • • 1. The Problem 2. Distributed hash tables (DHT) 18 Geometry Lookup Node Joins Performance 4. Extras Chord: Basic Lookup get(k): If (I have k) Return “ME” Else P next node Return P.get(k) N120 N10 “Where is key 80?” N105 “N90 has K80” • Each node remembers only next node K80 N90 • O(N) lookup time – no good! K80 N60 19 N32 Chord: “Finger Table” • Previous lookup was O(N). We want O(logN) 1/4 1/2 1/8 Finger Table i 0 1 2 i id+2 succ 0 80+2 = 81 __ 82+21 = 82 __ 84+22 = 84 __ 1/16 1/32 1/64 1/128 N80 • Entry i in the finger table of node n is the first node n’ such that n’ ≥ n + 2i • In other words, the ith finger of n points 1/2n-i way around the ring 20 get(k): Chord: “Finger Table” Lookups If (I have k) Return “ME” Else P next node Closest finger i ≤ k Return P.get(k) Finger Table i 0 1 2 i id+2 succ 0 80+2 = 81 __ 82+21 = 82 __ 84+22 = 84 __ 1/4 1/2 1/8 1/16 1/32 1/64 1/128 N80 • Entry i in the finger table of node n is the first node n’ such that n’ ≥ n + 2i • In other words, the ith finger of n points 1/2n-i way around the ring 21 get(k): Chord: “Finger Table” Lookups If (I have k) Return “ME” Else P Closest finger i ≤ k Return P.get(k) N90 N2 N9 Finger Table N81 Finger Table i id+2i 0 65+20 = 66 1 65+21 = 67 6 65+26 = 29 22 succ N74 N74 N19 N19 N74 “40!” N65 N49 “Where is key 40?” K40 N31 K40 i id+2i 0 20 1 21 4 35 succ N31 N31 N49 Chord: Example • Assume an identifier space [0..8] • Node n1 joins • Responsible for all keys Succ. Table i id+2i succ 0 1+20 = 2 1 1 1 1+2 = 3 1 2 2 1+2 = 5 1 0 1 7 • (Succ == successor) 6 2 5 3 4 23 Chord: Example • Node n2 joins Succ. Table i id+2i succ 0 2 12 1 3 1 2 5 1 0 1 7 6 2 Succ. Table 5 3 4 24 i id+2i succ 0 3 1 1 4 1 2 6 1 Chord: Example Succ. Table i id+2i succ 0 1 1 1 2 2 2 4 0 • Node n0, n6 join Succ. Table i id+2i succ 0 2 12 1 3 16 2 5 16 0 1 7 Succ. Table i id+2i succ 0 7 0 1 0 0 2 2 2 6 2 Succ. Table 5 3 4 25 i id+2i succ 0 3 16 1 4 16 2 6 16 Chord: Example Succ. Table i i id+2 succ 0 1 1 1 2 2 2 4 0 • Nodes: n1, n2, n0, n6 • Items: 1,7 Succ. Table Items i id+2i succ 1 0 2 12 1 3 16 2 5 16 0 1 7 Succ. Table i id+2i succ 0 7 0 1 0 0 2 2 2 6 2 Succ. Table 5 3 4 26 Items 7 i id+2i succ 0 3 16 1 4 16 2 6 16 Chord: Routing Upon receiving a query for item id, a node: Succ. Table i i id+2 succ 0 1 1 1 2 2 2 4 0 1. Checks if it stores the item locally 2. If not, forwards query to largest node i in its finger table such that i ≤ id Succ. Table Items i id+2i succ 1 0 2 2 1 3 6 2 5 6 0 1 7 query(7) Succ. Table i id+2i succ 0 7 0 1 0 0 2 2 2 6 2 Succ. Table 5 3 4 27 Items 7 i id+2i succ 0 3 6 1 4 6 2 6 6 3. Chord: a DHT scheme • • • • 1. The Problem 2. Distributed hash tables (DHT) 28 Geometry Lookup Node Joins Performance 4. Extras Chord: Node Join Node n joins: Need one existing node - n', in hand 1. Initialize fingers of n – Ask n' to look them up (logN fingers to init) 2. Update fingers of the rest – Few nodes need to be updated – Look them up and tell them n is new in town 3. Transfer keys 29 Chord: Improvements • Every 30s, ask successor for its predecessor – Fix your own successor based on this • Also, pick and verify a random finger – Rebuild finger table entries this way • keep successor list of r successors – Deal with unexpected node failures – Can use these to replicate data 30 3. Chord: a DHT scheme • • • • 1. The Problem 2. Distributed hash tables (DHT) 31 Geometry Lookup Node Joins Performance 4. Extras Chord: Performance • Routing table size? –Log N fingers • Routing time? –Each hop expects to half the distance to the desired id => expect O(log N) hops. • Node joins –Query for the fingers => O(log N) –Update other nodes’ fingers => O(log2 N) 32 Chord: Performance (2) • Real time: Lookup time / #nodes 33 Chord: Performance (3) • Comparing to other DHTs 34 Chord: Performance (4) • Promises few O(logN) hops on the overlay – But, on the physical network, this can be quite far A Chord network with N(=8) nodes and m(=8)-bit key space f 35 3. Chord: a DHT scheme • Geometry • Lookup • Node Joins • Performance 1. The Problem 2. Distributed hash tables (DHT) 36 4. Extras Applications employing DHTs • eMule (KAD implements Kademlia - a DHT) • A anonymous network (≥ 2 mil downloads to day) • BitTorrent (≥ 4.1.2 beta) – Trackerless BitTorrent, allows anonymity (thank god) 1. Clients A & B handshake 2. A: “I have DHT, its on port X” 3. B: ping port X of A 4. B gets a reply => start adjusting - nodes, rows… 37 Kademlia (KAD) • Distance between A and B is A XOR B • Nodes are treated as leafs in binary tree • Node’s position in A’s tree is determined by the longest postfix it shares with A – A’s ID: 010010101 – B’s ID: 101000101 38 Kademlia: Postfix Tree • Node’s position in A’s tree is determined by the longest postfix it shares with A (=> logN subtrees) Space of 160-bit numbers 11…11 1 1 1 1 1 0 0 1 0 1 0 0 1 0 0 1 0 1 0 1 0 0 1 No common prefix 39 00…00 0 0 1 0 1 0 common prefix: 0 1 0 1 common prefix: 00 0 common prefix: 001 Node / Peer Our node Kademlia: Lookup •Consider a query for ID 111010… initiated by node 0011100… 11…11 00…00 1 1 1 1 40 0 1 0 0 1 0 1 0 0 1 0 0 1 1 0 1 0 0 1 0 0 1 0 0 1 1 0 1 0 Node / Peer Our node Kademlia: K-Buckets 11…11 00…00 1 1 1 1 1 0 0 1 0 1 0 0 1 0 0 1 1 0 1 0 0 0 1 0 0 1 0 0 1 1 0 1 `0 Consider routing table for a node with prefix 0011 Its binary tree is divided into a series of subtrees The routing table is composed of a k-bucket s corresponding to each of these subtrees Consider a 2-bucket example, each bucket will have atleast 2 contacts for its key range A contact consist of <IP:Port, NodeID> Node / Peer Our node Summary 3. Chord: a DHT scheme • • • • 1. The Problem 2. Distributed hash tables (DHT) 42 Geometry Lookup Node Joins Performance 4. Extras Homework • Load balance is achieved when all Servers in the Chord network are responsible for (roughly) the same amount of keys • Still, with some probability, one server can be responsible for significantly more keys • How can we lower the upper bound to the number of keys assigned to a server? • Hint: Simulation 43