File Sharing : Hash/Lookup Yossi Shasho (HW in last slide) • Based on Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications •Partially based.

Transcript File Sharing : Hash/Lookup Yossi Shasho (HW in last slide) • Based on Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications •Partially based.

File Sharing : Hash/Lookup
Yossi Shasho
(HW in last slide)
• Based on Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications
•Partially based on The Impact of DHT Routing Geometry on Resilience and Proximity
• Partially based on Building a Low-latency, Proximity-aware DHT-Based P2P Network
http://www.computer.org/portal/web/csdl/doi/10.1109/KSE.2009.49
• Some slides liberally borrowed from:
• Carnegie Melon Peer-2-Peer 15-411
• Petar Maymounkov and David Mazières’ Kademlia Talk, New York University
1
Peer-2-Peer
– Distributed systems without any centralized
control or hierarchical organization.
– Long list of applications:
•
•
•
•
Redundant storage
Permanence
Selection of nearby servers
Anonymity, search, authentication, hierarchical naming
and more
– Core operation in most p2p systems is
efficient location of data items
2
Outline
3. Chord: a
DHT scheme
•
•
•
•
1. The
Problem
2.
Distributed
hash tables
(DHT)
3
Geometry
Lookup
Node Joins
Performance
4. Extras
3. Chord: a
DHT
scheme
• Geometry
• Lookup
• Node Joins
• Performance
1. The Problem
2.
Distributed
hash tables
(DHT)
4
4. Extras
Think Big
• /home/google/
• One namespace, thousands of servers
– Map each key (=filename) to a value (=server)
– Hash table? Think again
•
•
•
•
•
5
What if a new server joins? server fails?
How to keep track of all servers?
What about redundancy? And proximity?
Not scalable, Centralized, Fault intolerant
Lots of new problems to come up…
3. Chord: a
DHT
scheme
• Geometry
• Lookup
• Node Joins
• Performance
1. The
Problem
2. Distributed
hash tables
(DHT)
6
4. Extras
DHT: Overview
• Abstraction: a distributed “hash-table” (DHT)
data structure:
– put(id, item);
– item = get(id);
• Scalable, Decentralized, Fault Tolerant
• Implementation: nodes in system form a
distributed data structure
– Can be Ring, Tree, Hypercube, Skip List, Butterfly
Network, ...
7
DHT: Overview (2)
• Many DHTs:
8
DHT: Overview (3)
• Good properties:
– Distributed construction/maintenance
– Load-balanced with uniform identifiers
– O(log n) hops / neighbors per node
– Provides underlying network proximity
9
Consistent Hashing
• When adding rows (servers) to hash-table, we
don’t want all keys to change their mappings
• When adding the Nth row, we want ~1/N of
the keys to change their mappings.
• Is this achievable? Yes.
10
3. Chord: a
DHT scheme
•
•
•
•
1. The
Problem
2.
Distributed
hash tables
(DHT)
11
Geometry
Lookup
Node Joins
Performance
4. Extras
12
13
Chord: Overview
•
•
•
•
•
14
Just one operation: item = get(id)
Each node needs routing info about few other nodes
O(logN) for lookup, O(log2N) for join/leave
Simple, provable correctness, provable performance
Apps built on top of it do the rest
Chord: Geometry
• Identifier space [1,N], example: binary strings
• Keys (filenames) and values (server IPs)
on the same identifier space
• Keys & values evenly distributed
• Now, put this identifier space on a circle
• Consistent Hashing:
A key is stored at its successor.
15
Chord: Geometry (2)
• A key is stored at its successor: node with next higher ID
Node 105
• Get(5)=32
• Get(20)=32
• Get(80)=90
• Who maps to 105?
Nobody.
Key 5
N105
K20
Circular ID space
N90
K80
16
K5
N32
Chord: Back to Consistent Hashing
• “When adding the Nth row, we want ~1/N of the keys to
change their mappings.” (The problem, a few slides back)
Node 105
Key 5
K5
N15
• Get(5)=3215
Get(5)=32
• Get(20)=32
• Get(80)=90
• Who maps to 105?
Nobody.
N105
K20
Circular ID space
N32
N90
N50
K80
17
3. Chord: a
DHT scheme
•
•
•
•
1. The
Problem
2.
Distributed
hash tables
(DHT)
18
Geometry
Lookup
Node Joins
Performance
4. Extras
Chord: Basic Lookup
get(k):
If (I have k)
Return “ME”
Else
P next node
Return P.get(k)
N120
N10
“Where is key 80?”
N105
“N90 has K80”
• Each node
remembers only
next node
K80 N90
• O(N) lookup
time – no good!
K80
N60
19
N32
Chord: “Finger Table”
• Previous lookup was O(N). We want O(logN)
1/4
1/2
1/8
Finger Table
i
0
1
2
i
id+2
succ
0
80+2 = 81 __
82+21 = 82 __
84+22 = 84 __
1/16
1/32
1/64
1/128
N80
• Entry i in the finger table of node n is the first node n’ such that n’ ≥ n + 2i
• In other words, the ith finger of n points 1/2n-i way around the ring
20
get(k):
Chord: “Finger Table” Lookups
If (I have k)
Return “ME”
Else
P  next node Closest finger i ≤ k
Return P.get(k)
Finger Table
i
0
1
2
i
id+2
succ
0
80+2 = 81 __
82+21 = 82 __
84+22 = 84 __
1/4
1/2
1/8
1/16
1/32
1/64
1/128
N80
• Entry i in the finger table of node n is the first node n’ such that n’ ≥ n + 2i
• In other words, the ith finger of n points 1/2n-i way around the ring
21
get(k):
Chord: “Finger Table” Lookups
If (I have k)
Return “ME”
Else
P  Closest finger i ≤ k
Return P.get(k)
N90
N2
N9
Finger Table
N81
Finger Table
i id+2i
0 65+20 = 66
1 65+21 = 67
6 65+26 = 29
22
succ
N74
N74
N19
N19
N74
“40!”
N65
N49
“Where is key 40?”
K40
N31
K40
i id+2i
0 20
1 21
4 35
succ
N31
N31
N49
Chord: Example
• Assume an identifier space [0..8]
• Node n1 joins
• Responsible for all keys
Succ. Table
i id+2i
succ
0 1+20 = 2
1
1
1 1+2 = 3
1
2
2 1+2 = 5
1
0
1
7
• (Succ == successor)
6
2
5
3
4
23
Chord: Example
• Node n2 joins
Succ. Table
i id+2i succ
0 2
12
1 3
1
2 5
1
0
1
7
6
2
Succ. Table
5
3
4
24
i id+2i succ
0 3
1
1 4
1
2 6
1
Chord: Example
Succ. Table
i id+2i succ
0 1
1
1 2
2
2 4
0
• Node n0, n6 join
Succ. Table
i id+2i succ
0 2
12
1 3
16
2 5
16
0
1
7
Succ. Table
i id+2i succ
0 7
0
1 0
0
2 2
2
6
2
Succ. Table
5
3
4
25
i id+2i succ
0 3
16
1 4
16
2 6
16
Chord: Example
Succ. Table
i
i id+2 succ
0 1
1
1 2
2
2 4
0
• Nodes: n1, n2, n0, n6
• Items: 1,7
Succ. Table Items
i id+2i succ 1
0 2
12
1 3
16
2 5
16
0
1
7
Succ. Table
i id+2i succ
0 7
0
1 0
0
2 2
2
6
2
Succ. Table
5
3
4
26
Items
7
i id+2i succ
0 3
16
1 4
16
2 6
16
Chord: Routing
Upon receiving a query for item id, a node:
Succ. Table
i
i id+2 succ
0 1
1
1 2
2
2 4
0
1. Checks if it stores the item locally
2. If not, forwards query to
largest node i in its finger
table such that i ≤ id
Succ. Table Items
i id+2i succ 1
0 2
2
1 3
6
2 5
6
0
1
7
query(7)
Succ. Table
i id+2i succ
0 7
0
1 0
0
2 2
2
6
2
Succ. Table
5
3
4
27
Items
7
i id+2i succ
0 3
6
1 4
6
2 6
6
3. Chord: a
DHT scheme
•
•
•
•
1. The
Problem
2.
Distributed
hash tables
(DHT)
28
Geometry
Lookup
Node Joins
Performance
4. Extras
Chord: Node Join
Node n joins:
Need one existing node - n', in hand
1. Initialize fingers of n
– Ask n' to look them up (logN fingers to init)
2. Update fingers of the rest
– Few nodes need to be updated
– Look them up and tell them n is new in town
3. Transfer keys
29
Chord: Improvements
• Every 30s, ask successor for its predecessor
– Fix your own successor based on this
• Also, pick and verify a random finger
– Rebuild finger table entries this way
• keep successor list of r successors
– Deal with unexpected node failures
– Can use these to replicate data
30
3. Chord: a
DHT scheme
•
•
•
•
1. The
Problem
2.
Distributed
hash tables
(DHT)
31
Geometry
Lookup
Node Joins
Performance
4. Extras
Chord: Performance
• Routing table size?
–Log N fingers
• Routing time?
–Each hop expects to half the distance to the
desired id => expect O(log N) hops.
• Node joins
–Query for the fingers => O(log N)
–Update other nodes’ fingers => O(log2 N)
32
Chord: Performance (2)
• Real time: Lookup time / #nodes
33
Chord: Performance (3)
• Comparing to other DHTs
34
Chord: Performance (4)
• Promises few O(logN) hops on the overlay
– But, on the physical network, this can be quite far
A Chord network with N(=8) nodes and m(=8)-bit key space
f
35
3. Chord: a
DHT
scheme
• Geometry
• Lookup
• Node Joins
• Performance
1. The
Problem
2.
Distributed
hash tables
(DHT)
36
4. Extras
Applications employing DHTs
• eMule (KAD implements Kademlia - a DHT)
•
A anonymous network (≥ 2 mil downloads to day)
• BitTorrent (≥ 4.1.2 beta)
– Trackerless BitTorrent, allows anonymity (thank god)
1. Clients A & B handshake
2. A: “I have DHT, its on port X”
3. B: ping port X of A
4. B gets a reply => start adjusting - nodes, rows…
37
Kademlia (KAD)
• Distance between A and B is A XOR B
• Nodes are treated as leafs in binary tree
• Node’s position in A’s tree is determined by the
longest postfix it shares with A
– A’s ID: 010010101
– B’s ID: 101000101
38
Kademlia: Postfix Tree
• Node’s position in A’s tree is determined by the longest
postfix it shares with A (=> logN subtrees)
Space of 160-bit numbers
11…11
1
1
1
1
1
0
0
1
0
1
0
0
1
0
0
1
0
1
0
1
0
0
1
No common prefix
39
00…00
0
0 1
0
1
0
common
prefix: 0
1
0
1
common
prefix: 00
0
common
prefix: 001
Node / Peer
Our node
Kademlia: Lookup
•Consider a query for ID 111010… initiated by node
0011100…
11…11
00…00
1
1
1
1
40
0
1
0
0
1
0
1
0
0
1
0
0
1
1
0
1
0
0
1
0
0 1
0
0
1
1
0
1
0
Node / Peer
Our node
Kademlia: K-Buckets
11…11
00…00
1
1
1
1
1
0
0
1
0
1
0
0
1
0
0
1
1
0
1
0
0
0
1
0
0 1
0
0
1
1
0
1
`0
Consider routing table for a node with prefix 0011
Its binary tree is divided into a series of subtrees
The routing table is composed of a k-bucket s corresponding to each of these subtrees
Consider a 2-bucket example, each bucket will have atleast 2 contacts for its key range
A contact consist of <IP:Port, NodeID>
Node / Peer
Our node
Summary
3. Chord: a
DHT scheme
•
•
•
•
1. The
Problem
2.
Distributed
hash tables
(DHT)
42
Geometry
Lookup
Node Joins
Performance
4. Extras
Homework
• Load balance is achieved when all Servers in
the Chord network are responsible for
(roughly) the same amount of keys
• Still, with some probability, one server can be
responsible for significantly more keys
• How can we lower the upper bound to the
number of keys assigned to a server?
• Hint: Simulation
43

File Sharing : Hash/Lookup Yossi Shasho (HW in last slide) • Based on Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications •Partially based.

Transcript File Sharing : Hash/Lookup Yossi Shasho (HW in last slide) • Based on Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications •Partially based.

Directory