Distributed Hash Tables

Download Report

Transcript Distributed Hash Tables

Distributed Hash
Tables
Abdo Achkar
11-22-05,Villanova University
1
Overview
• Intro to Hash tables
• Distributed Hash tables
– IDA encoding
– Chord protocol
– DHash API
2
Hash tables
• Definition:
– Array of pointers to linked lists
– Has a hash function
3
Hash Tables,
The data structure
• Array of pointers to linked lists of a type T where
T is the type of the data structure that contains
both the key and data.
*
*
*
*
*
*
*
Key Data
*
Key Data
*
Key Data
*
Key Data
*
Key Data
T=typeof(<Key,Data>)
4
Hash Tables
The hash function
• Takes some data as input, and returns an integer
based on the data.
• Ex:
– int hash(char* data)
{
int sum = 0;
for (int i=0;i<strlen(data);i++)
sum = (sum + data[i]) % _tableSize;
return sum;
}
5
Benefits of Hash tables
• Seek time of O(1)
• Easy to implement (c++ source)
• Improves the performance
drastically when working with files.
6
Distributed Hash Tables
• Definition: A hash table that is
handled by many nodes in a network.
Node 0
Node 1
Keys
fragment of data
7
Why is DHash important?
•
•
•
•
Load Balance
Decentralization
Scalability
Availability
8
IDA algorithm
• Splits a block of data into f
fragments of size s/k.
• k distinct fragments are sufficient
to reconstruct the original block.
f fragments
9
Choosing values for k and
f
• k and f are selected to optimize for
8192-byte blocks.
• k=7 creates 1170-bytes fragments
that can fit inside a single IP packet
when combined with RPC overhead
• Having k=7, we can have f=14 and still
be able to reconstruct a block
10
Chord protocol
• Implements hash-like look-up
operation that maps 160-bit data
keys to hosts.
• Assigns hosts identifiers from the
same 160-bits space as the keys.
• The space can be viewed as a sorted
by identifier circular linked list.
11
Chord (cont’)
• Each node knows the identity of its
successor (IP, Chord identifier and
synthetic coordinates)
• Updates successor list when a node
– Joins
– Exists
12
Chord API
Function
Description
Get_successo Returns n’s successor list
r_list(n)
Lookup(k,m)
Returns at least m successors
of Key k.
13
HTab API
Function
Put(k,b)
Get(k)
Description
Store block b under the key
k; where k = SHA-1(b)
(SHA-1 is a hash function)
Fetches and returns the
block associated with the key
k.
14
Block Insert: put(Key k,
Block b)
• Void put(k,b)
// place one fragment on each successor
{
frags[] = IDAencode(b);
succs = lookup(k, 14);
for i from 0 to 13
send(succs[i].ipaddr,k,frags[i]);
}
15
•
Block get (k)
Block get (k)
{
// collect fragments from the successors
frags = [];
succs = lookup(k,7); //lookup at least 7 successors
sort_by_latency(succs);
for (i=0;i< succs# && I < 14;i++) {
// download fragment
<ret,data> = download(key,succ[i])
if (ret == OK) frags.push(data);
// decode fragments to recover block
<ret,block> = IDAdecode(frags);
if (ret == OK) return (SHA-1(block) != k) ? FAILURE : block;
if (i == #succs -1) {
newsuccs = get_successor_list(succs[i]);
sort_by_latency(newsuccs);
succs.append(newsuccs)
}
}
return FAILURE;
}
16
Questions?
17
References
• C++ In Action (Bartosz Milewski)
• Robust and Efficient Data Management for a Distributed
Hash Table by Josh Cates (Ms Thesis, MIT)
• Chort: A scalable Peer-to-peer Lookup Service for Internet
Applications (Ion Stoica, Robert Morris, David Karger, M.
Frans Kaashoek, Hari Balakrishnan, MIT)
• Building Peer-to-Peer Systems With Chord, a Distributed
Lookup Service (Frank Dabek, Emma Brunskill, M. Frans
Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari
Balalkrishnan)
• Distributed Hash Tables: Architecture and Implementation
http://www.usenix.org/events/osdi2000/full_papers/gribbl
e/gribble_html/node4.html
18