Transcript Fast FAST

Kaleidoscope – Adding
Colors to Kademlia
Gil Einziger, Roy Friedman, Eyal Kibbar
Computer Science,
Technion
1
Kademlia Overview
Kademlia is nowadays implemented in many popular
file sharing applications like Bit-torrent, Gnutella,
and eMule.
Applications over Kademlia have 100’s of millions users
worldwide.
Invented in 2002 by Petar Maymounkov and David
Mazieres.
2
Kademlia is good
Kademlia has a number of desirable features not
simultaneously offered by any previous DHT.
– It minimizes the number of configuration messages
nodes must send to learn about each other. Easy to
maintain
– Configuration information spreads automatically as a
side-effect of key lookup. Easy to maintain
– Nodes have enough knowledge and flexibility to route
queries through low-latency paths. Fast Log(N) lookups
– Kademlia uses parallel, asynchronous queries to avoid
timeout delays from failed nodes.
Fault tolerant
The “Problem” – one of the key
advantages of Kademlia.
3
Many ways to reach the
same value…
K possible peers to make
the first step.
The first peer returns k
other peers that are
closer to the value.
Each one of these peers
returns other closer
peers
And so on…
Until finally we reach
the k-closest nodes.
These nodes store the
actual value!
4
All roads lead to Rome…
Many users that
love Fry…
I can’t help you
all, I am just a
laptop!
Popular content
Many possible
routing paths…
But all of them lead to
the same k closest peers.
5
Caching to the rescue!
Motivation: If a value is popular, we should be able
to hit a cached copy before reaching the k-closest
nodes.
Local Cache – After searching an item, cache it
locally. (Guangmin, 2009).
KadCache – After searching an item, send it to the
last peer along the path.
A suggestion of Kademlia authors that was not
evaluated until now.
KC
LC
6
The three rules of
Kaleidoscope“Everything has a color” – We assign each key an
additional secondary key called color. This key is generated by
hashing the Kademlia key. It has a small domain (e.g. 17).
“Peers only perform lookups for keys of their own
color” – if a node wants to find a (key, value) pair of a
different color it has to first forward the request to a
correctly colored node.
“Only the peer that performed the lookup cache
the lookup” – More efficient use of distributed cache
content.
7
Kaleidoscope
Step 1: forward request to a correctly colored
peer.
Step 2: Iterative lookup that favors contacting
correctly colored peers.
Step 3: Cache the result, and forward it back to
the initiator.
8
Forward the request
(along the Kademlia lookup path)
Looking for a
value
Forward the request to
one of the peers in your
appropriate k-bucket.
If there is a
peer,
favor contacting that
peer.
If the receiving node is
not
it will continue
forwarding.
The forward phase ends
when we reach a
peer.
(or if we cannot advance
in the XOR metric)
9
Break symmetry using
Looking for a
the
color:
value
K possibilities, but we
favor
peers.
We keep picking
peers as the next
iterative step.
If there is no
we can
still pick any of the
peers.
We continue the lookup
until we find the value.
The value can be found
either at the k-closest
peers or in
Peers.
10
Forward the result
Looking
Thanks for
!!!! a
backwards.
value
The value is cached at the
node that performed the
iterative lookup.
Items are only cached on
nodes of matching color.
Making each node an
‘expert’ for its own color.
Also, cache content do not
violate users privacy.
11
More Colors = better cache hit rate:
1. Higher cache capacity– items are only
stored in correctly colored peers.
2. Higher cache hit rate (Mathematically
analyzed in the paper).
However:
1. It takes longer to reach a correctly
colored peer.
2. We encounter less correctly colored
peers during the lookup.
12
Numerical example
Lets assume that our value was requested before
by 10% of the nodes, and that our caches are
infinite.
How likely are we to hit a cached value ?
In local cache, all nodes are symmetric.
10%
10%
10%
10%
10%
10%
10%
10%
In Kaleidoscope, we can only hit when contacting peers.
However, when we do contact them the likelihood is
increased.
57%
0%
0%
0%
0%
0%
0%
0%
Comparative results
• Emulation
– We run the actual implementation, sending and
receiving actual UDP packets. (Only the user is simulated)
• Scale - Different network sizes up to 2,500 Kademlia peers.
• Experimental settings: Each peer does:
• 200 requests warm-up.
• 500 requests measurement interval.
(Up to 300K find value requests in warm-up and
1.25 Million requests in measurement)
• Experiment generation:
Each peer receives a file with
700 requests from the appropriate workload. All users
continuously play the requests.
14
Wikipedia trace
(Baaren & Pierre 2009)
“10% of all user requests issued
to Wikipedia during the period from September
19th 2007 to October 31th. “
YouTube trace
(Cheng et al, QOS 2008)
Weekly measurement of ~160k newly created
videos during a period of 21 weeks.
• We directly created a synthetic distribution for each week.
15
Comparative results
Load is distributed more evenly than with
local and KadCache.
16
Comparative results
Average lookup cost is reduced.
100 items Kaleidoscope can do better than
800 items KadCache or Local cache!!!
17
Conclusions
Our algorithm contributes in the following ways:
Less messages per lookup – up to 60% reduction
from cache-less Kademlia.
Better load distribution – Using an overload
protection algorithm that is not surveyed in this
talk.
Reproducibility– Kaleidoscope, KadCache and Local
are released as part of the open source project
OpenKad: https://code.google.com/p/openkad/.
Feel free to use them! 
18
In the paper…
Kaleidoscopes performance is mathematically
analyzed for infinite caches.
An overload protection mechanism that helps
with efficient load distribution.
In the near future
Further evaluate Kaleidoscope according to
more metrics, such as latency success rate
and privacy.
19
The end:
Any questions ?
Thanks for listening!
20
Kaleidoscope
Kaleidoscope caches enjoy higher hit rate than
local cache… why ?
If we denote by C the number of colors. By N the
number of nodes in the system.
For each color on average there are
𝑵
𝑪
nodes.
These nodes perform requests for all nodes in the
system and therefore each one performs on average
requests for C nodes (including itself).
21
Kaleidoscope
Some calculations…
Since any node perform requests of a certain color
for C other nodes on average, the probability for a
cache miss is the same as the probability that C
different nodes did not request the value in the
past.
𝑷𝒎𝒊𝒔𝒔 = 𝟏 − 𝒑𝒗 𝑪
𝑷𝒉𝒊𝒕 = 𝟏 − 𝑷𝒎𝒊𝒔𝒔
(*) we assume unbounded cache in this calculation.
22