Peer-to-Peer Social Networking
Download
Report
Transcript Peer-to-Peer Social Networking
Peer-to-Peer Systems
Presented By: Nazanin Dehghani
Supervisor: Dr. Naser Yazdani
Peer-to-Peer Architecture
every client should bound
statically to a specific server
more dynamic structure while
having a distributed system.
Peer-to-Peer Definition
“a computer network in which each computer
in the network can act as a client or server for
the other computers in the network, allowing
shared access to files and peripherals without
the need for a central server.”
Peer-to-Peer Applications
Content Distribution
Real-time
Distributed Computing
4
Peer-to-Peer Systems
Properties
◦ Nodes have to share their resources such as
memory, band-width and processing power
directly
◦ P2P networks should be robust to node churn
5
Primitives
Common Primitives
◦
◦
◦
◦
Join: how to I begin participating?
Publish: how do I advertise my file?
Search: how to I find a file?
Fetch: how to I retrieve a file?
Architecture of P2P Systems
Overlay Network
Graph Structure
◦ Structured
Aware of topology of overlay network
◦ Unstructured
How Did it Start?
A killer application: Naptser
◦ Free music over the Internet
Key idea: share the content, storage and
bandwidth of individual (home) users
Internet
Model
Each user stores a subset of files
Each user has access (can download) files
from all users in the system
Main Challenge
Find where a particular file is stored
E
F
D
E?
A
C
B
Other Challenges
Scale: up to hundred of thousands or
millions of machines
Dynamicity: machines can come and go
any time
Napster
Assume a centralized index system that
maps files (songs) to machines that are
alive
How to find a file (song)
◦ Query the index system return a machine
that stores the required file
Ideally this is the closest/least-loaded machine
◦ ftp the file
Advantages:
Disadvantages:
◦ Simplicity, easy to implement sophisticated
search engines on top of the index system
◦ Robustness, scalability (?)
Napster: Example
m5
E
m6
F
E?
E
E?
m5
m1
m2
m3
m4
m5
m6
m4
C
A
m1
D
A
B
C
D
E
F
B
m2
m3
Gnutella
Distribute file location
Idea: flood the request
Hot to find a file:
Advantages:
Disadvantages:
◦ Send request to all neighbors
◦ Neighbors recursively multicast the request
◦ Eventually a machine that has the file receives the
request, and it sends back the answer
◦ Totally decentralized, highly robust
◦ Not scalable; the entire network can be swamped
with request (to alleviate this problem, each
request has a TTL)
Gnutella
Queries are flooded for bounded number of
hops
No guarantees on recall
xyz
xyz
Query: “xyz”
Distributed Hash Tables (DHTs)
Abstraction: a distributed hash-table data
structure
◦ insert(id, item);
◦ item = query(id); (or lookup(id);)
◦ Note: item can be anything: a data object,
document, file, pointer to a file…
Proposals
◦ CAN, Chord, Kademlia, Pastry, Tapestry, etc
DHT Design Goals
Make sure that an item (file) identified is
always found
Scales to hundreds of thousands of nodes
Handles rapid arrival and failure of nodes
Structured Networks
Distributed Hash Tables (DHTs)
Hash table interface: put(key,item), get(key)
O(log n) hops
Guarantees on recall
K I
(K1,I1)
K I
K I
K I
K I
I1
K I
K I
put(K1,I1)
K I
get (K1)
K I
CHORD
Chord
In short: a peer-to-peer lookup service
Solves problem of locating a data item in a
collection of distributed nodes, considering
frequent node arrivals and departures
Core operation in most p2p systems is
efficient location of data items
Supports just one operation: given a key, it
maps the key onto a node
20
Chord Characteristics
Simplicity, provable correctness, and provable
performance
Each Chord node needs routing information
about only a few other nodes
Resolves lookups via messages to other
nodes (iteratively or recursively)
Maintains routing information as nodes join
and leave the system
21
Napster, Gnutella etc. vs. Chord
Compared to Napster and its centralized
servers, Chord avoids single points of control
or failure by a decentralized technology
Compared to Gnutella and its widespread use
of broadcasts, Chord avoids the lack of
scalability through a small number of
important information for rounting
22
Addressed Difficult Problems (1)
Load balance: distributed hash function,
spreading keys evenly over nodes
Decentralization: chord is fully distributed, no
node more important than other, improves
robustness
Scalability: logarithmic growth of lookup costs
with number of nodes in network, even very large
systems are feasible
23
Addressed Difficult Problems (2)
Availability: chord automatically adjusts its
internal tables to ensure that the node responsible
for a key can always be found
24
Consistent Hashing
Hash function assigns each node and key an m-bit
identifier using a base hash function such as SHA-1
ID(node) = hash(IP, Port)
ID(key) = hash(key)
Properties of consistent hashing:
Function balances load: all nodes receive roughly the same
number of keys
When an Nth node joins (or leaves) the network, only an
O(1/N) fraction of the keys are moved to a different location
25
Successor Nodes
identifier
node
6
1
0
successor(6) = 0
6
identifier
circle
6
5
key
successor(1) = 1
1
7
X
2
2
successor(2) = 3
3
4
2
26
Node Joins and Departures
6
6
1
0
successor(6) = 7
1
7
6
successor(1) = 3
2
5
3
4
2 1
27
Scalable Key Location
A very small amount of routing information suffices to implement
consistent hashing in a distributed environment
Each node need only be aware of its successor node on the circle
Queries for a given identifier can be passed around the circle via
these successor pointers
Resolution scheme correct, BUT inefficient: it may require
traversing all N nodes!
28
Acceleration of Lookups
Lookups are accelerated by maintaining additional routing
information
Each node maintains a routing table with (at most) m entries
(where N=2m) called the finger table
ith entry in the table at node n contains the identity of the first
node, s, that succeeds n by at least 2i-1 on the identifier circle
(clarification on next slide)
s = successor(n + 2i-1) (all arithmetic mod 2)
s is called the ith finger of node n, denoted by n.finger(i).node
29
Finger Tables (1)
finger table
start int. succ.
1
2
4
[1,2)
[2,4)
[4,0)
1
6
1
3
0
finger table
start int. succ.
0
7
keys
6
2
3
5
[2,3)
[3,5)
[5,1)
keys
1
3
3
0
2
5
3
4
finger table
start int. succ.
4
5
7
[4,5)
[5,7)
[7,3)
keys
2
0
0
0
30
Finger Tables (2) - characteristics
Each node stores information about only a small number of other
nodes, and knows more about nodes closely following it than
about nodes farther away
A node’s finger table generally does not contain enough
information to determine the successor of an arbitrary key k
Repetitive queries to nodes that immediately precede the given
key will lead to the key’s successor eventually
31
Node Joins – with Finger Tables
finger table
start int. succ.
1
2
4
[1,2)
[2,4)
[4,0)
finger table
start int. succ.
7
0
2
[7,0)
[0,2)
[2,6)
0
0
3
keys
1
6
1
3
0
6
finger table
start int. succ.
0
7
keys
6
2
3
5
[2,3)
[3,5)
[5,1)
keys
1
3
3
0
6
2
5
3
4
finger table
start int. succ.
4
5
7
[4,5)
[5,7)
[7,3)
keys
2
6
0
6
0
0
32
Node Departures – with Finger Tables
finger table
start int. succ.
1
2
4
[1,2)
[2,4)
[4,0)
finger table
start int. succ.
7
0
2
[7,0)
[0,2)
[2,6)
0
0
3
keys
6
1
6
3
1
3
0
6
finger table
start int. succ.
0
7
keys
2
3
5
[2,3)
[3,5)
[5,1)
keys
1
3
3
0
6
2
5
3
4
finger table
start int. succ.
4
5
7
[4,5)
[5,7)
[7,3)
keys
2
6
6
0
33
Chord “Finger Table”
1/4
1/2
1/8
1/16
1/32
1/64
1/128
N80
• Entry i in the finger table of node n is the first
node that succeeds or equals n + 2i
• In other words, the ith finger points 1/2n-i way
around the ring
38
Chord Routing
Upon receiving a query for
item id, a node:
Checks whether stores the
item locally
If not, forwards the query to
the largest node in its
successor table that does
not exceed id
7
Succ. Table
Items
7
i id+2 succ
0 1
1
1 2
2
2 4
0
i
0
Succ. Table
Items
i id+2 succ 1
0 2
2
1 3
6
2 5
6
1
i
query(7)
Succ. Table
i id+2i succ
0 7
0
1 0
0
2 2
2
6
2
Succ. Table
5
3
4
i id+2i succ
0 3
6
1 4
6
2 6
6
39
Node Join
Compute ID
Use an existing node to route to that ID in the
ring.
◦ Finds s = successor(id)
ask s for its predecessor, p
Splice self into ring just like a linked list
◦ p->successor = me
◦ me->successor = s
◦ me->predecessor = p
◦ s->predecessor = me
40
Chord Summary
Routing table size?
◦ Log N fingers
Routing time?
◦ Each hop expects to 1/2 the distance to the
desired id => expect O(log N) hops.
41
BITTORRENT
Fairness
How about somebody only download not
upload.
What is the policy
◦ Incentive mechanism
Univ. of Tehran
Distributed Operating Systems
43
Fetching Data
Once we know which node(s) have the
data we want...
Option 1: Fetch from a single peer
◦ Problem: Have to fetch from peer who has
whole file.
Peers not useful sources until d/l whole file
At which point they probably log off. :)
◦ How can we fix this?
44
Chunk Fetching
More than one node may have the file.
How to tell?
◦ Must be able to distinguish identical files
◦ Not necessarily same filename
◦ Same filename not necessarily same file...
Use hash of file
◦ Common: MD5, SHA-1, etc.
How to fetch?
◦ Get bytes [0..8000] from A, [8001...16000] from B
◦ Alternative: Erasure Codes
Univ. of Tehran
Distributed Operating Systems
45
BitTorrent
Written by Bram Cohen (in Python) in 2001
“Pull-based” “swarming” approach
◦ Each file split into smaller pieces
◦ Nodes request desired pieces from neighbors
As opposed to parents pushing data that they receive
◦ Pieces not downloaded in sequential order
Encourages contribution by all nodes
BitTorrent
Piece Selection
◦ Rarest first
◦ Random first selection
Peer Selection
◦ Tit-for-tat
◦ Optimistic un-choking
47
BitTorrent Swarm
Swarm
◦ Set of peers all downloading the same file
◦ Organized as a random mesh
Each node knows list of pieces downloaded by
neighbors
Node requests pieces it does not own from
neighbors
How a node enters a swarm
for file “popeye.mp4”
File popeye.mp4.torrent
hosted at a (well-known)
webserver
The .torrent has address of
tracker for file
The tracker, which runs on
a webserver as well, keeps
track of all peers
downloading file
How a node enters a swarm
for file “popeye.mp4”
www.bittorrent.com
1
Peer
File popeye.mp4.torrent
hosted at a (well-known)
webserver
The .torrent has address of
tracker for file
The tracker, which runs on
a webserver as well, keeps
track of all peers
downloading file
How a node enters a swarm
for file “popeye.mp4”
www.bittorrent.com
Peer
2
Tracker
File popeye.mp4.torrent
hosted at a (well-known)
webserver
The .torrent has address of
tracker for file
The tracker, which runs on
a webserver as well, keeps
track of all peers
downloading file
How a node enters a swarm
for file “popeye.mp4”
www.bittorrent.com
Peer
3
Swarm
Tracker
File popeye.mp4.torrent
hosted at a (well-known)
webserver
The .torrent has address of
tracker for file
The tracker, which runs on
a webserver as well, keeps
track of all peers
downloading file
Contents of .torrent file
URL of tracker
Piece length – Usually 256 KB
SHA-1 hashes of each piece in file
◦ For reliability
“files” – allows download of multiple files
Terminology
Seed: peer with the entire file
◦ Original Seed: The first seed
Leech: peer that’s downloading the file
◦ Fairer term might have been “downloader”
Peer-peer transactions:
Choosing pieces to request
Rarest-first: Look at all pieces at all peers,
and request piece that’s owned by fewest
peers
◦ Increases diversity in the pieces downloaded
avoids case where a node and each of its peers have
exactly the same pieces; increases throughput
◦ Increases likelihood all pieces still available
even if original seed leaves before any one
node has downloaded entire file
Choosing pieces to request
Random First Piece:
◦ When peer starts to download, request
random piece.
So as to assemble first complete piece quickly
Then participate in uploads
◦ When first complete piece assembled, switch
to rarest-first
Tit-for-tat as incentive to upload
Want to encourage all peers to contribute
Peer A said to choke peer B if it (A) decides not
to upload to B
Each peer (say A) unchokes at most 4 interested
peers at any time
◦ The three with the largest upload rates to A
Where the tit-for-tat comes in
◦ Another randomly chosen (Optimistic Unchoke)
To periodically look for better choices
Why BitTorrent took off
Better performance through “pull-based”
transfer
◦ Slow nodes don’t bog down other nodes
Allows uploading from hosts that have
downloaded parts of a file
◦ In common with other end-host based
multicast schemes
Pros and cons of BitTorrent
Pros
◦ Proficient in utilizing partially downloaded files
◦ Discourages “freeloading”
By rewarding fastest uploaders
◦ Encourages diversity through “rarest-first”
Extends lifetime of swarm
Pros and cons of BitTorrent
Cons
◦ Assumes all interested peers active at same
time; performance deteriorates if swarm
“cools off”
◦ Even worse: no trackers for obscure content
Pros and cons of BitTorrent
Dependence on centralized tracker:
pro/con?
◦ Single point of failure: New nodes can’t
enter swarm if tracker goes down
◦ Lack of a search feature
Prevents pollution attacks
Users need to resort to out-of-band search: well
known torrent-hosting sites / plain old web-search
“Trackerless” BitTorrent
To be more precise, “BitTorrent without a
centralized-tracker”
E.g.: Azureus
Uses a Distributed Hash Table (Kademlia DHT)
Tracker run by a normal end-host (not a webserver anymore)
◦ The original seeder could itself be the tracker
◦ Or have a node in the DHT randomly picked to act
as the tracker
CoolStreaming/DONet: A Datadriven Overlay Network for Peerto-Peer Live Media Streaming
P2P Live Video Streaming
Autonomous and selfish peers
Churn
Time-sensitive and deadline-prone data
64
Success of P2P-Based File
Distribution
Distribute content quickly
Utilizing the capacity of all peers
Incentive mechanism
◦ Preventing peers from free-riding
Incentive mechanism == formation of
clusters of similar bandwidth peers
65
Inefficiency of Local Incentive
Live Video Streaming
File Distribution
Quality as incentive
Fast download as incentive
66
newCoolstreaming
Provide peer-to-peer live streaming
Data-driven design
◦ Don’t use any tree, mesh, or any other
structures
◦ Data flows are guided by the availability of
data
Core operations of DONet /
CoolStreaming
DONet: Data-driven Overlay Network
CoolStream: Cooperative Overlay Streaming
◦ A practical DONet implementation
Every node periodically exchanges data availability
information with a set of partners
Retrieve unavailable data from one or more
partners, or supply available data to partners
The more people watching the streaming data, the
better the watching quality will be
◦ The idea is similar to BitTorrent (BT)
A generic system diagram for a DONet
node
Partnership manager
◦ Random select
Transmission scheduler
◦ Schedules transmission of video data
Buffer Map
◦ Record availability
Coolstreaming
Two types of connections between peers:
◦ Partnership relationship
◦ Parent-child relationship
Multiple sub-streams
Buffer partitioning
Push-Pull content delivering
Parent re-selection
70
An Example of Stream
Decomposition
Single stream of blocks with Sequence number {1, 2, 3, … , 13}
2
3
4
5
6
…
11
1
2
1
3
For sub-streams {S1, S2, S3, S4}
S1
1
5
9
S2
2
6
S3
3
7
S4
4
8
1
0
1
1
1
2
1
3
…
…
…
Combine & Decompose
1
…
71
Structure of Buffer in a Node
S1
d+1
S2
d
d+1
3
...
2
1
...
...
Cache Buffer
Sk
d+3k
d+2k
d+k
Blocks Received
Synchronization Buffer
With K numbers of sub-streams
Blocks Unavailable
72
Parent Re-selection
73
P2P: Summary
Many different styles; remember pros and cons of
each
◦ centralized, flooding, swarming, unstructured and
structured routing
Lessons learned:
◦
◦
◦
◦
◦
◦
Single points of failure are bad
Flooding messages to everyone is bad
Underlying network topology is important
Need incentives to discourage freeloading
Privacy and security are important
Structure can provide theoretical bounds and
guarantees
74