Peer-to-Peer Social Networking

Download Report

Transcript Peer-to-Peer Social Networking

Peer-to-Peer Systems
Presented By: Nazanin Dehghani
Supervisor: Dr. Naser Yazdani
Peer-to-Peer Architecture
every client should bound
statically to a specific server
more dynamic structure while
having a distributed system.
Peer-to-Peer Definition

“a computer network in which each computer
in the network can act as a client or server for
the other computers in the network, allowing
shared access to files and peripherals without
the need for a central server.”
Peer-to-Peer Applications
Content Distribution
Real-time
Distributed Computing
4
Peer-to-Peer Systems

Properties
◦ Nodes have to share their resources such as
memory, band-width and processing power
directly
◦ P2P networks should be robust to node churn
5
Primitives

Common Primitives
◦
◦
◦
◦
Join: how to I begin participating?
Publish: how do I advertise my file?
Search: how to I find a file?
Fetch: how to I retrieve a file?
Architecture of P2P Systems

Overlay Network

Graph Structure
◦ Structured
 Aware of topology of overlay network
◦ Unstructured
How Did it Start?

A killer application: Naptser
◦ Free music over the Internet

Key idea: share the content, storage and
bandwidth of individual (home) users
Internet
Model
Each user stores a subset of files
 Each user has access (can download) files
from all users in the system

Main Challenge

Find where a particular file is stored
E
F
D
E?
A
C
B
Other Challenges

Scale: up to hundred of thousands or
millions of machines

Dynamicity: machines can come and go
any time
Napster
Assume a centralized index system that
maps files (songs) to machines that are
alive
 How to find a file (song)

◦ Query the index system  return a machine
that stores the required file
 Ideally this is the closest/least-loaded machine
◦ ftp the file

Advantages:

Disadvantages:
◦ Simplicity, easy to implement sophisticated
search engines on top of the index system
◦ Robustness, scalability (?)
Napster: Example
m5
E
m6
F
E?
E
E?
m5
m1
m2
m3
m4
m5
m6
m4
C
A
m1
D
A
B
C
D
E
F
B
m2
m3
Gnutella



Distribute file location
Idea: flood the request
Hot to find a file:

Advantages:

Disadvantages:
◦ Send request to all neighbors
◦ Neighbors recursively multicast the request
◦ Eventually a machine that has the file receives the
request, and it sends back the answer
◦ Totally decentralized, highly robust
◦ Not scalable; the entire network can be swamped
with request (to alleviate this problem, each
request has a TTL)
Gnutella
Queries are flooded for bounded number of
hops
 No guarantees on recall

xyz
xyz
Query: “xyz”
Distributed Hash Tables (DHTs)

Abstraction: a distributed hash-table data
structure
◦ insert(id, item);
◦ item = query(id); (or lookup(id);)
◦ Note: item can be anything: a data object,
document, file, pointer to a file…

Proposals
◦ CAN, Chord, Kademlia, Pastry, Tapestry, etc
DHT Design Goals
Make sure that an item (file) identified is
always found
 Scales to hundreds of thousands of nodes
 Handles rapid arrival and failure of nodes

Structured Networks




Distributed Hash Tables (DHTs)
Hash table interface: put(key,item), get(key)
O(log n) hops
Guarantees on recall
K I
(K1,I1)
K I
K I
K I
K I
I1
K I
K I
put(K1,I1)
K I
get (K1)
K I
CHORD
Chord

In short: a peer-to-peer lookup service

Solves problem of locating a data item in a
collection of distributed nodes, considering
frequent node arrivals and departures

Core operation in most p2p systems is
efficient location of data items

Supports just one operation: given a key, it
maps the key onto a node
20
Chord Characteristics

Simplicity, provable correctness, and provable
performance

Each Chord node needs routing information
about only a few other nodes

Resolves lookups via messages to other
nodes (iteratively or recursively)

Maintains routing information as nodes join
and leave the system
21
Napster, Gnutella etc. vs. Chord

Compared to Napster and its centralized
servers, Chord avoids single points of control
or failure by a decentralized technology

Compared to Gnutella and its widespread use
of broadcasts, Chord avoids the lack of
scalability through a small number of
important information for rounting
22
Addressed Difficult Problems (1)

Load balance: distributed hash function,
spreading keys evenly over nodes

Decentralization: chord is fully distributed, no
node more important than other, improves
robustness

Scalability: logarithmic growth of lookup costs
with number of nodes in network, even very large
systems are feasible
23
Addressed Difficult Problems (2)

Availability: chord automatically adjusts its
internal tables to ensure that the node responsible
for a key can always be found
24
Consistent Hashing


Hash function assigns each node and key an m-bit
identifier using a base hash function such as SHA-1

ID(node) = hash(IP, Port)

ID(key) = hash(key)
Properties of consistent hashing:

Function balances load: all nodes receive roughly the same
number of keys

When an Nth node joins (or leaves) the network, only an
O(1/N) fraction of the keys are moved to a different location
25
Successor Nodes
identifier
node
6
1
0
successor(6) = 0
6
identifier
circle
6
5
key
successor(1) = 1
1
7
X
2
2
successor(2) = 3
3
4
2
26
Node Joins and Departures
6
6
1
0
successor(6) = 7
1
7
6
successor(1) = 3
2
5
3
4
2 1
27
Scalable Key Location

A very small amount of routing information suffices to implement
consistent hashing in a distributed environment

Each node need only be aware of its successor node on the circle

Queries for a given identifier can be passed around the circle via
these successor pointers

Resolution scheme correct, BUT inefficient: it may require
traversing all N nodes!
28
Acceleration of Lookups

Lookups are accelerated by maintaining additional routing
information

Each node maintains a routing table with (at most) m entries
(where N=2m) called the finger table

ith entry in the table at node n contains the identity of the first
node, s, that succeeds n by at least 2i-1 on the identifier circle
(clarification on next slide)

s = successor(n + 2i-1) (all arithmetic mod 2)

s is called the ith finger of node n, denoted by n.finger(i).node
29
Finger Tables (1)
finger table
start int. succ.
1
2
4
[1,2)
[2,4)
[4,0)
1
6
1
3
0
finger table
start int. succ.
0
7
keys
6
2
3
5
[2,3)
[3,5)
[5,1)
keys
1
3
3
0
2
5
3
4
finger table
start int. succ.
4
5
7
[4,5)
[5,7)
[7,3)
keys
2
0
0
0
30
Finger Tables (2) - characteristics

Each node stores information about only a small number of other
nodes, and knows more about nodes closely following it than
about nodes farther away

A node’s finger table generally does not contain enough
information to determine the successor of an arbitrary key k

Repetitive queries to nodes that immediately precede the given
key will lead to the key’s successor eventually
31
Node Joins – with Finger Tables
finger table
start int. succ.
1
2
4
[1,2)
[2,4)
[4,0)
finger table
start int. succ.
7
0
2
[7,0)
[0,2)
[2,6)
0
0
3
keys
1
6
1
3
0
6
finger table
start int. succ.
0
7
keys
6
2
3
5
[2,3)
[3,5)
[5,1)
keys
1
3
3
0
6
2
5
3
4
finger table
start int. succ.
4
5
7
[4,5)
[5,7)
[7,3)
keys
2
6
0
6
0
0
32
Node Departures – with Finger Tables
finger table
start int. succ.
1
2
4
[1,2)
[2,4)
[4,0)
finger table
start int. succ.
7
0
2
[7,0)
[0,2)
[2,6)
0
0
3
keys
6
1
6
3
1
3
0
6
finger table
start int. succ.
0
7
keys
2
3
5
[2,3)
[3,5)
[5,1)
keys
1
3
3
0
6
2
5
3
4
finger table
start int. succ.
4
5
7
[4,5)
[5,7)
[7,3)
keys
2
6
6
0
33
Chord “Finger Table”
1/4
1/2
1/8
1/16
1/32
1/64
1/128
N80
• Entry i in the finger table of node n is the first
node that succeeds or equals n + 2i
• In other words, the ith finger points 1/2n-i way
around the ring
38
Chord Routing



Upon receiving a query for
item id, a node:
Checks whether stores the
item locally
If not, forwards the query to
the largest node in its
successor table that does
not exceed id
7
Succ. Table
Items
7
i id+2 succ
0 1
1
1 2
2
2 4
0
i
0
Succ. Table
Items
i id+2 succ 1
0 2
2
1 3
6
2 5
6
1
i
query(7)
Succ. Table
i id+2i succ
0 7
0
1 0
0
2 2
2
6
2
Succ. Table
5
3
4
i id+2i succ
0 3
6
1 4
6
2 6
6
39
Node Join
Compute ID
 Use an existing node to route to that ID in the
ring.

◦ Finds s = successor(id)
ask s for its predecessor, p
 Splice self into ring just like a linked list

◦ p->successor = me
◦ me->successor = s
◦ me->predecessor = p
◦ s->predecessor = me
40
Chord Summary

Routing table size?
◦ Log N fingers

Routing time?
◦ Each hop expects to 1/2 the distance to the
desired id => expect O(log N) hops.
41
BITTORRENT
Fairness

How about somebody only download not
upload.

What is the policy
◦ Incentive mechanism
Univ. of Tehran
Distributed Operating Systems
43
Fetching Data
Once we know which node(s) have the
data we want...
 Option 1: Fetch from a single peer

◦ Problem: Have to fetch from peer who has
whole file.
 Peers not useful sources until d/l whole file
 At which point they probably log off. :)
◦ How can we fix this?
44
Chunk Fetching
More than one node may have the file.
 How to tell?

◦ Must be able to distinguish identical files
◦ Not necessarily same filename
◦ Same filename not necessarily same file...


Use hash of file
◦ Common: MD5, SHA-1, etc.
How to fetch?
◦ Get bytes [0..8000] from A, [8001...16000] from B
◦ Alternative: Erasure Codes
Univ. of Tehran
Distributed Operating Systems
45
BitTorrent
Written by Bram Cohen (in Python) in 2001
 “Pull-based” “swarming” approach

◦ Each file split into smaller pieces
◦ Nodes request desired pieces from neighbors
 As opposed to parents pushing data that they receive
◦ Pieces not downloaded in sequential order

Encourages contribution by all nodes
BitTorrent

Piece Selection
◦ Rarest first
◦ Random first selection

Peer Selection
◦ Tit-for-tat
◦ Optimistic un-choking
47
BitTorrent Swarm

Swarm
◦ Set of peers all downloading the same file
◦ Organized as a random mesh
Each node knows list of pieces downloaded by
neighbors
 Node requests pieces it does not own from
neighbors

How a node enters a swarm
for file “popeye.mp4”



File popeye.mp4.torrent
hosted at a (well-known)
webserver
The .torrent has address of
tracker for file
The tracker, which runs on
a webserver as well, keeps
track of all peers
downloading file
How a node enters a swarm
for file “popeye.mp4”
www.bittorrent.com

1
Peer


File popeye.mp4.torrent
hosted at a (well-known)
webserver
The .torrent has address of
tracker for file
The tracker, which runs on
a webserver as well, keeps
track of all peers
downloading file
How a node enters a swarm
for file “popeye.mp4”
www.bittorrent.com

Peer
2

Tracker

File popeye.mp4.torrent
hosted at a (well-known)
webserver
The .torrent has address of
tracker for file
The tracker, which runs on
a webserver as well, keeps
track of all peers
downloading file
How a node enters a swarm
for file “popeye.mp4”
www.bittorrent.com


Peer
3
Swarm
Tracker

File popeye.mp4.torrent
hosted at a (well-known)
webserver
The .torrent has address of
tracker for file
The tracker, which runs on
a webserver as well, keeps
track of all peers
downloading file
Contents of .torrent file
URL of tracker
 Piece length – Usually 256 KB
 SHA-1 hashes of each piece in file

◦ For reliability

“files” – allows download of multiple files
Terminology

Seed: peer with the entire file
◦ Original Seed: The first seed

Leech: peer that’s downloading the file
◦ Fairer term might have been “downloader”
Peer-peer transactions:
Choosing pieces to request

Rarest-first: Look at all pieces at all peers,
and request piece that’s owned by fewest
peers
◦ Increases diversity in the pieces downloaded
 avoids case where a node and each of its peers have
exactly the same pieces; increases throughput
◦ Increases likelihood all pieces still available
even if original seed leaves before any one
node has downloaded entire file
Choosing pieces to request

Random First Piece:
◦ When peer starts to download, request
random piece.
 So as to assemble first complete piece quickly
 Then participate in uploads
◦ When first complete piece assembled, switch
to rarest-first
Tit-for-tat as incentive to upload
Want to encourage all peers to contribute
 Peer A said to choke peer B if it (A) decides not
to upload to B
 Each peer (say A) unchokes at most 4 interested
peers at any time

◦ The three with the largest upload rates to A
 Where the tit-for-tat comes in
◦ Another randomly chosen (Optimistic Unchoke)
 To periodically look for better choices
Why BitTorrent took off

Better performance through “pull-based”
transfer
◦ Slow nodes don’t bog down other nodes

Allows uploading from hosts that have
downloaded parts of a file
◦ In common with other end-host based
multicast schemes
Pros and cons of BitTorrent

Pros
◦ Proficient in utilizing partially downloaded files
◦ Discourages “freeloading”
 By rewarding fastest uploaders
◦ Encourages diversity through “rarest-first”
 Extends lifetime of swarm
Pros and cons of BitTorrent

Cons
◦ Assumes all interested peers active at same
time; performance deteriorates if swarm
“cools off”
◦ Even worse: no trackers for obscure content
Pros and cons of BitTorrent

Dependence on centralized tracker:
pro/con?
◦  Single point of failure: New nodes can’t
enter swarm if tracker goes down
◦ Lack of a search feature
  Prevents pollution attacks
  Users need to resort to out-of-band search: well
known torrent-hosting sites / plain old web-search
“Trackerless” BitTorrent
To be more precise, “BitTorrent without a
centralized-tracker”
 E.g.: Azureus
 Uses a Distributed Hash Table (Kademlia DHT)
 Tracker run by a normal end-host (not a webserver anymore)

◦ The original seeder could itself be the tracker
◦ Or have a node in the DHT randomly picked to act
as the tracker
CoolStreaming/DONet: A Datadriven Overlay Network for Peerto-Peer Live Media Streaming
P2P Live Video Streaming
Autonomous and selfish peers
 Churn


Time-sensitive and deadline-prone data
64
Success of P2P-Based File
Distribution
Distribute content quickly
 Utilizing the capacity of all peers
 Incentive mechanism

◦ Preventing peers from free-riding

Incentive mechanism == formation of
clusters of similar bandwidth peers
65
Inefficiency of Local Incentive
Live Video Streaming
File Distribution
Quality as incentive
Fast download as incentive
66
newCoolstreaming

Provide peer-to-peer live streaming

Data-driven design
◦ Don’t use any tree, mesh, or any other
structures
◦ Data flows are guided by the availability of
data
Core operations of DONet /
CoolStreaming


DONet: Data-driven Overlay Network
CoolStream: Cooperative Overlay Streaming
◦ A practical DONet implementation



Every node periodically exchanges data availability
information with a set of partners
Retrieve unavailable data from one or more
partners, or supply available data to partners
The more people watching the streaming data, the
better the watching quality will be
◦ The idea is similar to BitTorrent (BT)
A generic system diagram for a DONet
node

Partnership manager
◦ Random select

Transmission scheduler
◦ Schedules transmission of video data

Buffer Map
◦ Record availability
Coolstreaming

Two types of connections between peers:
◦ Partnership relationship
◦ Parent-child relationship
Multiple sub-streams
 Buffer partitioning
 Push-Pull content delivering
 Parent re-selection

70
An Example of Stream
Decomposition
Single stream of blocks with Sequence number {1, 2, 3, … , 13}
2
3
4
5
6
…
11
1
2
1
3
For sub-streams {S1, S2, S3, S4}
S1
1
5
9
S2
2
6
S3
3
7
S4
4
8
1
0
1
1
1
2
1
3
…
…
…
Combine & Decompose
1
…
71
Structure of Buffer in a Node
S1
d+1
S2
d
d+1
3
...
2
1
...
...
Cache Buffer
Sk
d+3k
d+2k
d+k
Blocks Received
Synchronization Buffer
With K numbers of sub-streams
Blocks Unavailable
72
Parent Re-selection
73
P2P: Summary

Many different styles; remember pros and cons of
each
◦ centralized, flooding, swarming, unstructured and
structured routing

Lessons learned:
◦
◦
◦
◦
◦
◦
Single points of failure are bad
Flooding messages to everyone is bad
Underlying network topology is important
Need incentives to discourage freeloading
Privacy and security are important
Structure can provide theoretical bounds and
guarantees
74