Transcript Peer-to-Peer Systems: Practice Theory & Ashwin R. Bharambe
Peer-to-Peer Systems: Theory & Practice
Ashwin R. Bharambe 15744 Lecture
Overview Internet Indirection Infrastructure (i3) Freenet BitTorrent
Content distribution
Effect of P2P networks on the Internet
How does the new traffic matrix look like?
4/30/2020 Ashwin R. Bharambe 2
i3: Motivation
Today’s Internet based on point-to-point abstraction Applications need more: Multicast Mobility Anycast Existing solutions: Change IP layer Overlays
So, what’s the problem?
A different solution for each service
4/30/2020 Ashwin R. Bharambe 3
The i3 solution Indirection
Every problem in CS …
Only primitive needed
Solution: Add an indirection layer on top of IP Implement using overlay networks Solution Components: Naming using “identifiers” Subscriptions using “triggers” DHT as the gluing substrate
4/30/2020 Ashwin R. Bharambe 4
i3: Rendezvous Communication
Packets addressed to identifiers (“names”) Trigger=(Identifier, IP address): inserted by receiver Sender
send(ID, data) send( R , data)
trigger Receiver (R)
ID R 4/30/2020
Senders
decoupled
from receivers
Ashwin R. Bharambe 5
i3: Service Model
API
sendPacket(id, p); insertTrigger(id, addr); removeTrigger(id, addr); // optional
Best-effort service model (like IP) Triggers periodically refreshed by end-hosts Reliability, congestion control, and flow control implemented at end-hosts
4/30/2020 Ashwin R. Bharambe 6
i3: Implementation
Use a Distributed Hash Table Scalable, self-organizing, robust Suitable as a substrate for the Internet
IP.route(R) send( R , data) send(ID, data)
Sender
trigger
Receiver (R)
ID R DHT.put(id) DHT.put(id) 4/30/2020 Ashwin R. Bharambe 7
Mobility and Multicast
send to
many
Mobility supported naturally End-host inserts trigger with new IP address, and everything transparent to sender Robust, and supports location privacy Multicast All receivers insert triggers under same ID Sender uses that ID for sending Can optimize tree construction to balance load
4/30/2020 Ashwin R. Bharambe 8
Anycast
send to
any one
Generalized matching First k-bits have to match, longest prefix match among rest Sender
a b
Triggers
a b 1 a b 2 a b 3
(R1) (R2) (R3) Related triggers must be on same server Server selection (randomize last bits)
4/30/2020 Ashwin R. Bharambe 9
Generalization: Identifier Stack
Stack of identifiers i3 routes packet through these identifiers Receivers trigger maps id to
4/30/2020 Ashwin R. Bharambe 10
Service Composition
Receiver mediated: R sets up chain and passes id_gif/jpg to sender: sender oblivious Sender-mediated: S can include (id_gif/jpg, ID) in his packet: receiver oblivious Sender (GIF) send((ID_ GIF/JPG ,ID), data) S_ GIF/JPG send(ID, data) send(R, data)
ID R
Receiver R (JPG)
ID_ GIF/JPG S_ GIF/JPG 4/30/2020 Ashwin R. Bharambe 11
Public, Private Triggers
Servers publish their public ids: e.g., via DNS Clients contact server using public ids, and negotiate private ids used thereafter Useful: Efficiency -- private ids chosen on “close-by” i3 servers Security -- private ids are shared-secrets
4/30/2020 Ashwin R. Bharambe 12
Scalable Multicast
Replication possible at any i3-server in the infrastructure. Tree construction can be done internally R 3 g x (g, data) g R 1 g R 2 x R 3 x R 4 R 4 R 2 R 1
4/30/2020 Ashwin R. Bharambe 13
Evaluation
Efficiency Metric: Latency stretch Sender i3 takes many hops Sender i3 Receiver triangle routing Decoupling of senders and receivers One framework for various new abstractions Scalable, incrementally deployable Performance Overheads What speeds can this support?
4/30/2020 Ashwin R. Bharambe 14
4/30/2020
Switch tracks…
I don’t understand any DHT stuff; it’s all unreal All I understand is… FILE SHARING
Ashwin R. Bharambe 15
P2P Applications
Centralized model e.g., Napster global index held by central authority direct contact between requestors and providers
Index server
4/30/2020 NAPSTER Ashwin R. Bharambe 16
P2P Applications
Decentralized model e.g., Freenet, Gnutella no global index – local knowledge only (approximate answers) contact mediated by chain of intermediaries
Index servers
4/30/2020 KAZAA Ashwin R. Bharambe FREENET or GNUTELLA 17
4/30/2020
What is Freenet and Why?
Distributed, Peer to Peer, file sharing system Completely anonymous, for producers or consumers of information Resistance to attempts by third parties to deny access to information
Ashwin R. Bharambe 18
4/30/2020
Freenet: How it works
Data structure Key Management Problems How can one node know about others How can it get data from remote nodes How to add new nodes to Freenet How does Freenet manage its data
Ashwin R. Bharambe 19
Data structure
Each document is associated with a “ key ” Routing Table
pairs Data Structure should be able to: rapidly find the document given a certain key rapidly find the closest key to a given key keep track the popularity of documents and know which document to delete when under pressure4/30/2020 Ashwin R. Bharambe 20
4/30/2020
Key Management(1)
A way to locate a document anywhere Keys are used to form a URI Keyword-signed Key(KSK) Based on a short descriptive string, usually a set of keywords that can describe the document
Example: University/cmu/cs/ashu
Uniquely identify a document Potential problem – global namespace
Ashwin R. Bharambe 21
4/30/2020
Key Management (2)
Signed-subspace Key (SSK) Add sender information to avoid namespace conflict
Private key to sign / public key to verify
Content-hash Key(CHK) Hash of the document
Ashwin R. Bharambe 22
Sorry, No B Forward to nearest “untried” key Perform a depth-first search A A, Help me!
On success, return data to upstream requestor D Cache the data source I C
Routing algorithm characteristics
Key clustering Data partitioning Nodes know about keys “similar” to theirs Store clusters of files with same keys Popular data gets cached more Seamless replication to avoid hot-spots As time progresses, connectivity increases
4/30/2020 Ashwin R. Bharambe 24
File insertion
Query the file key A response key collision Re-send with a different key On success, nodes cache the file with a pointer to the data source
4/30/2020 Ashwin R. Bharambe 25
Node join
Need to assign a “key” to the node Two options: Existing node chooses the key Joining node chooses its key What’s the problem?
Uses a bit commitment protocol hash(a) hash(b ^ hash(a)) hash(c ^ hash(b ^ hash(a)))
4/30/2020 Ashwin R. Bharambe 26
Anonymity
Sender remains anonymous Data sources are randomly modified as packet traverses Use “pre-routing” with “mix-nets” to enhance Receiver (or key) anonymity “mix-nets”
4/30/2020 Ashwin R. Bharambe 27
Scalability
X-axis: # of nodes Y-axis: # of pathlength The relation between network size and average pathlenth.
Initially, 20 nodes. Add nodes regularly.
4/30/2020 Ashwin R. Bharambe 29
4/30/2020
Small world Model
X-axis: # of links Y-axis: fraction of nodes (log-scale) Most of nodes have only few connections while a small number of news have large set of connections.
WHY?
Power law
Ashwin R. Bharambe 30
What’s good?
Distributed storage and retrieval Anonymity Adaptive replication based on usage patterns
Anything else?
4/30/2020 Ashwin R. Bharambe 31
Is it perfect?
Query path-length Not bounded Difficult to know the cause of search failures Document did not exist?
Could not find it?
Anything else?
4/30/2020 Ashwin R. Bharambe 32
4/30/2020
Switch tracks…
How does file sharing change the Internet?
Ashwin R. Bharambe 33
Users are patient
4/30/2020 Ashwin R. Bharambe
batch mode delivery!
35
Audio-Video
Small objects
audio Large objects
video 4/30/2020 Ashwin R. Bharambe 37
Object Dynamics Fetch-at-most-once
Short-lived popularity Recently born objects most popular Most requests are for “old” objects
4/30/2020 Ashwin R. Bharambe 38
File sharing not Zipf!
4/30/2020 Ashwin R. Bharambe 39
Conclusions
Many other interesting aspects Some obvious, some not Contribution Fetch-at-most-once significant locality substantial opportunity for caching
4/30/2020 Ashwin R. Bharambe 40