Gnutella2: A Better Gnutella?

Download Report

Transcript Gnutella2: A Better Gnutella?

Gnutella2: A Better Gnutella?

COMP 5204: Data Networks Julie Thorpe School of Computer Science Carleton University

Introduction

 Gnutella and Gnutella2 are P2P application protocols.

 Gnutella has an interesting history – essentially a reverse engineered Beta version.  Changes governed by Gnutella Developers Forum (GDF).

 Gnutella2 is a completely different protocol than Gnutella, claiming it's what Gnutella should have been.

 The GDF reject this claim, and refuse to call Gnutella2 by its name – they instead call it “Mike's Protocol”.

 Unclear whether Gnutella2 better than Gnutella.

 My project goal is to compare these protocols to determine which is theoretically better.

Presentation Outline

 Gnutella review  Gnutella's problems  Gnutella2 review  Comparison:  Network architecture  Searching algorithms  Cooperation incentives  Security  Concluding remarks

Gnutella Review (1)

 Purely decentralized, simple protocol for file sharing.

 Runs over TCP/IP connections.

 New node enters Gnutella network by connecting to a known server (sends Ping), and when server responds (sends Pong) they are now connected peers.

 Learns of other nodes if server forwards Ping to its peers (and gets a Pong back in response).

Gnutella Review (2)

 Has no way to advertise files.

 Peers find files by “flooding” with query requests (stop after TTL hops).

 Query responses are routed back along the same path as the request arrived upon.

 Addressed by GUIDs.

Gnutella's Problems

 Scalability.

 Performance.

 Lack of cooperation incentives.

 Abuse by servents (Gnutella client/server program).

Gnutella's Problems - Scalability

• • • • For a node to be reached by a query, it (and all other nodes on the path to it) must be forwarded the request!

The reach is determined by

n

(# connections to other hosts) and

TTL

(# hops each request is permitted to take):

t TTL

  1 (

n

 1 )

t

 1 .

n

Assumption: nodes all have the same n and TTL.

Ritter [1] estimated that for reasonable parameters, to achieve a reach of 10 6 nodes, Gnutella nodes must have a bandwidth between 19.2 and 64 Gbps!

Gnutella's Problems - Performance

 Messages are forwarded through many other peers on the network.

 Connection speeds are effectively restricted to the bandwidth of the slowest peer along the route.  If A and B have high-speed connections, and C has a modem connection, the download rate between A and B is limited to C's speed.

Gnutella's Problems – Lack of Cooperation Incentives

 One study found that over 70% of users shared no files, and 50% of all responses are returned by the top 1% [2] .

 Implications of free-riders on Gnutella:  Increased

search horizon

(farthest set of hosts reachable by a search request, directly related to its TTL).  The top 1% that is providing most files reaches connection saturation.

 This generally a difficult problem to solve for purely decentralized systems.

Gnutella's Problems – Abuse by Servents

 Since it's a protocol, implementations can implement the Gnutella protocol as they please (in theory).

 Servents can act selfishly to improve their performance  Increasing TTL to increase search horizon (generating geometrically higher # messages).

 Frequent re-querying (generating more messages, degrading network).

Gnutella2

 GDF

recommended

(in version 0.6) “ultrapeers” to improve Gnutella's performance and scalability.  Gnutella2 enforces a variation of this to improve performance and scalability.  Decentralized, 2-tier hierarchy of peers (“leaf nodes” and “hubs”).

 Other important differences will come out in comparison.

Comparison: Gnutella vs. Gnutella2

 Network architecture  Searching algorithms  Cooperation incentives  Security

Gnutella's Network Architecture

 Recall it is completely decentralized.

Gnutella2's Network Architecture

 Decentralized, 2-tier.

 This architecture is recommended for Gnutella in v0.6.

 New node enters by connecting to a known hub (almost identical to Gnutella's handshake).

 Hubs typically accept 300-500 leaves, and connect to 5-30 other hubs.  Leaves typically connect to 3 hubs.

Comparison – Network Architecture

 Gnutella's purely decentralized is much simpler, but not as scalable.  No real difference between Gnutella's v0.6 “ultrapeer” structure and Gnutella2's “hub” structure.

 Either of these strategies should reduce searching traffic as explained in searching algorithms.

Gnutella's Searching Algorithm

 Recall that using the purely decentralized version, packets are flooded throughout the network.  If the v0.6 ultrapeer recommendation is implemented, searching is optimized using Query hash tables(QHTs).

 A QHT is maintained by each node, and describes the content it is sharing.

 An ultrapeer maintains an aggregate of its leaf's QHTs and its own QHT.  Searches are performed by forwarding a query to an ultrapeer, who checks its aggregate QHT for a match.  If there is a match, the query is forwarded to the appropriate leaf, otherwise the query is forwarded to neighbouring ultrapeers by “flooding”.

Gnutella2's Searching Algorithm

 Ultrapeers are called “hubs”.

 Uses a QHT like Gnutella, but if a hub cannot match a query to its aggregate QHT, it checks a set of caches:  Each hub maintains a cached copy of each neighbouring hub's aggregate QHT.  Upon a search miss, a hub will try to match the query against its cached copies of its neighbours QHTs.

 If the query matches, it will forward the query once, and the node that receives the query processes it and directly sends the result back to the client.

 If no match is made, the searching client will continue at another untried hub.

Comparison – Searching Algorithms

 Both Gnutella with Ultrapeers and Gnutella2 significantly reduce the number of messages.

 Should increase performance and scalability.

 Gnutella2's method further reduces the number of messages sent for a query:  Less query request messages due to caching neighbour's QHTs.

 Less response messages since Gnutella2's method allows the responding node to directly contact the requesting node, rather than sending the message back through the path to get there.

Comparison: Cooperation Incentives

 Neither Gnutella nor Gnutella2 specify cooperation incentives.

 Implementations often will not allow connections to network unless you share something, and by default make downloads shared.

 The problem of cooperation incentives in a decentralized environment is interesting, since nodes can avoid connecting to those that have a profile of their behaviour.

Gnutella's Security

 Gnutella's query messages are routed through peers, and the query does not contain the querying node's IP address, but a Globally Unique Identifier (GUID).

 Provides anonymity by masking requester's identity.

 Denial-of-service (DOS) attacks are possible by flooding the network with many requests with a fake GUID.  Another node could be similarly DOS'ed if a GUID for one of their request GUIDs is known.

 Response IP addresses could be spoofed and malicious content provided.

Gnutella2's Security

 Gnutella2 does not use GUIDs for queries  Sends the response directly back to the requesting node.

 The QHTs do not contain information about the content stored on a neighbouring node, providing privacy.  Queries make use of query keys (to verify the query return address is that of the original sender).

 Prevents malicious users from sending out queries for the purpose of flooding the network with spoofed requests.

 Search clients only permitted to query a hub after obtaining a “query key”, which are unique to each (hub, search client return address) from it to include in the transmission.

Comparison - Security

 Both Gnutella with ultrapeers and Gnutella2 provide privacy through their caching.

 Both Gnutella and Gnutella2 are suceptible to spoofed response IP addresses.

 For Gnutella2:  Gnutella does not provide authentication of nodes for querying, thus it is susceptible to request flooding attacks.  Gnutella ultrapeers cannot block certain hosts (do not have query keys or unique request addresses).

 For Gnutella:  Gnutella2 does not provide anonymous queries.

Concluding Remarks

 Gnutella has some serious flaws (scalability, performance, lack of cooperation incentives and servent abuse).  Gnutella2 solves all but cooperation incentives.

 Gnutella with ultrapeers solves scalability and performance, but the searching algorithm and caching is less sophisticated.

 Gnutella2 has many more features outside of this comparison, primarily being more extensible (yet specific) to support applications other than file sharing.

 Although they are different protocols, Gnutella2 is in essence, an improved version of Gnutella with ultrapeers.

Open Problems

 Is it possible to create useful cooperation incentives in a large, truly distributed environment like Gnutella where peers may reconnect to different hubs upon connection?

References

1.

Jordan Ritter, “Why Gnutella Can't Scale. No, Really.”, http://www.tch.org/gnutella.htm

.

2.

Eytan Adar and Bernardo A. Huberman, “Free-Riding on Gnutella”, http://www.firstmonday.org/issues/issue5_10/adar/index.html

.

3.

Farhad Manjoo, “Gnutella Bandwidth Bandits”, Aug. 8, 2002.

4.

RFC-Gnutella 0.6, http://rfc-gnutella.sourceforge.net/developer/testing/index.html

.

5.

Anurag Singla and Christopher Rohrs, “Ultrapeers: Another Step Towards Gnutella Scalability”, Version 1.0, http://rfc-gnutella.sourceforge.net/src/Ultrapeers_1.0.html

6.

Gnutella vs. Gnutella2, Part 1, http://www.mp3newswire.net/stories/2003/gnutella.html

.

7.

The Gnutella2 Developers Network. http://www.gnutella2.com

.

8.

“ LimeWire: Network Improvements ”, http://www.limewire.com/developer/net_improvements.html

9.

P2P Networking Technologies. URL: http://ntrg.cs.tcd.ie/undergrad/4ba2.02-03/Intro.html

.