Transcript Gnutella2: A Better Gnutella?
Gnutella2: A Better Gnutella?
COMP 5204: Data Networks Julie Thorpe School of Computer Science Carleton University
Introduction
Gnutella and Gnutella2 are P2P application protocols.
Gnutella has an interesting history – essentially a reverse engineered Beta version. Changes governed by Gnutella Developers Forum (GDF).
Gnutella2 is a completely different protocol than Gnutella, claiming it's what Gnutella should have been.
The GDF reject this claim, and refuse to call Gnutella2 by its name – they instead call it “Mike's Protocol”.
Unclear whether Gnutella2 better than Gnutella.
My project goal is to compare these protocols to determine which is theoretically better.
Presentation Outline
Gnutella review Gnutella's problems Gnutella2 review Comparison: Network architecture Searching algorithms Cooperation incentives Security Concluding remarks
Gnutella Review (1)
Purely decentralized, simple protocol for file sharing.
Runs over TCP/IP connections.
New node enters Gnutella network by connecting to a known server (sends Ping), and when server responds (sends Pong) they are now connected peers.
Learns of other nodes if server forwards Ping to its peers (and gets a Pong back in response).
Gnutella Review (2)
Has no way to advertise files.
Peers find files by “flooding” with query requests (stop after TTL hops).
Query responses are routed back along the same path as the request arrived upon.
Addressed by GUIDs.
Gnutella's Problems
Scalability.
Performance.
Lack of cooperation incentives.
Abuse by servents (Gnutella client/server program).
Gnutella's Problems - Scalability
• • • • For a node to be reached by a query, it (and all other nodes on the path to it) must be forwarded the request!
The reach is determined by
n
(# connections to other hosts) and
TTL
(# hops each request is permitted to take):
t TTL
1 (
n
1 )
t
1 .
n
Assumption: nodes all have the same n and TTL.
Ritter [1] estimated that for reasonable parameters, to achieve a reach of 10 6 nodes, Gnutella nodes must have a bandwidth between 19.2 and 64 Gbps!
Gnutella's Problems - Performance
Messages are forwarded through many other peers on the network.
Connection speeds are effectively restricted to the bandwidth of the slowest peer along the route. If A and B have high-speed connections, and C has a modem connection, the download rate between A and B is limited to C's speed.
Gnutella's Problems – Lack of Cooperation Incentives
One study found that over 70% of users shared no files, and 50% of all responses are returned by the top 1% [2] .
Implications of free-riders on Gnutella: Increased
search horizon
(farthest set of hosts reachable by a search request, directly related to its TTL). The top 1% that is providing most files reaches connection saturation.
This generally a difficult problem to solve for purely decentralized systems.
Gnutella's Problems – Abuse by Servents
Since it's a protocol, implementations can implement the Gnutella protocol as they please (in theory).
Servents can act selfishly to improve their performance Increasing TTL to increase search horizon (generating geometrically higher # messages).
Frequent re-querying (generating more messages, degrading network).
Gnutella2
GDF
recommended
(in version 0.6) “ultrapeers” to improve Gnutella's performance and scalability. Gnutella2 enforces a variation of this to improve performance and scalability. Decentralized, 2-tier hierarchy of peers (“leaf nodes” and “hubs”).
Other important differences will come out in comparison.
Comparison: Gnutella vs. Gnutella2
Network architecture Searching algorithms Cooperation incentives Security
Gnutella's Network Architecture
Recall it is completely decentralized.
Gnutella2's Network Architecture
Decentralized, 2-tier.
This architecture is recommended for Gnutella in v0.6.
New node enters by connecting to a known hub (almost identical to Gnutella's handshake).
Hubs typically accept 300-500 leaves, and connect to 5-30 other hubs. Leaves typically connect to 3 hubs.
Comparison – Network Architecture
Gnutella's purely decentralized is much simpler, but not as scalable. No real difference between Gnutella's v0.6 “ultrapeer” structure and Gnutella2's “hub” structure.
Either of these strategies should reduce searching traffic as explained in searching algorithms.
Gnutella's Searching Algorithm
Recall that using the purely decentralized version, packets are flooded throughout the network. If the v0.6 ultrapeer recommendation is implemented, searching is optimized using Query hash tables(QHTs).
A QHT is maintained by each node, and describes the content it is sharing.
An ultrapeer maintains an aggregate of its leaf's QHTs and its own QHT. Searches are performed by forwarding a query to an ultrapeer, who checks its aggregate QHT for a match. If there is a match, the query is forwarded to the appropriate leaf, otherwise the query is forwarded to neighbouring ultrapeers by “flooding”.
Gnutella2's Searching Algorithm
Ultrapeers are called “hubs”.
Uses a QHT like Gnutella, but if a hub cannot match a query to its aggregate QHT, it checks a set of caches: Each hub maintains a cached copy of each neighbouring hub's aggregate QHT. Upon a search miss, a hub will try to match the query against its cached copies of its neighbours QHTs.
If the query matches, it will forward the query once, and the node that receives the query processes it and directly sends the result back to the client.
If no match is made, the searching client will continue at another untried hub.
Comparison – Searching Algorithms
Both Gnutella with Ultrapeers and Gnutella2 significantly reduce the number of messages.
Should increase performance and scalability.
Gnutella2's method further reduces the number of messages sent for a query: Less query request messages due to caching neighbour's QHTs.
Less response messages since Gnutella2's method allows the responding node to directly contact the requesting node, rather than sending the message back through the path to get there.
Comparison: Cooperation Incentives
Neither Gnutella nor Gnutella2 specify cooperation incentives.
Implementations often will not allow connections to network unless you share something, and by default make downloads shared.
The problem of cooperation incentives in a decentralized environment is interesting, since nodes can avoid connecting to those that have a profile of their behaviour.
Gnutella's Security
Gnutella's query messages are routed through peers, and the query does not contain the querying node's IP address, but a Globally Unique Identifier (GUID).
Provides anonymity by masking requester's identity.
Denial-of-service (DOS) attacks are possible by flooding the network with many requests with a fake GUID. Another node could be similarly DOS'ed if a GUID for one of their request GUIDs is known.
Response IP addresses could be spoofed and malicious content provided.
Gnutella2's Security
Gnutella2 does not use GUIDs for queries Sends the response directly back to the requesting node.
The QHTs do not contain information about the content stored on a neighbouring node, providing privacy. Queries make use of query keys (to verify the query return address is that of the original sender).
Prevents malicious users from sending out queries for the purpose of flooding the network with spoofed requests.
Search clients only permitted to query a hub after obtaining a “query key”, which are unique to each (hub, search client return address) from it to include in the transmission.
Comparison - Security
Both Gnutella with ultrapeers and Gnutella2 provide privacy through their caching.
Both Gnutella and Gnutella2 are suceptible to spoofed response IP addresses.
For Gnutella2: Gnutella does not provide authentication of nodes for querying, thus it is susceptible to request flooding attacks. Gnutella ultrapeers cannot block certain hosts (do not have query keys or unique request addresses).
For Gnutella: Gnutella2 does not provide anonymous queries.
Concluding Remarks
Gnutella has some serious flaws (scalability, performance, lack of cooperation incentives and servent abuse). Gnutella2 solves all but cooperation incentives.
Gnutella with ultrapeers solves scalability and performance, but the searching algorithm and caching is less sophisticated.
Gnutella2 has many more features outside of this comparison, primarily being more extensible (yet specific) to support applications other than file sharing.
Although they are different protocols, Gnutella2 is in essence, an improved version of Gnutella with ultrapeers.
Open Problems
Is it possible to create useful cooperation incentives in a large, truly distributed environment like Gnutella where peers may reconnect to different hubs upon connection?
References
1.
Jordan Ritter, “Why Gnutella Can't Scale. No, Really.”, http://www.tch.org/gnutella.htm
.
2.
Eytan Adar and Bernardo A. Huberman, “Free-Riding on Gnutella”, http://www.firstmonday.org/issues/issue5_10/adar/index.html
.
3.
Farhad Manjoo, “Gnutella Bandwidth Bandits”, Aug. 8, 2002.
4.
RFC-Gnutella 0.6, http://rfc-gnutella.sourceforge.net/developer/testing/index.html
.
5.
Anurag Singla and Christopher Rohrs, “Ultrapeers: Another Step Towards Gnutella Scalability”, Version 1.0, http://rfc-gnutella.sourceforge.net/src/Ultrapeers_1.0.html
6.
Gnutella vs. Gnutella2, Part 1, http://www.mp3newswire.net/stories/2003/gnutella.html
.
7.
The Gnutella2 Developers Network. http://www.gnutella2.com
.
8.
“ LimeWire: Network Improvements ”, http://www.limewire.com/developer/net_improvements.html
9.
P2P Networking Technologies. URL: http://ntrg.cs.tcd.ie/undergrad/4ba2.02-03/Intro.html
.