Transcript Peer Organization Lecture
Scalability
1. Optimizing P2P Networks: Lessons learned from social networking a) Social Networks b) Lessons Learned c) Are P2P Networks Social??
d) Organizing P2P Networks 2. Peer Topologies a) Centralized, Ring, Hierarchical & Decentralized b) Hybrid: o o Centralized-Ring Centralized-Centralized c) o Centralized-Decentralized Reflector Nodes 3. Gnutella Case Studies a) 3 case studies 1
Scalability “You can’t scale better than by utilising someone else ’s computer.” •Paul James1
2
3
Limewire Gnutella Coding
Social Networks
• Stanley Milgram (Harvard professor ) – 1967 social networking experiment • How many ‘ social hops ’ would it take for messages to traverse through the US population (200 million) • Posted 160 letters randomly chosen people in Omaha, Nebraska
Omaha Boston
• Asked them to try to pass these letters to a stockbroker working in Boston, Massachusetts • Rules: • use intermediacies whom they know on a first name basis • chosen intelligently • make a note at each hop • 42 letters made it !!
• Average of 5.5 hops • Demonstrated the ‘small world effect ’ Proved that the social network of the United States is indeed connected with a path length (number of hops) of around 6 – The 6 degrees of separation !
Does this mean that it takes 6 hops to traverse 200 million people??
4
Lessons Learned from Milgrim ’s Experiment
• Social circles are highly clustered • A few members have wide-ranging connections • these form a bridge between far-flung social clusters • this bridging plays a critical role in bringing the network closer together For example • A quarter of all letters passed through a local storekeeper • A half were mediated by just 3 people Lessons Learned • These people acted as gateways or hubs between the source and the wider world • A small number of bridges dramatically reduces the number of hops 5
From Social Networks to Computer Networks…
• There are a number of similarities to social networks • People = peers • Intermediaries = Hubs, Gateways or Rendezvous Nodes (JXTA speak...) • Number of intermediaries passed through = number of hops Are P2P Networks Special then?
• P2P networks are more like social networks than other types of computer network because they are often: • Self Organizing • Ad-Hoc • Employ clustering techniques based on prior interactions (like we form relationships) • Decentralized discovery and communication (like we form neighbourhoods, villages, cities etc) 6
Peer to Peer: What ’s the problem?
• Problem: how do we organize peers within ad-hoc, multi hop pervasive P2P networks?
• network of self-organizing peers organized in a decentralized fashion • such networks can rapidly expand from a few hundred peers to several thousand or even millions • P2P Environment Recap: • Unreliable Environments • Peers connecting/disconnecting – network failures to participation • Random Failures e.g. power outages, Cable, DSL failure, hackers • Personal machines are much more vulnerable than servers • algorithms have to cope with this continuous restructuring of the network core.
• P2P systems need to treat failures as normal occurrences not freak exceptions • must be designed in a way that promotes redundancy with the tradeoff of a degradation of performance 7
So, how do we Organize Networks in Order to Get Optimum Performance?
For P2P • This does not mean abstract numerical benchmarks e.g. how many milliseconds will it take to compute this many millions of FFTs? • Rather, it means asking question like: • How long will it take to retrieve this particular file? • How much bandwidth will this query consume?
• How many hops will it take for my package to get to a peer on the far side of the network?
•If I add/remove a peer to the network will the network still be fault tolerant?
•Does the network scale as we add more peers. Such networks can rapidly expand from a few hundred peers to several thousand or even millions 8
Performance Issues in P2P Networks
3 main factors that make P2P networks more sensitive to performance issues: 1.
• • • Communication .
Fundamental necessity Users connected via different connections speeds Multi-hop 2.
• • Searching No central Control so more effort is needed Each hop adds to total bandwidth – problems: time outs 3.
• • • Equal Peers Free Riders – unbalance in the harmonicity of network Degrades performance for others Need to get this right to adjust accordingly 9
Peer Topologies • Core
• Centralized • Ring • Hierarchical • Decentralized
• Hybrid
• Centralized-Ring • Centralized-Centralized • Centralized-Decentralized 10
Centralized
• Client/server • Web servers • Databases • Napster search • Instant Messaging • Popular Power 11
Ring
• Fail-over clusters • Simple load balancing • Assumption – Single owner 12
Hierarchical
• Tree structure • DNS • Usenet (sort of) 13
Decentralized
• Gnutella • Freenet • Internet routing 14
Centralized + Ring
• Robust web applications • High availability of servers 15
Centralized + Centralized
• N-tier apps • Database heavy systems • Web services gateways • Google.com uses this topology to deliver their service 16
Centralized + Decentralized
• New Wave of P2P • Clip2 Gnutella Reflector (next) • FastTrack – KaZaA – Morpheus • Email • Like Social Networks perhaps ?
17
Reflector Nodes
• Known as ‘super peers’ – in JXTA these are Rendezvous peers • cache file list of connected users – maintain an index • When a query is issued, the Reflector does not retransmit it - it answers the query from its own memory C F1.mp3 – ID0:F1.mp3
… F1.mp3
0 F2.mp3
1 F3.mp3
2 • Do they remind you of anything ?
18
Napster = Gnutella?
Napster
Napster.com
User N2 N3 Napster User =?
Gnutella
Napster Duplicated Servers Gnutella Super Peers: 1. Natural??
2. Reflector (clip2.com) 19
The Gnutella Network Today
The figure below is a view of the topology of a Gnutella network as shown on the LimeWire web site, the popular Gnutella file-sharing client. Notice how the power-law or centralized-decentralized structure is demonstrated.
20
Another View of the Gnutella Network 21
Gnutella Studies 1: Free Riding
E. Adar and B.A. Huberman (2000), “
Free Riding on Gnutella
, ” First Monday 5(10), http://firstmonday.org/issues/issue5_10/adar/index.html
Two types of free riding 1.
2.
download files but never provide any files for other to download users that have undesirable content • They found 22,084 of the 33,335 peers in the network (66%) of the peers share no files • 24,347 or 73% share ten or less files • top 1 percent (333 hosts) represent 37 percent of the total files shared • 20 percent (6,667 hosts) sharing 98% of the files shows - even without Gnutella Reflector nodes, the Gnutella network naturally converges into a centralized + decentralized topology with the top 20% of nodes acting as super peers or reflectors 22
Gnutella Studies 2: Equal Peers
Study on
Reflector Nodes
[clip] www.clip2.com
Studied Gnutella for one month • Noted an apparent scalability barrier when query rates went above 10 per second.
Why??
•Gnutella query = 560 bits long and queries make up approximately one quarter of traffic. • Each peer is connect to three peers, so: 560 *10 * 3 = 16,800 bytes per second • This is a quarter of the traffic so total traffic 67,200 bytes per second. • a 56-K link cannot keep up with this amount of traffic • one node connected in the incorrect place can grind the whole network to a halt. • This is why P2P networks place slower nodes at the edges 23
Gnutella Studies 3: Communication
Peer-to-Peer Architecture Case Study: Gnutella Network
Matei Ripeanu, on-line at: http://people.cs.uchicago.edu/~matei/PAPERS/P2P2001.pdf
Studied topology of Gnutella over several months & reported two findings: 1.
Gnutella network shares the benefits and drawbacks of a
power-law structure
- networks that organize themselves so that most nodes have a few links and a small number of nodes have many - found to show an unexpected degree of robustness when facing random node failures. - vulnerable to attacks e.g. by removing a few of the
super nodes
can have a massive effect on the function of the network as a whole.
2.
Gnutella network topology does not match well with the underlying Internet topology leading to inefficient use of network bandwidth. He gave 2 suggestions: 1.
use an agent to monitor network and intervene by asking links to keep the topology optimal.
servents
to drop/add 2.
replace the
Gnutella
flooding mechanism with a smarter routing and group communication mechanism. 24
What about other topologies: The Future?
• Centralized + Hierarchical?
– Back end tree of information – Caching architectures
• Decentralized + Ring?
– P2P network of fail-over clusters
• More ??
25
Closing Remarks
1. Summary a) Centralized + Decentralized – understand from the original Gnutella to the new models b) The role of Reflector nodes 2. Further Information : Distributed Hashtable Models a) Pastry: http://research.microsoft.com/~antr/pastry b) Chord: http://www.pdos.lcs.mit.edu/chord/ 26