CMPT 880: P2P Systems - Simon Fraser University

Download Report

Transcript CMPT 880: P2P Systems - Simon Fraser University

School of Computing Science Simon Fraser University

CMPT 880: Internet Architectures and Protocols

Introduction to Peer-to-Peer Systems

Instructor: Dr. Mohamed Hefeeda

P2P Computing: Definitions



Peers cooperate to achieve desired functions

Peers:

• • •

End-systems (typically, user machines) Interconnected through an overlay network Peer ≡ Like the others (similar or behave in similar manner)

Cooperate:

• •

Share resources, e.g., data, CPU cycles, storage, bandwidth Participate in protocols, e.g., routing, replication, …

Functions:

•

File-sharing, distributed computing, communications, content distribution, …



Note: the P2P concept is much wider than file sharing 3

Overlay Network

When Did P2P Start?



Napster (Late 1990’s)

Court shut Napster down in 2001



Gnutella (2000)



Then the killer FastTrack (Kazaa, ...)



BitTorrent, and many others



Accompanied by significant research interest



Claim

P2P is much older than Napster!



Proof

The original Internet!

Remember UUCP (unix-to-unix copy)?

What IS and IS NOT New in P2P?



What is not new

Concepts!



What is new

The term P2P (may be!)

New characteristics of

• •

Nodes which constitute the System that we build 6

What IS NOT New in P2P?



Distributed architectures



Distributed resource sharing



Node management (join/leave/fail)



Group communications

 

Distributed state management ….

What IS New in P2P?



Nodes (Peers)

Quite heterogeneous

• •

Several order of magnitudes difference in resources Compare the bandwidth of a dial-up peer versus a high-speed LAN peer

Unreliable

•

Failure is the norm!

Offer limited capacity

•

Load sharing and balancing are critical

Autonomous

• •

Rational, i.e., maximize their own benefits!

Motivations should be provided to peers to cooperate in a way that optimizes the system performance 8

What IS New in P2P? (cont’d)



System

Scale

•

Numerous number of peers (millions) Structure and topology

• •

Ad-hoc: No control over peer joining/leaving Highly dynamic Membership/participation

•

Typically open



More security concerns

•

Trust, privacy, data integrity, …

Cost of building and running

•

Small fraction of same-scale centralized systems

•

How much would it cost to build/run a super computer with processing power of that 3 Million SETI@Home PCs?

What IS New in P2P? (cont’d)



So what?



We need to design new lighter-weight algorithms and protocols to scale to millions (or billions!) of nodes given the new characteristics



Question: why now, not two decades ago?

We did not have such abundant (and underutilized) computing resources back then!

And, network connectivity was very limited 10

Why is it Important to Study P2P?



P2P traffic is a major portion of Internet traffic (50+%), current killer app



P2P traffic has exceeded web traffic (former killer app)!



Direct implications on the design, administration, and use of computer networks and network resources

Think of ISP designers or campus network administrators



Many potential distributed applications 11

Sample P2P Applications



File sharing

Gnutella, Kazaa, Napster, …



Distributed cycle sharing

SETI@home, Gnome@home, …



File and storage systems

OceanStore, CFS, Freenet, Farsite, …



Media streaming and content distribution

PROMISE

SplitStream, CoopNet, PeerCast, Bullet, Zigzag, NICE, … 12

P2P vs its Cousin (Grid Computing)



Common Goal:

Aggregate resources (e.g., storage, CPU cycles, and data) into a common pool and provide efficient access to them



Differences along five axes

[Foster & Imanitchi 03] -

Target communities and applications

Type of shared resources

Scalability of the system

Services provided

Software required 13

P2P vs Grid Computing (cont’d)

Issue Grid P2P Communities and Applications Resources Shared



Established communities, e.g., scientific institutions



Computationally intensive problems



Grass-root communities (anonymous)



Mostly, file swapping

 

Powerful and Reliable machines, clusters High-speed connectivity



PCs with limited capacity and connectivity



Unreliable



Specialized instruments



Very diverse 14

P2P vs Grid Computing (cont’d)

Issue Grid P2P System Scalability Services Provided Software required



Hundreds to thousands of nodes



Hundreds of thousands to Millions of nodes



Sophisticated services: authentication, resources discovery, scheduling, access control, and membership control



Limited services: resource discovery



limited trust among peers



Members usually trust others



Sophisticated suit: e.g., Globus, Condor Simple: (screen saver), e.g., Kazza, SETI@Home 15

P2P vs Grid Computing: Discussion

    

The differences mentioned are based on the traditional view of each paradigm

In the future, it is conceived that both paradigms will converge and will complement each other

[e.g., Butt et al. 03]

Target communities and applications

Grid: is going open Type of shared resources

P2P: is to include various and more powerful resources Scalability of the system

Grid: is to increase number of nodes Services provided

P2P: is to provide authentication, data integrity, trust management, … 16

P2P Systems: Simple Model

System architecture: Peers form an overlay according to the P2P Substrate P2P Application Middleware P2P Substrate Operating System Hardware Software architecture model on a peer 17

Overlay Network



An abstract layer built on top of the physical network



Neighbors in the overlay can be several hops away in the physical network



Why do we need overlays?

Flexibility in

• •

Choosing neighbors Forming and customizing topology to fit application needs (e.g., short delay, reliability, high BW, …)

•

Designing communication protocols among nodes

Get around limitations in legacy networks

Enable new (and old!) network services 18

Overlay Network (cont’d)



Some applications that use overlays

Application level multicast, e.g., ESM, Zigzag, NICE, …

Reliable inter-domain routing, e.g., RON

Content Distribution Networks (CDN)

Peer-to-peer file sharing



Overlay design issues

Select neighbors

Handle node arrivals, departures

Detect and handle failures (nodes, links)

Monitor and adapt to network dynamics 20

Overlay Network (cont’d) IP Multicast 21

Overlay Network (cont’d) Application Level Multicast (ALM) 22

Peer Software Architecture Model



A software client installed on each peer



Three components:

P2P Substrate

Middleware

P2P Application P2P Application Middleware P2P Substrate Operating System Hardware Software architecture model on a peer 23

Peer Software Architecture Model (cont’d)



P2P Substrate (key component)

Overlay management

• •

Construction Maintenance (peer join/leave/fail and network dynamics)

Resource management

• •

Allocation (storage) Discovery (routing and lookup)



Can be classified according to the

flexibility of placing objects at peers

P2P Substrates: Classification



Structured (or tightly controlled, DHT)

− Objects are rigidly assigned to specific peers −

Looks like as a Distributed Hash Table (DHT)

−

Efficient search & guarantee of finding

−

Lack of partial name and keyword queries

−

Maintenance overhead

−

Ex: Chord, CAN, Pastry, Tapestry, Kademila (Overnet)



Unstructured (or loosely controlled)

−

Objects can be anywhere

−

Support partial name and keyword queries

−

Inefficient search & no guarantee of finding

−

Some heuristics exist to enhance performance

−

Ex: Gnutella, Kazaa (super node), GIA

[Chawathe et al. 03]

Peer Software Architecture Model (cont’d)



Middleware

Provides auxiliary services to the P2P application, e.g.,

• • • • • • •

Peer selection Trust management Data integrity validation Authentication and authorization Membership management Accounting (Economics and rationality) …

Ex: CollectCast, EigenTrust, Micro payement 26

Peer Software Architecture Model (cont’d)



P2P Application

Potentially, there could be multiple applications running on top of a single P2P substrate

Applications include

• • • •

File sharing File and storage systems Distributed cycle sharing Content distribution

This layer provides some functions and bookkeeping relevant to the target application

• •

File assembly (file sharing) Buffering and rate smoothing (streaming)



Ex: Promise, Bullet, CFS, Gnutella, Kazaa 27

Outline of the Rest of the Introduction



P2P Substrates

Structured (DHT)

•

Example: CAN

Unstructured

• •

Example 1: Gnutella Example 2: Kazaa



Middleware and P2P Application

Example: CollectCast and Promise



Course Roadmap:

Papers flash overview (1-2 min each!)



Project discussion 28

Summary

 

In P2P computing paradigm:

Peers cooperate to achieve desired functions Started (or re discovered) with Napster ’98



Old, well-researched distributed concepts



BUT, with new characteristics (e.g., heterogeneity, unreliability, rationality, scale, ad hoc), new and lighter-weight algorithms are needed



Simple model for P2P systems:

Peers form an abstract layer called overlay

A peer software client may have three components

• •

P2P substrate, middleware, and P2P application Borders between components may be blurred 29

CMPT 880: P2P Systems - Simon Fraser University

Transcript CMPT 880: P2P Systems - Simon Fraser University

CMPT 880: Internet Architectures and Protocols

P2P Computing: Definitions

Overlay Network

When Did P2P Start?

What IS and IS NOT New in P2P?

What IS NOT New in P2P?

What IS New in P2P?

What IS New in P2P? (cont’d)

What IS New in P2P? (cont’d)

Why is it Important to Study P2P?

Sample P2P Applications

P2P vs its Cousin (Grid Computing)

P2P vs Grid Computing (cont’d)

P2P vs Grid Computing (cont’d)

P2P vs Grid Computing: Discussion

P2P Systems: Simple Model

Overlay Network

Overlay Network (cont’d)

Overlay Network (cont’d)

Peer Software Architecture Model

Outline of the Rest of the Introduction

Summary

Directory