Application II
Download
Report
Transcript Application II
Network Applications
and Network Programming:
Web and P2P
Slides are from Richard Yang from Yale
Minor modifications are made
1
Recap: FTP, HTTP
FTP: file transfer
ASCII (human-readable
format) requests and responses
stateful server
one data channel and one control channel
HTTP
Extensibility: ASCII requests, header lines,
entity body, and responses line
Scalability/robustness
• stateless server (each request should contain the full
information); DNS load balancing
• Client caching Web caches
one data channel
2
Recap: WebServer Flow
Create ServerSocket(6789)
TCP socket space
connSocket = accept()
read request from
connSocket
128.36.232.5
128.36.230.2
state: listening
address: {*.6789, *.*}
completed connection queue:
sendbuf:
recvbuf:
state: established
address: {128.36.232.5:6789, 198.69.10.10.1500}
sendbuf:
recvbuf:
read
local file
write file to
connSocket
close connSocket
state: listening
address: {*.25, *.*}
completed connection queue:
sendbuf:
recvbuf:
Recap: Writing High Performance
Servers:
Major Issues: Many socket/IO operations can
cause processing to block, e.g.,
accept: waiting for new connection;
read a socket waiting for data or close;
write a socket waiting for buffer space;
I/O read/write for disk to finish
Thus a crucial perspective of network server
design is the concurrency design (non-blocking)
for high performance
to avoid denial of service
A technique to avoid
blocking: Thread
Multi-Threaded Web Server
Create ServerSocket(6789)
connSocket = accept()
Create thread for
connSocket
read request from
connSocket
read request from
connSocket
read
local file
read
local file
write file to
connSocket
write file to
connSocket
close connSocket
close connSocket
5
Recap: Writing High Performance
Servers
Problems of multiple
threads
Too many threads
throughput meltdown,
response time explosion
Event-Driven Programming
Event-driven programming, also called asynchronous i/o
Tell the OS to not block when accepting/reading/writing on sockets
Java: asynchronous i/o
for an example see: http://www.cafeaulait.org/books/jnp3/examples/12/
Yields efficient and scalable concurrency
Many examples: Click router, Flash web server, TP Monitors, etc.
Web Server
If the OS will not block on sockets, how
may the program structure look like?
Create ServerSocket(6789)
connSocket = accept()
Create thread for
connSocket
read request from
connSocket
read request from
connSocket
read
local file
read
local file
write file to
connSocket
write file to
connSocket
close connSocket
close connSocket
8
Typical Structure of Async i/o
Typically, async i/o programs use Finite
State Machines (FSM) to monitor the
progress of requests
The state info keeps track of the execution
stage of processing each request, e.g., reading
request, writing reply, …
The program has a loop to check potential
events at each state
9
Async I/O in Java
An important class is the class Selector,
to support event loop
A Selector is a multiplexer of selectable
channel objects
example
channels: DatagramChannel,
ServerSocketChannel, SocketChannel
use configureBlocking(false) to make a
channel non-blocking
A selector may be created by invoking the
open method of this class
Async I/O in Java
A selectable channel registers
events (called a
SelectionKey) with a
selector with the register
method
A SelectionKey object
contains two operation sets
Selectable Channel
register
interest Set
ready Set
A SelectionKey object has
an attachment which can store
data
Selector
often the attachment is a
buffer
Selection Key
Async I/O in Java
Call select (or selectNow(), or
select(int timeout)) to check for
ready events, called the selected key set
Iterate over the set to process all ready
events
Problems of Event-Driven Server
Difficult to engineer, modularize, and tune
No performance/failure isolation between
Finite-State-Machines (FSMs)
FSM code can never block (but page faults,
i/o, garbage collection may still force a
block)
thus still need multiple threads
Summary of Traditional
C-S Web Servers
DNS
Is the
application
extensible,
scalable,
robust,
secure?
app.
server
C0
client 1
client 2
client n
client 3
14
Content Distribution History...
“With 25 years of Internet experience, we’ve
learned exactly one way to deal with the
exponential growth: Caching”.
(1997, Van Jacobson)
15
Web Caches (Proxy)
Web caches/proxy
placed at entrance of
an ISP
Client sends all http
requests to web cache
if object at web
cache, web cache
immediately returns
object in http
response
else requests object
from origin server,
then returns http
response to client
origin
server
client
client
Proxy
server
origin
server
16
Web Proxy/Cache
app.
server
Web caches give good
performance because very
often
a single client
repeatedly accesses
the same
document
a nearby client also
accesses the same
document
Cache Hit ratio
increases
logarithmically with
number of users
C0
ISP
cache
ISP
cache
client 4
client 1
client 5
client 2
client 3
client 6
17
Benefits of Web Caching
Assume: cache is “close”
to client (e.g., in same
network)
smaller response time:
cache “closer” to
client
decrease traffic to
distant servers
link out of
institutional/local ISP
network often
bottleneck
origin
servers
public
Internet
1.5 Mbps
access link
institutional
network
10 Mbps LAN
institutional
cache
18
What went wrong with Web Caches?
Web protocols evolved extensively to
accommodate caching, e.g. HTTP 1.1
However, Web caching was developed with a
strong ISP perspective, leaving content providers
out of the picture
It is the ISP who places a cache and controls it
ISPs only interest to use Web caches is to reduce
bandwidth
In the USA: Bandwidth relative cheap
In Europe, there were many more Web caches
However, ISPs can arbitrarily tune Web caches to
deliver stale content
19
Content Provider Perspective
Content providers care about
User experience latency
Content freshness
Accurate access statistics
Avoid flash crowds
Minimize bandwidth usage in their access link
20
Content Distribution Networks
Content Distribution Networks (CDNs) build an
overlay networks of caches to provide fast, cost
effective, and reliable content delivery, while
working tightly with content providers.
Example:
Akamai – original and largest commercial CDN
operates over 25,000 servers in over 1,000
networks
Akamai (AH kuh my) is Hawaiian for intelligent,
clever and informally “cool”. Founded Apr 99,
Boston MA by MIT students
21
Basic of Akamai Operation
Content provider server
provides the base HTML document
Akamai caches embedded objects at a set of
its cache servers (called edge servers)
Akamaization of embedded content: e.g.,
<IMG SRC= http://www.provider.com/image.gif >
changed to
<IMGSRC = http://a661. g.akamai.net/hash/image.gif>
Akamai customizes DNS to select serving
edge servers based on
closeness to client browser
server load
22
More Akamai information
URL akamaization is becoming obsolete and
only supported for legacy reasons
Currently most content providers prefer to use
DNS CNAME techniques to get all their content
served from the Akamai servers
still content providers need to run their origin
servers
Akamai Evolution:
Files/streaming
Secure pages and whole pages
Dynamic page assembly at the edge (ESI)
Distributed applications
23
Lab: Problems of Traditional
Content Distribution
DNS
app.
server
C0
client 1
client 2
client n
client 3
24
Objectives of P2P
Share the resources
(storage and
bandwidth) of
individual clients to
improve
scalability/robustness
Internet
Bypass DNS to find
clients with resources!
examples: instant
messaging, skype
25
P2P
But P2P is not new
Original Internet was a p2p system:
The original ARPANET connected UCLA,
Stanford Research Institute, UCSB, and Univ.
of Utah
No DNS or routing infrastructure, just
connected by phone lines
Computers also served as routers
P2P is simply an iteration of scalable
distributed systems
P2P Systems
File Sharing: BitTorrent, LimeWire
Streaming: PPLive, PPStream, Zatto, …
Research systems
Collaborative computing:
SETI@Home project
• Human genome mapping
• Intel NetBatch: 10,000 computers in 25 worldwide
sites for simulations, saved about 500million
Peer-to-Peer Computing
-
40-70% of total traffic in many networks
-
upset the music industry, drawn college students,
web developers, recording artists and universities
into court
Source: ipoque Internet study 2008/2009
Recap: P2P Objectives
Bypass DNS to locate
clients with resources!
examples:
instant
messaging, skype
Internet
Share the storage
and bandwidth of
individual clients to
improve
scalability/robustness
29
The Lookup Problem
N1
Key=“title”
Value=MP3 data…
Publisher
N2
Internet
N4
N5
N3
?
Client
Lookup(“title”)
N6
find where a particular file is stored
pay particular attention to see its equivalence of DNS
Outline
Recap
P2P
the lookup problem
Napster
31
Centralized Database: Napster
Program for sharing music over the Internet
History:
5/99: Shawn Fanning (freshman, Northeasten U.) founded Napster
Online music service, wrote the program in 60 hours
12/99: first lawsuit
3/00: 25% UWisc traffic Napster
2000: est. 60M users
2/01: US Circuit Court of
Appeals: Napster knew users
violating copyright laws
7/01: # simultaneous online users:
Napster 160K
9/02: bankruptcy
We are referring to the Napster before closure.
03/2000
32
Napster: How Does it Work?
Application-level, client-server protocol over TCP
A centralized index system that maps files (songs) to
machines that are alive and with files
Steps:
Connect to Napster server
Upload your list of files (push) to server
Give server keywords to search the full list
Select “best” of hosts with answers
33
Napster Architecture
34
Napster: Publish
insert(X,
123.2.21.23)
...
Publish
I have X, Y, and Z!
123.2.21.23
Napster: Search
123.2.0.18
search(A)
-->
123.2.0.18
124.1.0.1
Query
Where is file A?
Reply
124.1.0.1
Napster: Ping
123.2.0.18
ping
124.1.0.1
ping
Napster: Fetch
123.2.0.18
124.1.0.1
fetch
Napster Messages
General Packet Format
[chunksize]
[chunkinfo]
[data...]
CHUNKSIZE:
Intel-endian 16-bit integer
size of [data...] in bytes
CHUNKINFO: (hex)
Intel-endian 16-bit integer.
5B - whois query
00 - login rejected
02 - login requested
5C - whois result
03 - login accepted
5D - whois: user is offline!
0D - challenge? (nuprin1715)
69 - list all channels
2D - added to hotlist
6A - channel info
2E - browse error (user isn't online!) 90 - join channel
2F - user offline
91 - leave channel
…..
39
Centralized Database: Napster
Summary of features: a hybrid design
control: client-server (aka special DNS) for files
data: peer to peer
Advantages
simplicity, easy to implement sophisticated search
engines on top of the index system
Disadvantages
application specific (compared with DNS)
lack of robustness, scalability: central search
server single point of bottleneck/failure
easy to sue !
40
Variation: BitTorrent
A global central index server is replaced by
one tracker per file (called a swarm)
reduces centralization; but needs other means
to locate trackers
The bandwidth scalability management
technique is more interesting
more later
41
Outline
Recap
P2P
the lookup problem
Napster (central query server; distributed
data servers)
Gnutella
42
Gnutella
On March 14th 2000, J. Frankel and T.
Pepper from AOL’s Nullsoft division (also
the developers of the popular Winamp mp3
player) released Gnutella
Within hours, AOL pulled the plug on it
Quickly reverse-engineered and soon many
other clients became available: Bearshare,
Morpheus, LimeWire, etc.
43
Decentralized Flooding: Gnutella
On startup, client contacts other servents (server + client)
in network to form interconnection/peering relationships
servent interconnection used to forward control (queries, hits,
etc)
How to find a resource record: decentralized flooding
send requests to neighbors
neighbors recursively forward the requests
44
Decentralized Flooding
C
E
F
B
D
M
J
A
L
G
H
K
I
S
N
45
Decentralized Flooding
send query to neighbors
C
E
F
B
D
M
J
A
L
G
H
K
I
S
N
Each node forwards the query to its neighbors other than the one
who forwards it the query
46
Background: Decentralized Flooding
C
E
F
B
D
J
A
L
G
H
K
I
M
S
N
Each node should keep track of forwarded queries to avoid loop !
nodes keep state (which will time out---soft state)
carry the state in the query, i.e. carry a list of visited nodes
47
Decentralized Flooding: Gnutella
Basic message header
Unique ID, TTL, Hops
Message types
Ping – probes network for other servents
Pong – response to ping, contains IP addr, # of files, etc.
Query – search criteria + speed requirement of servent
QueryHit – successful response to Query, contains addr +
port to transfer from, speed of servent, etc.
Ping, Queries are flooded
QueryHit, Pong: reverse path of previous message
48
Advantages and Disadvantages
of Gnutella
Advantages:
totally decentralized, highly robust
Disadvantages:
not scalable; the entire network can be swamped with
flood requests
• especially hard on slow clients; at some point broadcast
traffic on Gnutella exceeded 56 kbps
to alleviate this problem, each request has a TTL to limit
the scope
• each query has an initial TTL, and each node forwarding it
reduces it by one; if TTL reaches 0, the query is dropped
(consequence?)
49
Flooding: FastTrack (aka Kazaa)
Modifies the Gnutella protocol into two-level hierarchy
Supernodes
Nodes that have better connection to Internet
Act as temporary indexing servers for other nodes
Help improve the stability of the network
Standard nodes
Connect to supernodes and report list of files
Search
Broadcast (Gnutella-style) search across supernodes
Disadvantages
Kept a centralized registration prone to law suits
Optional Slides
51
Optional Slides
52
Aside: Search Time?
Aside: All Peers Equal?
1.5Mbps DSL
1.5Mbps DSL
56kbps Modem
1.5Mbps DSL
10Mbps LAN
1.5Mbps DSL
56kbps Modem
56kbps Modem
Aside: Network Resilience
Partial Topology
Random 30% die
Targeted 4% die
from Saroiu et al., MMCN 2002
Asynchronous Network
Programming
(C/C++)
56
A Relay TCP Client: telnet-like Program
fgets
fputs
TCP
client
writen
readn
TCP
server
http://zoo.cs.yale.edu/classes/cs433/programming/examples-c-socket/tcpclient
57
Method 1: Process and Thread
process
fork()
waitpid()
Thread: light weight process
pthread_create()
pthread_exit()
58
pthread
Void main() {
char recvline[MAXLINE + 1];
ss = new socketstream(sockfd);
pthread_t tid;
if (pthread_create(&tid, NULL, copy_to, NULL)) {
err_quit("pthread_creat()");
}
}
while (ss->read_line(recvline, MAXLINE) > 0) {
fprintf(stdout, "%s\n", recvline);
}
void *copy_to(void *arg) {
char sendline[MAXLINE];
if (debug) cout << "Thread create()!" << endl;
while (fgets(sendline, sizeof(sendline), stdin))
ss->writen_socket(sendline, strlen(sendline));
shutdown(sockfd, SHUT_WR);
if (debug) cout << "Thread done!" << endl;
}
pthread_exit(0);
59
Method 2: Asynchronous I/O
(Select)
select: deal with blocking system call
int select(int n, fd_set *readfds, fd_set
*writefds, fd_set *exceptfds, struct
timeval *timeout);
FD_CLR(int fd, fd_set *set);
FD_ZERO(fd_set *set);
FD_ISSET(int fd, fd_set *set);
FD_SET(int fd, fd_set *set);
60
Method 3: Signal and Select
signal: events such as timeout
61
Examples of Network Programming
Library to make life easier
Four design examples
TCP Client
TCP server using select
TCP server using process and thread
Reliable UDP
Warning: It will be hard to listen to me
reading through the code. Read the code.
62
Example 2: A Concurrent TCP
Server Using Process or Thread
Get a line, and echo it back
Use select()
For how to use process or thread, see later
Check the code at:
http://zoo.cs.yale.edu/classes/cs433/programming/examples-c-socket/tcpserver
Are there potential denial of service problems with the
code?
63
Example 3: A Concurrent HTTP
TCP Server Using Process/Thread
Use process-per-request or thread-per-
request
Check the code at:
http://zoo.cs.yale.edu/classes/cs433/programming/examples-c-socket/simple_httpd
64