Document 7301199

Transcript Document 7301199

COM 360
1
Chapter 5
End-to-End Protocols
2
Getting Processes to
Communicate
• Previously we were concerned with using
various technologies to connect a collection of
computers:
– using direct links (LANS – Ethernet and token Ring)
– packet switched networks (ATM) and
– internetworks.
• Next problem is to turn the host-to-host packet
delivery service into a process-to-process
communication channel.
3
End-To-End Protocol
• The transport layer supports communication
between the end application protocols and is
called the end-to-end protocol.
• It interacts with two layers:
– From above, the application-level processes
– From below, the underlying network
4
End-To-End Protocol
• Transport layer protocol provides these services to the
application layer:
–
–
–
–
–
–
–
Guarantees message delivery
Delivers messages in the same order that they are sent
Delivers at most one copy of each message
Supports arbitrarily large messages
Supports synchronization between sender and receiver
Allows the receiver to apply flow control to the sender
Supports multiple application processes on each host.
5
End-To-End Protocol
• The underlying network has some limitations.
It may:
–
–
–
–
–
Drop messages
Reorder messages
Deliver duplicate copies of a given message
Limit messages to some finite size
Deliver messages after some arbitrarily long delay
• This kind of network, exemplified by the
Internet, provides best effort service
6
The Challenge!
• Develop algorithms that turn less than desirable
properties of the network into high level of
service required by the application programs.
• Representative services:
– Simple asynchronous demultiplexing service (UDP)
– Reliable byte-stream service (TCP)
– Request/Reply service (for example, RPC – remote
procedure call)
7
Simple Demultiplexer (UDP)
• The simplest transport protocol extends host-to-host
deliver service of the underlying network into a
process-to-process communication service.
• There are multiple processes running on a host, so there
must be a level of demultiplexing to allow multiple
applications to share the network.
• The User Data Protocol (UDP) uses the best effort
network service
8
Simple Demultiplexer (UDP)
• Form of address used to identify the target process:
– Process can directly identify each other with an OSassigned process ID(pid)
– More commonly- processes indirectly identify each
other using a port or mailbox
• Source sends a message to a port and destination receives the
message from the port
– UDP port is 16 bits, so there are 64K possible ports- not
enough for all Internet hosts
– Process is identified as a port on a particular host – a
9
(port, host) pair
Simple Demultiplexer (UDP)
• How does a process learn the port to which it wants to
send a message?
• A client initiates a message exchange with a server
process.
• The server knows the client’s port (contained in message
header and can reply to it.
• Server accepts messages at a well known port.
• Examples: DNS at port 53, mail at port 25
• Published at /etc/services
10
Simple Demultiplexer (UDP)
• UDP does not implement flow control or reliable
delivery.
• It only it just sends messages to some application
and ensures correctness with a checksum
(optional in IPv4, but mandatory in IPv6).
• UDP computes checksum over the header, the
message body and the pseudo-header, which
contains 3 fields (protocol number, source IP
address and destination IP address)
11
Format for UDP Header
0
16
31
SrcPort
DstPort
Length
Checksum
Data
12
UDP Message Queue
Application
process
Application
process
Application
process
Ports
Queues
Packets
demultiplexed
UDP
Packets arrive
13
UDP Message Queue
• A port abstraction is typically implemented as a
message queue. When a message arrives, the
protocol appends it to the end of the queue.
• If the queue is full, the message is discarded.
There is no flow control that tells the sender to
slow down.
• When an application wants to receive a message,
one is removed from the queue, unless the queue
is empty.
14
Reliable Byte Stream (TCP)
• The Internet Transmission Control Protocol (TCP)
offers a reliable, connection-oriented, stream service.
• TCP guarantees reliable, in-order delivery of a stream
of bytes.
• It is full duplex and contains a flow control mechanism
that allows the receiver to limit how much data the
sender can transmit at one time.
• TCP also implements a congestion control mechanism,
which keeps the sender form overloading the network.
15
Congestion Control vs. Flow
Control
• Congestion control involves preventing too much
data from being sent into the network, which
would cause switches or links to become
overloaded.
• Flow control involves preventing senders from
over-running the capacity of the receivers.
• Flow control is thus an end-to-end issue while
congestion control is concerned with how hosts
and networks interact.
16
End-To-End Issues
• The sliding window algorithm is at the heart of TCP,
which runs over the Internet.
• TCP supports logical connections between processes
which are running on any two computers on the Internet.
• TCP uses an explicit connection establishment phase,
during which the two sides agree to exchange data with
each other.
• TCP connections have different round trip times
depending on the connections and distances between
hosts.
17
End-To-End Issues
• Packets may become reordered as the cross
the Internet, which does not happen on a
peer-to peer link.
• The sliding window algorithm can reorder
packets using the sequence number.
• TCP assumes that each packet has a
maximum lifetime, or maximum segment
lifetime (MSL) – usually set at 120 seconds.
18
End-To-End Issues
• Computers connected to a link usually support the link.
• TCP must include a mechanism that each side uses to
“learn” what resources ( e.g. buffer space) the other side
can apply to the connection – a flow control issue.
• The load on the link is visible as a queue of packets at
the sender but the sending side of a TCP connection
does not know what links will be used to reach the
destination and can cause network congestion.
19
Segment Format
• TCP is a byte-oriented protocol, which means that the
sender writes bytes into a TCP connection and the
receiver reads bytes out of the TCP connection – called
a “byte-stream”.
• TCP at the source buffers bytes to fill a packet and then
sends the packet to the destination. TCP at the
destination empties the packet into a receiver buffer and
the receiving process reads from this buffer at its
leisure.
• The packets exchanged are called segments, since each
carries a segment of the byte stream.
20
How TCP Manages a Byte
Stream
Application process
Application process
Write
Read
bytes
bytes
TCP
TCP
Send buffer
Receive buffer
Segment
Segment
■■■
Segment
Transmit segments
A TCP connection supports byte streams flowing in both directions
21
TCP Header Format
0
10
4
16
31
SrcPort
DstPort
SequenceNum
Acknow ledgment
HdrLen
0
Flags
AdvertisedWindow
Checksum
UrgPtr
Options (variable)
Data
The UrgPtr where non-urgent data begins.
22
TCP Process
Data (SequenceNum)
Receiver
Sender
Acknow ledgment +
AdvertisedWindow
Shows data flow in one direction and ACKs in the other.
23
Connection Establishment and
Termination
• A TCP connection begins with a client(caller) doing an active open
to a server (callee).
•The two sides exchange messages to establish the connection.
•After the connection establishment phase is over, the two sides begin
sending data.
•When a participant finishes sending data, it closes one direction of
the connection, which causes TCP to initiate a round of connection
termination messages.
•Note : connection setup is asymmetric (one side does a passive open
and the other an active open), teardown is symmetric(each side closes
24
independently). One side can keep sending…
Three-way Handshake
• The algorithm used by TCP to establish a
connection is called a three-way handshake.
• Idea: Two parties want to agree on a set of
parameters, which are the starting sequence
for their byte streams.
• TCP requires that each site select a random
sequence number (to prevent reusing the
same sequence again too soon).
25
Timeline For three-Way
Handshake Algorithm
Active participant
(client)
(server)
Acknowledgement identifies the next sequence number.
26
State Transition Diagram
• Each circle in TCP’s state transition diagram
denotes a state that one end of the connection can
find itself in easily.
• All connections start in the CLOSED state.
• The connection moves from state of state along
the arcs.
• Each arc is labeled with the tag of the form
event/action.
27
State Transition Diagram
Two kinds of events trigger a state transition:
1. A segment arrives from a peer or
2. The local application process invokes an
operation on TCP
• TCP’s state transition diagram defines the
semantics of both its peer-to-peer interface.
The syntax is given by the segment format.
28
TCP State Transition Diagram
CLOSED
Active open /SYN
Passive open
Close
Close
LISTEN
SYN_RCVD
SYN/SYN + ACK
Send /SYN
SYN/SYN + ACK
ACK
SYN + ACK/ACK
ESTABLISHED
Close/FIN
Close /FIN
FIN/ACK
FIN_WAIT_1
ACK
FIN_WAIT_2
SYN_SENT
CLOSE_WAIT
AC FIN/ACK
K
+
FI
N/
AC
K
FIN/ACK
Close/FIN
CLOSING
ACK Timeout after tw o
segment lifetimes
TIME_WAIT
LAST_ACK
ACK
CLOSED
29
Sliding Window Revisited
TCP’s version of the the sliding window algorithm
serves several purposes:
1. It guarantees the reliable delivery of data
2. It ensures that data is delivered in order
3. It enforces flow control between the sender
and receiver
TCP uses the sliding window algorithm and adds
the flow control function as well.
30
Sliding Window Revisited
• TCP differs from earlier algorithms because it
includes the flow control function.
• Rather than having a fixed size sliding window, it
advertises a window size to the sender.
• The sender is limited to having a value <= the value
of AdvertisedWindow bytes of unacknowledged data
at any given time.
• The receiver selects a suitable window value based
on the available amount of buffer memory, to keep
from over-running the buffer.
31
Reliable and Ordered Delivery
Sending application
Receiving application
TCP
LastByteWritten
LastByteAcked
LastByteSent
(a)
TCP
LastByteRead
NextByteExpected
LastByteRcvd
(b)
Relationship between TCP send buffer (a) and receive buffer (b)
32
Reliable and Ordered Delivery
• TCP on the sending side maintains a send buffer.
• This buffer is used to store data that has been sent but not
yet acknowledged as well as data that has been written by
the sending application, but not yet transmitted.
• One the receiving side, TCP maintains a receive buffer.
• This buffer holds data that arrives out of order as well as
in the correct order, that the application has not yet read.
• (Both buffers are finite and will eventually wrap around.)
33
Flow Control
• Data arriving from upstream fills the send buffer, and
data being transmitted to a downstream node empty the
receive buffer.
• The size of the window sets the amount of data that can
be sent without waiting for an acknowledgement from
the receiver.
• TCP on the receiver side must keep the buffer from
overflowing. It advertises the amount of remaining free
space.
• The sender computes an effective window that limits
how much data it can send
34
Protecting Against Wraparound
• We need to assure that the sequence number does
not wrap around within a 120 second period of
time. (On a T1 line it will be about 6.4 hours, but
on an OC-48 line it will be 14 seconds)
• Gigabit Ethernet is getting close to the point where
32 bits are too small for the sequence number.
Extensions are being developed to solve this.
35
Keeping the Pipe Full
• The delay x bandwidth product dictates the
size of the AdvertisedWindow field. The
window needs to be large enough to allow a
full delay X bandwidth product’s amount of
data to be transmitted.
• TCP’s Advertised window field is not big
enough to handle a T3 connection.
36
Triggering Transmission
•
•
How does TCP decide to transmit a segment?
Applications write bytes into the stream and TCP
decides that it has enough bytes to send a segment:
1. TCP uses the maximum segment size (MSS) and sends
the largest segment, without fragmenting.
2. The sending process invokes a push process and flushes
the buffer.
3. A timer fires and the current contents of the buffer is
sent.
37
Silly Window Syndrome
• Aggressively taking advantage of any available window
leads to the silly window syndrome.
• Think of a conveyor belt with “full containers” (data
segments) going in one direction and empty containers
(ACKs) going in the reverse.
• MSS are like large containers and 1 byte containers are
like small containers.
• If sender aggressively fills a container as soon as it
arrives, then small containers remain in the system, since
it is immediately filled and empties and never merged
into a larger container.
38
Silly Window Syndrome
• Silly window syndrome is only a problem when
either the sender sends a small segment or the
receiver opens the window a small amount. If
neither happens, the small container is never
introduced into the stream.
• Since we can’t prevent s small container, there
must be a means of coalescing them.
• The receiver can do this by delaying ACKssending one combined ACK, rather than multiple
small one.
39
Silly Window Syndrome
Sender
Receiver
40
Nagle’s Algorithm
• The ultimate solution comes back to the
sender. If there is data to send, but the
window is open , MSS, we may want to
wait before transmitting. But how long?
• Introduce a timer- transmit when time
expires.
• Nagle introduced a self-clocking solution.
41
Nagle’s Algorithm
• Idea: As long as there is data in “flight”, the sender will
eventually receive an ACK.
• This ACK can be treated as a timer firing, triggering the
transmission of more data:
When application produces data to send
if both data and window >= MSS
send a full segment
else if there is unACKed data
buffer the new data until the Ack arrives
else send all the new data now
42
Nagle’s Algorithm
• In other words- it’s always OK to send a full segment if
the window allows.
• It’s also all right to send a small amount of data if there
are currently no segments in transit, but if there is
anything in flight, the sender must wait for an ACK
before transmitting the next segment.
• An interactive application like telnet, that continually
writes a byte at a time, will send data at a rate of one
segment per RTT.
• Some applications, like the socket interface, cannot
afford such a delay and can turnoff Nagle’s algorithm.
43
Adaptive Retransmission
• Because TCP guarantees reliable delivery of data,
it re-transmits each segment if an ACK is not
received in a certain period of time.
• Choosing an appropriate timeout value is not easy.
To do so, TCP uses an adaptive re-transmission
mechanism.
• The idea is to keep a running average of the RTT
and then compute the timeout as a function of the
RTT. (Timeout = 2 x Estimated RTT)
44
Computing an Accurate RTT
Sender
Receiver
Orig
inal
t
r ans
miss
ion
Retr
an
smis
sion
Sender
Orig
in
Receiver
al tra
n
ACK
Retr
ansm
ACK
(a)
smis
sion
issio
n
(b)
a) Original transmission vs b) Retransmission
45
Karn/Partridge Algorithm
• There is a flaw in the previous estimate. The ACK
does not acknowledge a transmission; it
acknowledges the receipt of data. So when a segment
is re-transmmitted, it cannot be determined if the
ACK should be associated with the first or second
transmission.
• The Karn/Partridge algorithm (1987)is a simple
solution. Each time TCP transmits, it sets the next
timeout to be twice the last one. (It uses exponential
backoff, like the Ethernet does.)
46
Jacobson/Karels Algorithm
• Karn/Partridge algorithm was designed to
eliminate some of the Internet congestion.
• In 1988 Jacobs and Karels proposed a more
drastic change to TCP to battle congestion:
– TCP computes the timeout value as a function of both
EstimatedRTT and the Deviation as
Timeout = mu x EstimatedRTT + theta x Deviation
• Timeout is related to congestion because if you
timeout too soon, you may unnecessarily
retransmit a segment.
47
Implementation
• All of these retransmission algorithms are
based on acknowledgement timeouts, which
indicate that a segment has probably been
lost. Note that a timeout does not tell the
sender whether any segment it sent after the
lost segment were received.
• There are TCP extensions to assist with this.
48
Record Boundaries
• TCP is a byte stream protocol. The number of
bytes written by the sender are not necessarily the
same as the number of bytes read by the receiver.
( For example, an application might write 8 bytes,
then 2 bytes, then 20 bytes to a TCP connection
but on the receiving end, the application reads 5
bytes in a loop that iterates 6 times.)
• This is in contrast to UDP where the message sent
is exactly the same size as the message received.
49
Record Boundaries
• There are two features that can be used to
insert record boundaries into a byte stream:
– Urgent data feature, uses the URG flag and the
UrgPtr field in the TCP header– specifying
“special” data;
– Push operation- used to “flush” or send bytes
that it has buffered
50
TCP Extensions
• These extensions are options that can be added to the
TCP header:
– The first – helps to improve TCP’s timeout mechanism
– The second addresses the problem of TCP’s 32 bit sequence
number filed wrapping around too soon on a high-speed
network, using a timestamp
– The third allows TCP to advertise a larger window
– The fourth is the selective acknowledgement or SACK option,
which allows the sender to retransmit just the segments that
are missing.
51
Alternative Design Choices
• TCP is robust and satisfies the needs of a wide range of
applications, but it is not the only design choice
possible.
– There could have been request/reply protocols like RPC;
– TCP could have provided a reliable byte stream service
instead of a a reliable message stream service;
– TCP implements setup/tear down phases but could have sent
connection parameters with the first message;
– TCP is a window-based protocol, but rate-based designs are
possible.
52
Remote Procedure Call (RPC)
• The request/reply paradigm, is also called the message
transaction:
– A client sends a request message to a server and the server
responds with a reply message, with the client blocking (
suspending execution) to wait for the reply.
• A transport protocol that supports request/reply is more
that a UDP message going in one direction followed by a
UDP message going in the other direction.
• The third type of transport protocol, called RPC,matches
the needs of an application involved in a request/reply
message exchange.
53
Timeline for RPC
Client
Server
Requ
est
Blocked
Blocked
Computing
Reply
Blocked
54
RPC Fundamentals
• RPC is more than just a protocol- it is a
popular mechanism for structuring
distributed systems.
• The application program makes a call into a
procedure, regardless of whether it is local
or remote and blocks until the call returns.
• RPC is also called the remote method
invocation RMI
55
RPC Fundamentals
• A complete RPC mechanism involves two major
components:
– A protocol that manages the messages sent between the
client and the server as well as the properties of the
underlying network;
– Programming language and compiler support to
package arguments into a request message on the client
and to translate this message back into the arguments
on the server machine ( a stub compiler).
56
Complete RPC Mechanism
Caller
(client)
Arguments
Callee
(server)
Return
value
Client
stub
Request
Arguments
Return
value
Server
stub
Reply
RPC
protocol
Request
Reply
RPC
protocol
57
RPC Fundamentals
• RPC refers to t type of protocol rather than
a specific standard like TCP
• Unlike TCP, which is the dominant reliable
byte stream protocol, there is no one
dominant RPC protocols.
58
RPC Fundamentals
• RPC performs a complex set of functions.
• Think of it as a “stack” of protocols:
– BLAST: fragments and reassembles large messages
– CHAN: synchronizes request and reply messages
– SELECT: dispatches request messages to the correct
process
• These are not standard protocols, but demonstrate
the algorithms needed to implement RPC.
59
A Simple RPC Stack
SELECT
CHAN
BLAST
IP
ETH
60
Synchronous versus
Asynchronous Protocols
• One way to characterize a protocol is by
whether it is synchronous or asynchronous.
• At the asynchronous end of the spectrum,
the application knows nothing when send
returns.
• At the synchronous end of the spectrum, the
send operation returns a reply message.
61
SunRPC
• SunRPC is a widely used protocol and while it has
not been accepted by any standards body, it has
become a de facto standard.
• It plays a central role in Sun’s Network file system
(NSF).
• SunRPC addresses the three functions of
fragmentation, synchronization and dispatching in
a slightly different order.
62
Protocol Graph for SunRPC
SunRPC
UDP
IP
ETH
63
SunRPC Header Formats
0
31
0
XID
XID
MsgType = CALL
(a)
RPCVersion = 2
Program
31
MsgType = REPLY
(b)
Status = ACCEPTED
Data
Version
Procedure
Credentials (variable)
Verifier (variable)
Data
a) Request
b) Reply
64
DCE
• DCE is the Distributed Computing
Environment, which is a set of standards
and software for building distributed
systems.
• DCE-RPC is designed to run on top of UDP
and is similar to SunRPC in using a 2 –level
addressing scheme.
65
Typical DCE-RPC Message
Exchange
Client
Server
Req
uest
Ping
in
Work
g
Ping
in
Work
Resp
g
onse
Ack
66
Transport for Real-Time
Applications (RTP)
• Real-time traffic, such as digitized voice samples are
carried over packet networks.
• A “real-time” application is one with strong requirements
for the timely delivery of information. ( for example, VoIP
or “voice over IP” and multimedia applications)
• Real-time applications make demands on the transport
protocol that are not met by earlier protocols like TCP and
UDP.
• The real-time protocol (RTP) is designed to meet some of
these challenges.
67
Performance
• Network performance is evaluated by two
metrics: delay (or latency) and throughput.
• These represent performance as seen by the
application programs.
68
Performance
• Each experiment involved running 3 identical instances
of the same test.
• Delay or latency was measured for message sizes of 1
byte, 100 bytes, 200 bytes…1000 bytes
• Throughput results were computed for message sizes of
1KB, 2KB, 4KB…32KB
• Latency for the 1 byte case represents the overhead
involved in processing each message and is the lower
bound on latency.
• Delay increases with message size for both UDP and
TCP.
69
Measured System
User process
User process
Linux
kernel
Linux
kernel
100 Mbps
Two Pentium Workstations running Linux connected by a 100
Mbps Ethernet. A test program pings messages between them.
70
Round Trip Latencies
Message Size (bytes)
1
100
200
300
400
…
900
1000
UDP
58
76
93
111
132
TCP
66
84
104
124
136
223
239
228
249
71
Measured Throughput
100
95
90
85
80
75
70
1
2
4
8
16
M essage size (KB)
32
64
72
Performance
• Throughput improves as the messages get
larger – larger messages mean less
overhead.
• The throughput curve flattens above 16 KB
and tops out before reaching 100MBps.
• The factor preventing systems form running
at full Ethernet speed is a limitation of the
network adaptor, rather than the software.
73
Summary
• Four end-to end- protocols:
• UDP- simple demultiplexor- dispatches messages to
appropriate application process based on a port numberoffering unreliable, connectionless datagram service.
• TCP- reliable byte stream protocol, which recovers from
lost messages and delivers messages in the order they
were sent. It provides flow control by using the sliding
window protocol. It provides a timeout mechanism.
• RPC- request/reply protocol
• RTP- real time protocol.
74

Document 7301199

Transcript Document 7301199

Directory