CS244a: An Introduction to Computer Networks

Download Report

Transcript CS244a: An Introduction to Computer Networks

Transport Layer

  Transport Layer Services   connection-oriented vs. connectionless multiplexing and demultplexing UDP: Connectionless Unreliable Service  TCP: Connection-Oriented Reliable Service  connection management: set-up and tear down   reliable data transfer protocols flow and congestion control

Readings:

Chapter 5 Fall 2007 CSci232: Transport Layer & TCP 1

Transport Protocols

• Lowest level end-to end protocol.

– Header generated by sender is interpreted only by the destination – Routers view transport header as part of the payload Fall 2007

7 6 5 Transport IP Datalink Physical 2 1 IP router

CSci232: Transport Layer & TCP

2 1 7 6 5 Transport IP Datalink Physical

2

Transport Services and Protocols

• provide

logical communication

between app processes running on different hosts • transport protocols run in end systems – send side: breaks app messages into segments , passes to network layer – rcv side: reassembles segments into messages, passes to app layer • more than one transport protocol available to apps – Internet: TCP and UDP application transport network data link physical network data link physical network data link physical network data link physical network data link physical network data link physical application transport network data link physical Fall 2007 CSci232: Transport Layer & TCP 3

Transport Layer Services

• Underlying best-effort network – drops messages – re-orders messages – delivers duplicate copies of a given message – delivers messages after an arbitrarily long delay • Common end-to-end services – guarantee message delivery – deliver messages in the same order they are sent – deliver at most one copy of each message – allow the receiver to flow control the sender – support multiple application processes on each host Fall 2007 CSci232: Transport Layer & TCP 4

Transport vs. Application and Network Layer

application layer: application processes and message exchange • network layer: logical communication between hosts • transport layer: logical communication support for app processes – relies on, enhances, network layer services Household analogy:

12 kids sending letters to 12 kids

• processes = kids • app messages = letters in envelopes • hosts = houses • transport protocol = Ann and Bill • network-layer protocol = postal service Fall 2007 CSci232: Transport Layer & TCP 5

End to End Issues

• Transport services built on top of (potentially) unreliable network service – packets can be corrupted or lost – Packets can be delayed or arrive “out of order” • Do we detect and/or recover errors for apps?

– Error Control & Reliable Data Transfer • Do we provide “in-order” delivery of packets?

– Connection Management & Reliable Data Transfer • Potentially different capacity at destination, and potentially different network capacity – Flow and Congestion Control Fall 2007 CSci232: Transport Layer & TCP 6

Internet Transport Protocols

TCP service: • connection-oriented: required between client, server • reliable transport between sender and receiver setup • flow control: overloaded sender won’t overwhelm receiver • congestion control: throttle sender when network UDP service: • unreliable data transfer between sender and receiver • does not provide: connection setup, reliability, flow control, congestion control Both provide logical communication between app processes running on different hosts!

Fall 2007 CSci232: Transport Layer & TCP 7

Multiplexing/Demultiplexing

Demultiplexing at rcv host: delivering received segments to correct application process Multiplexing at send host: gathering data from multiple app processes, enveloping data with header (later used for demultiplexing) = API (“socket”) = process application P3 application P2 P4 application transport transport transport network network network link link link physical host 1 physical host 2 physical host 3 Fall 2007 CSci232: Transport Layer & TCP 8

How Demultiplexing Works

• host receives IP datagrams – each datagram has source IP address, destination IP address – each datagram carries 1 transport-layer segment – each segment has source, destination port number (recall: well-known port numbers for specific applications) • host uses IP addresses & port numbers to direct segment to appropriate app process (identified by “socket’) 32 bits source port # dest port # other header fields application data (message) TCP/UDP segment format Fall 2007 CSci232: Transport Layer & TCP 9

UDP: User Datagram Protocol

[RFC 768] • “no frills,” “bare bones” Internet transport protocol • “best effort” service, UDP segments may be: – lost – delivered out of order to app • connectionless: – no handshaking between UDP sender, receiver – each UDP segment handled independently of others Why is there a UDP?

• no connection establishment (which can add delay) • simple: no connection state at sender, receiver • small segment header • no congestion control: UDP can blast away as fast as desired Fall 2007 CSci232: Transport Layer & TCP 10

UDP (cont’d)

• often used for streaming multimedia apps – loss tolerant – rate sensitive • other UDP uses – DNS – SNMP Length, in bytes of UDP segment, including header • reliable transfer over UDP: add reliability at application layer – application-specific error recovery!

Fall 2007 32 bits source port # dest port # length checksum Application data (message) UDP segment format CSci232: Transport Layer & TCP 11

UDP Checksum

Goal: detect “errors” (e.g., flipped bits) in transmitted segment Sender: • treat segment contents as sequence of 16-bit integers • checksum: addition (1’s complement sum) of segment contents • sender puts checksum value (1’s complement of 1’s complement sum of 16 bit words) into UDP checksum field Receiver: • compute checksum of received segment • check if computed checksum equals checksum field value: – NO - error detected – YES - no error detected. But Fall 2007 CSci232: Transport Layer & TCP 12

Checksum: Example

arrange data segment in sequences of 16-bit words + 0110011001100110 1101010101010101 0000111100001111 sum: 0100101011001011 checksum(1’s complement): 1011010100110100 verify by adding: 1111111111111111 Fall 2007 CSci232: Transport Layer & TCP 13

TCP Overview

• Connection-oriented • Byte-stream – app writes bytes – TCP sends segments – app reads bytes • Full duplex • Flow control : keep sender from overrunning receiver • Congestion control : keep sender from overrunning network Fall 2007 Application process Application process TCP Send buffer Write bytes TCP Receive buffer Read bytes Segment Segment … Transmit segments Segment CSci232: Transport Layer & TCP 14

Functionality Split

• Network provides best-effort delivery • End-systems implement many functions – Reliability – In-order delivery – Demultiplexing – Message boundaries – Connection abstraction – Flow Control – Congestion control – … Fall 2007 CSci232: Transport Layer & TCP 15

High-Level TCP Characteristics

• Protocol implemented entirely at the ends – Fate sharing • Protocol has evolved over time and will continue to do so – Nearly impossible to change the header – Use options to add information to the header – Change processing at endpoints – Backward compatibility is what makes it TCP Fall 2007 CSci232: Transport Layer & TCP 16

1975

Three-way handshake

Raymond Tomlinson

In SIGCOMM 75 1974

TCP

Vint Cerf

described by and

Bob Kahn

In IEEE Trans Comm

Evolution of TCP

supports TCP/IP 1982

BSD Unix 4.2

TCP & IP

RFC 793 & 791 1984

Nagel’s algorithm

to reduce overhead of small packets; predicts congestion collapse 1983 1986

Karn’s algorithm

to better estimate round-trip time

Congestion collapse

observed 1987 1990

4.3BSD Reno

fast retransmit delayed ACK’s 1988

Van Jacobson’s algorithms

congestion avoidance and congestion control (

most

implemented in

4.3BSD Tahoe

) 1975 1980 Fall 2007 1985 CSci232: Transport Layer & TCP 1990 17

TCP Through the 1990s

1993

TCP Vegas

(Brakmo et al) real congestion

avoidance

1994

ECN

(Floyd) Explicit Congestion Notification 1994

T/TCP

(Braden) Transaction TCP 1996

Hoe

1996

SACK TCP

(Floyd et al) Selective Acknowledgement Improving TCP startup 1996

FACK TCP

(Mathis et al) extension to SACK 1993 Fall 2007 1994 1996 CSci232: Transport Layer & TCP 18

TCP Segment Header Structure

32 bits URG : urgent data (generally not used) ACK: ACK # valid PSH : push data now (generally not used) RST, SYN, FIN : connection estab (setup, teardown commands) source port # dest port # head len sequence number acknowledgement number not used U A P checksum R S F rcvr window size ptr urgent data Options (variable length) application data (variable length) counting by bytes of data (not segments!) # bytes rcvr willing to accept Internet checksum (as in UDP) Fall 2007 CSci232: Transport Layer & TCP 19

TCP Segment Format (cont)

• Each connection identified with 4-tuple: –

(SrcPort, SrcIPAddr, DstPort, DstIPAddr)

• Sliding window + flow control –

acknowledgment, SequenceNum, AdvertisedWinow

Data (SequenceNum) Sender Receiver • Flags Acknowledgment + AdvertisedWindow –

SYN, FIN, ACK, RESET, PUSH, URG

• Checksum – pseudo header (src & dst IP addresses) + TCP header + data Fall 2007 CSci232: Transport Layer & TCP 20

TCP Connection Set Up

Three way handshake: TCP sender, receiver establish “connection” before exchanging data segments • initialize TCP variables: – seq. # – buffers, flow control info • client: end host that initiates connection Step 1: client sends TCP control segment to server – specifies initial seq # SYN Step 2: server receives SYN, replies with segment SYN+ACK control – ACKs received SYN – specifies server  receiver initial seq. # • server: end host contacted by client Step 3: client receives SYN+ACK, replies with ACK segment (which may contain 1 st data segment) Fall 2007 CSci232: Transport Layer & TCP 21

initiate connection connection established client

TCP 3-Way Hand-Shake

server SYN received Question: a. What kind of “state” client and server need to maintain?

b. What initial sequence # should client (and server) use?

connection established Fall 2007 CSci232: Transport Layer & TCP 22

TCP Connection Setup Example

No. Time Source > Destination Proto SrcPort>DstPort [ Flags ] 1 13.734375 70.13.155.114 Seq=758244755 Len=0 MSS=1260 128.101.35.150

TCP 1414 > 22 [ SYN ] 2 13.968750 128.101.35.150 70.13.155.114

TCP 22 > 1414 [ SYN, ACK ] Seq=3778406755 Ack=758244756 Win=25200 Len=0 MSS=1460 3 13.968750 70.13.155.114 128.101.35.150

Seq=758244756 Ack=3778406756 Win=16384 Len=0 TCP 1414 > 22 [ ACK ]

Fall 2007 CSci232: Transport Layer & TCP 23

TCP Connection Setup Example

No. Time Source > Destination Proto SrcPort>DstPort [ Flags ] 1 13.6611233 70.13.155.114 Seq=3724852786 Len=0 MSS=1260 128.101.35.204

TCP 1567 > 80 [ SYN ] 2 13.890625 128.101.35.204 70.13.155.114

TCP 80> 1567 [ SYN, ACK ] Seq=484733971 Ack=3724852787 Win=25200 Len=0 MSS=1460 3 13.890625 70.13.155.114 128.101.35.204

Seq=3724852787 Ack=484733972 Win=17640 Len=0 TCP 1567 > 80 [ ACK ] 4 13.890625 70.13.155.114 128.101.35.204

TCP 1567 > 80 [PSH ,ACK ] Seq=73724852787 Ack=484733972 Win=17640 Len=564 5 14.630860 128.101.35.204 70.13.155.114

TCP 80> 1567 [ ACK ] Seq=484733972 Ack=3724853351 Win=25200 Len=0 MSS=1460

Fall 2007 CSci232: Transport Layer & TCP 24

?

?

3-Way Handshake: Finite State Machine

Client FSM?

Server FSM?

Upper layer:

initiate connection sent SYN w/ initial seq =x SYN sent ?

?

closed ?

?

SYN+ACK received sent ACK ?

?

conn estab’ed

info (“state”) maintained at client?

Fall 2007 CSci232: Transport Layer & TCP 25

Connection Setup Error Scenarios

• Lost (control) packets – What happen if SYN lost? client vs. server actions – What happen if SYN+ACK lost? client vs. server actions – What happen if ACK lost? client vs. server actions • Duplicate (control) packets – What does server do if duplicate SYN received?

– What does client do if duplicate SYN+ACK received?

– What does server do if duplicate ACK received?

Fall 2007 CSci232: Transport Layer & TCP 26

Connection Setup Error Scenarios

(cont’d) • Importance of • Dealing with (unique) – When receiving SYN, how does server know it’s a new connection request?

initial seq. no.?

– When receiving SYN+ACK, how does client know it’s a legitimate, i.e., a response to its SYN request?

old duplicate connections (or from malicious users) – If not careful: “TCP Hijacking” • How to choose unique initial seq. no.?

– randomly choose a number (and add to last syn# used) • Other security concern: packets from old – “SYN Flood” -- denial-of-service attack Fall 2007 CSci232: Transport Layer & TCP 27

TCP State Diagram: Connection Setup

Client Server

passive OPEN CLOSED create TCB CLOSE delete TCB LISTEN CLOSE delete TCB active OPEN create TCB Snd SYN SEND snd SYN SYN RCVD rcv SYN snd SYN ACK rcv SYN snd ACK rcv ACK of SYN Rcv SYN, ACK Snd ACK SYN SENT CLOSE Send FIN Fall 2007 ESTAB CSci232: Transport Layer & TCP 29

TCP: Closing Connection

Remember TCP duplex connection!

Client wants to close connection: client Step 1: TCP FIN server client end system sends control segment to Step 2: server receives FIN, replies with ACK . half closed client closing Step 3: client receives ACK.

half closed, wait for server to close half closed Server finishes sending data, also ready to close: Step 4: server sends FIN. Fall 2007 CSci232: Transport Layer & TCP server half closed server closing 30

TCP: Closing Connection (cont’d)

Step 5: client receives FIN, replies with ACK. connection fully closed Step 6: ACK. connection fully closed server , receives client closing half closed

Well Done!

full closed

Problem Solved?

client server half closed server closing full closed Fall 2007 CSci232: Transport Layer & TCP 31

TCP: Closing Connection (revised)

client server

Two Army Problem!

Step 5: client receives FIN, replies with ACK.

– Enters “ timed wait received FINs ” - will respond with ACK to client closing half closed Step 6: ACK. connection fully closed server , receives X Step 7: client , timer expires, connection fully closed full closed Fall 2007 CSci232: Transport Layer & TCP half closed server closing timeout full closed 32

TCP Connection Tear-Down Example

No. Time Source > Destination Proto SrcPort>DstPort [Flags] 80 35.156250 70.13.155.114 128.101.35.150

TCP 1414 > 22 [PSH,ACK] Seq=758246388 Ack=3778411633 Win=15920 Len=32 81 35.156250 70.13.155.114 128.101.35.150

TCP 1414 > 22 [ FIN, ACK] Seq=758246420 Ack=3778411633 Win=15920 Len=0 82 35.437500 128.101.35.150 70.13.155.114

TCP 22 > 1414 [ACK] Seq=3778411633 Ack=758246420 Win=25200 Len=0 13.968750

83 35.453125 128.101.35.150 84 35.453125 128.101.35.150 70.13.155.114

TCP 22 > 1414 [ ACK ] Seq=3778411633 Ack=758246421 Win=25200 Len=0 13.968750

70.13.155.114

TCP 22 > 1414 [ FIN , ACK ] Seq=3778411633 Ack=758246421 Win=25200 Len=0 13.968750

85 35.453125 70.13.155.114 128.101.35.150

Seq=758246421 Ack=3778411634 Win=15920 Len=0 TCP 1414 > 22 [ ACK ]

Fall 2007 CSci232: Transport Layer & TCP 33

State Diagram: Connection Tear-down

CLOSE send FIN FIN WAIT-1 ACK FIN WAIT-2 Fall 2007 rcv FIN snd ACK

Active Close

ESTAB CLOSE send FIN rcv FIN snd ACK rcv FIN+ACK snd ACK CLOSING rcv FIN send ACK

Passive Close

CLOSE WAIT rcv ACK of FIN CLOSE snd FIN LAST-ACK rcv ACK of FIN TIME WAIT Timeout=2min delete TCB CSci232: Transport Layer & TCP CLOSED 34

TCP Connection Management FSM

TCP client lifecycle Fall 2007 TCP client lifecycle CSci232: Transport Layer & TCP 35

TCP Connection Management FSM

TCP server lifecycle Fall 2007 TCP server lifecycle CSci232: Transport Layer & TCP 36

Reliability and Error Recovery

• ARQ vs. FEC – automatic retransmission request – forward error correction • General ARQ Algorithms – Stop & Wait • Perform issue: low utilization when delay-bw product large – Sliding Window Protocols • Go-Back-N • Selective Repeat • Key design issues: window size vs. size of seq. no. space Fall 2007 CSci232: Transport Layer & TCP 37

Error Recovery: Stop and Wait

• ARQ – Receiver sends acknowledgement (ACK) when it receives packet – Sender waits for ACK and timeouts if it does not arrive within some time period • Simplest ARQ protocol • Send a packet, stop and wait until ACK arrives Time Sender Receiver Fall 2007 CSci232: Transport Layer & TCP 38

Recovering from Error

Time Fall 2007 ACK lost Packet lost CSci232: Transport Layer & TCP Early timeout DUPLICATE

Problems with Stop and Wait

• How to recognize a duplicate • Performance – Can only send one packet per round trip Fall 2007 CSci232: Transport Layer & TCP 40

How to Recognize Resends?

• Use sequence numbers – both packets and acks • Sequence # in packet is finite  How big should it be? – For stop and wait?

• One bit – won’t send seq #1 until received ACK for seq #0 Fall 2007 CSci232: Transport Layer & TCP 41

Problem with Stop & Wait Protocol

first packet bit transmitted, t = 0 Sender Receiver RTT first packet bit arrives ACK arrives, send next packet, t = RTT + L / R

• Can’t keep the pipe full – Utilization is low when bandwidth-delay product (R x RTT)is large!

Fall 2007 CSci232: Transport Layer & TCP 42

Stop & Wait: Performance Analysis

Example: 1 Gbps connection, 15 ms end-end prop. delay, data segment size: 1 KB = 8Kb

T

transmit 

L

(packet length

R

(transmiss ion in bits) rate, bps)  8 kb 10 9 b/s  8  10  6 s  0 .

008 ms

U sender

RTT L

/ 

R L

/

R

RTT L

*

R

L

 .

008 30 .

008  0 .

00027 – U

sender

: utilization , i.e., fraction of time sender busy sending – 1KB data segment every 30 msec (round trip time) --> 0.027% x 1 Gbps = 33kB/sec throughput over 1 Gbps link Moral of story: network protocol limits use of physical resources!

Fall 2007 CSci232: Transport Layer & TCP 43

How to Keep the Pipe Full?

• Send multiple packets without waiting for first to be acked – Number of pkts in flight = window • Reliable, unordered delivery – Several parallel stop & waits – Send new packet after each ack – Sender keeps list of unack’ed packets; resends after timeout – Receiver same as stop & wait • How large a window is needed?

– Suppose 10Mbps link, 4ms delay, 500byte pkts • 1 ? 10 ? 20 ?

– Round trip delay * bandwidth = capacity of pipe Fall 2007 CSci232: Transport Layer & TCP 44

Pipelined (Sliding Window) Protocols

Pipelining: sender allows multiple, “in-flight”, yet-to be-acknowledged data segments – range of sequence numbers must be increased – buffering at sender and/or receiver • Two generic forms of pipelined protocols:

Go-Back-N and Selective Repeat

Fall 2007 CSci232: Transport Layer & TCP 45

Pipelining: Increased Utilization

sender receiver

first packet bit transmitted, t = 0 last bit transmitted, t = L / R RTT first packet bit arrives last packet bit arrives, send ACK last bit of 2 nd last bit of 3 rd packet arrives, send ACK packet arrives, send ACK ACK arrives, send next packet, t = RTT + L / R Fall 2007 U sender = Increase utilization by a factor of 3!

3 * L / R RTT + L / R = .

024 30.008 = 0.0008 microsecon ds CSci232: Transport Layer & TCP 46

Sliding Window

• Reliable, ordered delivery • Receiver has to hold onto a packet until all prior packets have arrived – Why might this be difficult for just parallel stop & wait?

– Sender must prevent buffer overflow at receiver • Circular buffer at sender and receiver – Packets in transit  buffer size – Advance when sender and receiver agree packets at beginning have been received Fall 2007 CSci232: Transport Layer & TCP 47

Sender/Receiver State

Sender

Max ACK received Next seqnum Sender window

… …

Receiver Next expected Max acceptable

Receiver window Sent & Acked OK to Send Fall 2007 Sent Not Acked Not Usable Received & Acked Acceptable Packet Not Usable CSci232: Transport Layer & TCP 48

Window Sliding – Common Case

• On reception of new ACK (i.e. ACK for something that was not acked earlier) – Increase sequence of max ACK received – Send next packet • On reception of new in-order data packet (next expected) – Hand packet to application – Send cumulative ACK – acknowledges reception of all packets up to sequence number – Increase sequence of max acceptable packet Fall 2007 CSci232: Transport Layer & TCP 49

Loss Recovery

• On reception of out-of-order packet – Send nothing (wait for source to timeout) – Cumulative ACK (helps source identify loss) • Timeout (Go-Back-N recovery) – Set timer upon transmission of packet – Retransmit all unacknowledged packets • Performance during loss recovery – No longer have an entire window in transit – Can have much more clever loss recovery Fall 2007 CSci232: Transport Layer & TCP 50

Go-Back-N in Action

Fall 2007 CSci232: Transport Layer & TCP 51

Selective Repeat

• Receiver individually acknowledges all correctly received pkts – Buffers packets, as needed, for eventual in-order delivery to upper layer • Sender only resends packets for which ACK not received – Sender timer for each unACKed packet • Sender window – N consecutive seq #’s – Again limits seq #s of sent, unACKed packets Fall 2007 CSci232: Transport Layer & TCP 52

Selective Repeat: Sender, Receiver Windows

Fall 2007 CSci232: Transport Layer & TCP 53

Sequence Numbers

• How large does size of sequence number space need to be?

– Must be able to detect wrap-around – Depends on sender/receiver window size • E.g.

– size of seq. no. space = 8, send win=recv win=7 – If pkts 0..6 are sent succesfully and all acks lost • Receiver expects 7,0..5, sender retransmits old 0..6!!!

• size of sequence no. space must be  send window + recv window Fall 2007 CSci232: Transport Layer & TCP 54

Sequence Numbers in TCP

• TCP regards data as a “byte-stream” – each byte in byte stream is numbered.

• 32 bit value, wraps around • initial values selected at start up time • TCP breaks up byte stream in packets.

– Packet size is limited to the Maximum Segment Size (MSS) • Each packet has a sequence number – seq. no of 1 st byte indicates where it fits in the byte stream • TCP connection is duplex – data in each direction has its own sequence numbers 13450 14950 16050 17550 Fall 2007 packet 9 packet 10 packet 8 CSci232: Transport Layer & TCP 55

TCP Seq. #’s and ACKs

Seq. #’s: byte stream “number”of first byte in segment’s data ACKs: seq # of next byte expected from other side User types ‘C’ host ACKs receipt of echoed ‘C’ Host A Host B host ACKs receipt of ‘C’, echoes back ‘C’ time red: A-to-B green: B-to-A simple telnet scenario Fall 2007 CSci232: Transport Layer & TCP 56

TCP Reliable Data Transfer

• TCP creates reliable data transfer service on top of IP’s unreliable service • Pipelined segments • Cumulative ACKs • TCP uses single retransmission timer • Retransmissions are triggered by: – timeout events – duplicate acks • Initially consider simplified TCP sender: – ignore duplicate acks – ignore flow control, congestion control Fall 2007 CSci232: Transport Layer & TCP 57

TCP = Go-Back-N Variant

• Sliding window with cumulative acks – Receiver can only return a single “ack” sequence number to the sender.

– Acknowledges all bytes with a lower sequence number – Starting point for retransmission – Duplicate acks sent when out-of-order packet received • But: sender only retransmits a single packet.

– Reason???

• Only one that it knows is lost • Network is congested packets.

packet – Why?

 shouldn’t overload it • Error control is based on byte sequences, not – Retransmitted packet can be different from the original lost Fall 2007 CSci232: Transport Layer & TCP 58

TCP Sender Events:

data rcvd from app: • Create segment with seq # • seq # is byte-stream number of first data byte in segment • start timer if not already running (think of timer as for oldest unacked segment) • expiration interval: TimeOutInterval timeout: • retransmit segment that caused timeout • restart timer ACK received: • If acknowledges previously unACKed segments – update what is known to be ACKed – start timer if there are outstanding segments Fall 2007 CSci232: Transport Layer & TCP 59

TCP ACK generation

[RFC 1122, RFC 2581] Event at Receiver TCP Receiver Action Arrival of in-order segment with expected seq #. All data up to expected seq # already ACKed

Delayed ACK.

send ACK Wait up to 500ms for next segment. If no next segment, Arrival of in-order segment with expected seq #. One other segment has ACK pending Immediately send single

cumulative

ACK, ACKing both in-order segments Arrival of out-of-order segment higher-than-expect seq. # .

Gap detected Immediately send

duplicate

ACK, indicating seq. # of next expected byte Arrival of segment that partially or completely fills gap Fall 2007 Immediate send ACK, provided that segment starts at lower end of gap CSci232: Transport Layer & TCP 60

TCP Flow Control

• receive side of TCP connection has a receive buffer: flow control sender won’t overflow receiver’s buffer by transmitting too much, too fast • speed-matching service: matching the send rate to the receiving app’s drain rate • app process may be slow at reading from buffer Fall 2007 CSci232: Transport Layer & TCP 61

TCP Flow Control: How It Works

(Suppose TCP receiver discards out-of-order segments) • spare room in buffer

= RcvWindow (dynamically changes) = RcvBuffer-[LastByteRcvd LastByteRead]

• Rcvr advertises spare room by including value of

RcvWindow

in segments • Sender limits unACKed data to

RcvWindow

– guarantees receive buffer doesn’t overflow Fall 2007 CSci232: Transport Layer & TCP 62

TCP Segment Structure

32 bits URG : urgent data (generally not used) ACK: ACK # valid PSH : push data now (generally not used) source port # dest port # head len sequence number acknowledgement number not used U A P checksum R S F rcvr window size ptr urgent data Options (variable length) counting by bytes of data (not segments!) # bytes rcvr willing to accept RST, SYN, FIN : connection estab (setup, teardown commands) application data (variable length) Internet checksum (as in UDP) Fall 2007 CSci232: Transport Layer & TCP 63

Triggering Transmission

• How does TCP decide to transmit a segment?

– MSS (Maximum segment size) • Set to size of the largest segment TCP can send without local IP fragmentation (MTU of directly connected) – Sending process explicitly asked to do (Push to flush) – Firing timer • Silly Window Syndrome – Flow control needs to be maintained – Sender can transmit full segment (MSS) when Acked by receiver Fall 2007 CSci232: Transport Layer & TCP 64

Silly Window Syndrome (cont’d)

– Window currently closed from receiver – ACK opens MSS/2 bytes – Should sender transmit MSS/2?

• Original TCP implementation silent • Early implementation of TCP decided to go ahead • Sender can not know when the window will open for full MSS – If sender is aggressive, sending available window size • results Silly window syndrome • small segment size remains indefinitely – Hence a problem when either sender transmits a small segment or receiver opens window a small amount Fall 2007 CSci232: Transport Layer & TCP 65

Triggering Transmission (cont’d)

– Receiver may delay ACKs, but how long?

– Ultimate solution lies with sender: • When does the TCP sender decide to transmit a segment?

• Nagle’s Algorithm: – Waiting too long hurt interactive applications (Telnet) – Without waiting, risk of sending a bunch of tiny packets (silly window syndrome) – Wait till timer expires: • Self clocking: As long as TCP has any data in flight, sender receives an ACK which can be used to trigger transmission • If no data in flight, immediately send the segment (setting TCP_NoDElAY option) Fall 2007 CSci232: Transport Layer & TCP 66

TCP Round Trip Time and Timeout

Q: how to set TCP timeout value?

• longer than RTT – but RTT varies • too short: premature timeout – unnecessary retransmissions • too long: slow reaction to segment loss Q: • how to estimate RTT?

SampleRTT

why?

: measured time from segment transmission until ACK receipt – ignore retransmissions, •

SampleRTT

current will vary, want estimated RTT “smoother” – average several recent measurements, not just

SampleRTT

Fall 2007 CSci232: Transport Layer & TCP 67

Round-trip Time Estimation

• Wait at least one RTT before retransmitting • Importance of accurate RTT estimators: – Low RTT estimate • unneeded retransmissions – High RTT estimate • poor throughput • RTT estimator must adapt to change in RTT – But not too fast, or too slow!

• Spurious timeouts – “Conservation of packets” principle – never more than a window worth of packets in flight Fall 2007 CSci232: Transport Layer & TCP 68

Adaptive Retransmission (Original Algorithm)

• Measure

SampleRTT

for each segment/ ACK pair • Compute weighted running average of RTT –

EstRTT

= a x

EstimatedRTT

+ (1 a) x

SampleRTT

 a between 0.8 and 0.9 ( to smooth Estimated RTT) - Small a indicates temp. fluctuation, a large value more stable, may not be quick to adapt to real changes • Set timeout based on

EstRTT

TimeOut = 2

x

EstRTT

Fall 2007 CSci232: Transport Layer & TCP 69

Retransmission Ambiguity

Sender Receiver Sender Receiver • ACK is for Original transmission but was for retransmission => Sample RTT is too large • ACK is for retransmission but was for original => Sample RTT too small Fall 2007 CSci232: Transport Layer & TCP 70

Karn/Partridge Algorithm

• Solution: • Do not sample RTT when retransmitting – only measures sample RTT for segments sent once • Double timeout for each retransmission – Next timeout to be twice the last timeout, rather than basing it on the last Estimated RTT • Karn and Patridge proposal is exponential backoff – Congestion is most likely cause of lost segments – TCP sources should not react too aggressively to a timeout – More timeouts mean more cautious the source should become (congestion problem) Fall 2007 CSci232: Transport Layer & TCP 71

Jacobson/ Karels Algorithm

• Original computation for RTT did not take the variance of sample RTTs into account – If variation among samples is small, Estimated RTT can be better – used without increasing the estimate twice – A large variance in the samples mean Time out values should not be too tightly coupled to the Estimated RTT • New Calculations for average RTT

Diff

= S

ampleRTT

-

EstRTT

EstRTT = EstRTT + (

x Diff)

Dev = Dev +

 • where 

( |Diff| - Dev)

is a fraction between 0 and 1 • Consider variance when setting timeout value –

TimeOut

= m x

EstRTT

+ f x

Dev

• where m = 1 and f = 4 Fall 2007 CSci232: Transport Layer & TCP 72

TCP Round Trip Time Estimation

EstimatedRTT = (1-

a

)*EstimatedRTT +

a

*SampleRTT

• Exponential weighted moving average • influence of past sample decreases exponentially fast • typical value: a

=

0.125

Setting the timeout interval •

Estimted RTT

plus “safety margin” – large variation in

EstimatedRTT ->

larger safety margin • “safty margin”: accommodate variations in estimatedRTT

DevRTT = (1-

)*DevRTT +

*|SampleRTT-EstimatedRTT| (typically,

= 0.25) TimeoutInterval = EstimatedRTT + 4*DevRTT

Fall 2007 CSci232: Transport Layer & TCP 73

Example RTT Estimation:

RTT: gaia.cs.umass.edu to fantasia.eurecom.fr

350 300 250 200 150 100 1 Fall 2007 8 15 22 29 36 43 50 57

time (seconnds)

64 71 78 85 SampleRTT Estimated RTT CSci232: Transport Layer & TCP 92 99 106 74

Timestamp Extension

• Used to improve timeout mechanism by more accurate measurement of RTT • When sending a packet, insert current time into option – 4 bytes for time, 4 bytes for echo a received timestamp • Receiver echoes timestamp in ACK – Actually will echo whatever is in timestamp • Removes retransmission ambiguity – Can get RTT sample on any packet Fall 2007 CSci232: Transport Layer & TCP 75

Timer Granularity

• Many TCP implementations set RTO (Retransmission Timeout) in multiples of 200,500,1000ms • Why?

– Avoid spurious timeouts – RTTs can vary quickly due to cross traffic – Make timers interrupts efficient • What happens for the first couple of packets?

– Pick a very conservative value (seconds) Fall 2007 CSci232: Transport Layer & TCP 76

Important Lessons

• TCP state diagram  setup/teardown • TCP timeout calculation  estimated how is RTT • Modern TCP loss recovery – Why are timeouts bad?

– How to avoid them?  e.g. fast retransmit Fall 2007 CSci232: Transport Layer & TCP 77

Fast Retransmit

• What are duplicate acks (dupacks)?

– Repeated acks for the same sequence • When can duplicate acks occur?

– Loss – Packet re-ordering – Window update – advertisement of new flow control window • Assume re-ordering is infrequent and not of large magnitude – Use receipt of 3 or more duplicate acks as indication of loss – Don’t wait for timeout to retransmit packet Fall 2007 CSci232: Transport Layer & TCP 78

Sequence No

Fast Retransmit

X Retransmission Duplicate Acks Packets Acks Fall 2007 Time CSci232: Transport Layer & TCP 79

Sequence No

TCP (Reno variant)

X X X X Now what? - timeout Packets Acks Fall 2007 Time CSci232: Transport Layer & TCP 80

SACK

• Basic problem is that cumulative acks provide little information • Selective acknowledgement (SACK) essentially adds a bitmask of packets received – Implemented as a TCP option – Encoded as a set of received byte ranges (max of 4 ranges/often max of 3) • When to retransmit?

– Still need to deal with reordering  wait for out of order by 3pkts Fall 2007 CSci232: Transport Layer & TCP 81

SACK

X X X X Now what? – send retransmissions as soon as detected Sequence No Packets Acks Fall 2007 Time CSci232: Transport Layer & TCP 82