Communications on the Internet are not efficient.

Download Report

Transcript Communications on the Internet are not efficient.

TCP Round Trip Time Analysis in
a University Network.
Justifying the pursuit of Active Queue
Management (AQM) research.
Author: Jonathan Thyer
September 2004 - March 2005
Disclaimers
• This work used to be a thesis…
– And then reality sunk in…
– Just one individual in a talented research community 
hoping to make a small contribution.
– Wonderful family and busy work life.
– This really is a mixture between a thesis and project
presentation.
• Now – have I lowered your expectations enough?
8-)
How do you communicate across
the Internet?
• Access your local network
• Use a well defined communications protocol
– Ie: HTTP (the web)
– Email
• Type in some Internet destination address and
away you go!
– [email protected]
– http://www.thyer.org/
• But communications are not as efficient as they
could be.
• Assertion: The Internet is under-performing!
What is happening under the
computer covers?
• Your Internet destination name address gets
translated into a 32 bit number by a network
service called the Domain Name System (DNS)
• Your computer initiates communication with the
destination Internet address.
• Numerous Internet protocol routers, switches,
hubs and physical media carry your
communications from source to destination and
back again.
Communications Protocols
• Open Systems Interconnection (OSI) model.
• Communication protocols are defined in a
layered application programming interface.
• Why? Because it is easy to understand and
to programmatically implement!
–
–
–
–
–
–
–
Layer 7: Applications (Web browser, HTTP etc)
Layer 6: Presentation layer (data conversions)
Layer 5: Session establishment (not communication orientated)
Layer 4: Transport protocol (often UDP/TCP)
Layer 3: Internet Protocol (logical addresses)
Layer 2: Data link layer - framing characteristics (often Ethernet)
Layer 1: Physical (radio frequency) characteristics
Data link layer (layer 2)
• Data can be sent between local area network
devices at layer 2.
• Data is broken down into smaller chunks of data
called packets.
• Different data link transmission protocols can be
used.
• Ethernet has become the common standard and
uses 48-bit (6 bytes) source and destination
addresses.
• Data link layer communications are confined to
local area networks through either point to point or
shared media links.
• Typically less than 1000 devices in a local area
network. (often less than 255)
Internet Protocol (layer 3)
• Known as the logical layer (32 bit
source/destination addresses)
• Number addresses have a system called “Domain
Name Service” that converts numbers to names.
– Eg: 152.13.2.96 = www.uncg.edu
• Data is also transported in packet form but can be
routed between multiple local area networks.
• A protocol called “Address Resolution Protocol”
(ARP) translates IP (layer 3) addresses into layer 2
Ethernet addresses.
• ARP is the glue between layer 3 and layer 2.
Hub/Switch #1
Hub/Switch #3
Computer #1
192.168.1.1
Computer #3
192.168.99.1
Internet Protocol Router
Computer #4
192.168.99.2
Computer #2
192.168.1.2
Hub/Switch #2
Hub/Switch #4
Computer #3
192.168.1.3
Computer #5
192.168.99.3
Local Area Network #1
192.168.1.0/24
192.168.1.254
Local Area Network #2
192.168.99.0/24
192.168.99.254
• Computer 1  Computer 2
– C#1 sends ARP request – who has 192.168.1.2?
– C#2 replies – thats me and supplies 48-bit addr.
– C#1 addresses data to C#2 using the supplied 48-bit address and
sends it.
Hub/Switch #1
Hub/Switch #3
Computer #1
192.168.1.1
Computer #3
192.168.99.1
Internet Protocol Router
Computer #4
192.168.99.2
Computer #2
192.168.1.2
Hub/Switch #2
Hub/Switch #4
Computer #3
192.168.1.3
Computer #5
192.168.99.3
Local Area Network #1
192.168.1.0/24
192.168.1.254
Local Area Network #2
192.168.99.0/24
192.168.99.254
• Computer 1  Computer 4
– C#1 knows that C#4 is not in local network.
– How? C#1 uses a mathematical masking operation by performing
a logical AND operation on the destination IP address.
– C#1 sends ARP – who has 192.168.1.254? – router replies with 48bit address
– C#1 sends data to router, router then looks in route tables for
destination logical address.
– Router sends ARP into destination address – who has
192.168.99.1? C#4 replies – thats me!!!
What is a router?
• A router is a device that operates at the OSI logical layer 3.
• It knows what to do with data arriving that has logical IP
addresses for source and destination.
• A router builds routing tables to represent “networks” that
are either directly connected or available through a
neighboring router.
• A router is designed to find the shortest network path
between a source network and destination network.
• A router often has multiple different physical links
connected to it. There are often multiple possible routes to
any specific network.
Transport (layer 4)
•
•
User Datagram Protocol (UDP) – a stateless and connectionless protocol.
– UDP packets get sent directly from source to destination and there is no
possible way for the source to know that the data arrives intact.
Transport Control Protocol (TCP) – a stateful and connection oriented
protocol.
– TCP data is sent in segments.
– A positive acknowledgement must be received for each segment sent.
– TCP is the majority carrier of traffic on the Internet.
– Why?
• It is reliable – guaranteed delivery of all data content.
• Validated over time and widely implemented.
• First proposed in 1981 by John Postel. (RFC-793)
Layer 2: Data Link Header (often Ethernet)
Layer 3: Logical Header (often Internet Protocol)
Layer 4: Transport Header (TCP / UDP)
Data Payload
Layer 2: Trailer (Ethernet checksum)
TCP – Transmission and
congestion control
TCP - SYNchronize
Host
Buffer
Usage
TCP - SYN ACKnowledge
Host
Buffer
Usage
TCP - ACK the ACK
Data Segment 1
Data Segment 2
Data Segment 3
ACK for Segment 1
Window size (bytes)
1
Internet Attached
Computers
2
A
3
4
B
5
6
7
8
C
9
10
11
12
13
....
....
D
Internet Attached
Computers
A - data that has been transmitted and acknowledged
B - data sent but NOT ackowledged
C - data ready to be sent immediately
D - data that may not be sent yet
*Note: All ACK packets contain receivers window size.
Internet – Old Days
Internet Protocol
Router
Internet Protocol
Router
Internet Protocol
Router
Internet Protocol
Router
Internet Protocol
Router
Internet Protocol
Router
Internet Protocol
Router
Internet Protocol
Router
Internet Protocol
Router
Internet Attached
Computers
Internet Protocol
Router
Internet Protocol
Router
The Old Days! Early 1990's (and prior years).
Routers were interconnected with telephone company leased
circuits.
- Frame Relay: 56 killobits per second
- DS1/T1: 1.5 million bits per second
- DS3/T3: 45 million bits per second
Physical media resource use was a source of congestion.
Internet Attached
Computers
Internet – More recent times..
Internet Protocol
Router
Internet Protocol
Router
Internet Protocol
Router
Internet Protocol
Router
Internet Protocol
Router
Internet Protocol
Router
Internet Protocol
Router
Internet Protocol
Router
Internet Protocol
Router
Internet Attached
Computers
Internet Protocol
Router
Internet Protocol
Router
The New (Boom) Days! Mid 1990's (and beyond).
Routers interconnected with high speed fiber optic links.
- OC3: 155 million bits per second
- OC12: 620 million bits per second
- OC48: 2,480 million bits per second
- OC192: 9,920 million bits per second
Physical media plays a smaller role in congestion. Router
performance becomes a significant factor. Providers motivated to
use bandwidth efficiently.
Internet Attached
Computers
Buffers / Queues everywhere
Router
Queue
Router
Queue
Router
Queue
Internet Protocol
Router
Internet Protocol
Router
Router
Queue
Router
Queue
Router
Queue
Internet Protocol
Router
Router
Queue
Internet Protocol
Router
Internet Protocol
Router
Internet Protocol
Router
Internet Protocol
Router
Router
Queue
Router
Queue
Router
Queue
Router
Queue
Internet Protocol
Router
Internet Protocol
Router
Internet Attached
Computers
Internet Protocol
Router
Internet Protocol
Router
Internet Attached
Computers
Round Trip Time (RTT)
• RTT is the time elapsed between when a TCP data
segment is sent and that segments corresponding
acknowledgement (ACK) is received.
• RTT is an important measure of Internet
performance.
• RTT directly impacts TCP performance
characteristics on end systems.
• RTT is impacted by router’s along the
communication path.
My Goals
• Develop a tool to measure TCP RTT data between
the UNCG campus and the Internet
– Produce frequency plots of the RTT data collected
– Why? Because I had to do something to prove what
Shan Suthaharan was telling me!
• Try and explain the results.
• Build a small network to perform further research
within.
• The tool developed is called tcpflowstat
Data Collection
Setup
Copy all data to
attached
collection host.
NCREN Internet
Protocol Router
UNCG Switching
and Routing Equipment
Data Collection
Host Machine
• NCREN: North Carolina Education and Research
Network
• UNCG to NCREN link averages about 60 – 80 m/bits
per second over time.
• Common “port spanning” method used to “copy” all
Internet data to collection host
• Collection host uses a program called “tcpdpriv” to
collect the data.
• Collected 100,000,000 packet samples over several
days.
Ethical Concerns
• “tcpdpriv” does a number of things to change the
data while preserving traffic characteristics
– Source and Destination addresses are replaced with
incrementing 32-bit numbers starting from 0.
– TCP port information is replaced with random numbers.
– Data content section of packet is discarded.
– Packet header is stored to a file in “PCAP” format.
• PCAP is a public domain packet header capture format for
UNIX systems.
Ver
IHL
ToS
IP Header
Identification
Time to live
Total Length
Flags
Protocol
Fragment Offset
Header Checksum
"tcpdpriv" - Source address replaced with incrementing number
"tcpdpriv" - Dest. address replaced with incrementing number
"tcpdpriv" - random source port
"tcpdpriv" - random dest. port
TCP Header
Sequence Number
Acknowledgement Number
Data
Offset
Reserved
Flags
Checksum
Window Size
Urgent Pointer
Options
"tcpdpriv" - discards all data bytes
Padding
Definition of a TCP Flow
• A unique, reliable communication between a source and destination
computer using the TCP protocol.
• Think of dialing an office phone number, then using an extension
number after that.
• The phone number would be the destination IP address, then the
extension becomes the TCP socket or port number.
• A TCP flow is defined as the five-tuple of TCP protocol, source
IP address, destination IP address, source TCP port, and
destination TCP port.
• There can be multiple TCP flows between a source and
destination computer.
RTT – How to calculate it!
• From research literature, there are three basic
calculation methods
– Subtract the time difference between the TCP SYN
packet and the resulting ACK of that SYN.
– Use the change in window size during slow start –
calculate the difference between data segment interarrival times… Uses a time threshold to determine a
flight (burst) of packets.
– Use a fluid dynamic view treating traffic at a bits level
per unit time. Basis is that when TCP is in congestion
avoidance mode, the window size increases by one
MSS every RTT.
RTT – with limited resources…
• Related research shows that the SYN –
SYN/ACK method is a reasonably good
estimator of RTT.
• Other methods depend on averaging several
hundred values per TCP flow of
communication.
• I had only limited computing power
available!
Basic operation of the tcpflowstat
program
• Open’s a packet capture file
• For each packet header in the file
– Find a TCP packet
• If the packet is a SYN packet, allocate a tcpflow data structure
node and use the IP and TCP port addressing as the key item.
• If not, search the tcpflow data structure to see if this packet
matches an existing flow
• If the packet is a SYN-ACK, then calculate time difference and
update RTT data value in flow data structure.
• If the packet is a FIN or RST packet, then the flow is removed
from the data structure and placed in a “completed flow”
linked list.
Challenges to overcome
• Each TCP flow detected (by seeing a SYN)
forces the code to allocate memory for a
tcpflow node.
• Each TCP packet potentially results in a
search of the tcpflow data structure
• Data structures must be efficient.
The tcpflow data structure
• I chose to use a hash table to implement the
data structure.
• Hash table size was set to a large prime
number not close to a power of 2.
• In this case, the number was set to 47,189
– about halfway between 2^15 and 2^16
• This ensures fairly even distribution of
hashing keys in the table.
RTT – A word about time!
• The UNIX PCAP library code stores packets with
a millisecond and nanosecond time resolution.
• Time delay may be introduced due to processing
the packet header on the data collection host.
• The nanosecond portion of the time stamp was
multiplied by 1000 and added to the millisecond
portion to bring the measurements into a
millisecond time unit.
Final processing
• When all packets have been read and processed,
the final steps are as follows
– Sort flow duration data, then calculate the min, max,
mean, and median
– Sort flow RTT data, then calculate the min, max, mean,
and median
– Calculate the flow RTT frequency and output an RTT
frequency file for gnuplot to process.
– Output RTT, duration, and overall statistics to the
screen.
Tcpflowstat - code performance
• Run on a Pentium III, 1Ghz CPU, and 512 Mb of
RAM.
• Processed 100 million packets of data in 30
minutes of elapsed time.
• Approximately 7 – 10 Gbytes of data on disk to
process.
• Most time spent in waiting for disk activity, and
memory management routines.
– UNIX malloc code is notoriously inefficient (linear)
especially when using the “free” routines.
Performance cont…
• Hash table exhibited collisions linearly from the
upper bound.
– Collision resolution was implemented through simple
linked lists.
• “tcpdpriv” sequential numbering of addresses
created sequential hash keys (not too bad actually)
• UNIX modulus function could be optimized
• Large amount of RAM usage due to thousands of
parallel TCP flows being processed within any
time span.
• A multiple indirect hashing approach would be
better – ie: break the src/dest IP address down by
octet. This is commonly implemented in routers.
Initial Results
TCP Flow Duration
------------------------MIN
=
2 ms
MAX
= 4794030 ms
MEAN
=
3593 ms
MEDIAN =
758 ms
------------------------TCP Round Trip Time (RTT)
------------------------MIN
=
0 ms
MAX
=
63108 ms
MEAN
=
221 ms
MEDIAN =
11 ms
------------------------802373 TCP flows counted.
-------------------------
Overall Stats
-------------------------TCP
=
93455966 packets
UDP
=
6290127 packets
ICMP =
192462 packets
-------------------------TOTAL = 100000000 packets
--------------------------
Observations on statistical
breakdown
• Over 90% of traffic is TCP.
• A vast majority of flow durations are short (less
than 1 second)
• Likely due to web transactions which tend to be
many and short.
• Mean flow duration is higher than the mean.
– A fair number of measured flows have longer durations.
– Related research confirms that longer duration flows
dominate the Internet traffic.
Initial RTT Frequency Plot
Observations
• High frequency of TCP flows exhibiting
RTT of less than 5 ms.
• Significant percentage of TCP flows with
RTT of approx. 20ms.
• Peaks and valleys across the distribution
• Skepticism thus:
– Four more samples were taken over the period
of about 1 week.
Further combined sample results
Combined RTT Frequencies – Loch
Ness Monster Plot!
Congested Router Diagram
Burst of traffic overflows queue!
Delayed traffic deep in queue
Router
Traffic
Queue
Congested Router!
Internet Attached
Computers
The New (Boom) Days! Mid 1990's (and beyond).
Routers interconnected with high speed fiber optic links.
- OC3: 155 million bits per second
- OC12: 620 million bits per second
- OC48: 2,480 million bits per second
- OC192: 9,920 million bits per second
Physical media plays a smaller role in congestion. Router
performance becomes a significant factor. Providers motivated to
use bandwidth efficiently.
Internet Attached
Computers
Possible explanations
• Assuming that a router becomes a congestion
point, a burst of traffic will cause queue overflow
(droptail)
• Global TCP congestion control synchronization
will occur during queue overflow.
• All affected TCP flows will synchronously reduce
their Window size by 2. (multiplicative decrease)
• Flows deeper in queue will not experience packet
drop but will experience delay.
• Flow treatment is not equal.
What do we want to see…
How to achieve a desired result
• Only a router along the path knows its own
congestion conditions at a point in time.
• At high congestion times, we must ensure that
there is no congestion control synchronization.
• Random packet drop or marking (ECN) is
appropriate to force a selection of flows to reduce
their window sizes.
• ECN is defined in RFC-3168 (borrows two bits
from a reserved part of the header)
• Queue size must be optimized to that flow delay is
minimized.
Random Early Detection (RED)
[Floyd/Jacobson]
• Two queue thresholds used, min_th and
max_th.
• When ave size < min_th, no packets marked
• When ave size >= min_th, <= max_th, mark
packets with probability p where p is a
function of ave queue size.
• When ave size > max_th, mark all packets
RED is not always sufficient
• Sudden congestion can keep queue depth
above the maximum threshold.
• RED can degenerate into the same behavior
as a drop-tail configuration.
• Weighted moving average algorithm reacts
too slowly to sudden changes.
Queue depth and router buffer
sizes
• 1994 paper (Villamizar and Song) set the standard
router buffer size at B  RTT xC
– This is a commonly used formula today!
• Subsequent paper at SIGCOMM 2004 from
researchers at Stanford suggest the more
appropriate formula is B  ( RTTxC) / n
– Where C is the link capacity and n = no. of flows.
• The denominator of this equation represents a
variable that must be dynamic. It is the “predictor
of congestion” variable. Shan Suthaharan is
currently seeking a patent for predictive
algorithms that determine this variable.
Why not just reduce buffer size?
• Reducing buffer size would likely reduce the
incidence of delayed traffic flows.
• Buffer overflow would still result in a TCP
congestion control synchronization problem.
• Still also have the problem of unfair treatment of
flows – first come, first serve is not necessarily
best.
• Poor performance would still result.
Conclusions
• Maximum buffer size should be reduced as shown
in SIGCOMM’04 paper  saves $ and eases
hardware design concerns.
• Active queue management (AQM) should be used.
• A combination of reduced, preferably dynamic,
maximum buffer size and AQM should reduce
congestion control synchronization and increase
fair treatment of different TCP flows.
• Implementations should be simple to use; perhaps
even be the default configuration.
• New methods of active queue management must
continue to be researched and developed.
Furthering research efforts
• Collect more data, 8 – 12 hour samples would be
nice.
• Build a test network rather than using a simulator.
• Use sources of real traffic as testing environment.
• Write a program to completely replay all traffic
capture on a specific Internet link. (not easy)
Our small research network build!
Network: 10.2.1.0/16
Network: 10.12.1.0/16
Ethernet HUB
FreeBSD
Router
FreeBSD
Router
HEWL ETT
PACKARD
HEWL ETT
PACKARD
FreeBSD
End Host
Ethernet HUB
FreeBSD
End Host
Ethernet HUB
HEWLETT
PACKARD
Network: 192.168.1.0/24
10.2.1.2
10.12.1.2
192.168.1.1
10.2.1.1
192.168.1.2
10.12.1.1
FreeBSD is useful!
• O/S has an in-built firewall for matching specific
packet flows.
• A kernel module called “DummyNet” exists for
research use.
– DummyNet can be configured to buffer traffic for extra
time
– Uses mbufs – BSD ring buffer to delay traffic
– Danger of overflowing delay buffer
• Full source code is freely available and well
documented.
Network: 10.2.1.0/16
Network: 10.12.1.0/16
Ethernet HUB
FreeBSD
Router
FreeBSD
Router
HEWL ETT
PACKARD
HEWL ETT
PACKARD
FreeBSD
End Host
Ethernet HUB
FreeBSD
End Host
Ethernet HUB
HEWLETT
PACKARD
Network: 192.168.1.0/24
10.2.1.2
10.12.1.2
192.168.1.1
10.2.1.1
192.168.1.2
10.12.1.1
• Using the end hosts, generate multiple thousands
of TCP data streams.
– Simple server listener code that generates character data
should be sufficient
• Lower the link speed at the center of the network
to force the routers to buffer traffic
• Modify the ALTQ code to implement different
active queue management algorithms.
• Connect analyzer host and use tcpflowstat to
analyze traffic characteristics.
Thank you.
• Thanks to my management for dealing with my
brain split between daily work and research work!
• Thanks for my family for all those times I could
not attend various events but really wanted to.
• Thanks to Dr. Suthaharan for taking time after
5pm on weeknights and on weekends to speak
with me about various research topics and papers.