Transcript Document

IP Network Performance Measurements

Bruce Morgan AARNet Pty Ltd

Just checking…

 Why metrics?

 Metrics are important to identify network related issues especially performance  Metrics can be diverse  No one metric is suitable for all needs

Types of Measurement

 Active Measurement 

Injecting measurement data into the network

E.g. UDP, TCP, ICMP packets

 Passive Measurement 

Measuring what is there already

The Problem

 Measurement of the network cloud is difficult – but is essential if we are to gauge user perception of the internet

The World Wide Wait

Some problems are host based, while others are network based:  Physical latency  Network queuing and delays  Server processing delay  Timeouts and packet loss  TCP protocol delays

The Dark Cloud

 Diverse network paths  Asymmetric paths  Policy routing  Committed Access Rates  Firewalls and filters

IP Performance Metrics

 Framework spelt out in RFC 2330 from the IPPM Working Group  Goal: “to achieve a situation in which users and providers of Internet transport service have an accurate common understanding of the performance and reliability of the Internet component 'clouds' that they use/provide.”

On the Standards track…

 RFC 2678 IPPM Metrics for Measuring Connectivity  RFC 2679 A One-way Delay Metric for IPPM.

 RFC 2680 A One-way Packet Loss Metric for IPPM.

 RFC 2681 A Round-trip Delay Metric for IPPM.

A One-way Delay Metric

 Type-P-One-way-Delay  The P is for protocol  A Poisson distribution is chosen to inject packets  Both source and destination require time synchronisation

A Round-trip Delay Metric

 Many applications do not perform well with large end to end delays  Ease of deployment compared to one-way metrics  Ease of interpretation

Ping

 Two way path measurement based on RTTs (return trip times)  Choice of monitored address 

Host

Router interface

Router Loopback address

Packet Loss on ICMP

 Loss Asymmetry 

Loss = 1 – ((1 – Lossfwd).(1-Lossrcv))

 Path Asymmetry  Possibility of Internet Service Providers (ISPs) or sites or even hosts rate limiting (including complete blocking) ICMP echo and thus giving rise to invalid packet loss measurements.

PingER

 (Ping End-to-end Reporting) is the name given to the Internet End-to-end Performance Measurement (IEPM) project to monitor end-to end performance of Internet link  Uses ICMP RTT for measurement

Surveyor

Dedicated PC running Unix at key sites

GPS for clock synchronization

One way delay & loss measurements

Community is Internet 2 clients,

HEP sites collaborating with Surveyor

PingER/Surveyor Comparison

 PingER uses the ICMP echo facility (ping) and thus only makes round trip measurements.  Surveyor uses a GPS system to synchronise time between sites and makes one way measurements.

PingER/Surveyor Comparison

 Surveyor requires a dedicated platform (PC) to be installed at each site that is monitored, whereas PingER uses an existing host with no special software installed at the monitored site.  PingER cheaper!

PingER/Surveyor Comparison

Surveyor is more accurate and better for short term measurement, especially for sites which have good connectivity. PingER is a more light weight solution, requires less management, uses less bandwidth, requires less storage, and nothing needs to be installed at the remotely monitored sites and is good for remote sites with poor connectivity.

Method Hosts Frequency Timing Monitors Remotes Pairs Storage

PingER/Surveyor Comparison Surveyor

1 way delay dedicated ~2*2/s Poisson <2/s> ~30 ~30 (~full mesh) ~900 ~38Mbytes / pair / mo

PingER

2 way ping selected ~ 0.01/s bursty (30 min intervals) 18 ~300 (hierarchical) ~1200 ~ 0.6 Mbytes / pair / mo

      

PingER - Surveyor Complementarity Agree well Surveyor has one way measurements, PingER only round-trip Surveyor dedicated platforms & strong central management

experience with PingER shows this has benefits.

PingER more parsimonious/lightweight (bandwidth, disk space, cpu)

but necessarily less accurate especially at small (hourly) time resolution on low loss links.

PingER good for looking at long term trends & grouping where statistics are less a problem

TCP SYN / ACK tools

 In order to truly measure Web traffic, which is almost entirely TCP/IP traffic, it is best to probe using TCP/IP rather than ICMP  SYN/ACK mechanism proves useful for this purpose

TCP SYN/ACK tools 3 way handshake Send SYN seq=x Receive SYN +ACK Send ACK y+1 Receive SYN Send SYN seq=y, ACK x+1 Receive ACK

TCP SYN/ACK

 Connection request by a SYN and measures the time taken by the target to respond with an ACK  The connection is promptly cleared by another exchange of packets, this time containing the FIN control flag.

TCP SYN/ACK tools

Metric Samples TCP SYN/ACK tools Ping SYN/ACK

30000 30000

Average Standard Deviation Median Minimum Maximum Lost packets

161.6 ms 33.0 ms 154.4 ms 151 ms 1222 ms 528 (1.76%) 158.0 ms 11.6 ms 153.0 ms 150 ms 610 ms 469 (1.56%)

TCP SYN/ACK tools

Sting

  Sting is a TCP-based network measurement tool that measures end-to-end network path characteristics. sting is unique because it can estimate

one-way

properties, such as loss rate, through careful manipulation and observation of TCP behaviour. Avoids increasing problems with ICMP-based network measurement (blocking, spoofing, rate limiting, etc).

 http://www.cs.washington.edu/homes/savage/sting/

Current AARNet Measurements

MRTG

Perf

ICMP RTT measurements

ICMP Packet Loss measurements

Wa

Host/endpoint reachability

TCP HTTP file transfer measurements

Netflow data

MRTG

 Uses SNMP interface statistics  Provides multi-functionality from router temperature to throughput  Visualisation package  Lacks granularity with time  Deployed at each RNO

MRTG graphs

WARNO/ International traffic on June 18 WARNO / VRNO traffic on June 18

Perf Tool

  Perfd – uses a bsd based ping for RTT and packet Loss calculation Perf – web display tool of the data  Deployed at each RNO to measure all points of the mesh  Used to check SLA agreement with Cable and Wireless Optus

Perf – LA Cable 21 June 2000 ICMP Loss

Perf – LA Cable 21 June 2000 ICMP RTT

Perf – Optus IA3 21 June 2000 Packet Loss

Perf – Optus IA3 21 June 2000 ICMP RTT

Perf 6 June Optus international ICMP Loss

Perf 6 June Optus international ICMP RTT

Perf 6 June ACTRNO ICMP Loss

Perf 6 June ACTRNO ICMP RTT

WA

 “what’s alive” is based on nocol  Checks reachability of hosts/endpoints  Uses ICMP echo, but could be easily extended to check on service level availablity  Frequent check of all hosts

TCP based Measurements

Uses an active http file transfer

Measure at host

Measure from Netflow records

Can detect retransmissions

These may occur from packet loss/out of sequence packets in either direction

Load balancing impacts

 Can use contiguous IP addresses on monitoring machine to monitor per destination load balancing  Monitoring machine can determine performance on link but unable to determine which link is used.

 If a link fails then traffic will divert to other links

Load Balancing – round robin

Load Balancing – per packet

Load Balancing – 14 May

Load Balancing – 14 May

Load Balancing – 14 May

Flows…

 A flow is taken to be either a bidirectional or unidirectional communication between a source and destination host. The communication shares an address/port correspondence.

 The biggest indicator of scan/DOS attacks are generally flow records!

Netflow Records

 We keep detailed Flow records 

Timestamps and durations

Source/destination addresses

Protocol Types

Cumulative IP Flags

ICMP control types

Netflow Records

 Useful for determining metric targets eg top 100 WWW hosts   Can derive useful measurements from the netflow data itself Be wary on derived throughput – flows can take a long time.

What are the choices?

 Various tools and methods are available  No one tool is good for everything  Combinations of tools, both passive and active, leads to interesting and more detailed analysis

AARNet futures…

 Deployment of measurement machines  Monitoring and measuring ICMP, TCP and UDP  Monitoring QOS  Deploying one-way and round-trip metrics  To ensure the network does what its supposed to do…