Transcript Document

Performance Enhancement and Response Team

TNC 2004, Rhodes (GR), 09/06/04 Nicolas Simar, Network Engineer Toby Rodwell, Network Engineer DANTE

Performance and Enhancement Response Team

– TNC 2004 – Rhodes (GR) -- Nicolas Simar ([email protected])

PERT Origins

Where is it coming from?

– Historically it is long distance circuits (the “wide-area”) that have been the bottleneck in a network.

– Over the last few years, the capacity of long distance circuits has significantly increased.

– Now, End-to-end performance bottle-necks may occur at any point in a system, application, hardware or LAN level in addition to wide-area networks.

– As such, it is becoming more and more difficult for a non-expert end-user to diagnose their network performance issues

Performance and Enhancement Response Team

– TNC 2004 – Rhodes (GR) -- Nicolas Simar ([email protected])

PERT Origins (2)

A group of NRENs met in December 2002 at the TERENA offices in Amsterdam…

– Mauro Campanella (GARR), Valentino Cavalli (TERENA), Larry Dunn (Cisco), Marian Garcia (DANTE), Simon Leinen (SWITCH), Victor Reijs (HEAnet), Nicolas Simar (DANTE), Sven Ubik (CESnet) and Steve R. Williams (Uni. of Swansea/UKERNA)

Performance and Enhancement Response Team

– TNC 2004 – Rhodes (GR) -- Nicolas Simar ([email protected])

PERT Origins (3)

…and they came up with the concept of PERT

– – To provide a support structure to investigate and resolve problems in the performance of applications over computer networks Comparable to CERT structure.

http://www.dante.net/pert

Performance and Enhancement Response Team

– TNC 2004 – Rhodes (GR) -- Nicolas Simar ([email protected])

PERT? What is it?

• • •

P erformance E nhancement and R esponse T eam A virtual team consisting of

– Cross-discipline experts who are capable of identifying the locations of performance bottle-necks.

– Subject specialists who can precisely diagnose the cause of a given problem and help the end-users resolve it.

Monitoring tools

– Deployment of a monitoring infrastructure to ease the troubleshooting.

Performance and Enhancement Response Team

– TNC 2004 – Rhodes (GR) -- Nicolas Simar ([email protected])

PERT? What is it? (2)

• •

Information storage, tracking and retrieval

– Tracking system.

Knowledge base to document

– Known performance issues, with possible ways to address them.

– Successful diagnostic strategies.

Performance and Enhancement Response Team

– TNC 2004 – Rhodes (GR) -- Nicolas Simar ([email protected])

PERT current status

PERT is in a Pilot phase

– – Informal, unregulated access to PERT; anybody can request the help from PERT.

Primary purpose of investigation is to improve PERT’s knowledge and experience.

– As this is a Pilot phase, the problems are addressed on a best effort basis.

– – No dedicated Monitoring tools.

RoundUp tracking system (off-the-shelf) used.

Performance and Enhancement Response Team

– TNC 2004 – Rhodes (GR) -- Nicolas Simar ([email protected])

PERT current status

• •

Please DO e-mail … with any performance issues you or your customers are experiencing and would like investigated Please DON’T assume the issue will quickly resolved

– However, very few issues to date have been passed to the PERT.

Performance and Enhancement Response Team

– TNC 2004 – Rhodes (GR) -- Nicolas Simar ([email protected])

Case 1 - Strasbourg Astrology Laboratory - Fermilab

• • •

From Fermilab SDSS.

– – SDSS – Sloane Digital Sky Survey Transfer rate is 5Mbps (rsync)

PERT contacted Potential causes

– – – – Ethernet interfaces not full duplex mode TCP buffer size, occasional losses on large RTT path Application Hardware

Performance and Enhancement Response Team

– TNC 2004 – Rhodes (GR) -- Nicolas Simar ([email protected])

Case 1 Strasbourg-Fermilab (2)

Use of other machines and same software

– – – US to FR, memory to memory with large buffer using NTTCP/iperf: 90Mbps But when using rsync with large buffer: lower than 20Mbps.

Investigate now the HW, the application and keep an eye on the network losses.

Performance and Enhancement Response Team

– TNC 2004 – Rhodes (GR) -- Nicolas Simar ([email protected])

Case 2 – JIVE - GEANT

• •

Joint Institute VLBI in Europe

– – (

JIVE )

VLBI: Very Long Baseline Interferometry JIVE is located in Dwingerloo (The Netherlands) and collects and correlate data from the European VLBI network (radio telescopes).

Test the download of 430MB file from the JIVE website in Dwingerloo to the University of Oxford.

– Problems with the systems in Oxford, therefore test done between JIVE and a GÉANT workstation.

Performance and Enhancement Response Team

– TNC 2004 – Rhodes (GR) -- Nicolas Simar ([email protected])

Case 2 – JIVE (2)

• • •

Initial transfer test:

– – Via http, using wget Took 5 minutes to complete the 430Mbps transfer, (approximately 10Mbps throughput)

PERT case opened Potential causes

– – Ethernet interfaces not full duplex mode Insufficiently large TCP buffers

Performance and Enhancement Response Team

– TNC 2004 – Rhodes (GR) -- Nicolas Simar ([email protected])

Case 2 – JIVE (3)

– The TCP receive buffers max size on GEANT of reasonable size – wget uses the default TCP buffer size size. TCP default buffer size increased on two receiver (ws4.uk: Linux -> 8MB, ws1.de: Unix -> 196kB)

Improvement: 40Mbps

– – Could not access the JIVE webserver to increase the Tx buffer (critical production machine) Access was granted to the JIVE FTP server, where the Tx buffer was increased to 2MB

Improvement: 90Mbps

Performance and Enhancement Response Team

– TNC 2004 – Rhodes (GR) -- Nicolas Simar ([email protected])

Lessons Learned

• • •

Identify technical contact at each end Determine the scope of testing possible

– – Production machines involved, some configurations changes may not be acceptable for testing purpose.

Strong test machines needed (close to the user, easy access to it)

Wherever possible, use methods to minimise the amount of variables

– e.g. sink data to /dev/null, memory to memory transfer not to disk

Performance and Enhancement Response Team

– TNC 2004 – Rhodes (GR) -- Nicolas Simar ([email protected])

• •

GN2

SA3: enable End-to-End QoS across European academic networks on a routine basis.

– – – Development of a distributed provisioning system (SA3) and a performance monitoring system (JRA1).

Need to deploy both systems to be “SA3-compliant” GN2 will only ensure QoS edge-to-edge; the PERT will be crucial to help users to get the best performance from their end-systems in order that they may make full use of any Premium IP service they might reserve.

PERT structure and operation will be designed by SA3.

Performance and Enhancement Response Team

– TNC 2004 – Rhodes (GR) -- Nicolas Simar ([email protected])

GN2 SA3 milestones

First 12 months

– – – – – – 31 Oct 04 - Establish PERT 31 Jan 05 - PERT Ticket system deployed 28 Feb 05 - PERT troubleshooting procedures published 28 Feb 05 - PERT knoweldgebase operational 31 May 05 - Consultancy service established 31 May 05 - First issue of 'User Guide' and 'Best Practice Guide'

Performance and Enhancement Response Team

– TNC 2004 – Rhodes (GR) -- Nicolas Simar ([email protected])

GN2 SA3 – Work Items

• •

WI-2 Define and establish the PERT (Sep 04 Dec 04) WI-3 Deliver PERT Documentation

– The heart of the PERT documentation will be a knowledgebase. The knowledgebase will be the source for a Best Practice Guide (information for network administrators) and an End User's Guide

Performance and Enhancement Response Team

– TNC 2004 – Rhodes (GR) -- Nicolas Simar ([email protected])

GN2 SA3 – Work Items

WI-4 Deliver PERT Ticket System (Sep 04 May 05)

– Enable the PERT (and its customers) to track issues from when they are raised to when they are resolved. – An end user will never raise an issue directly with their NREN. Each PERT ticket will be created from a NREN ticket that has been escalated to the PERT – It is expected there will be an interface between the Ticket System and the Knowledgebase, so that all PERT cases can be accessed through the knowledgebase.

Performance and Enhancement Response Team

– TNC 2004 – Rhodes (GR) -- Nicolas Simar ([email protected])

GN2 SA3 – Work Items

• •

WI-5 Deliver PERT Troubleshooting Procedures (Sep 04 - Dec 04) WI-6 PERT Day to Day Operations (Feb 05 – end of the project)

– Case Managers (CM) receive the PERT requests. Every issue is opened by them. Note: CMs funded by GN2.

– When a case managers cannot solve the problems themselves they will localise the area of the problems and then contact a Subject Matter Expert (SME). Note: the SMEs will probably work on a voluntary (unfunded) basis.

Performance and Enhancement Response Team

– TNC 2004 – Rhodes (GR) -- Nicolas Simar ([email protected])

Any Question?

Thank you

.

http://www.dante.net/pert [email protected]

[email protected]

Performance and Enhancement Response Team

– TNC 2004 – Rhodes (GR) -- Nicolas Simar ([email protected])