Transcript - TNC 2005

Connect. Communicate. Collaborate
The Performance Enhancement
Response Team: Origins and
Evolution
Ann Harding, HEAnet [email protected]
Toby Rodwell, DANTE [email protected]
Michal Przybylski, PSNC, [email protected]
•Overview
• Origins
• Evolution – from trial to pilot
• GÉANT – the PERT in action
– Organisation
– Tools
• PERT case
– FermiLab-Renater throughput
Connect. Communicate. Collaborate
•Origins
Connect. Communicate. Collaborate
• Development of high capacity WAN, NREN and
campus backbones
– Still seeing end-to-end performance problems
– Need to look beyond the network
• Internet 2 End to End Performance Initiative (E2EPI)
– Initial proposal of a PERT in 2001
– A group of specialists who would be to network
performance what CERT is to network security
– Link user and expert to solve performance
problems
– Internet2 concept remained theoretical
•Evolution – European Trial PERT
Connect. Communicate. Collaborate
• 2002 TF-NGN meeting discussion
– Trial PERT for GÉANT Y4
• Principal participants
– GARR (IT), TERENA, DANTE, SWITCH (CH),
CESNET (CZ), HEAnet (IE) and UKERNA (UK)
– Entirely dependent on volunteer effort
• Format
– Mailing list hosted by SWITCH (Dec 2002)
– Free, open source issue tracker (Roundup) hosted by
SWITCH (Mar 2004)
•What’s in a name?
Connect. Communicate. Collaborate
• First mail to ‘pert-discuss’ stated PERT stood for
‘Performance Emergency Response Team’
• Second mail said the ‘E’ should be Enhancement!
• It was agreed that PERT should properly be ‘Performance
Enhancement & Response Team’
– Connotations of the CERT
– Removed the misleading ‘Emergency’ element from
the title
•Evolution – GÉANT2 Pilot PERT
Connect. Communicate. Collaborate
• November 2004 GÉANT2 Pilot PERT
– Service Activity 3 - Performance and Allocated
Capacity for End-users (PACE)
• New e-mail address for reporting PERT cases
• Roster of duty Case Managers
– Duty Case Manager spending to 2/3 hours per
day on open issues
• PERT Wiki
– Diary, to track successes and failures of the pilot
– Preliminary knowledgebase
•GÉANT2 Production PERT
Connect. Communicate. Collaborate
•GÉANT2 Production PERT
Connect. Communicate. Collaborate
• Who?
– PERT customers
– PERT staff
• What?
– Any academic networked system performance
problem
– Guaranteed investigation of problems
– Consultancy service
• When?
– March 2005 GÉANT production PERT
•Production PERT - Organisation
• PERT Participants
– PERT managers
– Full Time Case Managers
– Subject Matter Experts
– PERT customer
– PERT forum moderators
Connect. Communicate. Collaborate
•Production PERT - Tools
Connect. Communicate. Collaborate
• PERT Ticketing System (PTS)
– Ticket management & notification
– E-mail and Jabber integration
– PERT Diary
• PERT Knowledgebase (KB)
– Wiki-based public knowledgebase
– Organised, categorised PERT knowledge
– Updatable by any PERT member
• PERT Public Forum
• Production PERT PTS
Connect. Communicate. Collaborate
•PERT Case - Fermilab to
Renater:The Problem
Connect. Communicate. Collaborate
• Case example from Pilot PERT
• Problem observed transferring files from fnal.gov
(FermiLab) to a machine in Strasbourg
• The data consisted of many 15MByte files, totalling a few
hundred gigabytes
• The transferring application was "rsync"
• The bottleneck links were 100Mbps, but the achieved
transfer rate was typically 5Mbps
•PERT Case - FermiLab to
Renater: The Tests
Connect. Communicate. Collaborate
• Test machines in similar locations
• Use web100 tools
• Memory-memory routinely achieved 90+Mbps, using nttcp
– limited system & disk i/o capability on the receiving
machine
• Alternative receiving test machine
– long path, fast machines on both ends;
– data via ssh via TCP slower than accountable by crypto
overhead
– highlighted the ssh/ssl buffer limitations
•PERT Case - FermiLab to
Renater: Conclusions
Connect. Communicate. Collaborate
• The FermiLab sender-side rsync server had small TCP
buffers
• The Renater receive-side TCP buffers were too small
• On Linux, use auto-buffer-tuning on send and receive
• You've got to have at least 8MBytes of buffer space
available for 1xGigE across an ocean
• Final throughput
– memory-memory: 429Mbits/sec
– disk-disk: ~20Mbytes/sec (~160Mbits/sec)
• Conclusions
Connect. Communicate. Collaborate
• Wide problem domain
– SMEs crucial to success
– Areas of networking, applications, protocols, systems
• Only service of this kind in the world!
– Each new case enhances the service
– Here to help
•Acknowledgements
Connect. Communicate. Collaborate
The authors would like to acknowledge the
pioneering work done in the PERT by Simon
Leinen (SWITCH), Victor Reijs (HEAnet) and
Sven Ubik (CESNET), and Larry Dunn (Cisco) for
his analysis of the FermiLab-Renater case.