GridNM Network Monitoring Architecture

Download Report

Transcript GridNM Network Monitoring Architecture

GridNM
Network Monitoring Architecture
(and a bit about my phd)
Yee-Ting Li, <[email protected]>
1st Year Report @ UCL, 17th June 2002
What the GRID is
Distributed System
 Interconnected with networks
 Balancing processors, storage and
network utilisation
 Like the SETI project on steriods
 Networking is important to make
GRID work

2
GridNM - Yee-Ting Li
17th June 2002
Networking Important!
Only way two grid nodes can
communicate with each other
 Need ways of determining how
‘efficiently’ they talk
 Focus on:

The characterising how they talk
 The language they use to talk

3
GridNM - Yee-Ting Li
17th June 2002
Part 1
Network Metrics and Measurement
 GridNM
 Case studies

4
GridNM - Yee-Ting Li
17th June 2002
Network Metrics /
Characteristics
Metric: ‘several quantities related to the
performance and reliability of the
Internet that we'd like to know the value
of. When such a quantity is carefully
specified, we term the quantity a
metric.’
 Can be empirical or derived
 Singletons, Sample and Statistical
Metrics

5
GridNM - Yee-Ting Li
17th June 2002
Example Metrics
Connectivity
 One-way delay
 Two-way delay
 Throughput / goodput
 Network path
 Loss
 Jitter

6
GridNM - Yee-Ting Li
17th June 2002
Metrics Example

Video Conferencing





FTP



7
Needs predictable bit rate
Doesn’t usually matter if bit rate changes too
much
Needs constant jitter
Low one-way delay preferable
Needs reliable transport
Throughput depends on urgency of data
Jitter and delay don’t matter
GridNM - Yee-Ting Li
17th June 2002
Measurement Methodology






8
How to get the metrics
Must be repeatable – need to define
methodology carefully
Direct measurement of a performance
metric using injected test traffic.
Projection of a metric from lower-level
measurements.
Estimation of a constituent metric from a set
of aggregated measurements.
Estimation of a given metric at one time
from a set of related metrics at other times.
GridNM - Yee-Ting Li
17th June 2002
Measurement Example





9
‘ping’ measures rtt – a direct measurement
Sending a single ‘ping’ would give a
singleton - empirical
Sending 10 pings (a sample) out and
getting the average is a statistical metric –
derived
Using a set of measurements over time, we
can derive an Estimate of the rtt
Projection would be if we had the owd for
each router to the next – add all up together
to get path owd.
GridNM - Yee-Ting Li
17th June 2002
Network Monitoring Uses





10
Monitoring is measuring over long periods
of time
Gives an indication of network performance
over time – a baseline
Allows comparison of different tools for
analysis
Allows analysis of how different protocols
behave in different conditions – in real life
Allows ‘tuning’ of existing protocols to make
most out of network
GridNM - Yee-Ting Li
17th June 2002
GridNM





Architecture for monitoring the network
Backend – collects data for presentation
Logs metrics in ASCII log files on a single
host
Allows mesh measurements – all nodes
performs measurements to al other nodes
Uses standard UNIX infrastructure – ssh

11
Should be easily adaptable to using Globus
certifications once interactive processing is
introduced in EDG.
GridNM - Yee-Ting Li
17th June 2002
GridNM (cont…)


Uses existing (and future tools) to collect
metrics
Modular - uses XML to describe available
resources




12
Hosts
Tools
Locks hosts if under measurement –
prevents other tests affecting metrics
Currently monitoring 6 sites around Europe
using 5 tools
GridNM - Yee-Ting Li
17th June 2002
GridNM ‘plot’
13
GridNM - Yee-Ting Li
17th June 2002
Security

As secure as SSH

But requires automatic logon
Denial of Service Attacks
 Certain Tools (eg iperf) require
servers to be run.


14
GridNM runs the server (unless
otherwise told not to) before each
tests on the remote host
GridNM - Yee-Ting Li
17th June 2002
Tool Examples
Name
Protocol Metrics
Notes
Iperf
TCP/UDP Goodput
Idea application
level performance
UDPMon
UDP
Indication of
network
performance
Ping
ICMP/IP
Loss,
goodput
RTT, Jitter
Traceroute
ICMP/IP
Path, RTT
Pipechar
UDP?
BBCP
GridFTP
TCP
TCP…
Router
utilisation
Goodput
Goodput
15
GridNM - Yee-Ting Li
Response of
network
Approximate
SCP Copy
Application
17th June 2002
UDP versus TCP
16
GridNM - Yee-Ting Li
17th June 2002
Rtt – good network
17
GridNM - Yee-Ting Li
17th June 2002
Rtt – periodicity
18
GridNM - Yee-Ting Li
17th June 2002
Rtt – bad network
19
GridNM - Yee-Ting Li
17th June 2002
Rtt – bad network, loss
20
GridNM - Yee-Ting Li
17th June 2002
TCP / Iperf Throughput
21
GridNM - Yee-Ting Li
17th June 2002
TCP Performance
22
GridNM - Yee-Ting Li
17th June 2002
TCP Performance
23
GridNM - Yee-Ting Li
17th June 2002
What does TCP do?
Socket buffer size



TCP Protocol



Network
24
Tap is independent of Tank size
Tank filled by application
Valve opening (data rate)
determined by feedback from
network
Small tanks mean small data rate
Large tanks mean larger data rate
Even larger tank mean smaller
data rate?!?!
GridNM - Yee-Ting Li
17th June 2002
Investigation




Possible explanation:
Rate of tank filling < rate of water flow out
i.e. application not fast enough to fill socket
buffer past threshold
BUT - needs further investigation


25
Back to back lab tests with PCs and routers
Comparison to other tcp based tools
GridNM - Yee-Ting Li
17th June 2002
Part 2
Network Communication Languages
 Known as transport protocols determines how applications put traffic
into the network
 Sits on top of IP – common language
of the internet

26
GridNM - Yee-Ting Li
17th June 2002
Transport Level Protocols


27
TCP (HTTP, FTP, GridFTP) used for file transfer
 Gives guarantee on delivery
 All data is copied precisely
 Performance can be poor
 Respects other internet users
UDP (Real, H323) used for video conferencing
 Gives no guarantees on delivery
 Data may be incomplete
 Performance good
 Doesn’t respect other internet users
GridNM - Yee-Ting Li
17th June 2002
UDP versus TCP
performance at high speeds
28
GridNM - Yee-Ting Li
17th June 2002
Measuring Performance of
Transport Level Protocols



Need to identify what we want to measure –
the metrics.
Dependant on the use of the transport
protocol. Need to analyse application level
usage
For Grid:

Movement of ‘transient’ data
• File Transfer and Replication
• process jobs or ‘sandboxes’

Movement of Real-Time Data
• Video Conferencing – Access Grid
• Real-Time applications
29
GridNM - Yee-Ting Li
17th June 2002
Transport Protocols ‘NG’
Name
Transport
UDP Blast
UDP
Tsunami
UDP/TCP
Uses TCP as
‘control’ channel
High Speed
TCP
TCP
For 10Gb/sec links
PGM / CC
Modified UDP
Multicast UDP – new
transport protocol
Application ‘logistical
networking’
IBP
30
Notes
GridNM - Yee-Ting Li
17th June 2002
Tools to Measure Grid Traffic




31
Eg TCP
 Can use web100 – allows analysis of TCP traffic via
fundamental variables important to TCP/IP\
 GridFTP allows logging of transfer information
UDP (UDP Blast, Tsunami)
 Need either transport level recording (like web100) or
application monitoring
PGM / CC
 Need application to be built to use transport protocol
General Solution
 Gather SNMP data from nodes along network.
GridNM - Yee-Ting Li
17th June 2002
Future Directions
(the phd bit)

Provision Title in field of



32
Providing Advanced Transport Protocols for
Grid Applications
Aim: Use GridNM infrastructure to analyse
performance of different transport protocols
Implement findings into Grid infrastructure,
eg GridFTP, to improve grid processes
(processing jobs, file transfer, file
replication, Access Grid…)
GridNM - Yee-Ting Li
17th June 2002
Conclusion
Created a flexible infrastructure to
monitor and analyse internet traffic
 Shown metrics for different scenarios
 Given performance overview of
current transport protocols
 Identified future areas of research into
Transport Protocols for the grid.

33
GridNM - Yee-Ting Li
17th June 2002