RECONSTRUCTION OF APPLICATION LAYER MESSAGE SEQUENCES BY NETWORK MONITORING Jaspal Subhlok University of Houston Houston, TX Amitoj Singh Fermi National Accelerator Laboratory Batavia, IL.

Download Report

Transcript RECONSTRUCTION OF APPLICATION LAYER MESSAGE SEQUENCES BY NETWORK MONITORING Jaspal Subhlok University of Houston Houston, TX Amitoj Singh Fermi National Accelerator Laboratory Batavia, IL.

RECONSTRUCTION OF
APPLICATION LAYER MESSAGE SEQUENCES
BY NETWORK MONITORING
Jaspal Subhlok
University of Houston
Houston, TX
Amitoj Singh
Fermi National
Accelerator Laboratory
Batavia, IL
1
Introduction
• Reconstruct Application layer message sequences
by analyzing Transport layer traffic.
TCP segments exchanged
N1
N2
messages
sent
messages
recvd
N3
2
Purpose (why bother ?)
• Application message exchange pattern is a
fundamental program property
– e.g., determines application performance in
different conditions
• Network traffic due to an application can be
monitored non-intrusively, but..
discovering application message sequence is hard
– need access to source code or a profiling library
Hence this method to construct application messages
from TCP monitoring
3
Particular Motivation
Data
Model
Sim 2
Pre
Stream
Vis
Sim 1
Application
?
Network
size and pattern of message exchanges is a key
component of an application profile used to select good
network nodes to execute on
4
Key Principle
• An application message is typically fragmented
into a consecutive sequence of TCP segments
where all except the last segment is of size MSS
(Maximum Segment Size).
Application Layer
1 unit = MSS
TCP layer
Application message
TCP segment
Last TCP segment
5
Message Reconstruction Procedure Phases
1. Separate TCP streams.
2. Sanitize a TCP stream.
3. Reconstruct application layer messages.
4. Error minimization by “best-of-three” technique.
6
Separating TCP streams
• A communication link transports multiple TCP streams
• A TCP stream spans a unique series of sequence numbers
MSS = 1448 bytes
1 : 1448
1449 : 2896
2897 : 4344
431376 : 432823
4345 : 5792
5793 : 7240
432824 : 433610
7241 : 8688
8689 : 10136
431376 : 432823
432824 : 433610
1 : 1448
1449 : 2896
2897 : 4344
4345 : 5792
5793 : 7240
7241 : 8688
8689 : 10136
Separate red and black streams of TCP Segments (not fool
7
proof but adequate)
Sanitizing TCP streams
• Insert TCP segments not recorded (assume it is rare)
• Filter out retransmissions
Missing TCP
segment is inserted
1 : 1448
1449 : 2896
2897 : 4344
4345 : 5792
7241 : 8688
8689 : 10136
10137 : 11584
1448 bytes 10137 : 11584
11585 : 13032
997 bytes
1448 bytes
1448 bytes
1448 bytes
1448 bytes
1448 bytes
1448 bytes
1448 bytes
1448 bytes
4345 : 5792
1448 bytes
5793 : 7240
1448 bytes
7241 : 8683
Missing TCP segment
Duplicate TCP
segment is removed.
8
Reconstruct application messages
• A TCP segment of size smaller than MSS (=1448)
indicates the end of an application message.
TCP
segments
1448 bytes
1448 bytes
1448 bytes
1448 bytes
1448 bytes
1448 bytes
1448 bytes
1448 bytes
997 bytes
1448 bytes
800 bytes
1 : 1448
+
1449 : 2896
+
2897 : 4344
+
4345 : 5792
5793 : 7240 +
+
7241 : 8688
+
8689 : 10136
+
10137 : 11584
11585 : 12574 +
Application
messages
12,574 bytes
End of Message
Start of Message
12575 : 14022
+
14023 : 15022
2,248 bytes
9
Best-of-three
•
Reconstruction heuristic is not perfect
1. A TCP segment smaller than MSS may be sent
before the entire application message is finished.
2. Two short application messages may be packed
into the same TCP segment.
1.
2.
Application message
TCP segment
10
Best-of-three
Basic idea:reconstruction heuristic is unlikely to fail
in exactly the same way in multiple identical runs
Solution: make 3 runs and select the majority view
at every stage
A
A
A
A+B
B
B
B
C
C
C
D
D
D
Run 2
Run 3
C+D
Run 1
Correct Message
Sequence
11
Experimental setup
• NAS parallel benchmark suite programs run on a
cluster of 4 workstations
• tcpdump utility used to capture TCP segments
• The reconstructed application layer message
sequence compared with the true sequence
obtained with profiling
12
Message Reconstruction
Results
100%
90%
80%
70%
60%
NO MATCH
50%
APPROX MATCH
40%
EXACT MATCH
30%
20%
10%
0%
BT
CG
IS
LU
MG
SP
NAS Benchmark
• APPROX MATCH: Includes reconstructed messages
off by upto 100 bytes AND/OR combined with one other
application message.
• Perfect for large messages (IS), Approx for small (LU)
13
Conclusions
• Majority of messages reconstructed accurately,
almost all detected approximately
• Accuracy low for large number of small messages
• Procedure based entirely on network
measurements, hence can be applied to any code
• Accuracy sufficient for resource selection in
Network/Grid environments.
14
Dominant communication pattern of the NAS benchmarks
15
Experimental Setup
100 Mbps
Ethernet switch
500 MHz dual
processor Pentium
Linux workstations.
tcpdump – capturing
outgoing TCP packets.
16