Queuing Cache - Washington University in St. Louis
Download
Report
Transcript Queuing Cache - Washington University in St. Louis
New Directions in Traffic
Measurement and Accounting
Cristian Estan – UCSD
George Varghese - UCSD
Reviewed by
Michela Becchi
Discussion Leaders
Andrew Levine
Jeff Mitchell
Outline
Introduction
Cisco NetFlow
Sample and Hold & Multistage Filters
Analytical Evaluation
Comparison
Measurements
Conclusions
Michela Becchi - 7/8/2015
Introduction
Measuring and monitoring of network traffic for
Internet Backbones
» Long term traffic engineering (traffic rerouting and link
upgrade)
» Short term monitoring (hot spots and DOS attacks’
detection)
» Accounting (usage based pricing)
Scalability problem
» FixWest, MCI traces: ~million flows/hour between end host
pairs
Michela Becchi - 7/8/2015
Cisco NetFlow
Flow: unidirectional stream of data identified by
» Source IP address and port
» Destination IP address and port
» Protocol
» TOS byte
» Rx router interface
An entry in DRAM for each flow
Heuristics for end-of-flow detection
Flow data exported via UDP packets from routers
to collection server for processing
Michela Becchi - 7/8/2015
Cisco NetFlow - problems
Processing overhead
» Interfaces faster then OC3 (155Mbps) slowed down by
memory cache updates
Collection overhead
» Collection server
» Network connection
NetFlow Aggregation (based on IP prefixes, ASes,
ports)
» Extra “aggregation” cache
» Only aggregated data exported to collection server
» PB: High amount of aggregates
Michela Becchi - 7/8/2015
Sampled NetFlow
Sampling packets
Per flow information based on samples
Problems:
» Inaccurate (sampling and losses)
» Memory Intensive
» Slow (DRAM needed)
Michela Becchi - 7/8/2015
Idea
“A small percentage of flows accounts for a large
percentage of the traffic”
» Algorithms for identifying large flows
Use of SRAM instead of DRAM
Categorize algorithms depending on:
1.
2.
3.
4.
Memory size and memory references
False negatives
False positives
Expected error in traffic estimates
Michela Becchi - 7/8/2015
Algorithms
Sample and Hold
» Sample to determine flows to consider
» Update flow entry for every subsequent packet belonging
to the flow
Multistage Filters
» Use multiple tables of counters (stages) indexed by a
hash function computed on flow ID
» Different stages have independent hash functions
» For each packet and for each stage, compute hash on flow
ID and add the packet size to corresponding counter
» Consider counters in all stages for addition of packets to
flow memory
Michela Becchi - 7/8/2015
Sample and Hold
Sampled Packet (probability=1/3)
Entry created
Flow Memory
Entry updated
F1 312
F3 21
F1
F1
F2
F3
F2
Transmitted Packets
Michela Becchi - 7/8/2015
F4
F1
F3
F1
Multistage Filters
flow memory
Array of
counters
Hash(Pink)
Michela Becchi - 7/8/2015
Multistage Filters
flow memory
Array of
counters
Hash(Green)
Michela Becchi - 7/8/2015
Multistage Filters
flow memory
Array of
counters
Hash(Green)
Michela Becchi - 7/8/2015
Multistage Filters
flow memory
Michela Becchi - 7/8/2015
Multistage Filters
Collisions
are OK
Michela Becchi - 7/8/2015
flow memory
Multistage Filters
Reached
threshold
flow memory
stream1 1
Insert
Michela Becchi - 7/8/2015
Multistage Filters
flow memory
stream1 1
Michela Becchi - 7/8/2015
Multistage Filters
flow memory
stream1 1
stream2 1
Michela Becchi - 7/8/2015
Multistage Filters
Stage 1
flow memory
stream1 1
Stage 2
Michela Becchi - 7/8/2015
Parallel vs. Serial Multistage Filters
Threshold for serial filters: T/d (d = number of
stages)
Parallel filters perform better on traces of actual
traffic
Michela Becchi - 7/8/2015
Optimizations
Preserving entries
» Nearly exact measurement of long lived large flows
» Bigger flow memory required
Early removal
» Definition of a threshold R < T to determine which entries
added in the current interval to keep
Shielding
» Avoid to update counters for flows already in flow memory
» Reduction of false positives
Conservative update of the counters
» Update normally only the smallest counter
» No introduction of false negatives
» Reduction of false positive
Michela Becchi - 7/8/2015
Conservative update of counters
Gray = all prior packets
Michela Becchi - 7/8/2015
Conservative update of counters
Redundant
Redundant
Michela Becchi - 7/8/2015
Conservative update of counters
Michela Becchi - 7/8/2015
Analytical Evaluation
Sample and Hold
» Prob.(false negatives): (1-p)^T ~ e^(-O)
» Best estimate for flow size s: c+1/p
» Upper bound for flow memory size: O*C/T
– Preserving entries: 2O*C/T
– Early removal: O*C/T+C/R
Parallel Multistage Filters
» No false negatives
» Prob(false positives): f(1/k)^d
» Upper bound for flow size estimate error: f(T,1/k)
» Bound on memory requirement
Where
T: threshold, p:sample prob (O/T), c: number of bytes counted for flow,
C: link capacity, O: oversampling factor, d: filter depth,
k: stage strength (T*b/C)
Michela Becchi - 7/8/2015
Comparison w/ Memory Constraint
Assumptions:
» Memory Constraint M
» The considered flow produces traffic zC (e.g. z=0.01)
Observations and Conclusions:
» Mz ~ oversampling factor
» S&H and MF better accuracy but more memory accesses
» S&H and MF through SRAM, SNetflow through DRAM, as
long as x is larger than the ratio of a DRAM memory access
to an SRAM memory access
Michela Becchi - 7/8/2015
Comparison w/o Mem Constraint
Observations and Conclusions:
» Through preserving of entries, S&H and MF provide exact
estimation for long-lived large flows
» S&H and MF gain in accuracy by losing in memory bound
(u=zC/T)
» Memory access as in case of constrained memory
» S&H provides better accuracy for small measurement
intervals => faster detection of new large flows
» Increase in memory size => greater resource consumption
Michela Becchi - 7/8/2015
Dynamic threshold adaption
How to dimension the algorithms
» Conservative bounds vs. accuracy
» Missing a priori knowledge of flow distribution
Dynamical adaptation
» Keep decreasing the threshold below the conservative
estimate until the flow memory is nearly full
» “Target usage” of memory
» “Adjustment ratio” of threshold
» For stability purposes, adjustments made across 3
intervals
Netflow: fixed sampling rate
Michela Becchi - 7/8/2015
Measurement setup
3 unidirectional traces of
Internet traffic
3 flow definitions
Traces are between 13%
and 17% of link
capacities
Michela Becchi - 7/8/2015
Measurements
S&H (threshold 0.025%, oversampling 4)
MF (strength=3)
Michela Becchi - 7/8/2015
Differences between
analytical bounds and actual
behavior (lightly loaded
links)
Effect of preserving entries
and early removal
Measurements
Flow IDs: 5-tuple
MF always better than
S&H
SNetflow better for
medium flows, worse for
very large ones
AS: reduced number of
flows (~entries in flow
memory).
Flow IDs: destination IP
Flow IDs: ASes
Michela Becchi - 7/8/2015
Conclusions
Focus on identifying large flows which creates the
majority of network traffic
Proposal of two techniques
» Providing higher accuracy than Sampled Netflow
» Using limited memory resource (SRAM)
Mechanism to make the algorithms adaptable
Analytical Evaluation providing theoretical bounds
Experimental measurements showing the validity
of the proposed algorithms
Michela Becchi - 7/8/2015
Future works
Generalize algorithms to automatically extract flow
definitions for large flows
Deepen analysis, especially to cover discrepancy
between theory and experimental measurements
Explore the commonalities with other research
areas (e.g.: data mining, architecture, compilers)
where issues related to data volume and high
speed also hold
Michela Becchi - 7/8/2015
The End
Questions?
Michela Becchi - 7/8/2015
Zipf distribution
Characteristics:
» Few data “score” very high
» A medium number of elements have “medium score”
» Huge number of elements “score” very low
Examples
» Use of words in a natural language
» Web use (e.g.: www.sun.com website accesses)
Michela Becchi - 7/8/2015