Transcript simone.ppt
I N S R Institute for Networking and Security Research CSE Department PennState University
Inference, monitoring and recovery of large scale networks
Faculty: Thomas La Porta Post-Doc: Simone Silvestri Ph.D. Students: Srikar Tati, Brett Holbert, Michael Lin
INSR Industry Day 2014
Problems and challenges in large scale networks
Research problems
Inferencing Monitoring Recovery Challenges
Large scale Partial information Interdependent networks Constraints (time, cost, ..)
This research is sponsored by: Defense Threat Reduction Agency (DTRA) Internet router level topology Merlin Tool Army Research Lab and UK Ministry of Defence - ITA Program Inference, monitoring and recovery of large scale networks
2
INSR Industry Day 2014
Inferencing: motivation
The lack of global knowledge of the Internet topology Hinders network diagnostics (losses, failures, bottlenecks) Inflates IP path lengths Reduces accuracy of models Encourages overlay networks to ignore underlay Network operators rarely publish their topologies Current inference approaches rely on tools such as
Traceroute
Traceroute provides only
partial information
The network is only partially observable Previous approaches fail or peform poorly
Our problem
:
infer the routing topology in the presence of partial information Inference, monitoring and recovery of large scale networks
3
INSR Industry Day 2014
Inferencing: our approach - iTop
iTop algorithm: Fills unobservable parts of the network with virtual links/routers Analyzes the traces to determine properties of the real topology Iteratively merges links to infer the real network Trace analysis
iTop
+ Ground Truth topology Virtual topology Merging algorithm Inferred topology Inference, monitoring and recovery of large scale networks
4
INSR Industry Day 2014
Inferencing: our approach - Results
We compare our approach to state-of-art inferencing approaches: X. Jin, W.-P. Yiu, S. H. Chan, and Y. Wang, “Network topology inference based on end-to-end measurements,” IEEE Journal on Selected Areas in Communications, vol. 24, no. 12, pp. 2182 –2195, 2006 B. Yao, R. Viswanathan, F. Chang, and D. Waddington, “Topology inference in the presence of anonymous routers,” IEEE Infocom, 2003.
We consider realistic networks We also show how iTop improves the performance of failure diagnosis algorithms in the presence of partial information Inference, monitoring and recovery of large scale networks
5
INSR Industry Day 2014
Monitoring: motivation (1)
Accurate knowledge of the internal network state enables Performance diagnosis Resoruce allocation Efficient routing Congestion control Monitoring large scale networks may incur high overhead Network tomography Infer internal network from end-to-end measurements Solve a linear system = Enables efficient monitoring probing only a basis of the system Inference, monitoring and recovery of large scale networks
6
Monitoring: motivation (2)
Failures are common events in modern networks Failures can significantly affect the performance of network tomography INSR Industry Day 2014 Probing incurs a cost, often a maximum budget is available
Our problem
: select a set of probing paths to maximize the performance of network tomography under failures with a limited budget
Inference, monitoring and recovery of large scale networks
7
INSR Industry Day 2014
Monitoring: our approach
We translate the problem into a maximization of a submodular function under budget constraint We propose the algorithm RoMe Makes use of recent advances in submodular maximiztion theory Has an approximation factor (1-1/e)/2 It is optimal with additional constraint of linear independency Assumes knowledge of the failure distribution We consider the case of unknown failure distribution We propose the algorithm LSR (Learning with Submodular Rewards) Reinforcement learning approach Init Select paths Collect measurements Learns path availabilities Update path availabilities Performance guarantees Inference, monitoring and recovery of large scale networks
8
INSR Industry Day 2014
Monitoring: results
We compare our approach to state-of-art path selection algorithms Y. Chen, D. Bindel, H. Song, and R. H. Katz, “An algebraic approach to practical and scalable overlay network monitoring,” ACM SIGCOMM Comp. Com. Rev., 2004 .
We consider realistic topologies and failure models Inference, monitoring and recovery of large scale networks
9
INSR Industry Day 2014
Recovery: motivation
Modern networks are highly interdependent The Internet and the smart grid Water supply, transportaion, fuel and power stations are coupled together Interdependent networks are extremely sensitive to failures Electrical blackout that occurred in Italy in September 2003 Failures may create performance degradation Degradation can also propagate in the surviving network Inference, monitoring and recovery of large scale networks
10
INSR Industry Day 2014
Recovery: research problems (1)
Recovery algorithms for overlay networks
Two networks sharing the same infrastructure Failures occur in the underlay network and affect the overlay Models an emergency urban communication network after a weapon of mass destruction attack We aim at restoring the functionality of the overlay network repairing the underlay Objectives & constrains Bandwith Time Cost Utility Inference, monitoring and recovery of large scale networks
11
INSR Industry Day 2014
Recovery: research problems (2)
Models for temporal propagation of failures
Two general interdependent networks Failures propagate over time Backup batteries/generators Local solar plant supply Given the initial failure our model will: Estimate the probability that one element fails at a given time Estimate the expected time at which one element fails Estimate the expected number of failed elements at a given time These information will be used to design recovery strategies These models will be mapped and validated with real interdependent networks Inference, monitoring and recovery of large scale networks
12
INSR Industry Day 2014
Recovery: research problems (3)
Improve network robustness: Re-design existing networks Design new networks less prone to cascading effects Models and recovery strategies for performance degradation over time Partial knowledge Partial control Multiple interdependent networks Inference, monitoring and recovery of large scale networks
13
Thank you! Any question?
INSR Industry Day 2014 Inference, monitoring and recovery of large scale networks
14