Transcript simone.ppt

I N S R Institute for Networking and Security Research CSE Department PennState University

Inference, monitoring and recovery of large scale networks

Faculty: Thomas La Porta Post-Doc: Simone Silvestri Ph.D. Students: Srikar Tati, Brett Holbert, Michael Lin

INSR Industry Day 2014

Problems and challenges in large scale networks

Research problems

  

Inferencing Monitoring Recovery Challenges

   

Large scale Partial information Interdependent networks Constraints (time, cost, ..)

This research is sponsored by:  Defense Threat Reduction Agency (DTRA)  Internet router level topology Merlin Tool Army Research Lab and UK Ministry of Defence - ITA Program Inference, monitoring and recovery of large scale networks

2

INSR Industry Day 2014

Inferencing: motivation

 The lack of global knowledge of the Internet topology     Hinders network diagnostics (losses, failures, bottlenecks) Inflates IP path lengths Reduces accuracy of models Encourages overlay networks to ignore underlay  Network operators rarely publish their topologies  Current inference approaches rely on tools such as

Traceroute

 Traceroute provides only

partial information

 The network is only partially observable  Previous approaches fail or peform poorly

Our problem

:

infer the routing topology in the presence of partial information Inference, monitoring and recovery of large scale networks

3

INSR Industry Day 2014

Inferencing: our approach - iTop

 iTop algorithm:  Fills unobservable parts of the network with virtual links/routers  Analyzes the traces to determine properties of the real topology  Iteratively merges links to infer the real network Trace analysis

iTop

+ Ground Truth topology Virtual topology Merging algorithm Inferred topology Inference, monitoring and recovery of large scale networks

4

INSR Industry Day 2014

Inferencing: our approach - Results

  We compare our approach to state-of-art inferencing approaches:   X. Jin, W.-P. Yiu, S. H. Chan, and Y. Wang, “Network topology inference based on end-to-end measurements,” IEEE Journal on Selected Areas in Communications, vol. 24, no. 12, pp. 2182 –2195, 2006 B. Yao, R. Viswanathan, F. Chang, and D. Waddington, “Topology inference in the presence of anonymous routers,” IEEE Infocom, 2003.

We consider realistic networks  We also show how iTop improves the performance of failure diagnosis algorithms in the presence of partial information Inference, monitoring and recovery of large scale networks

5

INSR Industry Day 2014

Monitoring: motivation (1)

 Accurate knowledge of the internal network state enables     Performance diagnosis Resoruce allocation Efficient routing Congestion control  Monitoring large scale networks may incur high overhead  Network tomography  Infer internal network from end-to-end measurements  Solve a linear system =  Enables efficient monitoring probing only a basis of the system Inference, monitoring and recovery of large scale networks

6

Monitoring: motivation (2)

 Failures are common events in modern networks  Failures can significantly affect the performance of network tomography INSR Industry Day 2014  Probing incurs a cost, often a maximum budget is available

Our problem

: select a set of probing paths to maximize the performance of network tomography under failures with a limited budget

Inference, monitoring and recovery of large scale networks

7

INSR Industry Day 2014

Monitoring: our approach

 We translate the problem into a maximization of a submodular function under budget constraint  We propose the algorithm RoMe  Makes use of recent advances in submodular maximiztion theory  Has an approximation factor (1-1/e)/2  It is optimal with additional constraint of linear independency  Assumes knowledge of the failure distribution  We consider the case of unknown failure distribution  We propose the algorithm LSR (Learning with Submodular Rewards)  Reinforcement learning approach Init Select paths Collect measurements  Learns path availabilities Update path availabilities  Performance guarantees Inference, monitoring and recovery of large scale networks

8

INSR Industry Day 2014

Monitoring: results

  We compare our approach to state-of-art path selection algorithms  Y. Chen, D. Bindel, H. Song, and R. H. Katz, “An algebraic approach to practical and scalable overlay network monitoring,” ACM SIGCOMM Comp. Com. Rev., 2004 .

We consider realistic topologies and failure models Inference, monitoring and recovery of large scale networks

9

INSR Industry Day 2014

Recovery: motivation

 Modern networks are highly interdependent   The Internet and the smart grid Water supply, transportaion, fuel and power stations are coupled together  Interdependent networks are extremely sensitive to failures Electrical blackout that occurred in Italy in September 2003  Failures may create performance degradation  Degradation can also propagate in the surviving network Inference, monitoring and recovery of large scale networks

10

INSR Industry Day 2014

Recovery: research problems (1)

Recovery algorithms for overlay networks

 Two networks sharing the same infrastructure  Failures occur in the underlay network and affect the overlay  Models an emergency urban communication network after a weapon of mass destruction attack  We aim at restoring the functionality of the overlay network repairing the underlay  Objectives & constrains   Bandwith Time   Cost Utility Inference, monitoring and recovery of large scale networks

11

INSR Industry Day 2014

Recovery: research problems (2)

Models for temporal propagation of failures

 Two general interdependent networks  Failures propagate over time   Backup batteries/generators Local solar plant supply  Given the initial failure our model will:    Estimate the probability that one element fails at a given time Estimate the expected time at which one element fails Estimate the expected number of failed elements at a given time  These information will be used to design recovery strategies  These models will be mapped and validated with real interdependent networks Inference, monitoring and recovery of large scale networks

12

INSR Industry Day 2014

Recovery: research problems (3)

 Improve network robustness:  Re-design existing networks  Design new networks less prone to cascading effects  Models and recovery strategies for performance degradation over time  Partial knowledge  Partial control  Multiple interdependent networks Inference, monitoring and recovery of large scale networks

13

Thank you! Any question?

INSR Industry Day 2014 Inference, monitoring and recovery of large scale networks

14