How to Increase the ROI of Your Management System Through

Download Report

Transcript How to Increase the ROI of Your Management System Through

How to Increase the ROI of Your
Management System Through
Simulation
Dennis Morton
Practice Director, Network Operations and
Infrastructure Management
Greenwich Technology Partners
www.micromuse.com
Copyright © Micromuse, 2002. All rights reserved.
Session Overview
 Motivation for using simulation
 The MIMIC SNMP Simulator
– Technology and Limitations
 Simulation Examples
–
–
–
–
OpenView NNM
MtTrapd Probe
Visionary
OpenNMS
 What’s my tool doing?
 Wrap-up
Copyright © Micromuse, 2002. All rights reserved.
2
Motivation for Simulation
The Problem…
 Management Systems are complex
– Many moving parts
– Many vendors
– Many interfaces
 How can you test all the interfaces in the system?
 Little toleration for “failure”
– … especially for critical problems
 How do you make sure you have every condition
covered?
– How do you test conditions like excessive CRC errors or
excessive spanning tree recalculations?
– There are much more complex problems to test
Copyright © Micromuse, 2002. All rights reserved.
4
Solutions – “Wait and See”
 Wait until something happens and then modify the
system to catch it next time
 Terrible approach!
– The problem must happen twice to verify that the system can
catch it!
– … if you made the modification properly the first time!
 100% Reactive
– What happened to proactive network management?
– Not good to have proactive tools but a reactive engineer…
Copyright © Micromuse, 2002. All rights reserved.
5
Solution – “Raw Capture”
 Raw Probe capture and log file duplication
 Not much better
– Still must wait for a problem to occur
– However, can ensure problem is solved
– Can be risky for production systems
– Raw capture files can/will get huge
– Only works at the Omnibus interface
– What about everything before that?
 Still 100% Reactive!
Copyright © Micromuse, 2002. All rights reserved.
6
Solutions – “Build a Lab”
 Build a lab composed of real devices and a mirror of
the production system
 Ideal solution
 But…
– Difficult to use
– You must configure the devices and create conditions
– If not you, then someone else who you must schedule time with
– May run into conflicts
– Expensive!
– Real boxes cost real $$
– Could share, but this rarely works in practice
Copyright © Micromuse, 2002. All rights reserved.
7
Solution – “Simulate Your Network”
 Use a simulation tool to create a near-perfect copy of
your production network
 Many advantages
– Simulate your actual network topology and devices
– Simple to cause faults, even complex ones
– Extremely rapid edit-test cycle
– Allows you to test many alternatives quickly
– Rapid feedback is good
– Much, much easier to test/verify complex workflow
– Think Impact policy testing, Reporting, Gateways, etc…
– This is how the vendors do it!
– Pump them for their simulations 
 Some caveats
– Covered in detail later…
Copyright © Micromuse, 2002. All rights reserved.
8
Simulation Options and
Limitations
MIMIC SNMP Simulator
 Market leader in SNMP simulation
– http://www.gambitcomm.com/
– Used by the majority of hardware companies for testing SNMP
agents and developing/testing management software
– Used by software companies like Micromuse as well
 Products
– Core MIMIC SNMP Simulator
– MIMIC Recorder
– Discovery Wizard, Simulation Wizard, Topology Wizard, etc.
– Much more…
– Cisco IOS Simulator
– Full IOS Simulation via Telnet
– Cable Modem Simulator
Copyright © Micromuse, 2002. All rights reserved.
10
MIMIC SNMP Simulator – Technology
 Seamlessly supported on Linux, Windows, and Solaris
– 2000 agents/box for Windows, 10,000 (!)/[U|Li]nix box
 Core Simulation Engine
– Supports any combination of SNMP V1, V2, V2c, and V3
– Extremely fast, native code core engine
– 1GHz PIII w/512MB can easily simulate 250 devices with boatloads of
headroom
 TCL/Tk GUI Components
– MIMIC Shell for CLI access to the engine and MIMICView for GUI
access to features
– Orthogonal feature sets
– Wizards for ease of performing complex functions
– I.e. Discovery Wizard for “recording” an entire network of devices
– Topology Wizard for manipulating a topology
– TCL, Perl, Java, and C++ APIs
– TCL supported at the engine level
– IOW, you have to learn TCL
Copyright © Micromuse, 2002. All rights reserved.
11
MIMIC SNMP Simulator – Cost
 Priced by agent plus yearly maintenance fee
– 25 Agent license is $5K + $1,250 (support)
– 250 Agent license is $10K + $2500 (support)
– Or, about $250/agent for a 25 agent license and $50/agent for a
250 agent license.
 Food for thought…
– Used 2600 routers cost $750 and up
– Much, much more for anything but basic interfaces
– Feature-rich switches even more
– Trunking/VoIP == $$$!
– How much would electricity cost for 25 boxes?
Copyright © Micromuse, 2002. All rights reserved.
12
Limitations of Simulation
 Most limitations stem from one fact: a single physical
node is acting like many virtual nodes
– Can’t just change ifOperStatus to force a link “down”
– Node is still ping-able, after all
– For root-cause tools, YMMV
– Causing good faults takes more thought
 MIMIC-specific notes
– May not have enough node licenses available for your network
– Some faults are just not possible
– EIGRP, for example
– Some require fairly complex TCL scripting
– But, when have we ever shied away from that!
– SNMP, IOS, and Telnet only.
– No syslog simulator
Copyright © Micromuse, 2002. All rights reserved.
13
MIMIC Caveats
 MIMIC has some caveats as a NMS simulation tool
 Recording
– Only records a single IP address for an agent
– Gambit has a script to add the additional IP aliases
– Still works quite well with only one, though
– All counters set to one of three functions
 Simulating
– Tricky to have many problems happen simultaneously
– Mainly affects root-cause simulation
– Simulations consist of thousands of small files
– Very very difficult to create one by hand!
– Simulation Wizard greatly simplifies this
– Can become difficult to remember where you put modified files
Copyright © Micromuse, 2002. All rights reserved.
14
Simulation Examples
Example Simulation
 Recording of the GTP internal network
– Two real devices
– Can you spot them? 
 Mix of Cisco routers, switches, and Sun workstations
 Note:
– Community Strings in the simulation do not have to be the
same as those on the real devices
– Same goes for any aspect of the simulated device. IOW, you
could test the affect of moving to SNMPv3 on your systems!
Copyright © Micromuse, 2002. All rights reserved.
16
Example Simulation (cont.)
Copyright © Micromuse, 2002. All rights reserved.
17
Example – OpenView NNM
 NNM works quite well with MIMIC
 When recording, MIMIC stores the ARP and routing tables
 NNM will use these to draw a nice topology!
Copyright © Micromuse, 2002. All rights reserved.
18
Example – OpenView NNM (cont.)
 Most built-in NNM applications work fine
 Traceroute obviously won’t, though
– Any tool that uses SNMP to locate a route, however, will work
Copyright © Micromuse, 2002. All rights reserved.
19
Example – Visionary
 MIMIC works especially well with tools like Visionary that have no
notion of topology
 Just enter the IP address of the agent and start causing faults!
Change
local.busyPer
here
Copyright © Micromuse, 2002. All rights reserved.
20
Example – Visionary (cont.)
See the event
Immediately!
Copyright © Micromuse, 2002. All rights reserved.
21
Example – MtTrapd Probe
 First, generate traps
– Once/periodically via GUI or arbitrarily via scripts
 Specify trap type and rate
– MIMIC can quite easily cause a trap storm!
Copyright © Micromuse, 2002. All rights reserved.
22
Example – MtTrapd Probe (cont.)
 Specify Trap Variable bindings
– Fun test – try nonsensical values!
Copyright © Micromuse, 2002. All rights reserved.
23
Example – MtTrapd Probe (cont.)
 Choose security options (optional)
– MIMIC supports the full range of traps/informs
Copyright © Micromuse, 2002. All rights reserved.
24
Example – MtTrapd Probe (cont.)
 Voila!
Note the Counts!
Copyright © Micromuse, 2002. All rights reserved.
25
Example – OpenNMS
 Great example of the
limitations of simulation
 Discovered the same services
on all simulated nodes!
 Will still work, but should
disable service monitoring for
the MIMIC server
Copyright © Micromuse, 2002. All rights reserved.
26
Another Use for Simulation
What’s My NMS Tool Doing?
 Another great use for simulation: tracing the SNMP
polling behavior of NMS tools
 Lets you explore in detail precisely what your NMS
tools are doing and how they react to changes
–
–
–
–
–
What happens when you turn off SNMPv1 and turn on v2/v3?
What happens when you radically change a node?
What MIB objects are being polled and how often?
How efficient is my polling engine?
Many, many more examples…
 Tremendously useful for customization
– Trace output easier to use than Sniffer output, for example
 Let’s see what a trace looks like…
Copyright © Micromuse, 2002. All rights reserved.
28
Trace Example – Visionary
 PDU packing in action!
– This single GET had 71
varbinds
INFO 09/08.06:49:42 - agent 16, PDU type GET, req ID 1000a3e3
1.3.6.1.4.1.9.9.43.1.1.1.0 = ccmHistoryRunningLastChanged.0
1.3.6.1.4.1.9.9.43.1.1.2.0 = ccmHistoryRunningLastSaved.0
1.3.6.1.4.1.9.9.43.1.1.3.0 = ccmHistoryStartupLastChanged.0
1.3.6.1.4.1.9.2.1.46.0 = bufferFail.0
 Very useful for testing/tracing
custom rules
1.3.6.1.4.1.9.2.1.47.0 = bufferNoMem.0
1.3.6.1.4.1.9.2.1.19.0 = bufferSmMiss.0
1.3.6.1.4.1.9.2.1.27.0 = bufferMdMiss.0
1.3.6.1.4.1.9.2.1.35.0 = bufferBgMiss.0
1.3.6.1.4.1.9.2.1.43.0 = bufferLgMiss.0
1.3.6.1.4.1.9.2.1.67.0 = bufferHgMiss.0
1.3.6.1.4.1.9.9.48.1.1.1.5.1 = ciscoMemoryPoolUsed.1
1.3.6.1.4.1.9.9.48.1.1.1.6.1 = ciscoMemoryPoolFree.1
1.3.6.1.4.1.9.9.48.1.1.1.7.1 = ciscoMemoryPoolLargestFree.1
1.3.6.1.4.1.9.9.48.1.1.1.5.2 = ciscoMemoryPoolUsed.2
1.3.6.1.4.1.9.9.48.1.1.1.6.2 = ciscoMemoryPoolFree.2
1.3.6.1.4.1.9.2.1.56.0 = busyPer.0
1.3.6.1.4.1.9.9.109.1.2.2.1.6.1.48 = cpmProcExtUtil1Min.1.48
1.3.6.1.4.1.9.9.109.1.2.2.1.6.1.90 = cpmProcExtUtil1Min.1.90
1.3.6.1.4.1.9.9.109.1.2.2.1.6.1.96 = cpmProcExtUtil1Min.1.96
1.3.6.1.4.1.9.9.109.1.2.2.1.6.1.97 = cpmProcExtUtil1Min.1.97
1.3.6.1.4.1.9.9.109.1.1.1.1.4.1 = cpmCPUTotal1min.1
1.3.6.1.2.1.10.7.2.1.2.2 = dot3StatsAlignmentErrors.2
… and so on!
Copyright © Micromuse, 2002. All rights reserved.
29
Trace Example – OpenNMS
 Trace from OpenNMS 1.0.1
INFO 08/25.20:01:34 - agent 25, PDU type BULK, req ID 3191fd84
 Notice:
– Use of Get BULK
– Correctly noticed that node
1.3.6.1.4.1.9.2.1.8 = freeMem.
supported SNMPv2
1.3.6.1.4.1.9.2.1.58 = avgBusy5.
1.3.6.1.4.1.9.2.1.46 = bufferFail.
1.3.6.1.4.1.9.2.1.47 = bufferNoMem.
1.3.6.1.4.1.9.2.1.15 = bufferSmTotal.
1.3.6.1.4.1.9.2.1.16 = bufferSmFree.
1.3.6.1.4.1.9.2.1.18 = bufferSmHit.
1.3.6.1.4.1.9.2.1.19 = bufferSmMiss.
1.3.6.1.4.1.9.2.1.23 = bufferMdTotal.
1.3.6.1.4.1.9.2.1.24 = bufferMdFree.
1.3.6.1.4.1.9.2.1.26 = bufferMdHit.
1.3.6.1.4.1.9.2.1.27 = bufferMdMiss.
1.3.6.1.4.1.9.2.1.31 = bufferBgTotal.
1.3.6.1.4.1.9.2.1.32 = bufferBgFree.
1.3.6.1.4.1.9.2.1.34 = bufferBgHit.
1.3.6.1.4.1.9.2.1.35 = bufferBgMiss.
1.3.6.1.4.1.9.2.1.39 = bufferLgTotal.
1.3.6.1.4.1.9.2.1.40 = bufferLgFree.
1.3.6.1.4.1.9.2.1.42 = bufferLgHit.
1.3.6.1.4.1.9.2.1.43 = bufferLgMiss.
1.3.6.1.4.1.9.2.1.63 = bufferHgTotal.
1.3.6.1.4.1.9.2.1.64 = bufferHgFree.
1.3.6.1.4.1.9.2.1.66 = bufferHgHit.
1.3.6.1.4.1.9.2.1.67 = bufferHgMiss.
Copyright © Micromuse, 2002. All rights reserved.
30
Conclusion
Conclusion
 Simulation can be a very cost effective method to:
– Verify enhancements to your NMS before placing them into
production
– Proactively ensure that key conditions are handled properly
– Test and debug complex workflow
Our tools are proactive – why shouldn’t we be as well?
Copyright © Micromuse, 2002. All rights reserved.
32
Questions?
Contact information:
Dennis Morton
[email protected]
Cell: (214) 289-3675