Fault Management - University of Wollongong

Download Report

Transcript Fault Management - University of Wollongong

Fault Management
IACT 418/918 Autumn 2005
Gene Awyzio
SITACS University of Wollongong
Overview
• Fault Management is the process of
locating and correcting network
problems or faults
• Comprehensive fault management is
probably the most important task in
Network Management
2
Benefits of Fault Management
Process
• Increased network reliability
– Provides tools allowing engineer to quickly
• Detect problems
• Initiate recovery procedures
• Need to maintain the illusion of complete
and continuous connectivity
• Also provides tools to extract information
about the networks current state
3
Accomplishing Fault
Management
• Can be considered as a three (3) step
process
– Identify the fault
– Isolate the cause of the fault
– Correct the fault if possible
4
Identifying the fault
• Gathering Information to identify a
problem
– To learn that a problem exists we need to
gather data about the current state of the
network
• Two approaches
– Log critical network events
– Poll network devices
5
Identifying the fault
• Critical network events
– Examples
• Failure of a link
• Lack of response from host
– Transmitted by network device when fault
conditions occur
– Reactive method
– If device fails it cannot send an event
6
Identifying the fault
• Occasional Polling
– Can help find faults in a timely manner
– Tradeoff
• Degree of timeliness vs bandwidth consumption
– Other factors
• Number of devices to poll
• Bandwidth of links
7
Identifying the fault
• Example of Occasional Polling
– Assume each query and response is 100 bytes long
(including data and header information)
– For a network of 30 devices
• (100 + 100) * 30 = 6000bytes/polling interval = 48,000
bits/polling interval
– Polling every minute
• 800 bits/second
• (48,000 bits/polling interval * 60 secs * 60 polls) = 172,800,000
= 173 Megabits/hour
– Polling every 10 minutes
• 17.3 Megabits/hour
• May not know about event for 10 minutes
8
Deciding Which Faults to
Manage
• Need to decide which faults to mange
– Need to prioritise faults
– If number of faults reports is high network may not
handle volume
– Limiting event traffic can reduce redundant
transmissions and storage
• Factors to consider
– Scope of control over network
– Size of network
9
Fault Management of a Network
Management System
• Simplest system
– Reports existence of fault but NOT location
• More complex tool
– Uses capability of hosts and network devices to
• Send critical network events
• Facilitate isolation of fault cause
• Advanced tool
– Correction of fault
10
Impact of a Fault on the
Network
• A fault management tool MUST be capable of
analysing how a fault can affect other areas of
the network
• Need to know
– What services the fault
• STOPS
• IMPACTS
– Not only that a fault has occurred but also how that
fault affects other network communication
• Data can come from performance
management tools
11
Form of Reporting Faults
• Common forms of fault reporting
– Text
– Graphical
– Auditory signals
• Text
– Will work on any type of terminal
12
Form of Reporting Faults
• Graphical
– Considered to be very effective
– Can use flashing images to gain attention
– Colour can be used to indicate device
status
• Auditory signals
– Will quickly call attention to the occurrence
of a fault
13