CHAPTER Network Reliability: Fault Tolerance and Other Issues Chapter Objectives • Discuss network reliability issues – Fault tolerance, tape backup, UPS etc • Describe different.

Download Report

Transcript CHAPTER Network Reliability: Fault Tolerance and Other Issues Chapter Objectives • Discuss network reliability issues – Fault tolerance, tape backup, UPS etc • Describe different.

CHAPTER
Network Reliability: Fault
Tolerance and Other Issues
Chapter Objectives
• Discuss network reliability issues
– Fault tolerance, tape backup, UPS etc
• Describe different levels of fault
tolerance
– Levels 1, 2 and 3
– Examine the relevance of file allocation
tables to fault tolerance
• Explain RAID technology
Chapter Modules
• An overview of network reliability
• Level 1 fault tolerance
• Level 2 and level 3 of fault tolerance
• Practical implementation examples
• RAID
END OF CHAPTER
INTRODUCTION
MODULE
An Overview of Network
Reliability
Importance of Fault Tolerance
• Mission critical applications are today
run on networks in many organizations
• Important to provide built-in fault
tolerance in networks to support
mission critical applications
Fault Tolerance
• The ability to continue to function when
a fault occurs
• Example
– A server with built-in fault tolerance can
continue to operate even when one of its
hard disks fails
Focus of Fault Tolerant Features
• Most fault tolerance features are centered on
a server
– Disk storage in the server is the focal point of a
number of fault tolerant features
– Mechanical components are more susceptible to
failure than electronic components
– The hard disk is most vulnerable to failure in a
server
• A number of fault tolerant features address
the possible failure of hard disks
Fault Tolerance Implementation
• Software based
• Hardware based
• A combination of both
Sever Based Implementation of
Fault Tolerance
• Level 1
• Level 2
• Level 3
Preview of Fault Tolerance
• Based on the premise of maintaining
multiple copies of critical components
• Level 1
– Duplicate FATs
• Level 2
– Duplicate server hard disks
• Level 3
– Duplicate servers
RAID Storage: The Practical
Implementation
• Redundant Array of Independent Disks
• Data is stored in a RAID subsystems
• A largely hardware-based
implementation
Other Features
• Uninterruptible Power Supply (UPS)
• Tape backup
Uninterruptible Power Supply
(UPS)
• Ensures the uninterruptible supply of
power to the server
• Batteries in the UPS will continue to
provide power in the event of a power
outage
UPS Implementations
• AS-400 example
– When the power goes down the UPS takes over
and systematically shuts down the system
preserving the data files
• Other implementations
– Power loss---- UPS takes over -
– - Standby generator is activated by sensors
– The process is reversed when the power come
back
Tape Backup
• Used more as a precautionary measure than a
fault tolerant measure
• Data on the server is periodically backed up
on a tape
• If the disk storage fails on the server:
– A previously stored version of the data is loaded
on to a newly installed disk storage on the server
• Offers some degree of protection against the
total loss of data
Network Operating System
Support for Reliability
• Support for Levels 1 and 2 of fault
tolerance is readily available in network
operating systems
• Currently support is also available for
Level 3 fault tolerance as well
• RAID 0,1 and 5 are commonly
supported
END OF MODULE
MODULE
Level 1 Fault Tolerance
Level 1 Fault Tolerance
• A Software Based Solution
Support for Fault Tolerance
• Provided by the network OS
• Support for Level 1 and 2 has been
available in OS for a period of time
• Newer operating systems have support
for Level 3 fault tolerance
– Support for RAID is also incorporated
• RAID may be considered as an
extension of Level 2 fault tolerance
Level 1 Fault Tolerance
• A backup copy of the File Allocation
Table (FAT) is kept on the server disk
• NOS uses the backup FAT should the
original FAT become corrupted
– This would ensure the continued operation
of the server
• The problem should be rectified as soon
as possible
File Allocation Table (FAT)
FAT
File A
Size 34K
---Start
Sector 1
Track 2
FAT
Backup FAT
A Summary of File Allocation
Table (FAT) Features
• Keeps track of files on the disk
• Uses pointers to point to the location of
the files
– Tracks, sectors
• Stores file related information
– Size, date last modified, security
information etc.
• If a FAT is corrupted, none of the files
on the disk can be retrieved
A Note on File Systems
• Newer file systems have been introduce
following FAT16
• FAT32
– Windows 95/98/ME systems
– Windows 2000 OS
• NTFS
– Windows NT related filing technology
– Windows 2000
• HPFS
– OS/2 related filing technology
• Linux
– ext2
Newer File System Characteristics
• Support longer file names
• Better security
• Support larger hard disks
• Abide by Uniform Naming Convention
(UNC)
• Provide by Better security
– Allows greater control to be exercised on
the access to directories, files etc.
Format of Uniform Naming
Convention
• \\computer_name\directory_name\file_na
me
END OF MODULE
MODULE
Levels 2 and 3 of Fault Tolerance
Levels 2 and 3
• A dominantly hardware based solution
• Obviously, software support in the OS
also required
Level 2 Fault Tolerance (FT)
• Implemented by installing a duplicate disk in
the server
• The server data is duplicated on the second
disk in real-time to provide fault tolerance
• The duplication process itself is automatic
when a NOS that supports Level 2 FT is used
• In the event of a failure of the primary hard
disk, the network will continue to operate
using the secondary hard disk
– However, immediate action must be taken to
replace the failed hard disk
Level 2 FT Implementation
• Types of Implementation
– Disk Mirroring
– Disk Duplexing
• Disk Mirroring
– One controller supporting two drives
• Disk Duplexing
– Two controllers and two drives
– Each drive would have its own controller
– Better protection compared to disk mirroring
Level 2 Fault Tolerance
Implementation
HD
Controller
HD
Mirroring
HD
Controller
Controller
HD
Duplexing
Level 3 Fault Tolerance
• Dual interconnected servers are used to
support the network
– Second server is simply a mirror of the first
server
• Data mirroring is done automatically by
the NOS that supports Level 3 fault
tolerance
Level 3 Fault Tolerance
Implementation
Main
Server
High-Speed
Link
Mirrored
Server
Work Stations
Actual Implementation of Fault
Tolerance
• Level 1 is universally deployed
• Level 2 requires additional hardware
– Best deployed by using the RAID storage
subsystem
• Level 3 requires considerably more
hardware and software resources
– Largely used in networks that support
mission critical applications
END OF MODULE
MODULE
RAID Storage Subsystem
RAID Storage
• Redundant Array of Independent Disks
• Data is stored striped over different
disk in a RAID storage subsystem
Purpose of RAID
• Provide fault tolerance
• Offer better performance
RAID Basics
• Data is stored striped over multiple disks
– Data striping is the fundamental concept pursued
by RAID
• Data can be recreated from the redundant
disks
• MTBF (Mean Time Between Failure) is
reduced (MTBF of a disk/number of disks in
the subsystem???)
RAID Storage Standards
• RAID 0 through RAID 5
• Popular RAID formats
– RAID 0, RAID 1, RAID 5
• Other formats
– RAID 10 and RAID 50
RAID 0
• Data is simply stored striped over
multiple disks
• Does not offer fault tolerance
• Offers better performance
– Multiple heads access the data stored on
the different drives for faster data access
RAID 0 Striping
Source: Adaptech
More on Striping
• Striping logically divides each hard disk into
stripes
• The stripes are arranged interleaved in a
rotating sequence among the various disks
• Data stored in the stripes for a logical
sequence of storage space composed
alternatively of stripes from each disk (drive)
• A stripe can be as small as a sector (512 bytes)
or as large as several megabytes
– In general, a record falls entirely within one stripe
RAID 0 Data Access
Performance
Source: Adaptech
Multiple I/O Access
• Most operating systems support
concurrent disk I/O
• I/O load must be balanced on the disks
for optimum performance
• Striping promotes load balancing and
hence improves disk I/O performance
RAID 0 Configuration
• Large stripes for multiple users
• Small stripes for single users
Advantages and Disadvantages
• Fast access
• If one disk fails, the entire system will
no more be able to use the data on all
the disks
Windows Support for RAID 0
• Windows 2003 supports RAID 0
• 2 to 32 disks can be used in a set known
as a striped volume
RAID 1
• Provides fault tolerance
• Basically implements disk mirroring
Implementation
• A single pair of mirrored disks are not
striped
• Multiple pair of mirrored disks can be
striped to create striped volumes
RAID 1 in Operation
Source: Adaptec
RAID 1 Performance
• Read performance is improved because
both disks can be simultaneously read
for different records
• Write performance remains unchanged
as the same data need to be written to
both disks
Windows Support for RAID 1
• Supported in Windows 2000
– Ftdisk.sys is the driver use for supporting
fault tolerance
RAID 5
• Provides fault tolerance using Parity
• Data and parity information is
distributed over all the disks
Read and Write Operation with
RAID 5
Source: Adaptech
Read and Write Performance
• Read access can be overlapped
– Because data is spread over different
drives
• Write operations could also be
overlapped
– Because different data records store the
parity information in different disks
Windows Support
• Supported in Windows 2000
• Known as “stripe set with parity on
basic disks”
• Requires at least 3 disks
• An additional 16 Mbytes of memory
must be provided to support RAID 5
RAID 10
• Offers the advantage of both RAID 0
and RAID 1
– Faster performance through multiple read
access
– Fault tolerance through disk mirroring
• Also known as RAID 0+1
RAID 50
• Combines the advantages of RAID 0
and RAID 5
Summary
(Source: Adaptec)
• RAID 0 offers good read and write
performance, but it does not provide
fault tolerance
• RAID 1 offers fault tolerance, but it
does not in general offer performance
advantage
– Multiple pairs may be created for
performance advantage in addition to
providing fault tolerance
Summary (Continued)
• RAID 5 combines efficient, fault-tolerant data
storage with good performance
characteristics.
• However, write performance and
performance during drive failure is slower
than with RAID 1.
• Rebuild operations also require more time
than with RAID 1 because parity information
is also reconstructed.
• At least three drives are required for RAID 5
arrays.
END OF MODULE
MODULE
An Assembly of Fault Tolerance
and Backup Features
Fault Tolerant Components
• RAID storage subsystem
• Redundant power supplies
• Uninterruptible Power Supply or UPS
• Tape backup device
Hardware Systems for Reliability
Redundant
Power Supply
RAID
UPS
Server
With Surge
Protector
Tape
Client
Client
UPS
Tape Backup Technology
QIC
Travan
DAT
8mm
Mammoth
AIT technology
Digital Linear
Tape
Super DLT
ADR technology
Linear Tape Open
VXA technology
Robotic
applications
http://www.pctechguide.com/15tape2.htm
Web Research
• Obtain information on RAID 0, 1, 5 and 10
– Adaptec
– http://www.acnc.com/04_01_00.html#top
– Get the information on different file systems
including the Linux and Unix file systems
• Visit the website of an UPS vendor to get
additional information on UPC
– APC
– PC Power and Cooling
• Tape backup
– http://www.pctechguide.com/15tape.htm#QIC
Firewall and Protocols
Software Firewall Settings
• ICMP etc.
• Check Zone Alarm Pro
END OF MODULE
END OF CHAPTER