FT server + cluster

Download Report

Transcript FT server + cluster

Fault-Tolerant General Purpose Servers
Express5800/ft series servers
Product Information
Express5800/ft Series Servers
High Availability Technologies
Approaches to Reliability and Availability
▐ Select and combine hardware and software technologies for availability
Higher availability of a single server
FT server + cluster
Redundant hardware
(dual modular architecture)
•Continuous operation despite
of hardware failures.
•Simplified installation and
operation
Fault tolerant server
FT server
cluster
Enhance
fault tolerance of
the hardware
Single server
(Typical servers)
Partially redundant
hardware (e.g. HDD, PSU)
Cluster software
Enhance
availability
of the system
Failover across
multiple servers
•Enhanced HW/SW failure resilience
•For Large scale system with scalable
nodes etc.
Higher availability of the system
Select the best availability solution according to system requirements
Page 3
© NEC Corporation 2013
FT Server and Cluster Solution Comparison
Fault tolerant server
Cluster system
Aim
High availability of a single server
Achieve availability / scalability / load balancing
Technology
Lockstep (CPU&MEM) and Failover (I/O)
(Synchronized in normal conditions)
Failover
Load balancing
Failure
Failover process
CPU
CPU
Failure
Isolation
MemoryMemory
HDD
Service during
failure
Failover
HDD
Isolate faulty
component
Failover to
other servers
Continuous operation (no interruption)
Operation is interrupted for failover process
(some several minutes to 10 minutes)
Resilience
Performance
enhancement
Hardware failures
Hardware/ Software failures
Add CPU
Add CPU or node. Supports servers with 4 or more sockets
Supported apps
General applications
No modifications needed
Failover settings is required for each app.
(creation of script batch files)
•System configuration requires no app modifications
•Continuous operation without interruption
•Ideal for 24-7 systems, email and Web servers
•Features load balancing as well as availability
•Software failure-resilient
•Suitable for large-scale systems (scalable nodes)
ft servers provide hardware availability and can be installed quick and easily
Ft servers + EXPRESSCLUSTER solution takes advantage of both solutions
Page 4
© NEC Corporation 2013
Recovery Process from HW Failures
Express5800/ft series server
Continuous operation
Recovery
complete
Non-stop service
Failure
In service
In service
1. Instantaneous isolation
of the faulty module
Module #0
In service
2. Resynchronization
after replacement
Isolated faulty
model
Processing
Lockstep
Processing
Replacement of
faulty module
Processing
Module #1
Cluster system
Failure
Start
failover
process
System down
for a few mins to 10 mins
In service
Service Intermittence
1. Interruption
2. Determine failover host
(a few secs)
(a few secs to 1-2 mins)
Failover
Page 5
Failover
complete
3. Takeover of cluster
resources (e.g. NW
settings and disks)
(a few secs to 1 min)
© NEC Corporation 2013
Restart service
4. Restart apps
(a few secs to a
few mins)
Repair / Replace
Express5800/ft Series Servers
Optional Features to Increase
Fault Tolerance
Express Report Service
Support
• Isolate the failed components to continue operation.
• Monitor hardware status at the service center.
• Support the system proactively to ensure continuous availability.
Express Report Service
②
①
Only the alert information will be
sent out with dedicated software
(secure environment)
Isolation
Failure
CPU CPU
CPU CPU
Mem Mem
Mem Mem
HDD HDD
HDD HDD
Client
Continuous
Operation
Recovery
④
CPU CPU
Mem Mem
HDD HDD
Hardware
monitoring &
detection
③
Via the internet (mail server)
public line (modem connection)
Alert
Notification
Replace
NEC (monitoring center)
CPU CPU
Mem Mem
HDD HDD
NEC
Service Center
Notification
Page 7
© NEC Corporation 2013
Support for Redundant Peripheral Devices
Peripheral Devices
▐ Selection of LTO or DAT and support for redundant backup*
◆ Double backup configuration is supported to provide for failures during backup
◆ LTO or DAT drives are offered for selection
ft series
Module #1
SAS
Controller
Backup
device
SAS
Controller
Backup
device
Module #2
Data is output from each module to
achieve backup redundancy
Both backups are created almost
simultaneously
* Configuration of standalone backup is also supported
▐ A two UPS configuration provides tolerance against UPS defects*
ft series
UPS
Module #1
PSU
Uninterruptable
power supply
PSU
Uninterruptable
power supply
Module #2
Page 8
Connecting each UPS to separate power
sources helps avoid being affected by failures
of the power sources
UPS
© NEC Corporation 2013
* Single UPS configuration is also supported.
UPS is controlled through the network
ft series + EXPRESSCLUSTER for Higher Availability
Enhancement SW
▐ Clusters with ft servers enhance both HW and SW availability
Software failure
EXPRESSCLUSTER
Failover to secondary server
EXPRESSCLUSTER monitors SW
Apps
Apps
OS
OS
Module #0
Module #1
ft server (secondary)
Module #0
Module #1
Hardware failure
ft server (primary)
ft series server
Highest level of availability suitable for critical systems
Page 9
© NEC Corporation 2013
Benefits of ft Series + EXPRESSCLUSTER
Enhancement SW
▐ Clusters using ft servers deliver the benefits of both solutions
Function
HW failure
tolerance
Treatment
Treatment
time
SW failure
tolerance
Treatment
Treatment
time
Periodical maintenance
(SW update)
Performance
enhancement
Apps settings
Express5800/ft server
Cluster system
(configured by normal servers)
Cluster system
(configured by ft servers)
Lockstep and Failover
(within a server)
Failover
(between multiple servers)
Failover
(between multiple servers)
★★★
★★☆
★★★
Isolate faulty module (within the server)
Failover from the primary server to the
secondary server
Isolate faulty module within the primary server
(no failover between nodes)
Few minutes
(Depends on the time necessary to startup apps)
Instantaneous
Instantaneous
-
★★☆
★★☆
(Apps level failures can be resolved by
SingleServerSafe software)
Failover from the primary server to the
secondary server
Failover from the primary server to the
secondary server
-
Several minutes
(Depends on the time necessary to startup apps)
Several minutes
(Depends on the time necessary to startup apps)
★★☆
Active Upgrade enables OS patches to be
applied with only short interruption
★★★
★★★
Each node can be separated for upgrade
Each node can be separated for upgrade
★★☆
★★★
★★☆
Add CPU
Add CPU or Nodes
Add CPU
★★★
General apps can be used without special
modifications
★☆☆
★☆☆
Takeover process is required for each app
Takeover process is required for each app
Legend: ★★★: Excellent, ★★☆: Good, ★ ☆ ☆ : Fair
Page 10
© NEC Corporation 2013
ft server + Hyper V + EXPRESSCLUSTER
Enhancement SW
▐ Clusters configured on Hyper-V on an ft server
Software failure
EXPRESSCluster
EXPRESSCluster monitors SW
In the event of a SW failure, the operation fails
over to another guest OS
Apps
Apps
Guest OS
Guest OS
Hyper-V™ 2.0
Hardware failure
ft server
Module #0
Module #1
ft series server
High HW and SW availability for virtualized environments
Page 11
© NEC Corporation 2013
ExpressCluster X SingleServerSafe
Enhancement SW
▐ SW is monitored on the ft server to automatically restart the SW in the event of
a failure.
◆ SingleServerSafe (SSS) monitors the server and SW status at all times.
◆ In an event of a failure, SSS restarts the service, process, OS etc. to resume operation.
◆ The ft server and SSS in tandem can handle both HW and SW failures
Service
Process
Restart
Restart
Apps
SingleServerSafe
By enabling failure detection and
restart/reboot, SSS helps handle a wide range
of failures with a single server
By using the optional monitoring function
of EXPRESSCluster, SSS is capable of further
detailed monitoring including the detection
of stalling in data bases.
OS
Reboot
SW availability can be improved even for a single ft server
Page 12
© NEC Corporation 2013
Page 13
© NEC Corporation 2013