• • • • Volume Platform for Availability Inside a “Cluster in a Box” RDMA Configurations and Building Blocks.
Download
Report
Transcript • • • • Volume Platform for Availability Inside a “Cluster in a Box” RDMA Configurations and Building Blocks.
•
•
•
•
Volume Platform for Availability
Inside a “Cluster in a Box”
RDMA
Configurations and Building
Blocks
2+ years of research
6,000+ customers
Customer Focused
Design (CFD) & Areas
of Investigation
6000+ Voice of the Customer
Statements
900+ Customer Prioritized
Buckets
200 Customer Focused Design
Sessions
22 Areas of Investigation
Continuous availability
of the OS, applications,
and data was ranked by
customers WW (US,
Germany and Japan) as
a must have feature
Continuously available
failover
transparent
• Extended scale out
• Advanced replication
• Advanced power
management
• Advanced
performance options
•…
NEW SYSTEMS EXPAND RANGE OF COST AND FEATURE/PERFORMANCE SLA’S
• Additional scale out
features
SLA
Cluster in a Box
• HA/CA
• Simple OOBE
• Spaces configurations
• HW RAID capability
• SSD perf capability
Scale out Server Cluster
SAS/SATA
SAS/SATA
Single node servers
SAS/SATA
JBOD storage
expansion (optional)
iSCSI/FC
SMB/NFS
iSCSI/FC
SMB/NFS
iSCSI/FC
SMB/NFS
Target market for
CA HW solutions
Storage Arrays
• SMB
• Branch
• Private Cloud
JBOD storage
• IT Generalist or remote only
• PB 1 - 3
• IT Specialists
• PB 4+
IT expertise,
$$
“Business in a Box” Hyper-V Appliance
IT admin
Part-time, generalist, IT may not be primary job function, may get
assistance from VAR
Infrastructure
Office environment equipment room, ISP network connection
Workload
Hyper-V appliance supporting variable applications, e.g., point of
sale, inventory, documents/records
Examples
Doctor’s office, individual retail store, lawyer’s office
“Branch in a Box” Hyper-V Appliance
IT admin
Centralized at HQ, primarily remote support, local assistance as
requested
Infrastructure
Office environment equipment room, domain-connected to main office
Workload
Hyper-V appliance supporting replicated workloads per branch, with
LOB applications, file server with cached data
Examples
Retail chain stores, bank branches
Cloud/Datacenter High Performance Storage Server
IT admin
Specialists, onsite and remote, but highly leveraged
Infrastructure
Enterprise equipment room, switched network
Workload
High-performance NAS server supporting variable workloads,
hosted LOB apps and data
Examples
Cloud solution builders, Enterprise datacenters
10G E or
Infiniband
Network
10G E or
Infiniband
Network
x8 PCIe
Server A
CPU
x8 PCIe
Storage
Controller
x4 SAS
Server B
1/10G Ethernet cluster connect
A
port
s
0
1
…
CPU
x8 PCIe
x4 SAS
through
x4 SAS
midplane
through
Storage
Controller
midplane
SAS
Expander
x8 PCIe
Server Enclosure
23
B
port
s
x4 SAS
SAS
Expander
External JBOD
SAS
Expander
A ports
0
1
…
23
B ports
Additional JBODs …
SAS
Expander
HP X5000
WiWynn (ODM)
Quanta
Windows Server Support
• Released on Windows Storage Server 2008 R2
• Prototype on Windows Server 2012
• Designed for Windows Server 2012
• Designed for Windows Server 2012
Market
Midrange NAS Server
• NAS server, Hyper-V appliance, SMB or
Branch
• Private cloud performance NAS server
• NAS server, Hyper-V appliance, SMB or
Branch
• Private cloud performance NAS server
Size
3U rack
3U rack
2U rack
Disks
• 36 x 2.5”
• 16 x 3.5”
24 x 2.5”
12 x 3.5”
Blades
2
2
2
CPU
2x Intel
2x SB EP
2x SB EP
Memory (per blade)
Varies by SKU
12 DDR3 DIMMs
16 DDR3 DIMMs
PCIe expansion slots (per
blade)
1 x4, Gen 2
• 2 x16, Gen 3
• 1FH/HL, 1 HH/HL
1 x8, LP, Gen 3
Storage Controller
HP Cascade
• SAS Controller (Spaces)
• LSI High Availability MegaRAID
• SAS Controller (Spaces)
• LSI High Availability MegaRAID
External SAS (per blade)
• 4 x4 SAS for system
1 x4 SAS
1 x4 SAS
External network (per
blade)
2x HP Flex-10 LOM
4x 1 GbE mezz card
4x 1GbE LOM
2x RDMA-capable 10GbE or IB
Management Controller
HP iLO (integrated Lights-Out)
Emulex Pilot 3 iBMC
BMC
PAGE 14
11/7/2015
Cluster in a Box prototypes
Quanta
WiWynn
LSI HA-DAS MegaRAID® and SAS
controllers
Quanta application servers, JBOD
expansion, and 10GbE switch
Mellanox IB FDR NICs and switch
OCZ SAS SSDs
Infrastructure
Domain Controller server
Power distribution unit
1GbE switch
Keyboard & monitor
MegaRAID® is a registered trademark of LSI Corporation
Mellanox FDR IB
switches and controllers
Quanta Application
Servers
Quanta CiB
LSI HA-DAS MegaRAID
controllers
OCZ SAS SSDs
Switches
Quanta 10GbE switch
1GbE switch
Quanta Application
Servers
WiWynn CiB
LSI SAS controllers
SAS HDDs
File Client
File Server
Application
User
Kernel
SMB Client
Network w/
RDMA support
R-NIC
SMB Server
Network w/
RDMA support
NTFS
SCSI
R-NIC
Disk
Client
Memory
RDMA
File
Server
Memory
SMB
Server
SMB Client
SMB Direct
SMB Direct
NDKPI
NDKPI
RDMA
NIC
RDMA
NIC
Ethernet or
InfiniBand
Type (Cards*)
Pros
Non-RDMA Ethernet
(wide variety of NICs)
•
•
•
•
RoCE
(Mellanox ConnectX-2,
Mellanox ConnectX-3*)
InfiniBand
(Mellanox ConnectX-2,
Mellanox ConnectX-3*)
TCP/IP-based protocol
Works with any Ethernet switch
Wide variety of vendors and models
Support for in-box NIC teaming (LBFO)
Low CPU Utilization under load
Low latency
iWARP
(Intel NE020*,
Chelsio T4)
Cons
•
•
•
Currently limited to 10Gbps per NIC port
High CPU Utilization under load
High latency
•
•
•
TCP/IP-based protocol
Works with any 10GbE switch
RDMA traffic routable
•
Currently limited to 10Gbps per NIC port*
•
•
•
Ethernet-based protocol
Works with high-end 10GbE/40GbE switches
Offers up to 40Gbps per NIC port today*
•
•
RDMA traffic not routable via existing IP infrastructure
Requires DCB switch with Priority Flow Control (PFC)
•
•
Offers up to 54Gbps per NIC port today*
Switches typically less expensive per port than
10GbE switches*
Switches offer 10GbE or 40GbE uplinks
Commonly used in HPC environments
•
•
•
•
Not an Ethernet-based protocol
RDMA traffic not routable via existing IP infrastructure
Requires InfiniBand switches
Requires a subnet manager (on the switch or the host)
•
•
* This is current as of the release of Windows Server 2012 RC. Information on this slide is subject to change as technologies evolve and new cards become available.
Client
1
User
File Server
4
Application
Memory
RDMA
Memory
Unchanged API
Kernel
decision to use SMB
Direct at run time
SMB Server
2
3.
NDKPI provides a much
3
SMB Direct
thinner layer than TCP/IP
SMB Direct
NDKPI
NDKPI
TCP/ IP
RDMA
NIC
RDMA
NIC
Ethernet and/or
InfiniBand
4.
Remote Direct Memory
4
Access performed by the
network interfaces.
3
NIC
SQL Server) does not
need to change.
2
2.
SMB client makes the
SMB Client
TCP/ IP
1.
1 Application (Hyper-V,
NIC
Mellanox provides end-to-end InfiniBand and Ethernet
connectivity solutions (adapters, switches, cables)
Connecting data center servers and storage
Up to 56Gb/s InfiniBand and 40Gb/s Ethernet per port
Low latency, Low CPU overhead, RDMA
InfiniBand to Ethernet Gateways for seamless operation
Windows Server 2012 exposes the great value of
InfiniBand for storage traffic, virtualization and low latency
InfiniBand and Ethernet (with RoCE) integration
Highest Efficiency, Performance and return on investment
For more information, contact:
Gilad Shainer, [email protected], [email protected]
Intel 10GbE iWARP Adapter For Server Clusters
NE020
In production today
Supports Microsoft’s MPI via ND in Windows Server 2008R2 and
beyond
See Intel’s Download site (http://downloadcenter.intel.com) for
drivers (search “NE020”)
Drivers inbox since Beta for Windows Server
2012
Supports Microsoft’s SMB Direct via NDK
Uses the IETF’s iWARP RDMA technology that is built on top of IP
The only WAN-routable, “cloud-ready” RDMA technology
Uses standard ethernet switches
Beta drivers available from Intel’s Download site
(http://downloadcenter.intel.com) for drivers (search “NE020”)
For more information:
[email protected]
Contact: [email protected]
Download site: http://service.chelsio.com
http://www.chelsio.com/wp-content/uploads/2011/07/ProductSelector-0312.pdf
SMB
Client
IO Micro
Benchmark
SMB
Client
10GbE
IO Micro
Benchmark
Single
Server
SMB
Server
FusionIO
IO
Fusion
Fusion
IO
Fusion
IO
SMB
Client
IB QDR
SMB
Server
10 GbE
FusionIO
IO
Fusion
Fusion
IO
Fusion
IO
IO Micro
Benchmark
IO Micro
Benchmark
IB FDR
SMB
Server
IB QDR
FusionIO
IO
Fusion
Fusion
IO
Fusion IO
IB FDR
FusionIO
IO
Fusion
Fusion
IO
Fusion IO
Configuration
BW
IOPS
%CPU
Configuration
%CPU
Privileged
571
73,160
~21.0
RDMA (InfiniBand QDR, 32Gbps)
2,620
335,446
~85.9
~4.8
RDMA (InfiniBand FDR, 54Gbps)
2,683
343,388
~84.7
~6.6
Local
4,103
525,225
~90.4
Privileged
1,129
2,259
~9.8
Non-RDMA
RDMA (InfiniBand QDR, 32Gbps)
3,754
7,508
~3.5
RDMA (InfiniBand FDR, 54Gbps)
5,792
11,565
Local
5,808
11,616
(Ethernet, 10Gbps)
IOPS
8KB IOs/sec
512KB IOs/sec
Non-RDMA
BW
MB/sec
MB/sec
(Ethernet, 10Gbps)
File Client
(SMB 3.0)
SQLIO
Single Server
File Server
(SMB 3.0)
SQLIO
RAID
Controller
RAID
Controller
RAID
Controller
RAID
Controller
RAID
Controller
Hyper-V
(SMB 3.0)
RAID
Controller
SQLIO
RDMA
NIC
RDMA
NIC
RDMA
NIC
RDMA
NIC
RAID
Controller
VM
RAID
Controller
File Server
(SMB 3.0)
RAID
Controller
RAID
Controller
RDMA
NIC
RDMA
NIC
RDMA
NIC
RDMA
NIC
RAID
Controller
RAID
Controller
SAS
SAS
SAS
SAS
SAS
SAS
SAS
SAS
SAS
SAS
SAS
SAS
JBOD
JBOD
JBOD
JBOD
JBOD
JBOD
JBOD
JBOD
JBOD
JBOD
JBOD
JBOD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
Configuration
BW
IOPS
MB/sec
512KB
IOs/sec
Privileged
milliseconds
Local
10,090
38,492
~2.5%
~3ms
Remote
9,852
37,584
~5.1%
~3ms
Remote VM
10,367
39,548
~4.6%
~3 ms
%CPU
Latency
SCALING STORAGE CAPACITY AND CONNECTIVITY
Scale vertically
Scale horizontally
Expand capacity with
additional clusters
Live migration
Hyper-V storage
Live Migrate Hyper-V Storage
Scale Vertically
Add internal storage
Add JBOD expansion
Add RDMA NICs
Scale Horizontally
SCALING GUEST VM CAPACITY
Scale vertically
Live Migrate Hyper-V Guests
Add CPU/memory
Add internal storage
Add JBOD expansion
Expand capacity with
additional clusters
Live migration
Hyper-V guests
Hyper-V storage
Disaster recovery Replication
Scale Vertically
Scale horizontally
Live Migrate Hyper-V Storage
Replicate Hyper-V
Guests
Scale Horizontally
Switches
1GbE switch
WiWynn CiB
LSI SAS controllers
SAS HDDs
WSV334: Windows Server 2012 File and Storage Services Management
WSV322: Update Management in Windows Server 2012: Revealing Cluster-Aware Updating
WSV303: Windows Server 2012 High-Performance, Highly-Available Storage Using SMB
WSV330: How to increase SQL availability and performance using Window Server 2012 SMB 2.2 solutions
WSV410: Continuously Available File Server – Under the Hood
#TE(sessioncode)
DOWNLOAD
Windows Server
2012 Release
Candidate
Hands-On Labs
microsoft.com/windowsserver
DOWNLOAD
Windows Azure
Windowsazure.com/
teched
http://northamerica.msteched.com
www.microsoft.com/learning
http://microsoft.com/technet
http://microsoft.com/msdn