Transcript Slide 1

Virtualization Trends,
Challenges and Solutions
Naresh Sehgal, Ph.D., MBA
Lead SW Architect
Enterprise Platforms and Services Division
Intel Corp, Bangalore
Email: [email protected]
Convergence 08
Robert X. Cringely on Computers…
"If the automobile had followed the same
development cycle as the computer…
A Rolls-Royce would today cost $100,
get a million miles per gallon, and
explode once a year, killing everyone
inside.” 
Convergence 08
Hardware Virtual Machines (VMs)
App
App
...
VM0
App
App
Operating System
GFX
Physical Host Hardware
Processors
Memory
Graphics
A new
layer of
software...
App
Guest OS0
...
VM1
App
App
...
App
...
App
Guest OS1
VM Monitor (VMM)
Physical Host Hardware
Network
Storage
Keyboard / Mouse
Without VMs: Single OS owns
all hardware resources
With VMs: Multiple OSes share
hardware resources
Virtualization enables multiple operating
systems to run on the same platform
Convergence 08
How long has virtualization been around?
1. Recent development: ~5 years
2. A while: 10 years
3. Older than Microsoft: 30 years
4. A lot longer…..>40 years
Would you believe ~45 - 50 years?
Convergence 08
Virtualization
Strachey:
Time Sharing in Large Fast Computers
Open source Xen
is released
MIT: Project MAC
Connectix is founded
VMWare
is founded
1950
1960
1970
1980
1990
2000
Goldberg:
Survey of Virtual Machines Research
IBM: M44/44X Project
IBM & MIT:
Compatible Time
Sharing System
Convergence 08
Intel introduces
Intel Virtualization
Technology
Today
Microsoft acquires Connectix
Virtualization Challenges
Complexity
CPU virtualization requires binary translation or
paravirtualization
Must emulate I/O devices in software
Functionality
Paravirtualization may limit supported guest OSes
Guest OSes “see” only simulated platform and I/O devices
Reliability and Protection
I/O device drivers run as part of host OS or hypervisor
No protection from errant DMA that corrupts memory
Performance
Overheads of address translation in software
Extra memory required (e.g., translated code, shadow tables)
Convergence 08
Processor Virtualization
Without VT
VM0
Para- 3
Virtualization
Guest OSes run at
intended rings
With VT
VM0
VMn
VMn
Ring 3
Apps
Apps
Ring 3
Apps
Apps
Ring 1
Legacy OS
Modified OS
Ring 0
WinXP
Linux
Binary Translator
Ring 0
Binary Translation
Cache
1 Ring
Deprivileging
VMM
2 Binary
Translation
Processors
CPU0
CPUn
Convergence 08
(Standard
IA-32 or IPF)
VMX
Root
Mode
VM
Entry
VM
VMCS
Configuration
Exit
H/W VM Control
Structure (VMCS)
VT-x
CPU0
CPUn
VMM
Memory and I/O
Virtualization
Processors
with
VT-x (or VT-i)
Intel® Virtualization Technology (VT)
1st VT base SW
Solutions
..…
App
OS
App
App
App
OS
OS
OS
Virtual Machine Monitor
Intel®
Processors with
Virtualization Technology
and others …
Intel® VT
CoreTM 2 Microarchitecture based systems
First to market with
native virtualization
support
Broadest HW and SW
ecosystem support
 Significant increase in performance and improved
VT performance overall segments

Mobile - Intel® Core™2 Duo Mobile Processor for Intel® Centrino®
Duo Mobile Technology

Desktop - Intel® Core™2 Duo Desktop Processor E6000 sequence -

Server Dual-Core Intel® Xeon® Processor 5100 series
Get More Done On Every Server
Get More Capabilities On Client
Convergence 08
Today’s Uses – Servers
Virtualization addresses today’s IT concerns
Server Consolidation
VM1
VM1
VMn
…
App
App
OS
OS
OS
HW0
HWn
App
Test and Development
VMn
…
VM1
App
App
App
OS
OS
OS
VMM
VMM
HW
HW
10:1 in many cases
Enables rapid deployment
Virtualization increases server utilization,
simplifies legacy software migration
Convergence 08
Emerging Server Usage Models
True “Lights Out” Datacenter
Dynamic Load Balancing
VM2a
VM1a
VM2b
VM1b
Disaster Recovery
VM1
App
1
App
2
App
3
App
4
App
OS
OS
OS
OS
OS
…
VMn
VM1
App
App
OS
OS
VMn
…
OS
VMM
VMM
VMM
VMM
HW
HW
HW
HW
CPU Usage
90%
62%
App
CPU Usage
30%
63%
Balancing utilization
with head room
Upholding high-levels
of business continuity
Intel® Virtualization Technology will play an
integral role on the next generation of VMMs
Convergence 08
Emerging Business Usage Models
Built-in
Management
Professional
Business
Platform
Proactive
Security
Energy Efficient
Performance
Intel Platform Software
Convergence 08
vProTM Key Features
Remotely Manageability Repair down systems
Securely update systems
Audit powered-down PCs
Prevents malicious packets
from entering the OS
HP OpenView
Supported by over 45 OEMs,
ISVs, & IT Outsourcers
More details in the IDF vProTM tracks
Convergence 08
Intel® Virtualization and
Intel® vPro™ technology
®
VM0
VM1
User Partition
Service Partition
Stack owned and
managed by IT dept…
protected from users
“Firewall”
Application
App0
App1
Appn
User OS (Win2K, XP)
.
“Management”
Application
Service OS
(WinCE or Linux)
Lightweight VMM (LWVMM)
Intel®
architecture Platform
VT
AMT
Uses Intel VT for creating a separate
independent hardware-based
environment inside of the PC
Service Partition –
Allowing IT administrators to create
a dedicated and tamper resistant
service environment or partition
where tasks can run independently
and isolated from the main
operating system as well as from
the end user
User partition -
OS and application
Help desk or console access even
when user partition is “down”
Convergence 08
Intel, the Intel logo, and Intel architecture are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.
Intel® Virtualization Technology
Evolution
Vector 3:
I/O Focus
PCI-SIG
Vector 2:
Platform Focus
Vector 1:
Processor Focus
VMM
Software
Evolution
VT-d
VT-x
VT-i
Software-only VMMs
• Binary translation
• Paravirtualization
Past
No Hardware
Support
Establish foundation
for virtualization in the
IA-32 and Itanium
architectures…
Simpler and more
Secure VMM through
foundation of
virtualizable ISAs
Today
Standards for IO-device sharing:
• Multi-context I/O devices
• Endpoint device translation caching
• Under definition in the PCI-SIG* IOV
Hardware support for IO-device virtualization:
• Device DMA remapping
• Direct assignment of I/O devices to VMs
• Device-independent control over DMA
… followed by on-going evolution of support:
• Micro-architectural (e.g., lower VM switch times)
• Architectural (e.g., extended page tables, EPT)
Increasingly better CPU and I/O virtualization
Performance and Functionality as I/O devices
and VMMs exploit infrastructure provided by VT-x,
VT-i, VT-d
VMM software evolution over
time with hardware support
*Other names and brands may be claimed as the property of others
Convergence 08
Options for I/O Virtualization
Hypervisor Model
Service VMs
VMn
VM0
Guest OS
and Apps
Guest OS
and Apps
Service VM Model
I/O
Services
Device
Drivers
I/O Services
Pass-through Model
Guest VMs
VMn
VM0
Guest OS
and Apps
VMn
VM0
Guest OS
and Apps
Guest OS
and Apps
Device
Drivers
Device
Drivers
Device Drivers
Hypervisor
Shared
Devices
Pro: High
Performance
Pro: I/O Device
Sharing
Pro: VM Migration
Con: Large
Hypervisor
Hypervisor
Hypervisor
Shared
Devices
Assigned
Devices
Pro: Higher Security
Pro: I/O Device
Sharing
Pro: VM Migration
Con: Lower
Performance
VT Goal: Support all 3 Models
Convergence 08
Pro: Higher
Performance
Pro: Rich Device
Features
Con: Limited Sharing
Con: VM Migration
Limits
VT-d Overview
VT-d provides infrastructure for I/O virtualization
Defines architecture for DMA and interrupt remapping
Common architecture across IA platforms
Will be supported broadly across Intel® chipsets
CPU
CPU
System Bus
North Bridge
DRAM
VT-d
Integrated
Devices
PCIe* Root Ports
PCI Express
Convergence 08
South
Bridge
PCI, LPC,
Legacy devices, …
*Other names and brands may be claimed as the property of others
How VTd works?
1000
Each VM thinks it is
0 address based
GPA (Guest Physical
Address)
But mapped to a
different address in
the system memory
HPA (Host Physical
Address)
VM0
0
Catches any DMA
attempt to cross VM
memory boundary
Convergence 08
600
100
VM1
0
VTd does the
address mapping
between GPA and
HPA
700
100
350
50
300
10
260
250
100
200
VM2
100
0
0
DMA Remapping: Hardware Overview
DMA Requests
Device ID Virtual Address
Length …
Dev 31, Func 7
Bus 255
Dev P, Func 2
4KB
Page
Frame
Bus N
Fault Generation
Bus 0
Dev P, Func 1
Dev 0, Func 0
DMA Remapping
Engine
Translation Cache
Context Cache
Memory Access with Host Physical
Address
Convergence 08
Device
Assignment
Structures
4KB Page
Tables
Address Translation
Device D1
Structures
Device D2
Address Translation
Structures
Memory-resident Partitioning &
Translation Structures
VT-d Applied to Hypervisor Model
Improved Reliability and Protection
Hypervisor programs remap tables
Errant DMA is detected by hardware and
reported to hypervisor / device driver
Hypervisor Model
VMn
VM0
Guest OS
and Apps
Guest OS
and Apps
I/O Services
Bounce Buffer Support
Limited DMA addressability in I/O devices
limits access to high memory
“Bounce buffer” is a software technique to
copy I/O buffers into high memory
VT-d eliminates need for “bounce buffer”
Above equally useful for standard OSes
VT-d does not require a VMM to function
Convergence 08
Device Drivers
Hypervisor
Shared
Devices
Pro: Higher Performance
Pro: I/O Device Sharing
Pro: VM Migration
Con: Larger Hypervisor
VT-d Applied to Service VM Model
Service VM Model
Device Driver Deprivileging
Device drivers run above hypervisor as
part of a “Service OS”
Guest device drivers program devices in
DMA-virtual address space
Service VM
Forwards DMA API calls to hypervisor
Hypervisor sets up DMA-virtual to hostphysical translation
Further Improvements in Protection
Guest device driver cannot compromise
hypervisor code or data
Convergence 08
Service VMs
I/O
Services
Device
Drivers
Guest VMs
VMn
VM0
Guest OS
and Apps
Hypervisor
Shared
Devices
Pro: High Security
Pro: I/O Device Sharing
Pro: VM Migration
Con: Lower Performance
VT-d Applied to Pass-through Model
Pass-through Model
Direct Device Assignment to Guest OS
Guest OS directly programs physical
device
For legacy guests, hypervisor sets up
guest- to host-physical DMA mapping
For remapping aware guests, hypervisor
involved in map/unmap of DMA buffers
PCI-SIG I/O Virtualization Working Group
Activity towards standardizing natively
sharable I/O devices
IOV devices provide virtual interfaces, each
independently assignable to VMs
Convergence 08
VMn
VM0
Guest OS
and Apps
Guest OS
and Apps
Device
Drivers
Device
Drivers
Hypervisor
Assigned
Devices
Pro: Highest Performance
Pro: Smaller Hypervisor
Pro: Device-assisted sharing
Con: VM Migration Limits
DMA Remapping: IOTLB Scaling
Address Translation Services (ATS) extensions to PCIe*
enable IOTLB scaling
ATS endpoint implements ‘Device IOTLBs’
Device-IOTLBs can be used to improve performance
E.g., Cache only static translations (e.g. command buffers)
Pre-fetch translations to reduce latency
Minimizes dependency on root-complex caching
Support device-specific demand I/O paging
Convergence 08
*Other names and brands may be claimed as the property of others
Address Translation Services (ATS)
ATS Translation Flows
Root Complex
Translate Address
Translated
DMA Request
IOTLB
Translation
Response
Remap Hardware
Translation
Request
Device issues Translation
Requests to root-complex
Root-complex provides
Translation Response
Device caches translation
locally in ‘Device IOTLB’
Devices can issue DMA with
translated address
Translated DMA from
enabled devices bypass
address translation
Device IOTLB
DMA using
Translated Address
Endpoint Device
VT-d supports per-device control of ATS
Convergence 08
*Other names and brands may be claimed as the property of others
Invalidation Architecture
Invalidation enforces consistency of caches
Required when software updates translation
structures
Invalidation primitives
Global, domain-selective, and page-range
invalidations
Support for Device-IOTLB invalidation (through ATS)
Invalidation software interfaces
Synchronous interface through MMIO registers
Queued interface through invalidation queue
Convergence 08
ATS Invalidations
Invalidation details
Invalidation request contains
unique Invalidation Tag
Invalidation Responses may
be coalesced
*Other names and brands may be
claimed as the property of others
Convergence 08
Remap Hardware
IOTLB
Invalidation
Response
Root-complex issues
invalidation request to device
Device invalidates specified
mappings from Device IOTLB
Device issues Invalidation
response
Root Complex
Invalidation
Request
ATS Invalidation Flow
Device IOTLB
Invalidate Device-IOTLB
Endpoint Device
Mapping to VMM Software Challenges
Virtual
Machines
(VMs)
VMM
(a.k.a.,
hypervisor)
VM0
VM1
VM2
Apps
Apps
Apps
OS
OS
OS
…
VMn
Apps
OS
Higher-level VMM Functions:
Resource Discovery / Provisioning / Scheduling / User Interface
Processor Virtualization
Ring
Virtual CPU Binary
Deprivileging
Translation
Configuration
Memory Virtualization
Page-table
EPT
Configuration
Shadowing
I/O Device Virtualization
I/ODMA
DMAand Interrupt
Interrupt
Remapping
Remapping Configuration
Remapping
VT-d2
Physical
Platform
Resources
VT-x
CPU0
VT-d
VT-x2
CPUn
Processors
Convergence 08
I/O Device
Emulation
Sharing
Storage
PCI SIG
VMDq
Network
Memory
I/O Devices
®
Example 6: Virtualization overhead on Intel experimental client
VMM* (vs. Native OS)
PCMark Performance Indicator
97.88%
100.00%
99.67%
93.90%
85.69%
83.44%
90.00%
80.00%
70.00%
60.00%
50.00%
40.00%
30.00%
20.00%
10.00%
0.00%
System
CPU
Memory
Graphic
HDD
• Relatively low Virtualization overheads for client benchmark
•Targeting <10% overhead with improved SW techniques
• Further VMM SW optimization and Next generation VT
features to reduce virtualization overheads
* Pre beta version
Source: Intel Corporation Projections and technical specifications are based on internal analysis and
subject to change
Convergence 08
Summary: A better IA platform
First to Market & Massive Ecosystem Support:
Choice: Broadest virtualization software support in the industry
Robust: First x86 hardware assisted virtualization technology (Intel VT)
Innovation: common specification = enhanced virtualization on x86 and will
set the standard
Flexibility: Leverage Intel® Xeon® processor-based servers widely deployed
infrastructure for advanced failover and dynamic load balancing
Better Platform Reliability:
Critical for more applications on the same server
More reliability features
Proven Platform Architecture - almost 40X more
IA based servers than AMD based since 19961
Performance Headroom
“Choose the
right basket”
Intel® Xeon® processors have key performance features for
virtualization: dual-core, hyper-threading, I/O, memory, and larger
caches
1 – source: Q4’05 IDC server Tracker, 1996-2005 total system shipped
Whitepaper on Virtualization benefits: http://www.intel.com/business/bss/products/server/virtualization_wp.pdf
Convergence 08
Backup
Q&A
Convergence 08
Example 1: SysBench Running
with VMware*’s ESX Server*
SysBe nch normalize d re sults graph
1.80
Normalized scores
1.60
1.40
Dual-Core AMD Opteron
285-based server
1.20
1.00
Dual-Core Intel Xeon
processor 5160-based
server
0.80
0.60
0.40
0.20
0.00
4 VM
2 VM
1 VM
Number of virtual machines
Figure 1: Normalized SysBench results for the two test servers in the one, two, and four virtual machine
environments. Higher numbers are better.
*Source: Principled Technologies (PT) performance report
http://www.principledtechnologies.com/clients/reports/Intel/VMSysBench0706.pdf
System configuration in backup foils
Convergence 08
Example 2: SPECjbb Running
with VMware*’s ESX Server**
SPECjbb2005 normalized results graph
1.80
Normalized scores
1.60
1.40
Dual-Core AMD Opteron
285-based server
1.20
1.00
Dual-Core Intel Xeon
processor 5160-based
server
0.80
0.60
0.40
0.20
0.00
4 VM
2 VM
1 VM
Number of virtual machines
Figure 2: Normalized SPECjbb2005 results for the two test servers in the one, two, and four virtual
machine environments. Higher numbers are better.
* *Other names and brands may be claimed as the property of others
**Source: Principled Technologies (PT) performance report
Comparing Dual-Core AMD Opteron 285 with Dual-Core Intel® Xeon
5160
Convergence 08
® Processor
Example 3:
Microsoft* Virtual Server*
Java Performance with 4 VMs - JVM BEA WebLogic JRockit®
on Microsoft* Virtual Server
1.53
1.6
VMM – Microsoft* Virtual Server*
2005 R2 SP1 Java JFT workload
Guest OS - Windows 2003*
Enterprise Edition R2 (32 bit)
1.4
1.2
1
1
0.8
0.6
Up to 53%
gain
Benchmark - JVM BEA WebLogic
x`® (build R26.0.0-188-528751.5.0_04-2005110-0920-linuxx86_64)
0.4
0.2
0
HP DL385 with 2*AMD SuperMicro SDP with
Opteron 2.6GHz
2* Dual-Core Intel ®
Xeon Processor
3.0GHz
Systems –
HP DL385 2 AMD Opteron 2.6GHz
2x1MB
Intel® Dual-Core Intel ® Xeon®
Processor 3.0G SuperMicro SDP
16x1GB
Source: Intel Corporation Projections and technical specifications
are based on internal analysis and subject to change
*Other names and brands may be claimed as property of others. System Configuration details in backup. Performance tests and ratings are
measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those
tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other
sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on
performance tests and on the performance of Intel products, visit http://www.intel.com/performance/resources/limits.htm or call (U.S.) 1-800628-8686 or 1-916-356-3104.
Convergence 08
Example 4:
Energy Efficient Performance
Performance/Watt - - JVM BEA WebLogic JRockit®
on Microsoft* Virtual Server
1.6
1.6
1.4
1.2
1
1
0.8
0.6
0.4
0.2
0
HP DL385 with 2*AMD
Opteron 2.6GHz
SuperMicro SDP with 2*
Dual-Core Intel ® Xeon
Processor 3.0GHz
Intel® CoreTM 2 Duo based system provide Energy Efficient
Performance (EEP ) Leadership in virtualized environment
Source: Intel Corporation Projections and technical specifications are based on internal analysis and subject to change
Convergence
08
Example 5: MS VS SpecJBB 2005*
SpecJBB 2005 Microsoft VS R2 SP1 (RH32 Guests)
Host OS
16404
18000
16000
Microsoft* Virtual Server* R2 Beta
SP1 ver. 1.1.512.0 EE
13938
14000
1.18X
Virtualization
12000
10000
Bops
8408
8000
Microsoft* Server 2003 X64
Enterprise Edition SP1 RTM
Drop B1036 vmm.sys
Microsoft* Virtual Machine Windows*
Guest Editions ver. 13.705
1.66X
6000
Guest OS
RedHat V9 2.4.20-8 kernel (32-bit)
Workload
SpecJBB 2005
1.95X
4000
2000
0
Opteron 1P RH32
Guest (SW)
Intel ® XEON 5100 Intel ® XEON 5100
series (SW)
series (VT)
®
Intel XEON® SW Virtualized guest performance is 1.66x of Opteron
Intel® XEON® VT performance is 1.18x of Software (no VT) Intel® XEON®
Intel® XEON® VT performance is 1.95x of Opteron SW (no Pacifica)
Source: Intel Corporation Projections and technical specifications are based on internal analysis and
subject to change
Convergence 08