High-Performance Computing With Windows Ryan Waite General Program Manager Windows Server HPC Group Microsoft Corporation.
Download
Report
Transcript High-Performance Computing With Windows Ryan Waite General Program Manager Windows Server HPC Group Microsoft Corporation.
High-Performance
Computing With Windows
Ryan Waite
General Program Manager
Windows Server HPC Group
Microsoft Corporation
Outline
Part 1: Overview
Why Microsoft has gotten into HPC
What our V1 product offers
Some future directions
Part 2: Drill-down
A few representative V1 features
(for those who are interested)
Part 1
Overview
Evolving Tools Of The Scientific Process
1. Observation
2. Hypothesis
Instruments
Experiments done with a
telescope by Galilei 400 years
ago inaugurated the scientific
method
Microscope, laser, x-ray,
collider, accelerator allowed
peering further and deeper
into matter
HPC
4. Validation
3. Prediction
Automation and acceleration
of the scientific and
engineering process itself
Digital instruments, data
mining, simulation,
experiment steering
The Next Challenge
Taking HPC Mainstream
Volume economics of industry standard
hardware and commercial software
applications are rapidly bringing HPC
capabilities to a broader number of users
But HPC is still only accessible to the few
computational scientists who can master a
domain science, program parallel,
distributed algorithms, and use/manage
a supercomputer
Microsoft HPC Strategy – taking HPC to
the mainstream
Enabling broad HPC adoption and making
HPC into a high volume market in which
everyone can have their own personal
supercomputer
Enabling domain scientists who are not
computer scientists to partake in the HPC
revolution
Evidence Of Standardization And Commoditization
Clusters
over 70%
Industry
usage
rising
GigE is
gaining
(50% of
systems)
x86 is
leading
(Pentium
41%,
EM64T
16%,
Opteron
11%)
HPC Market Trends
<$250K – 97% of systems, 55% of revenue
2005 Systems
2005 Growth
981
-3%
4,988
21,733
163,441
Source: IDC, 2005
30%
36%
33%
Even The Low End Is Powerful
1991
1998
2005
Cray Y-MP C916
Sun HPC10000
Small Form Factor PCs
Architecture
16 x Vector
4GB, Bus
24 x 333MHz UltraSPARCII, 24GB, SBus
4 x 2.2GHz Athlon64
4GB, GigE
OS
UNICOS
Solaris 2.5.1
Windows Server 2003 SP1
GFlops
~10
~10
~10
Top500 #
1
500
N/A
Price
$40,000,000
$1,000,000 (40x drop)
< $4,000 (250x drop)
Customers
Government Labs
Large Enterprises
Every Engineer and Scientist
Applications
Classified, Climate,
Physics Research
Manufacturing, Energy,
Finance, Telecom
Bioinformatics, Materials
Sciences, Digital Media
System
Top Challenges
Setup is painful
Takes a long time to get
clusters up and running
Clusters are separate
islands
Lack of integration into
IT infrastructure
Job management
Lack of integration into
end-user apps
“Make high-end computing easier and
more productive to use. Emphasis
should be placed on time to solution,
the major metric of value to high-end
computing users…
A common software environment for
scientific computation encompassing
desktop to high-end systems will
enhance productivity gains by
promoting ease of use and
manageability of systems.”
Application availability
Limited eco-system of
applications that can exploit
parallel processing capabilities
High-End Computing Revitalization Task Force, 2004
(Office of Science and Technology Policy,
Executive Office of the President)
Windows Compute Cluster Server 2003
Simplified cluster deployment, job submission
and status monitoring
Better integration with existing Windows
infrastructure allowing customers to leverage
existing technology and skill-sets
Familiar development environment allows
developers to write parallel applications from
within the powerful Visual Studio IDE
Windows Compute Cluster Server 2003
Leveraging Existing Windows Infrastructure
Integration with IT infrastructure
Kerberos authentication
Resource management
Secure job execution
Group policies
Secure MPI
Operations manager
Job scheduler
Windows Update services
Admin console
Systems Management Server
Performance monitor
Command line interface
Remote Installation services
CCS Key Features
Node deployment and administration
Task-based configuration for head and compute nodes
UI and command line-based node management
Monitoring with Performance Monitor (Perfmon), Microsoft Operations Manager (MOM), Server
Performance Advisor (SPA), and 3rd-party tools
Integration with existing Windows and management infrastructure
Integrates with Active Directory, Windows security technologies, management, and
deployment tools
Extensible job scheduler
3rd-party extensibility at job submission and/or job assignment
Submit jobs from command line, UI, or directly from applications
Simple job management, similar to print queue management
Secure and performant MPI
User credentials secured in job scheduler and compute nodes
MPI stack based on MPICH2 reference implementation
Support for high performance interconnects through Winsock Direct
Integrated development environment
OpenMP support in Visual Studio, Standard Edition
Parallel debugger in Visual Studio, Professional Edition
HPC Institutes
National Center for
Supercomputing
Applications, IL
U.S.A.
University of Utah
Salt Lake City, UT
U.S.A.
TACC – University of
Texas
Austin, TX U.S.A.
Cornell Theory Center
Ithaca, NY U.S.A.
Southampton
University
Southampton,
UK
University of
Virginia
Charlottesville,
VA U.S.A.
University of
Tennessee
Knoxville, TN
U.S.A.
Nizhni Novgorod
University
Nizhni Novgorod,
Russia
Tokyo
Institute of
Technology
Tokyo, Japan
HLRS –
University of
Stuttgart
Stuttgart,
Germany
Shanghai
Jiao Tong
University
Shanghai,
PRC
An Example Of Porting To Windows
Weather research and forecasting model
Large collaborative effort, lead by NCAR, to develop next-generation community model
with direct path to operations
Applications
Atmospheric research
Numerical weather prediction
Coupled modeling systems
Current release WRFV2.1.2
~1/3 million lines, Fortran 90
and some C using MPI, OpenMP
Traditionally developed for Unix
HPC systems
Two dynamical cores
Full range of physics options
Rapid community growth –
more than 3,000 registered users
Operational capabilities
U.S. Air Force Weather Agency
National Centers for Environmental Prediction (NOAA)
KMA (Korea), IMD (India), CWB (Taiwan), IAF (Israel), WSI (U.S.)
WRF On Windows
Motivation
Extend available systems available to WRF users
Stability and consistency with respect to Linux
Take advantage of Microsoft and 3rd party (e.g., Portland Group)
development tools, environments
WRF ported under SUA and running on development AMD64 clusters
using Compute Cluster Pack
Of 360k lines, fewer than 750 changed to compile and link under SUA
Largest number of changes involved the WRF build mechanism
(Makefiles, scripts)
Level of effort and nature of tasks was not unlike porting to any new
version of UNIX
Details of porting experience described in a white paper available from
Microsoft and at http://www.mmm.ucar.edu/wrf/WG2/wrf_port_notes.htm
An Example Of Application Integration
With HPC
Scaling Excel
Excel Services on Windows Compute
Cluster Server 2003
Excel Services
Excel “12”
Desktop
Servers
Clusters
Excel Services
View and
Interact
Author and Publish
Spreadsheets
Browser
100% thin
Open
Spreadsheet/Snapshot
Excel “12”
Web Services
Access
Excel “12”
client
Custom
applications
Excel And Windows CCS
Customer requirements
Faster spreadsheet calculation
Free-up client machines from long-running calculations
Time/mission critical calculations that must run
Parallel iterations on models
Example scenarios
Schedule overnight risk calculations
Farm out analytical library calculations
Scale-out Monte Carlo iterations, parametric sweeps
Evolution Of HPC
Evolving Scenarios
Key Factors
Batch computing on supercomputers
IT
Mgr
Manual, batch
execution
Interactive computing on
departmental clusters
Compute cycles are scarce and require careful
partitioning and allocation
Cluster systems administration major challenge
Applications split into UI and compute parts
Compute cycles are cheap
Interactive applications integrate UI/compute parts
Emergence of turnkey personal clusters
Interactive
Computation and
Visualization
Complex workflow spanning applications
SQL
Compute and data resources are diffused throughout
the enterprise
Distributed application, systems and data
management is the key source of complexity
Multiple applications are organized into complex
workflows and data pipelines
Focus on service orientation and web services
Cheap Cycles And Personal Supercomputing
IBM Cell processor
256 Gflops today
4 node personal cluster 1 Tflops
32 node personal cluster Top100
Microsoft Xbox
The key challenge
How to program these things
Concurrent programming
will be an important area
of investments for all of
Microsoft (not just HPC)
3 custom PowerPCs + ATI graphics
processor
1 Tflops today
$300
8 node personal cluster “Top100” for $2500
(ignoring all that you don’t get for $300)
Intel many-core chips
“100’s of cores on a chip in 2015” (Justin Rattner, Intel)
“4 cores”/Tflop 25 Tflops/chip
22
“Grid Computing”
A catch-all marketing term
Desktop cycle-stealing
Managed HPC clusters
Internet access to giant,
distributed repositories
Virtualization of data center IT resources
Out-sourcing to “utility data centers”
“Software as a service”
Parallel databases
HPC Grids And Web Services
Compute grid
Forest of clusters
Coordinated scheduling
of resources
Data grid
Distributed storage facilities
Coordinated management
of data
Web Services
Glue for heterogeneous
platforms/applications/systems
Cross- and intraorganization integration
Standards-based
distributed computing
Interoperability
and composability
Cluster-Based HPC
Intra-Organization HPC
Virtual Organizations
Part 2
Drill-Down
Technologies
Platform
Windows Server 2003 SP1 64-bit Edition
x64 processors (Intel EM64T and AMD Opteron)
Ethernet, Ethernet over RDMA and Infiniband support
Administration
Prescriptive, simplified cluster setup and administration
Scripted, image-based compute node management
Active Directory based security
Scalable job scheduling and resource management
Development
MPICH-2 from Argonne National Labs with performance and
security enhancements
Cluster scheduler programmable via Web Services and DCOM
Visual Studio 2005 – OpenMP, Parallel Debugger
Partner delivered Fortran compilers and numerical libraries
Head Node Installation
Head Node installs only on x64
Windows 2003 Compute Cluster Edition
Windows 2003 SP1 Standard And Enterprise
Windows 2003 R2
Installation
Leverages appliance like functionality
Scripted installation
Warnings if system is misconfigured
To Do list to assist with final configuration
Walkthrough
Windows Server 2003 is installed on the head node
System may have been pre-installed using OPK
User launches Compute Cluster Kit setup
To Do list starts up, guiding User through next steps
User joins Active Directory domain
User installs IP over IB drivers for InfiniBand cards if not pre-installed
Wizard assists with multi-NIC routing and configuration
Remote Installation Service is configured for imaging compute nodes
Compute Node Installation
Automated installation
Remote Installation Service provides simple
imaging solution
May use third-party system imaging tools
compute nodes
Requires private network
Walkthrough
User racks up compute nodes
Starts Add Node wizard
Powers up a group of compute nodes
Compute nodes PXE boot
RIS and installation scripts will
Install operating system: W2K3 SP1
Install drivers
Join appropriate domain
Install compute cluster software (CD2)
Join cluster
Exiting wizard turns off RIS
Corpnet
Ethernet
Head
Node
Infiniband
Compute
Node
Compute
Node
Node Management
Not building a new systems management paradigm
Leveraging Windows infrastructure for simple management
MMC, Perfmon, Event Viewer, Remote Desktop
Can integrate with enterprise management infrastructure, such as Microsoft
Operations Manager
Compute Cluster MMC snap-in
Compute Cluster Admin Console
File
Supports specific actions
Pause Node
Resume Node
Open CD Drive
Reboot Node
Execute Command
Remote Desktop Connection
Start PerfMon
Delete
Properties
Can operate on multiple nodes at once
Action
View
Favorites
Window
Compute Cluster Admin Console
Bio Lab 1 (Compute Cluster)
To Do List
Queue Management
Node Management
Help
Compute Node Name
Node1
Node2
Node3
Node4
Node5
Node6
Node7
Node8
Node9
Node10
Node11
Node12
Node13
Node14
Node15
Node16
Node17
Node18
Node19
Node20
Node Status
Active
Active
Active
Active
Active
Paused
Active
Paused
Paused
Paused
Active
Active
Active
Active
Active
Active
Installing
Installing
Installing
Installing
Job Status
Job Name
Executing
Executing
Executing
Executing
Idle
Executing
Executing
Executing
Idle
Idle
Idle
Idle
Executing
Executing
Executing
Executing
Bob’s Blast Job
Bob’s Blast Job
Bob’s Blast Job
Orange Temp
Job Time Owner
47
51
41
1245
NTDEV\bobmu
NTDEV\bobmu
NTDEV\bobmu
NTDEV\suej
Bob’s Blast Job
Orange Temp
Agent B, Matrix 27
42
1245
60102
NTDEV\bobmu
NTDEV\suej
NTDEV\enrico
Agent B, Matrix 27
Patching
Patching
Patching
60102
465
680
465
NTDEV\enrico
CC\admin
CC\admin
CC\admin
Job/Task Conceptual Model
Serial Job
Parallel MPI Job
Task
Task
Proc
Proc
IPC
Proc
Parameter Sweep Job
Task
Task
Task
Proc
Proc
Proc
Task Flow Job
Task
Task
Task
Task
Job Scheduler Stack
User
Admin
Third-party
Applications
User
Console
Admin
Console
Command
Line Interface
COM
API
WS (WSE
3.0)
Jobs/Tasks
Client Node
Object Model
Interface
Layer
Admission
Scheduling
Layer
Allocation
User Interface Handlers
Head Node
Queueing
Job Management
Compute Node
Node
Manager
Node
Manager
Node
Manager
Resource Management
Node
Manager
Node
Manager
Node
Manager
Execution
Layer
Activation
Job Scheduler
Job scheduler provides two features: Ordering and allocation
Job ordering
Priority-based first-come, first-serve (FCFS)
Backfill supported for jobs with time limits
Resource allocation
License-aware scheduling through plug-ins
Parallel application node allocation policies
Extensible
Core engine based on embedded SQL engine
Resource and job descriptions are based on XML
3rd parties can extend by plugging into submission and execution phases to implement queuing
and licensing policies
Job submission
Jobs submitted via UI, API, command line, or web service
Security
Jobs on compute nodes execute in the security account of the submitting user, allowing secure
access to networked resources
Cleanup
Jobs executed in Job Objects on compute nodes, facilitating cleanup
Queue Management
Job Management model similar to print
queue management
Leverage familiar user paradigm
Queue management operations
Delete
Change properties
Priority
Run time
# of CPUs
Preferred nodes
CPUs per node
All in one
License parameters
Uniform attributes
Notification
Compute Cluster
File
Action
View
Favorites
Window
Console Root
Bio Lab 1 (Compute Cluster)
To Do List
Queue Management
Compute Nodes
Help
Order Priority
1
2
2
3
4
5
6
1
2
2
1
2
Name
Bob’s Blast Job
000434
000435
000436
000437
000438
000439
000440
000441
000442
000443
000444
000445
000446
000447
000448
000449
000450
Lodica Calc
Sue
Agent B, Matrix 27
Better work this time!
Orange Temp
Owner
Domain\Bobr
Domain\Bobr
Domain\Bobr
Domain\Bobr
Domain\Bobr
Domain\Bobr
Domain\Bobr
Domain\Bobr
Domain\Bobr
Domain\Bobr
Domain\Bobr
Domain\Bobr
Domain\Bobr
Domain\Bobr
Domain\Bobr
Domain\Bobr
Domain\Bobr
Domain\Bobr
Domain\Sue
Domain\Sue
Domain\Tam..
Domain\Crai..
Domain\Ryan
Status
Running
Completed
Completed
Running – Bnode19
Running – Bnode20
Running – Bnode30
Running – Bnode21
Running – Bnode26
Running – Bnode27
Running – Bnode18
Queued
Queued
Queued
Queued
Queued
Queued
Queued
Queued
Running
Running
Running
Running
Running
Networking
Focusing on industry standard interconnect technologies
MPI implementation tuned to Winsock
Automatic RDMA support through Winsock Direct
(SAN provider required from IHV)
Gigabit Ethernet
Expect to be the mainstream choice
RDMA + GigE offers compelling latency
Infiniband
Emerging as a leading high end solution
Engaged with all IB vendors
OpenIB group developing a Windows IB stack
Planning to support IB in WHQL
Resources
Microsoft HPC web site
(evaluation copies available)
http://www.microsoft.com/hpc/
Microsoft Windows Compute Cluster Server
2003 community site
http://www.windowshpc.net/
Windows Server x64 information
http://www.microsoft.com/64bit/
http://www.microsoft.com/x64/
Windows Server System information
http://www.microsoft.com/wss/
© 2006 Microsoft Corporation. All rights reserved.
Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation.
Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft,
and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation.
MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.