Transcript Intel® Cluster Tools Cluster Software & Technologies Intel Software
1
Intel® Cluster Tools
Cluster Software & Technologies Intel Software & Solutions Group
2
Agenda
Introduction Intel® Software Development Products overview Cluster Tools Call to action and next steps Intel® Cluster Ready Intel® Cluster Tools Intel Confidential
3
Intel® Software Development Products
Intel® Compilers
– The best way to get application
performance
on Intel processors
Intel® VTune™ Performance Analyzers
– Identify bottlenecks in source code to increase
performance
or solve problems
Intel® Performance Libraries
– Highly
optimized
, thread-safe, multimedia and HPC math functions
Intel® Threading Tools
– Find threading errors and optimize threaded applications for maximum
performance
Intel® Cluster Tools
– Create, analyze,
optimize
and deploy cluster-based applications
Boost HW Performance by SW Development Tools
Intel® Cluster Tools Intel Confidential
4
Cluster Market rapidly growing
100% 80% 60% 40% 20% HPC Market using Clusters
Clusters are the majority of HPC market
Clusters Non-Clustered 0% 03Q1 03Q2 03Q3 03Q4 04Q1 04Q2 04Q3 04Q4 05Q1 05Q2 05Q3 05Q4
Why? Less expensive hardware. Easier implementation.
Source: IDC, 2006 Intel® Cluster Tools Intel Confidential
What Are the Biggest Bottlenecks Today in Creating Parallel Applications?
Source: Developing Custom Parallel Computing Applications, Simon Management Group, September 2006
5
Intel® Cluster Tools Intel Confidential
6 Intel® Cluster Tools
Intel® Cluster Tools help you use the power of your software to unleash the full potential of the cluster
Performance
– Extract the maximum application performance and scalability from Intel processor based clusters – Fully enables multi-core processors and future Intel architectures
Compatibility
– Enables new or upgraded interconnects by invoking a new driver – MPICH2 based – – 32-bit and 64-bit processor support within one package Intel compilers and MKL
Support
– Unlimited technical support and upgrades included for one year – Get answers from the engineers that know how to develop software on Intel Architecture Intel® Cluster Tools Intel Confidential
7
Intel Tools for Clusters
Intel software tools make clusters easier to program and optimize
– Intel® Cluster Toolkit – Bundle with single installer and license – Intel® MPI Library • A high performance universal MPI solution enabling applications to run across multiple network fabrics – Intel® Trace Analyzer and Collector • Low-overhead, event-based tracing tool that allows complete graphics analysis of parallel applications – Intel® Math Kernel Library • Highly optimized and extensively threaded math routines for engineering and scientific applications – Intel® MPI Benchmarks • Compares cluster and MPI implementation performance Intel® Cluster Tools Intel Confidential
What’s new
Just launched new versions of the tools Support for Microsoft* Windows CCS* Improvements in performance and usability Extended interoperability
8
Intel® Cluster Tools Intel Confidential
9
Intel® Cluster Toolkit Compiler Edition
Intel® Cluster Tools Intel Confidential Linux Windows IA 32 Intel® 64 Itanium
10
Intel® Cluster Toolkit Compiler Edition 3.1
New product suite containing: – Cluster OpenMP (For Linux only on Intel® 64 and IA-64 architectures) – – – – – – – Intel® C++ Compiler 10.1 Intel® Debugger 10.1 Intel® Fortran Compiler 10.1 Intel® MKL Cluster Edition 10.0
Intel® MPI Benchmarks 3.1 Intel® MPI Library 3.1 Intel® Trace Analyzer and Collector 7.1
Unified cluster installation for all components
Available for Linux* and Microsoft Windows* CCS Enhanced compiler support for clusters: – – Intel Debugger 10.1 for Linux supports Intel MPI Library 3.1
New –tcollect & /Qtcollect options to instrument function calls for Intel Trace Collector Intel® Cluster Tools Intel Confidential
11
Intel® Cluster Toolkit
Intel® Cluster Tools Intel Confidential Linux Windows IA 32 Intel® 64 Itanium
12
Intel
®
Cluster Toolkit 3.1
Boost development and performance of cluster applications
– – – Universal MPI Library runs cluster applications on all networks Leading cluster development environment to efficiently create, analyze, optimize and deploy parallel applications Ready to support dual-core and multi-core cluster
Intel® Cluster Toolkit 3.1
Full-featured MPI tools environment for Linux* and Windows* – Intel® MPI Library 3.1
– – Intel® Trace Analyzer and Collector 7.1
Intel® Math Kernel Library 10.0
– Intel® MPI Benchmarks 3.1
Packaged for faster installation at lower prices !
Intel® Cluster Tools Intel Confidential
13
Intel® MPI Library
Intel® Cluster Tools Intel Confidential Linux Windows IA 32 Intel® 64 Itanium
14 Intel® MPI Library 3.1
A high performance universal MPI solution enabling applications to run across multiple network fabrics
Features
– High performance MPI-2 implementation – – – – – – Linux and Windows CCS support Interconnect independence Smart fabric selection Easy installation Free runtime environment Close integration with Intel and 3 rd development tools party
What’s New
– Now available for Microsoft* Windows* Compute Cluster Server 2003 – – – Improved application performance Multiple usability improvements Extended interoperability
RIKEN
Intel’s MPI and Cluster Tools provide us the best cluster development environment.”
Dr. Takahiro Koichi Computational Astro Physics Laboratory RIKEN, Japan
Intel® Cluster Tools Intel Confidential
Intel® MPI Library 3.0 for Linux* with sample ISV benchmarks on Intel® Xeon® 64 processors 15 Shared memory, Gigabit Ethernet (higher is better) Shared memory, InfiniBand* (higher is better) 1.2
1 0.8
0.6
0.4
0.2
0 1.2
1 0.8
0.6
0.4
0.2
0 Fluent*, 8 nodes LS-DYNA*, 16 nodes Intel MPI HP MPI* Fluent*, 8 nodes LS-DYNA*, 16 nodes Intel MPI HP MPI*
Fluent 6.3, FL5L; Dual Core Intel® Xeon® Processor 5100 series, 8 DP nodes, 4 processes per node; Red Hat* Enterprise Linux 4.0U2; Intel compiler 9.1; HP MPI 2.2; Intel MPI 3.0: export I_MPI_DEVICE=rdssm; export I_MPI_PIN_MPDE=lib; export I_MPI_PIN_PROCS=all; export I_MPI_USE_DYNAMIC_CONNECTIONS=0; export I_MPI_USE_DAPL_INTRANODE=yes; export I_MPI_EAGER_THRESHOLD=128000; export I_MPI_ADJUST_COLLECTIVES=bcast:+1-8192,+2-80000,3-5000000; mpiexec perhost 4 -n 32 LS-DYNA mpp970s; car2car; 64 bit Intel® Xeon® processor (800 MHz FSB) with 1 MB L2 Cache, 16 UP nodes, 2 processes per node; Red Hat* Enterprise Linux 4.0U2; Intel compiler 9.1; HP MPI 2.2; Intel MPI 3.0: export I_MPI_DEVICE=rdssm; export I_MPI_PIN_MPDE=lib; export I_MPI_PIN_PROCS=all; export I_MPI_USE_DYNAMIC_CONNECTIONS=0; export I_MPI_USE_DAPL_INTRANODE=yes; export I_MPI_EAGER_THRESHOLD=128000; export I_MPI_ALLREDUCE_MSG=9200; export I_MPI_ALLTOALL_MSG=1024,17000; export I_MPI_ADJUST_COLLECTIVES=bcast:1-8192,2-5000000; mpiexec -perhost 2 -n 32 Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, refer to http://www.intel.com/performance/resources/benchmark_limitations.htm
.
Intel® Cluster Tools Intel Confidential
16
Intel® MPI Library 3.1
New Operating System support
– Microsoft* Windows* Compute Cluster Server 2003 – Red Hat* Enterprise Linux* 5.0
Increased Application Performance
– Automatic performance tuning through the mpitune utility – Improved default performance settings – – Intelligent process layout pinning Scalable application startup/termination
Usability Improvements
– Simplified variable names, diagnostics and options – – – – Lightweight statistics gathering Unified Intel® memory management support (i_malloc) PVFS and PANFS file system support Enhanced documentation (including a new Installation Guide)
Extended Interoperability
– Simplified Intel(R) Trace Collector build and runtime linkage – – – Intel(R) C++ and Fortran Compiler 10.1 support GNU* 4.x C++ and Fortran 95 compiler support Compatibility with Microsoft* Visual Studio .Net 2005 Intel® Cluster Tools Intel Confidential
17
Intel® Trace Analyzer and Collector
Intel® Cluster Tools Intel Confidential Linux Windows IA 32 Intel® 64 Itanium
18 Intel® Trace Analyzer and Collector 7.1
The world’s best analysis tool for MPI applications
Features – – – – – – – – – Event based approach Low impact on application performance Hierarchical approach to address data scalability Function Tracing Fail-Safe MPI Tracing Provides API to instrument user code Trace optimized program runs Analyzes communication layer Calls are intercepted at link time What’s New – Lightweight statistics gathering – – – – Improved performance Extended interoperability with the Intel® Compilers for application instrumentation Better and faster GUI MPI Checking - correctness checking library
SIMULIA – Dassault Syst è mes
Bringing in new software development tools can be a complex process at SIMULIA. Trace Collector was up and running within a day. Its
intuitive interface easy to use
for the parallel programmer is while providing a lot of
useful data
.
Matt Dunbar Platform Specialist Group Manager Simulia
Abaqus Intel Trace Collector Tracefile Intel Trace Analyzer
Intel® Cluster Tools Intel Confidential
19
Intel® Trace Analyzer and Collector 7.1
Enables the user to quickly focus at the appropriate level of detail to find performance hotspots and bottlenecks Use of hierarchical displays to address scalability in time and processor–space High–performance graphics, excellent zooming and filtering
What’s new?
–
Now on Microsoft Windows* CCS
– Supports Intel MPI Library 3.1
–
Usability Improvements
– Command line integration to the Intel Trace Analyzer – Automatic instrumentation of user binaries for all supported platforms and OS –
Improved performance
– Accelerated data caching to reduce runtime overhead and memory consumption – Lightweight statistics gathering –
Extended interoperability
– Thread safe tracing with extended tracing support for Global Arrays (GA), and non-MPI applications – Compatible with Microsoft and Intel compilers and Visual Studio* – Automatic compiler-driven instrumentation with Intel and GNU compilers Intel® Cluster Tools Intel Confidential
20
Timeline of
initial
application run
Intel® Trace Analyzer and Collector
Works on systems from 2 processes to more than a thousand processes Timeline of
optimized
application run Comparison of function and process profile data Shorter RED bars means less MPI traffic and increased performance Network usage profile data for MPI messages Intel® Cluster Tools Intel Confidential
21 MPI Correctness Checking with Intel Trace Analyzer and Collector
A novel MPI correctness technology – Detects errors with data types, buffers, communicators, point-to-point message and collective operations, deadlocks, and data corruption In-place analysis – Collects, analyzes and reports errors as the application runs – Can trigger debugger breakpoints for in – Complements standard debuggers Current status – Support for Microsoft* Windows* CCS – – Works with Intel MPI 2.x and 3.x
Distributed memory checking
EMSS
The Intel Message Checker is an
indispensable tool for parallel programming
robustness is taken seriously when stability and
Ulrich Jakobus Technical Director EMSS, Germany
Application MPI Calls Intel® MPI Correctness Checking Library PMPI Calls MPI
MPI Correctness Report Intel® Cluster Tools Intel Confidential
22
Intel
®
Math Kernel Library
Intel® Cluster Tools Intel Confidential Linux Windows IA 32 Intel® 64 Itanium
23
Intel® Math Kernel Library
Description A highly optimized math library for scientific, engineering, financial and energy applications Value Outstanding performance on Intel® processors.
Automatic threading (speeds multi-core performance) Function Domains
Linear Algebra: BLAS & LAPACK
Linear Algebra: Sparse Solvers
Fast Fourier Transforms
Vector Math Library
Vector Random Number Generators
Added Cluster functionality
ScaLAPACK and distributed FFTs
Intel® MKL – The Flagship for HPC Math Software
Intel® Cluster Tools Intel Confidential
24
Intel® Math Kernel Library 10.0
What’s new?
– Optimizations for new Penryn architecture – Inclusion of Cluster functionality into base package – Intel MKL now contains full “Cluster Edition” functionality – ScaLAPACK and Distributed Memory Fast Fourier Transforms – Re-architecture for integrated support for – Multiple OpenMP implementations – ILP64, LP64, and Serial version in base package – – – – – – – – Debian* and Ubuntu* Linux distributions support Threading model change New vector math arithmetic functions (Mul, Conj, Abs, …) New vector math enhanced performance mode Sparse BLAS 0-base indexing and single precision support [Z/C]GEMM3M – faster algorithm for [Z/C]GEMM PARDISO* Direct Sparse Solver Out-Of-Core support Greatly Enhanced User’s Guide Intel® Cluster Tools Intel Confidential
25
Intel® MPI Benchmarks 3.1
Enables testing of interconnects, systems, and MPI implementations
Comprehensive set of MPI kernels that provide performance measurements for: – Point-to-point message-passing – – – Global data movement and computation routines One-sided communications File I/O
What’s new?
–
Now available for Microsoft Windows*
–
New Benchmarks
– Gather(v), Scatter(v) –
Improved Performance
– Greater user control over cache re-use and memory usage – Improved run time control for collectives like Alltoall(v) on large clusters Intel® Cluster Tools Intel Confidential
Call to Action
Intel® Software Development tools help make software faster and developers more productive – – – Gain competitive advantage Reduce development and deployment investment Increase productivity with profiling tools and libraries
26
Next Steps – Try the tools.. Learn more and download evals at: www.intel.com/software/products/cluster
Intel® Cluster Tools Intel Confidential
27
More links
Intel® MPI product site http://www.intel.com/go/mpi Intel® Cluster Tools product site www.intel.com/software/products/cluster/index.htm
Intel® MPI self-help pages support.intel.com/support/performancetools/cluster/mpi/index.htm
Intel® MPI Library support page http://support.intel.com/support/performancetools/cluster/mp i/index.htm
Intel® Software Network Forums http://softwarecommunity.intel.com/isn/Community/en us/Forums/ Intel® Cluster Tools Intel Confidential
What is the problem, and why do I care?
What if…
You had to select and purchase a cluster with e.g. 32 nodes You need to assure that 12+ ISV applications will run, such as Fluent*, LS-DYNA*, etc.
You want to add a fast interconnect now or later You have no local wizard resources for cluster-linux-beowolf issues The cluster gets delivered, and runs your parallel applications out-of-the box
28
What do you think?
Possible?
work Is it Easy?
Simple Are you confident?
No Intel Confidential [1] [2] [3] [4] [5] Will [1] [2] [3] [4] [5]
29
Intel® Cluster Ready is…
A partner program to make it easier for customers to buy, deploy and use clusters – reduce TCO • • Backed by reference implementations on Intel server platforms Including tools to confirm compliance A three-way collaboration between Software Vendors, OEMs/Channels, and Intel to assure – ‘a cluster will just work’ • HW System Players solutions certified as compliant with the specification • – Specification encourages HW Systems Players to add “Secret Sauce” • Process facilitating HW Systems Players easily define compliant recipes ISVs applications registered as compliant with the specification – Registered applications will run out-of-the box • …on any compliant Intel® Ready Cluster system • Customers just need to request an ‘Intel® Cluster Ready’ solution
… getting you to a cluster solution right away
Intel® Cluster Tools Intel Confidential
Intel® Cluster Ready – ‘it just works’
Intel® Cluster Ready Solution Deployment
End-Users
Intel® Cluster Ready Platform
Specification
ISV-Applications
Intel® Cluster Ready Application
Registration 30
Intel® Cluster Ready Platform
Specification
IA-based Clusters
www.intel.com/go/cluster
Intel® Cluster Tools Intel Confidential Intel® Cluster Ready Platform
Certification
Intel® Cluster Checker
31
Intel® Cluster Tools Intel Confidential