Intel® Cluster Tools Cluster Software & Technologies Intel Software

Download Report

Transcript Intel® Cluster Tools Cluster Software & Technologies Intel Software

1

Intel® Cluster Tools

Cluster Software & Technologies Intel Software & Solutions Group

2

Agenda

     Introduction Intel® Software Development Products overview Cluster Tools Call to action and next steps Intel® Cluster Ready Intel® Cluster Tools Intel Confidential

3

Intel® Software Development Products

Intel® Compilers

– The best way to get application

performance

on Intel processors 

Intel® VTune™ Performance Analyzers

– Identify bottlenecks in source code to increase

performance

or solve problems 

Intel® Performance Libraries

– Highly

optimized

, thread-safe, multimedia and HPC math functions 

Intel® Threading Tools

– Find threading errors and optimize threaded applications for maximum

performance

Intel® Cluster Tools

– Create, analyze,

optimize

and deploy cluster-based applications

Boost HW Performance by SW Development Tools

Intel® Cluster Tools Intel Confidential

4

Cluster Market rapidly growing

100% 80% 60% 40% 20% HPC Market using Clusters

Clusters are the majority of HPC market

Clusters Non-Clustered 0% 03Q1 03Q2 03Q3 03Q4 04Q1 04Q2 04Q3 04Q4 05Q1 05Q2 05Q3 05Q4

Why? Less expensive hardware. Easier implementation.

Source: IDC, 2006 Intel® Cluster Tools Intel Confidential

What Are the Biggest Bottlenecks Today in Creating Parallel Applications?

Source: Developing Custom Parallel Computing Applications, Simon Management Group, September 2006

5

Intel® Cluster Tools Intel Confidential

6 Intel® Cluster Tools

Intel® Cluster Tools help you use the power of your software to unleash the full potential of the cluster

Performance

– Extract the maximum application performance and scalability from Intel processor based clusters – Fully enables multi-core processors and future Intel architectures 

Compatibility

– Enables new or upgraded interconnects by invoking a new driver – MPICH2 based – – 32-bit and 64-bit processor support within one package Intel compilers and MKL 

Support

– Unlimited technical support and upgrades included for one year – Get answers from the engineers that know how to develop software on Intel Architecture Intel® Cluster Tools Intel Confidential

7

Intel Tools for Clusters

Intel software tools make clusters easier to program and optimize

– Intel® Cluster Toolkit – Bundle with single installer and license – Intel® MPI Library • A high performance universal MPI solution enabling applications to run across multiple network fabrics – Intel® Trace Analyzer and Collector • Low-overhead, event-based tracing tool that allows complete graphics analysis of parallel applications – Intel® Math Kernel Library • Highly optimized and extensively threaded math routines for engineering and scientific applications – Intel® MPI Benchmarks • Compares cluster and MPI implementation performance Intel® Cluster Tools Intel Confidential

What’s new

    Just launched new versions of the tools Support for Microsoft* Windows CCS* Improvements in performance and usability Extended interoperability

8

Intel® Cluster Tools Intel Confidential

9

Intel® Cluster Toolkit Compiler Edition

Intel® Cluster Tools Intel Confidential  Linux  Windows  IA 32  Intel® 64  Itanium

10

Intel® Cluster Toolkit Compiler Edition 3.1

New product suite containing: – Cluster OpenMP (For Linux only on Intel® 64 and IA-64 architectures) – – – – – – – Intel® C++ Compiler 10.1 Intel® Debugger 10.1 Intel® Fortran Compiler 10.1 Intel® MKL Cluster Edition 10.0

Intel® MPI Benchmarks 3.1 Intel® MPI Library 3.1 Intel® Trace Analyzer and Collector 7.1

Unified cluster installation for all components

 Available for Linux* and Microsoft Windows* CCSEnhanced compiler support for clusters: – – Intel Debugger 10.1 for Linux supports Intel MPI Library 3.1

New –tcollect & /Qtcollect options to instrument function calls for Intel Trace Collector Intel® Cluster Tools Intel Confidential

11

Intel® Cluster Toolkit

Intel® Cluster Tools Intel Confidential  Linux  Windows  IA 32  Intel® 64  Itanium

12

Intel

®

Cluster Toolkit 3.1

Boost development and performance of cluster applications

– – – Universal MPI Library runs cluster applications on all networks Leading cluster development environment to efficiently create, analyze, optimize and deploy parallel applications Ready to support dual-core and multi-core cluster 

Intel® Cluster Toolkit 3.1

Full-featured MPI tools environment for Linux* and Windows* – Intel® MPI Library 3.1

– – Intel® Trace Analyzer and Collector 7.1

Intel® Math Kernel Library 10.0

– Intel® MPI Benchmarks 3.1

Packaged for faster installation at lower prices !

Intel® Cluster Tools Intel Confidential

13

Intel® MPI Library

Intel® Cluster Tools Intel Confidential  Linux  Windows  IA 32  Intel® 64  Itanium

14 Intel® MPI Library 3.1

A high performance universal MPI solution enabling applications to run across multiple network fabrics

 

Features

– High performance MPI-2 implementation – – – – – – Linux and Windows CCS support Interconnect independence Smart fabric selection Easy installation Free runtime environment Close integration with Intel and 3 rd development tools party

What’s New

– Now available for Microsoft* Windows* Compute Cluster Server 2003 – – – Improved application performance Multiple usability improvements Extended interoperability

RIKEN

Intel’s MPI and Cluster Tools provide us the best cluster development environment.”

Dr. Takahiro Koichi Computational Astro Physics Laboratory RIKEN, Japan

Intel® Cluster Tools Intel Confidential

Intel® MPI Library 3.0 for Linux* with sample ISV benchmarks on Intel® Xeon® 64 processors 15 Shared memory, Gigabit Ethernet (higher is better) Shared memory, InfiniBand* (higher is better) 1.2

1 0.8

0.6

0.4

0.2

0 1.2

1 0.8

0.6

0.4

0.2

0 Fluent*, 8 nodes LS-DYNA*, 16 nodes Intel MPI HP MPI* Fluent*, 8 nodes LS-DYNA*, 16 nodes Intel MPI HP MPI*

Fluent 6.3, FL5L; Dual Core Intel® Xeon® Processor 5100 series, 8 DP nodes, 4 processes per node; Red Hat* Enterprise Linux 4.0U2; Intel compiler 9.1; HP MPI 2.2; Intel MPI 3.0: export I_MPI_DEVICE=rdssm; export I_MPI_PIN_MPDE=lib; export I_MPI_PIN_PROCS=all; export I_MPI_USE_DYNAMIC_CONNECTIONS=0; export I_MPI_USE_DAPL_INTRANODE=yes; export I_MPI_EAGER_THRESHOLD=128000; export I_MPI_ADJUST_COLLECTIVES=bcast:+1-8192,+2-80000,3-5000000; mpiexec perhost 4 -n 32 LS-DYNA mpp970s; car2car; 64 bit Intel® Xeon® processor (800 MHz FSB) with 1 MB L2 Cache, 16 UP nodes, 2 processes per node; Red Hat* Enterprise Linux 4.0U2; Intel compiler 9.1; HP MPI 2.2; Intel MPI 3.0: export I_MPI_DEVICE=rdssm; export I_MPI_PIN_MPDE=lib; export I_MPI_PIN_PROCS=all; export I_MPI_USE_DYNAMIC_CONNECTIONS=0; export I_MPI_USE_DAPL_INTRANODE=yes; export I_MPI_EAGER_THRESHOLD=128000; export I_MPI_ALLREDUCE_MSG=9200; export I_MPI_ALLTOALL_MSG=1024,17000; export I_MPI_ADJUST_COLLECTIVES=bcast:1-8192,2-5000000; mpiexec -perhost 2 -n 32 Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, refer to http://www.intel.com/performance/resources/benchmark_limitations.htm

.

Intel® Cluster Tools Intel Confidential

16

Intel® MPI Library 3.1

New Operating System support

– Microsoft* Windows* Compute Cluster Server 2003 – Red Hat* Enterprise Linux* 5.0

Increased Application Performance

– Automatic performance tuning through the mpitune utility – Improved default performance settings – – Intelligent process layout pinning Scalable application startup/termination 

Usability Improvements

– Simplified variable names, diagnostics and options – – – – Lightweight statistics gathering Unified Intel® memory management support (i_malloc) PVFS and PANFS file system support Enhanced documentation (including a new Installation Guide) 

Extended Interoperability

– Simplified Intel(R) Trace Collector build and runtime linkage – – – Intel(R) C++ and Fortran Compiler 10.1 support GNU* 4.x C++ and Fortran 95 compiler support Compatibility with Microsoft* Visual Studio .Net 2005 Intel® Cluster Tools Intel Confidential

17

Intel® Trace Analyzer and Collector

Intel® Cluster Tools Intel Confidential  Linux  Windows  IA 32  Intel® 64  Itanium

18 Intel® Trace Analyzer and Collector 7.1

The world’s best analysis tool for MPI applications

  Features – – – – – – – – – Event based approach Low impact on application performance Hierarchical approach to address data scalability Function Tracing Fail-Safe MPI Tracing Provides API to instrument user code Trace optimized program runs Analyzes communication layer Calls are intercepted at link time What’s New – Lightweight statistics gathering – – – – Improved performance Extended interoperability with the Intel® Compilers for application instrumentation Better and faster GUI MPI Checking - correctness checking library

SIMULIA – Dassault Syst è mes

Bringing in new software development tools can be a complex process at SIMULIA. Trace Collector was up and running within a day. Its

intuitive interface easy to use

for the parallel programmer is while providing a lot of

useful data

.

Matt Dunbar Platform Specialist Group Manager Simulia

Abaqus Intel Trace Collector Tracefile Intel Trace Analyzer

Intel® Cluster Tools Intel Confidential

19

Intel® Trace Analyzer and Collector 7.1

 Enables the user to quickly focus at the appropriate level of detail to find performance hotspots and bottlenecks  Use of hierarchical displays to address scalability in time and processor–space  High–performance graphics, excellent zooming and filtering 

What’s new?

Now on Microsoft Windows* CCS

– Supports Intel MPI Library 3.1

Usability Improvements

– Command line integration to the Intel Trace Analyzer – Automatic instrumentation of user binaries for all supported platforms and OS –

Improved performance

– Accelerated data caching to reduce runtime overhead and memory consumption – Lightweight statistics gathering –

Extended interoperability

– Thread safe tracing with extended tracing support for Global Arrays (GA), and non-MPI applications – Compatible with Microsoft and Intel compilers and Visual Studio* – Automatic compiler-driven instrumentation with Intel and GNU compilers Intel® Cluster Tools Intel Confidential

20

Timeline of

initial

application run

Intel® Trace Analyzer and Collector

Works on systems from 2 processes to more than a thousand processes Timeline of

optimized

application run Comparison of function and process profile data Shorter RED bars means less MPI traffic and increased performance Network usage profile data for MPI messages Intel® Cluster Tools Intel Confidential

21 MPI Correctness Checking with Intel Trace Analyzer and Collector

 A novel MPI correctness technology – Detects errors with data types, buffers, communicators, point-to-point message and collective operations, deadlocks, and data corruption  In-place analysis – Collects, analyzes and reports errors as the application runs – Can trigger debugger breakpoints for in – Complements standard debuggers  Current status – Support for Microsoft* Windows* CCS – – Works with Intel MPI 2.x and 3.x

Distributed memory checking

EMSS

The Intel Message Checker is an

indispensable tool for parallel programming

robustness is taken seriously when stability and

Ulrich Jakobus Technical Director EMSS, Germany

Application MPI Calls Intel® MPI Correctness Checking Library PMPI Calls MPI

MPI Correctness Report Intel® Cluster Tools Intel Confidential

22

Intel

®

Math Kernel Library

Intel® Cluster Tools Intel Confidential  Linux  Windows  IA 32  Intel® 64  Itanium

23

Intel® Math Kernel Library

Description A highly optimized math library for scientific, engineering, financial and energy applications Value Outstanding performance on Intel® processors.

Automatic threading (speeds multi-core performance) Function Domains

Linear Algebra: BLAS & LAPACK

Linear Algebra: Sparse Solvers

Fast Fourier Transforms

Vector Math Library

Vector Random Number Generators

Added Cluster functionality

ScaLAPACK and distributed FFTs

Intel® MKL – The Flagship for HPC Math Software

Intel® Cluster Tools Intel Confidential

24

Intel® Math Kernel Library 10.0

What’s new?

– Optimizations for new Penryn architecture – Inclusion of Cluster functionality into base package – Intel MKL now contains full “Cluster Edition” functionality – ScaLAPACK and Distributed Memory Fast Fourier Transforms – Re-architecture for integrated support for – Multiple OpenMP implementations – ILP64, LP64, and Serial version in base package – – – – – – – – Debian* and Ubuntu* Linux distributions support Threading model change New vector math arithmetic functions (Mul, Conj, Abs, …) New vector math enhanced performance mode Sparse BLAS 0-base indexing and single precision support [Z/C]GEMM3M – faster algorithm for [Z/C]GEMM PARDISO* Direct Sparse Solver Out-Of-Core support Greatly Enhanced User’s Guide Intel® Cluster Tools Intel Confidential

25

Intel® MPI Benchmarks 3.1

Enables testing of interconnects, systems, and MPI implementations

 Comprehensive set of MPI kernels that provide performance measurements for: – Point-to-point message-passing – – – Global data movement and computation routines One-sided communications File I/O 

What’s new?

Now available for Microsoft Windows*

New Benchmarks

– Gather(v), Scatter(v) –

Improved Performance

– Greater user control over cache re-use and memory usage – Improved run time control for collectives like Alltoall(v) on large clusters Intel® Cluster Tools Intel Confidential

Call to Action

 Intel® Software Development tools help make software faster and developers more productive – – – Gain competitive advantage Reduce development and deployment investment Increase productivity with profiling tools and libraries

26

Next Steps – Try the tools.. Learn more and download evals at: www.intel.com/software/products/cluster

Intel® Cluster Tools Intel Confidential

27

More links

 Intel® MPI product site http://www.intel.com/go/mpi Intel® Cluster Tools product site www.intel.com/software/products/cluster/index.htm

Intel® MPI self-help pages support.intel.com/support/performancetools/cluster/mpi/index.htm

Intel® MPI Library support page http://support.intel.com/support/performancetools/cluster/mp i/index.htm

Intel® Software Network Forums http://softwarecommunity.intel.com/isn/Community/en us/Forums/ Intel® Cluster Tools Intel Confidential

What is the problem, and why do I care?

     

What if…

You had to select and purchase a cluster with e.g. 32 nodes You need to assure that 12+ ISV applications will run, such as Fluent*, LS-DYNA*, etc.

You want to add a fast interconnect now or later You have no local wizard resources for cluster-linux-beowolf issues The cluster gets delivered, and runs your parallel applications out-of-the box

28

   

What do you think?

Possible?

work Is it Easy?

Simple Are you confident?

No Intel Confidential [1] [2] [3] [4] [5] Will [1] [2] [3] [4] [5]

29

Intel® Cluster Ready is…

  A partner program to make it easier for customers to buy, deploy and use clusters – reduce TCO • • Backed by reference implementations on Intel server platforms Including tools to confirm compliance A three-way collaboration between Software Vendors, OEMs/Channels, and Intel to assure – ‘a cluster will just work’ • HW System Players solutions certified as compliant with the specification • – Specification encourages HW Systems Players to add “Secret Sauce” • Process facilitating HW Systems Players easily define compliant recipes ISVs applications registered as compliant with the specification – Registered applications will run out-of-the box • …on any compliant Intel® Ready Cluster system • Customers just need to request an ‘Intel® Cluster Ready’ solution

… getting you to a cluster solution right away

Intel® Cluster Tools Intel Confidential

Intel® Cluster Ready – ‘it just works’

Intel® Cluster Ready Solution Deployment

End-Users

Intel® Cluster Ready Platform

Specification

ISV-Applications

Intel® Cluster Ready Application

Registration 30

Intel® Cluster Ready Platform

Specification

IA-based Clusters

www.intel.com/go/cluster

Intel® Cluster Tools Intel Confidential Intel® Cluster Ready Platform

Certification

Intel® Cluster Checker

31

Intel® Cluster Tools Intel Confidential