ARC Cluster - North Carolina State University

Transcript ARC Cluster - North Carolina State University

Frank Mueller

North Carolina State University

PIs & Funding

     NSF funding level: $550k NCSU: $60k (ETF) + $20+k (CSC) NVIDIA: donations ~$30k PIs/co-PIs: — Frank Mueller — — — — Vincent Freeh Helen Gu Xuxian Jiang Xiaosong Ma Contributors: — Nagiza Samatova — George Rouskas 2

ARC Cluster: In the News

   “NC State is Home to the Most Powerful Academic HPC in North Carolina” (CSC News, Feb 2011) “Crash-Test Dummy For High-Performance Computing” (NCSU, The Abstract, Apr 2011) “Supercomputer Stunt Double” (insideHPC, Apr 2011) 3

Purpose

 Create a mid-size computational infrastructure to support research in areas such as: 4

Researchers Already Active

In the first week of public access:    From groups from within NCSU: — CSC, ECE, Chem/Bio Engineering, Materials, Operations Research ORNL Tsinghua University, Beijing, China 5

System Overview

Head/Login Nodes PFS Switch Stack Compute/Spare Nodes GEther Switch Stack I/O Nodes IB Switch Stack SSD+ SATA Front Tier Interconnect Mid Tier Storage Array Back Tier 6

Hardware

  108 Compute Nodes — 2-way SMPs with AMD Opteron 6128 processors with 8 cores per socket — — 16 cores per node!

32 GB DRAM per node

1728 compute cores available

Interconnects

 Gigabit Ethernet —  interactive jobs, ssh, service — Home directories 40Gbit/s Infiniband (OFEDstack) — MPI Communication — — Open MPI, MVAPICH IP over IB 8

GPUs

NVIDIA Tesla C2050 (1 login + 36 nodes) — 448 Compute cores per GPU — — — — Peak GigaFLOPS 515 SP/1030 DP Memory Amount 3GB Memory Interface 384-bit Memory Bandwidth (GB/sec) 144 NVIDIA GTX480 (10 nodes) — — 480 Compute cores per GPU Peak GigaFLOPS 1344.96 SP/ 168DP — — — Memory Amount 3GB Memory Interface 384-bit Memory Bandwidth (GB/sec) 177.4

NVIDIA Tesla C2070 (2 nodes) — — — — — 448 Compute cores per GPU Peak GigaFLOPS 515 SP/1030 DP Memory Amount 6GB Memory Interface 384-bit Memory Bandwidth (GB/sec) 144 NVIDIA 1060 GTX (1 node) NVIDIA 8800 GTX (1 node) 9

Solid State Drives

 All 108 compute nodes equipped with

OCZ RevoDrive 120GB SSD

— — — — Read: Up to 540 MB/s Write: Up to 480 MB/s Sustained Write: Up to 400 MB/s Random Write 4KB (Aligned): 75,000 IOPS 10

File Systems

 Available Today: — NFS home directories over Gigabit Ethernet — — Local per-node scratch on spinning disks (ext3) Local per-node 120GB SSD (ext2)  In the future: — Parallel File Systems – Lustre – Separate dedicated nodes are available for parallel filesystems – 1 MDS + 4 clients Are you interested in helping us set this up for your research projects??

Power Monitoring

 Watts Up Pro — Serial and USB available.

 Connected in groups of: — Mostly 4 nodes (sometimes just 3) — 2x 1 node – 1 w/ GPU – 1 w/o GPU 12

Software Stack

 Additional packages and libraries — upon request but… — — — Not free?  you need to pay License required?  you need to sign it Installation required? – Test it  you need to – Provide install script  check ARC website  constantly changing 13

Base System

 64bit Rocks 5.3 (based off of CentOS)  Batch system: — Torque/Maui (PBS)  All compilers and tools are available on the login nodes.

— Gcc, gfortran, … — Compute nodes share the same base OS and libraries as the login nodes.

MPI

 Open MPI — Operates over Infiniband — — Integrated with BLCR Already in your default PATH – mpicc  MVAPICH — Infiniband support — Requires changes to your path. See ARC site.

OpenMP

 The "#pragma omp" directive in C programs works.

gcc -fopenmp -o fn fn.c 16

CUDA SDK

 Ensure you are using a node with a GPU — Several types available to fine tune for your applications needs: – Well-performing single or double precision devices.

 Requires environment changes: export PATH=".:~/bin:/usr/local/bin:/usr/bin:$PATH“ export PATH="/usr/local/cuda/bin:$PATH“ export LD_LIBRARY_PATH="/usr/local/cuda/lib64:/usr/local/cuda/lib:$LD_LIBRARY_PATH“ export MANPATH="/usr/share/man:$MANPATH“ Or see site to make sure you have the latest paths… 17

PGI Compiler (Experimental)

 Awaiting site license update.

export PATH=".:~/bin:/usr/local/bin:/usr/bin:$PATH“ export PATH="/usr/local/cuda/bin:$PATH“ export LD_LIBRARY_PATH="/usr/local/cuda/lib64:/usr/local/cuda/lib“ export MANPATH="/usr/share/man“ 18

Virtualization

 Goal: To allow a user to request VMs from the batch system just like they would any other resource — User gets full root access to each VM requested with complete control over that VM.

— VMs will share the same network or may be grouped together into private networks across single or multiple nodes.

 Elegant VM creation scripts in place allow entire machine creation in a single line.

Job Submission

 cannot SSH to a compute node  must use PBS to submit jobs — Either as batch — or interactively  Presently there are “hard” limits for job times and sizes. In the system.

 There are special queues for nodes with a GPU — As we add additional specialized resources even more queues 20

PBS Basics

 On the login node: — to submit a job: qsub … — — to list your jobs: qstat to list everyone’s jobs: qstat –a — — to delete/cancel/stop your job: qdel … to check node status: pbsnodes 21

qsub Basics

 qsub -q cuda ... # job submitted to GPU/CUDA queue  qsub -l ncpus=4 ... # ask for four tasks (processors) -- packed as up to 16 tasks per node  qsub -l nodes=4:ppn=16 ... # job for four nodes with 16 processors on each node (64 tasks)  qsub -l nodes=2:ppn=1 -q cuda ... # job for two tasks on two nodes with GPU/CUDA support  qsub -l nodes=2,cput=00:5:00 ... # job for two tasks + 5 minutes CPU time  to submit interactive: qsub -I # one node, shell will open up  to submit interactive: qsub -I -nodes=20 #two nodes w/ 20 tasks  to submit interactive: qsub -I -l host=compute-0-54.local #specifically on node 54  to submit interactive: qsub -I -l host=compute-0-54.local+compute-0-55.local #on 54+55  to submit interactive with X11: qsub -I -X ... 22

Listing your nodes

 Once your job begins, $PBS_NODEFILE points to a file that contains a list of your requested nodes.

 Open MPI is already integrated with PBS. Simply using mpirun … will automatically use all requested processes directly from PBS.

 For example, a CUDA programmer that wants to use 4 GPU nodes: [dfiala@login-0-0 ~]$ qsub -I -lnodes=4:ppn=1 -qcuda qsub: waiting for job 1774.arcs.csc.ncsu.edu to start qsub: job 1774.arcs.csc.ncsu.edu ready [dfiala@compute-0-2 ~]$ cat $PBS_NODEFILE compute-0-2.local

compute-0-32.local

compute-0-35.local

compute-0-38.local

---SSHing between these nodes FROM the PBS session is allowed-- 23

Handling problems

 If you find a node that is giving you trouble please report it to the mailing list.

 As a workaround, you can keep that node busy by queuing an empty job: echo sleep 600 | qsub -l host=compute-0-100,walltime=1000 24

Hardware in Action

 4 racks in server room 25

Running Large Jobs (and keeping cool)

 While our new cluster is surely state of the art… Our “dual action” cooling solution for the state of the art cluster State of the art cluster 

The cooling system isn’t.

Temperature Monitoring

 It is the user’s responsibility to maintain room temperatures below 80 degrees while utilizing the cluster.

— ARC website has links to online browser based temperature monitors.

— And the building staff have pagers that will alarm 24/7 when temperatures exceed the limit.

Connecting to ARC

 ARC access is restricted to on-campus IPs only.

— If you ever are unable to log in (connection gets dropped immediately before authentication) then this is likely the cause.

 Non-NCSU users may request remote access by providing a remote machine that their connections must originate from.

Summary

   Your ARC Cluster@Home : What can I do with it?

Primary purpose: Advance Computer Science Research (HPC and beyond) — Want to run a job over the entire machine?

— Want to replace parts of the software stack?

Secondary purpose: Service to sciences, engineering & beyond — Vision: Have domain scientists work w/ Computer Scientists on code http://moss.csc.ncsu.edu/~mueller/cluster/arc/   Equipment donations welcome  Ideas how to improve ARC? —  let us know send to mailing list (once you have an account) — Qs?  request an account: email dfialancsu.edu

– Research topic, abstract, and compute requirements/time –

Must

include your unity ID – NCSU Students: Advisor sends email as means of their approval – Non-NCSU: same + preferred username + hostname(your remote login location.

Slides provided by David Fiala Edited by Frank Mueller Current as of May 11, 2011.