Hackystat and the DARPA High Productivity Computing Systems Program Philip Johnson University of Hawaii University of Hawaii Slide-1
Download ReportTranscript Hackystat and the DARPA High Productivity Computing Systems Program Philip Johnson University of Hawaii University of Hawaii Slide-1
Hackystat and the DARPA High Productivity Computing Systems Program Philip Johnson University of Hawaii University of Hawaii Slide-1 Overview of HPCS University of Hawaii Slide-2 High Productivity Computing Systems Goal: Provide a new generation of economically viable high productivity computing systems for the national security and industrial user community (2007 – 2010) Impact: Performance (time-to-solution): speedup critical national security applications by a factor of 10X to 40X Programmability (time-for-idea-to-first-solution): reduce cost and time of developing application solutions Portability (transparency): insulate research and operational application software from system Robustness (reliability): apply all known techniques to protect against outside attacks, hardware faults, & programming errors HPCS Program Focus Areas Applications: Intelligence/surveillance, reconnaissance, cryptanalysis, weapons analysis, airborne contaminant modeling and biotechnology Fill the Critical Technology and Capability Gap University ofComputing) Hawaii Today (late 80’s HPC technology)…..to…..Future (Quantum/Bio Slide-3 Vision: Focus on the Lost Dimension of HPC – “User & System Efficiency and Productivity” Parallel Vector Systems 1980’s Technology Vector Tightly Coupled Parallel Systems Commodity HPCs 2010 High-End Computing Solutions Moore’s Law Double Raw Performance every 18 Months New Goal: Double Value Every 18 Months Fill the high-end computing technology and capability gap for critical national security missions University of Hawaii Slide-4 HPCS Technical Considerations Communication Programming Models Shared-Memory Multi-Processing Distributed-Memory Multi-Computing “MPI” Architecture Types Custom Vector Parallel Vector Scalable Vector Vector Supercomputer Microprocessor Symmetric Multiprocessors Distributed Shared Memory Massively Parallel Processors Commodity Clusters, Grids HPCS Focus Tailorable Balanced Solutions Performance Characterization & Precision Programming Models System Architecture Software Technology Hardware Technology Commodity HPC Single Point Design Solutions are no longer Acceptable University of Hawaii Slide-5 HPCS Program Phases I - III Metrics, Metrics and Benchmarks Benchmarks Early Software Academia Research Early Pilot Tools Platforms Platforms HPCS Capability or Products Application Analysis Performance Assessment Products Requirements and Metrics Concept Reviews System Design Review Research Prototypes & Pilot Systems Technology Assessments PDR DDR Industry Industry Evolutionary Development Cycle Phase II Readiness Reviews Fiscal Year 02 Phase III Readiness Review 03 04 05 06 07 08 09 Reviews Industry Procurements Critical Program Milestones Slide-6 Phase I Industry Concept Study Phase II R&D Phase III Full Scale Development University of Hawaii 10 Application Analysis/ Performance Assessment Activity Flow Inputs DDR&E & IHEC Mission Analysis Mission Partners: DOD DOE NNSA NSA NRO Application Analysis Benchmarks & Metrics Impacts HPCS Applications Common Critical Kernels Participants HPCS Technology Drivers Compact Applications Define System Requirements and Characteristics 1. Cryptanalysis 2. Signal and Image Processing 3. Operational Weather 4. Nuclear Stockpile Stewardship 5. Etc. Mission-Specific Roadmap Mission Work Flows Applications Productivity Mission Partners Ratio of Utility/Cost Improved Mission Capability Metrics Participants: Cray IBM Sun Slide-7 - Development time (cost) - Execution time (cost) DARPA HPCS Program Implicit Factors University of Hawaii Motivation Workflow Priorities & Goals • • System Requirements Workflows define scope of customer priorities Workstation Activity and Purpose benchmarks will be used to measure Productivity HPCS Goal is to add value to each workflow – Productivity • Mission Needs Cluster Researcher Workflow Researcher Enterprise Production Implicit Productivity Factors Perf. Prog. Port. Robust. High High High High High High High Production HPCS Increase productivity while increasing problem size Problem Size University of Hawaii Slide-8 Productivity Framework Overview Phase I: Define Phase II: Implement Phase III: Transition Framework & Scope Petascale Requirements Framework & Perform Design Assessments To HPC Procurement Quality Framework Value Metrics •Execution •Development Run Evaluation Acceptance Experiments Level Tests Preliminary HPCS Vendors Multilevel Multilevel Workflows -Production -Enterprise -Researcher System HPCS FFRDC & Gov System Models R&D Partners Models & & Benchmarks -Activity •Purpose Final Prototypes Mission Agencies SN001 Commercial or Nonprofit Productivity Sponsor HPCS needs to develop a procurement quality assessment methodology that will be the basis of 2010+ HPC procurements University of Hawaii Slide-9 HPCS Phase II Teams Industry: PI: Elnozahy PI: Rulifson PI: Smith Goal: Provide a new generation of economically viable high productivity computing systems for the national security and industrial user community (2007 – 2010) Productivity Team (Lincoln Lead) MIT Lincoln Laboratory PI: Kepner PI: Koester PI: Lucas PI: Basili PI: Benson & Snavely Ohio LCS State PIs: Vetter, Lusk, Post, Bailey PIs: Gilbert, Edelman, Ahalt, Mitchell Goal: Develop a procurement quality assessment methodology that will be the basis University of Hawaii of 2010+ HPC procurements Slide-10 Motivation: Metrics Drive Designs “You get what you measure” Low Large FFTs (Reconnaissance) High High Table Toy (GUPS) (Intelligence) Adaptive Multi-Physics Weapons Design Vehicle Design Weather Top500 Linpack High Rmax Development Time (Example) Language Expressiveness Spatial Locality Execution Time (Example) Tradeoffs UPC/CAF C/Fortran MPI/OpenMP StreamsAdd Temporal Locality High Performance High Level Languages Matlab/ Python SIMD/ DMA Low Low Current metrics favor caches and pipelines • Systems ill-suited to applications with • Low spatial locality • Low temporal locality Low Language Performance High • No metrics widely used • Least common denominator • • standards Difficult to use Difficult to optimize University of Hawaii Slide-11 Assembly/ VHDL Phase 1: Productivity Framework Activity & Purpose Benchmarks Productivity (Ratio of Utility/Cost) Work Flows Productivity Metrics Development Time (cost) Actual System or Model Common Modeling Interface Execution Time (cost) System Parameters (Examples) BW bytes/flop (Balance) Memory latency Memory size …….. Processor flop/cycle Processor integer op/cycle Bisection BW ……… Size (ft3) Power/rack Facility operation ………. Code size Restart time (Reliability) Code Optimization time ……… University of Hawaii Slide-12 Phase 2: Implementation (Mitre, ISI, LBL, Lincoln, HPCMO, LANL & Mission Partners) (Lincoln, OSU, CodeSourcery) Performance Analysis (ISI, LLNL & UCSD) Activity & Purpose Benchmarks Productivity (Ratio of Utility/Cost) Work Flows Metrics Analysis of Current and New Codes (Lincoln, UMD & Mission Partners) Productivity Metrics Actual System or Model Development Time (cost) University Experiments (MIT, UCSB, UCSD, UMD, USC) Common Modeling Interface Execution Time (cost) System Parameters (Examples) BW bytes/flop (Balance) Memory latency Memory size …….. Processor flop/cycle Processor integer op/cycle Bisection BW ……… Size (ft3) Power/rack Facility operation ………. Code size Restart time (Reliability) Code Optimization time ……… (ISI, LLNL& UCSD) (ANL & Pmodels Group) University of Hawaii Slide-13 Contains Proprietary Information - For Government Use Only HPCS Mission Work Flows Overall Cycle Theory Researcher Days to hours Development Cycle Research er Hours to Code Prototyping minutes Test Design Development Execution Enterprise Port Legacy Software Enterprise Months to days Simulation Orient Observe Production Initial Product Development Hours to Minutes (Response Time) Act Decide Port Legacy Software Months to days Code Optimize Development Prototyping Design Test Scale Design Production Years to months Initial Development Experiment Visualize Design Code Evaluation Test Maintenance Port, Scale, Optimize Operation HPCS Productivity Factors: Performance, Programmability, University of Hawaii Portability, and Robustness are very closely coupled with each work flow Slide-14 HPC Workflow SW Technologies • • • Production Workflow Many technologies targeting specific pieces of workflow Need to quantify workflows (stages and % time spent) Need to measure technology impact on stages Supercomputer Workstation Algorithm Development Spec Design, Code, Test Operating Systems Compilers Linux Matlab Java Libraries Tools Problem Solving Environments Port, Scale, Optimize C++ RT Linux OpenMP F90 UPC Coarray ATLAS, BLAS, VSIPL MPI DRI FFTW, PETE, PAPI ||VSIPL++ CORBA UML Run Globus CCA Mainstream Software TotalView ESMF POOMA PVL HPC Software University of Hawaii Slide-15 Prototype Productivity Models Efficiency and Power (Kennedy, Koelbel, Schreiber) Special Model with Work Estimator (Sterling) w SP E A cf n cm coT T(PL ) I(PL ) rE(PL ) I (PL ) E(PL ) I(P 0) rE(P 0) I (P 0) E( P 0) I(P 0) / L rE(P 0) /L Utility (Snir) U(T(S, A,Cost)) P(S,A,U(.)) min cos t Cost Productivity Factor Based (Kepner) productivity GUPS ... Linpack useful ops GUPS ... second Linpack Hardware Cost productivity mission factor factor Availability productivity Language Parallel Portability factor Level Model Maintenance Effort Multipliers x A x Size Least Action (Numrich) Time-To-Solution (Kogge) year Surveillance Programming Time CoCoMo II (software engineering community) Scale Factors S= º [ wdev + wcomp ] dt; S=0 Intelligence month Weathe r (res earch) Weapons De sign program m ing bounde d m iss ions Crypt analysis week day HPCS Goal Weathe r (ope rational) hour execution bounde d m iss ions hour day week month year Execution Time HPCS has triggered ground breaking activity in understanding HPC productivity University of Hawaii -Community focused on quantifiable productivity (potential for broad impact) Slide-16 Example Existing Code Analysis MG Performance Analysis of existing codes used to test metrics and identify important trends in productivity and performance NAS MG Linecounts 1200 comm/sync/dir 1000 declarations computation 800 Cray Inc. Proprietary ĞNotFor Public Disclosure 600 400 200 0 MPI Java HPF OpenMP Cray Inc. Proprietary ĞNotFor Public Disclosure Slide-17 Serial A-ZPL University of Hawaii Example Experiment Results (N=1) Performance (Speedup x Efficiency) Matlab 1000 C C++ 3Research 2 Current Practice PVL BLAS pMatlab /MPI 100 Distributed Memory 1 BLAS /MPI 4 MatlabMPI 10 6 1 7 •Matlab *Estimate •BLAS •BLAS/OpenMP •BLAS/MPI* •PVL/BLAS/MPI* •MatlabMPI •pMatlab* 5 BLAS 0 Shared Memory BLAS/ OpenMP • Same application (image filtering) • Same programmer • Different langs/libs Single Processor Matlab 0 200 400 600 800 1000 Development Time (Lines of Code) Controlled experiments can potentially measure the impact of different University of Hawaii technologies and quantify development time and execution time tradeoffs Slide-18 Summary • Goal is to develop an acquisition quality framework for HPC systems that includes – Development time – Execution time • Have assembled a team that will develop models, analyze existing HPC codes, develop tools and conduct HPC development time and execution time experiments • Measures of success – Acceptance by users, vendors and acquisition community – Quantitatively explain HPC rules of thumb: • "OpenMP is easier than MPI, but doesn’t scale a high” • "UPC/CAF is easier than OpenMP” • "Matlab is easier the Fortran, but isn’t as fast” – Predict impact of new technologies Slide-19 University of Hawaii Example Development Time Experiment • Goal: Quantify development time vs. execution time tradeoffs of different parallel programming models – – – • • Setting: Senior/1st Year Grad Class in Parallel Computing (MIT/BU, Berkeley/NERSC, CMU/PSC, UMD/?, …) Timeline: – – – • Month 1: Intro to parallel programming Month 2: Implement serial version of compact app Month 3: Implement parallel version Metrics: – – • Message passing (MPI) Threaded (OpenMP) Array (UPC, Co-Array Fortran) Development time (from logs), SLOCS, function points, … Execution time, scalability, comp/comm, speedup, … Analysis: – – – Development time vs. Execution time of different models Performance relative to expert implementation Size relative to expert implementation University of Hawaii Slide-20 Hackystat in HPCS University of Hawaii Slide-21 About Hackystat • Five years old: – – – – • I wrote the first LOC during first week of May, 2001. Current size: 320,562 LOC (not all mine) ~5 active developers Open source, GPL General application areas: – Education: teaching measurement in SE – Research: Test Driven Design, Software Project Telemetry, HPCS – Industry: project management • Has inspired startup: 6th Sense Analytics University of Hawaii Slide-22 Goals for Hackystat-HPCS • • • Support automated collection of useful lowlevel data for a wide variety of platforms, organizations, and application areas. Make Hackystat low-level data accessable in a standard XML format for analysis by other tools. Provide workflow and other analyses over low-level data collected by Hackystat and other tools to support: – discovery of developmental bottlenecks – insight into impact of tool/language/library choice for specific applications/organizations. University of Hawaii Slide-23 Pilot Study, Spring 2006 • Goal: Explore issues involved in workflow analysis using Hackystat and students. • Experimental conditions (were challenging): – – – – • Undergraduate HPC seminar 6 students total, 3 did assignment, 1 collected data. 1 week duration Gauss-Seidel iteration problem, written in C, using PThreads library, on cluster As a pilot study, it was successful. University of Hawaii Slide-24 Data Collection: Sensors • Sensors for Emacs and Vim captured editing activities. • • Sensor for CUTest captured testing activities. • Custom makefile with compilation, testing, and execution targets, each instrumented with sensors. Sensor for Shell captured command line activities. University of Hawaii Slide-25 Example data: Editor activities University of Hawaii Slide-26 Example data: Testing University of Hawaii Slide-27 Example data: File Metrics University of Hawaii Slide-28 Example data: Shell Logger University of Hawaii Slide-29 Data Analysis: Workflow States • Our goal was to see if we could automatically infer the following developer workflow states: – Serial coding – Parallel coding – Validation/Verification – Debugging – Optimization University of Hawaii Slide-30 Workflow State Detection: Serial coding • • • We defined the "serial coding" state as the editing of a file not containing any parallel constructs, such as MPI, OpenMP, or PThread calls. We determine this through the MakeFile, which runs SCLC over the program at compile time and collects Hackystat FileMetric data that provides counts of parallel constructs. We were able to identify the Serial Coding state if the MakeFile was used consistently. University of Hawaii Slide-31 Workflow State Detection: Parallel Coding • We defined the "parallel coding" state as the editing of a file containing a parallel construct (MPI, OpenMP, PThread call). • Similarly to serial coding, we get the data required to infer this phase using a MakeFile that runs SCLC and collects FileMetric data. • We were able to identify the parallel coding state if the MakeFile was used consistently. University of Hawaii Slide-32 Workflow State Detection: Testing • We defined the "testing" state as the invocation of unit tests to determine the functional correctness of the program. • Students were provided with test cases and the CUTest to test their program. • We were able to infer the Testing state if CUTest was used consistently. University of Hawaii Slide-33 Workflow State Detection: Debugging • We have not yet been able to generate satisfactory heuristics to infer the "debugging" state from our data. – Students did not use a debugging tool that would have allowed instrumentation with a sensor. – UMD heuristics, such as the presence of "printf" statements, were not collected by SCLC. – Debugging is entwined with Testing. University of Hawaii Slide-34 Workflow State Detection: Optimization • We have not yet been able to generate satisfactory heuristics to infer the "optimization" state from our data. – Students did not use a performance analysis tool that would have allowed instrumentation with a sensor. – Repeated command line invocation of the program could potentially identify the activity as "optimization". University of Hawaii Slide-35 Insights from the pilot study, 1 • Automatic inference of these workflow states in a student setting requires: – Consistent use of MakeFile (or some other mechanism to invoke SCLC consistently) to infer serial coding and parallel coding workflow states. – Consistent use of an instrumented debugging tool to infer the debugging workflow state. – Consistent use of an "execute" MakeFile target (and/or an instrumented performance analysis tool) to infer the optimization workflow state. University of Hawaii Slide-36 Insights from the pilot study, 2 • Ironically, it may be easier to infer workflow states from industrial settings than from classroom settings! – Industrial settings are more likely to use a wider variety of tools which could be instrumented and provide better insight into development activities. – Large scale programming leads inexorably to consistent use of MakeFiles (or similar scripts) that should simplify state inference. University of Hawaii Slide-37 Insights from the pilot study, 3 • Are we defining the right set of workflow states? • For example, the "debugging" phase seems difficult to distinguish as a distinct state. • Do we really need to infer "debugging" as a distinct activity? • Workflow inference heuristics appear to be highly contextual, depending upon the language, toolset, organization, and application. (This is not a bug, this is just reality. We will probably need to enable each MP to develop heuristics that work for them.) University of Hawaii Slide-38 Next steps • Graduate HPC classes at UH. – The instructor (Henri Casanova) has agreed to participate with UMD and UH/Hackystat in data collection and analysis. – Bigger assignments, more sophisticated students, hopefully larger class! • Workflow Inference System for Hackystat (WISH) – Support export of raw data to other tools. – Support import of raw data from other tools. – Provide high-level rule-based inference mechanism to support organization-specific heuristics for workflow state identification. University of Hawaii Slide-39