Hackystat and the DARPA High Productivity Computing Systems Program Philip Johnson University of Hawaii University of Hawaii Slide-1

Download Report

Transcript Hackystat and the DARPA High Productivity Computing Systems Program Philip Johnson University of Hawaii University of Hawaii Slide-1

Hackystat and the DARPA
High Productivity Computing
Systems Program
Philip Johnson
University of Hawaii
University of Hawaii
Slide-1
Overview of HPCS
University of Hawaii
Slide-2
High Productivity
Computing Systems
Goal:
 Provide a new generation of economically viable high productivity computing
systems for the national security and industrial user community (2007 – 2010)
Impact:
 Performance (time-to-solution): speedup critical national
security applications by a factor of 10X to 40X
 Programmability (time-for-idea-to-first-solution): reduce
cost and time of developing application solutions
 Portability (transparency): insulate research and
operational application software from system
 Robustness (reliability): apply all known techniques to
protect against outside attacks, hardware faults, &
programming errors
HPCS Program Focus Areas
Applications:
 Intelligence/surveillance, reconnaissance, cryptanalysis, weapons analysis, airborne contaminant
modeling and biotechnology
Fill the Critical Technology and Capability Gap
University ofComputing)
Hawaii
Today (late 80’s HPC technology)…..to…..Future (Quantum/Bio
Slide-3
Vision: Focus on the Lost Dimension of HPC –
“User & System Efficiency and Productivity”
Parallel Vector
Systems
1980’s
Technology
Vector
Tightly Coupled
Parallel Systems
Commodity HPCs
2010
High-End
Computing Solutions
Moore’s Law
Double Raw
Performance every
18 Months
New Goal:
Double Value Every
18 Months
Fill the high-end computing technology and capability gap
for critical national security missions
University of Hawaii
Slide-4
HPCS Technical Considerations
Communication
Programming
Models
Shared-Memory
Multi-Processing
Distributed-Memory
Multi-Computing
“MPI”
Architecture Types
Custom Vector
Parallel
Vector
Scalable
Vector
Vector
Supercomputer
Microprocessor
Symmetric
Multiprocessors
Distributed Shared
Memory
Massively
Parallel
Processors
Commodity
Clusters, Grids
HPCS Focus
Tailorable Balanced Solutions
Performance
Characterization
& Precision
Programming
Models
System
Architecture
Software
Technology
Hardware
Technology
Commodity
HPC
Single Point Design Solutions are no longer
Acceptable
University of Hawaii
Slide-5
HPCS Program Phases I - III
Metrics,
Metrics
and
Benchmarks
Benchmarks
Early
Software
Academia
Research
Early
Pilot
Tools
Platforms
Platforms
HPCS
Capability or
Products
Application
Analysis
Performance
Assessment
Products
Requirements
and Metrics
Concept
Reviews
System
Design
Review
Research
Prototypes
& Pilot Systems
Technology
Assessments
PDR
DDR
Industry
Industry Evolutionary
Development Cycle
Phase II
Readiness Reviews
Fiscal Year
02
Phase III Readiness Review
03
04
05
06
07
08
09
Reviews
Industry Procurements
Critical Program
Milestones
Slide-6
Phase I
Industry
Concept Study
Phase II
R&D
Phase III
Full Scale Development
University of Hawaii
10
Application Analysis/
Performance Assessment
Activity Flow
Inputs
DDR&E
& IHEC
Mission
Analysis
Mission
Partners:
DOD
DOE
NNSA
NSA
NRO
Application Analysis
Benchmarks & Metrics
Impacts
HPCS Applications
Common
Critical
Kernels
Participants
HPCS Technology
Drivers
Compact
Applications
Define System
Requirements and
Characteristics
1. Cryptanalysis
2. Signal and Image
Processing
3. Operational Weather
4. Nuclear Stockpile
Stewardship
5. Etc.
Mission-Specific
Roadmap
Mission Work Flows
Applications
Productivity
Mission Partners
Ratio of
Utility/Cost
Improved Mission
Capability
Metrics
Participants:
Cray
IBM
Sun
Slide-7
- Development time
(cost)
- Execution time
(cost)
DARPA
HPCS Program
Implicit Factors
University of Hawaii
Motivation
Workflow Priorities & Goals
•
•
System
Requirements
Workflows define scope of
customer priorities
Workstation
Activity and Purpose benchmarks
will be used to measure
Productivity
HPCS Goal is to add value to
each workflow
–
Productivity
•
Mission
Needs
Cluster
Researcher
Workflow
Researcher
Enterprise
Production
Implicit Productivity Factors
Perf.
Prog. Port.
Robust.
High
High
High
High
High
High
High
Production
HPCS
Increase productivity while
increasing problem size
Problem Size
University of Hawaii
Slide-8
Productivity Framework Overview
Phase I: Define
Phase II: Implement
Phase III: Transition
Framework & Scope
Petascale Requirements
Framework & Perform
Design Assessments
To HPC Procurement
Quality Framework
Value Metrics
•Execution
•Development
Run Evaluation
Acceptance
Experiments
Level Tests
Preliminary
HPCS Vendors
Multilevel
Multilevel
Workflows
-Production
-Enterprise
-Researcher
System
HPCS FFRDC & Gov
System
Models
R&D Partners
Models
&
&
Benchmarks
-Activity
•Purpose
Final
Prototypes
Mission Agencies
SN001
Commercial or Nonprofit
Productivity Sponsor
HPCS needs to develop a procurement quality assessment
methodology that will be the basis of 2010+ HPC procurements
University of Hawaii
Slide-9
HPCS Phase II Teams
Industry:
PI: Elnozahy
PI: Rulifson
PI: Smith
Goal:
 Provide a new generation of economically viable high productivity computing
systems for the national security and industrial user community (2007 – 2010)
Productivity Team (Lincoln Lead)
MIT Lincoln
Laboratory
PI: Kepner
PI: Koester
PI: Lucas
PI: Basili
PI: Benson & Snavely
Ohio
LCS State
PIs: Vetter, Lusk, Post, Bailey PIs: Gilbert, Edelman, Ahalt, Mitchell
Goal:
 Develop a procurement quality assessment methodology
that
will be the basis
University
of Hawaii
of 2010+ HPC procurements
Slide-10
Motivation: Metrics Drive Designs
“You get what you measure”
Low
Large FFTs
(Reconnaissance)
High
High
Table Toy (GUPS)
(Intelligence)
Adaptive Multi-Physics
Weapons Design
Vehicle Design
Weather
Top500 Linpack
High
Rmax
Development Time (Example)
Language
Expressiveness
Spatial Locality
Execution Time (Example)
Tradeoffs
UPC/CAF
C/Fortran
MPI/OpenMP
StreamsAdd
Temporal Locality
High Performance
High Level Languages
Matlab/
Python
SIMD/
DMA
Low
Low
Current metrics favor caches and
pipelines
• Systems ill-suited to applications
with
• Low spatial locality
• Low temporal locality
Low
Language
Performance
High
• No metrics widely used
• Least common denominator
•
•
standards
Difficult to use
Difficult to optimize
University of Hawaii
Slide-11
Assembly/
VHDL
Phase 1: Productivity Framework
Activity &
Purpose
Benchmarks
Productivity
(Ratio of Utility/Cost)
Work
Flows
Productivity
Metrics
Development
Time (cost)
Actual
System
or
Model
Common Modeling Interface
Execution
Time (cost)
System Parameters
(Examples)
BW bytes/flop (Balance)
Memory latency
Memory size
……..
Processor flop/cycle
Processor integer op/cycle
Bisection BW
………
Size (ft3)
Power/rack
Facility operation
……….
Code size
Restart time (Reliability) Code
Optimization time
………
University of Hawaii
Slide-12
Phase 2: Implementation
(Mitre, ISI, LBL, Lincoln, HPCMO, LANL & Mission Partners)
(Lincoln, OSU, CodeSourcery)
Performance Analysis
(ISI, LLNL & UCSD)
Activity &
Purpose
Benchmarks
Productivity
(Ratio of Utility/Cost)
Work
Flows
Metrics Analysis of
Current and New Codes
(Lincoln, UMD & Mission Partners)
Productivity
Metrics
Actual
System
or
Model
Development
Time (cost)
University Experiments
(MIT, UCSB, UCSD, UMD, USC)
Common Modeling Interface
Execution
Time (cost)
System Parameters
(Examples)
BW bytes/flop (Balance)
Memory latency
Memory size
……..
Processor flop/cycle
Processor integer op/cycle
Bisection BW
………
Size (ft3)
Power/rack
Facility operation
……….
Code size
Restart time (Reliability) Code
Optimization time
………
(ISI, LLNL& UCSD)
(ANL & Pmodels Group)
University of Hawaii
Slide-13
Contains Proprietary Information - For Government Use Only
HPCS Mission Work Flows
Overall Cycle
Theory
Researcher
Days to
hours
Development Cycle
Research
er Hours to
Code
Prototyping
minutes
Test
Design
Development
Execution
Enterprise
Port Legacy
Software
Enterprise
Months
to days
Simulation
Orient
Observe
Production
Initial Product
Development
Hours to
Minutes
(Response Time)
Act
Decide
Port Legacy
Software
Months
to days
Code
Optimize
Development
Prototyping
Design
Test
Scale
Design
Production
Years to
months
Initial
Development
Experiment
Visualize
Design
Code
Evaluation
Test
Maintenance
Port, Scale,
Optimize
Operation
HPCS Productivity Factors: Performance, Programmability,
University of Hawaii
Portability, and Robustness are very closely coupled with each work flow
Slide-14
HPC Workflow SW Technologies
•
•
•
Production Workflow
Many technologies targeting specific pieces of workflow
Need to quantify workflows (stages and % time spent)
Need to measure technology impact on stages
Supercomputer
Workstation
Algorithm
Development Spec
Design, Code, Test
Operating
Systems
Compilers
Linux
Matlab
Java
Libraries
Tools
Problem
Solving
Environments
Port, Scale, Optimize
C++
RT Linux
OpenMP
F90
UPC Coarray
ATLAS, BLAS,
VSIPL MPI DRI
FFTW, PETE, PAPI
||VSIPL++
CORBA
UML
Run
Globus
CCA
Mainstream Software
TotalView
ESMF
POOMA PVL
HPC Software
University of Hawaii
Slide-15
Prototype Productivity Models
Efficiency and Power
(Kennedy, Koelbel, Schreiber)
Special Model with Work Estimator (Sterling)
w 
SP  E  A
  
cf     n  cm  coT
T(PL )  I(PL )  rE(PL )
I (PL )
E(PL )
 I(P 0) 
 rE(P 0) 
I (P 0)
E( P 0)
 I(P 0) / L  rE(P 0) /L
Utility (Snir)
U(T(S, A,Cost))
P(S,A,U(.))  min cos t
Cost
Productivity Factor Based (Kepner)

productivity

GUPS
...
Linpack

 useful ops GUPS
...
 second Linpack
 
Hardware Cost


productivity mission 



factor  factor 



Availability
productivity Language  Parallel

 
 
 Portability 
 factor   Level  Model 
Maintenance
Effort
Multipliers
x A x Size
Least Action (Numrich)
Time-To-Solution (Kogge)
year Surveillance
Programming Time
CoCoMo II
(software engineering
community)
Scale
Factors
S=
º [ wdev + wcomp ] dt;
S=0
Intelligence
month
Weathe r
(res earch)
Weapons
De sign
program m ing
bounde d
m iss ions
Crypt
analysis
week
day
HPCS Goal
Weathe r
(ope rational)
hour
execution
bounde d
m iss ions
hour
day
week month
year
Execution Time
HPCS has triggered ground breaking activity in understanding HPC productivity
University of Hawaii
-Community focused on quantifiable productivity (potential for
broad impact)
Slide-16
Example Existing Code Analysis
MG Performance
Analysis of existing codes used to
test metrics and identify
important trends in productivity
and performance
NAS MG Linecounts
1200
comm/sync/dir
1000
declarations
computation
800
Cray Inc. Proprietary ĞNotFor Public Disclosure
600
400
200
0
MPI
Java
HPF
OpenMP
Cray Inc. Proprietary ĞNotFor Public Disclosure
Slide-17
Serial
A-ZPL
University of Hawaii
Example Experiment Results (N=1)
Performance (Speedup x Efficiency)
Matlab
1000
C
C++
3Research
2
Current
Practice
PVL
BLAS
pMatlab /MPI
100
Distributed
Memory
1
BLAS
/MPI
4
MatlabMPI
10
6
1
7
•Matlab
*Estimate
•BLAS
•BLAS/OpenMP
•BLAS/MPI*
•PVL/BLAS/MPI*
•MatlabMPI
•pMatlab*
5
BLAS
0
Shared
Memory
BLAS/
OpenMP
• Same application
(image filtering)
• Same programmer
• Different langs/libs
Single
Processor
Matlab
0
200
400
600
800
1000
Development Time (Lines of Code)
Controlled experiments can potentially measure the impact of different
University of Hawaii
technologies and quantify development time and execution time tradeoffs
Slide-18
Summary
•
Goal is to develop an acquisition quality framework for HPC
systems that includes
– Development time
– Execution time
•
Have assembled a team that will develop models, analyze
existing HPC codes, develop tools and conduct HPC
development time and execution time experiments
•
Measures of success
– Acceptance by users, vendors and acquisition community
– Quantitatively explain HPC rules of thumb:
• "OpenMP is easier than MPI, but doesn’t scale a high”
• "UPC/CAF is easier than OpenMP”
• "Matlab is easier the Fortran, but isn’t as fast”
– Predict impact of new technologies
Slide-19
University of Hawaii
Example Development Time
Experiment
•
Goal: Quantify development time vs. execution time tradeoffs of different parallel
programming models
–
–
–
•
•
Setting: Senior/1st Year Grad Class in Parallel Computing (MIT/BU,
Berkeley/NERSC, CMU/PSC, UMD/?, …)
Timeline:
–
–
–
•
Month 1: Intro to parallel programming
Month 2: Implement serial version of compact app
Month 3: Implement parallel version
Metrics:
–
–
•
Message passing (MPI)
Threaded (OpenMP)
Array (UPC, Co-Array Fortran)
Development time (from logs), SLOCS, function points, …
Execution time, scalability, comp/comm, speedup, …
Analysis:
–
–
–
Development time vs. Execution time of different models
Performance relative to expert implementation
Size relative to expert implementation
University of Hawaii
Slide-20
Hackystat in HPCS
University of Hawaii
Slide-21
About Hackystat
•
Five years old:
–
–
–
–
•
I wrote the first LOC during first week of May, 2001.
Current size: 320,562 LOC (not all mine)
~5 active developers
Open source, GPL
General application areas:
– Education: teaching measurement in SE
– Research: Test Driven Design, Software Project
Telemetry, HPCS
– Industry: project management
•
Has inspired startup: 6th Sense Analytics
University of Hawaii
Slide-22
Goals for Hackystat-HPCS
•
•
•
Support automated collection of useful lowlevel data for a wide variety of platforms,
organizations, and application areas.
Make Hackystat low-level data accessable in
a standard XML format for analysis by other
tools.
Provide workflow and other analyses over
low-level data collected by Hackystat and
other tools to support:
– discovery of developmental bottlenecks
– insight into impact of tool/language/library choice
for specific applications/organizations.
University of Hawaii
Slide-23
Pilot Study, Spring 2006
•
Goal: Explore issues involved in workflow
analysis using Hackystat and students.
•
Experimental conditions (were challenging):
–
–
–
–
•
Undergraduate HPC seminar
6 students total, 3 did assignment, 1 collected data.
1 week duration
Gauss-Seidel iteration problem, written in C, using
PThreads library, on cluster
As a pilot study, it was successful.
University of Hawaii
Slide-24
Data Collection: Sensors
•
Sensors for Emacs and Vim captured editing
activities.
•
•
Sensor for CUTest captured testing activities.
•
Custom makefile with compilation, testing,
and execution targets, each instrumented
with sensors.
Sensor for Shell captured command line
activities.
University of Hawaii
Slide-25
Example data: Editor activities
University of Hawaii
Slide-26
Example data: Testing
University of Hawaii
Slide-27
Example data: File Metrics
University of Hawaii
Slide-28
Example data: Shell Logger
University of Hawaii
Slide-29
Data Analysis: Workflow States
•
Our goal was to see if we could automatically
infer the following developer workflow states:
– Serial coding
– Parallel coding
– Validation/Verification
– Debugging
– Optimization
University of Hawaii
Slide-30
Workflow State Detection:
Serial coding
•
•
•
We defined the "serial coding" state as the
editing of a file not containing any parallel
constructs, such as MPI, OpenMP, or
PThread calls.
We determine this through the MakeFile,
which runs SCLC over the program at
compile time and collects Hackystat
FileMetric data that provides counts of
parallel constructs.
We were able to identify the Serial Coding
state if the MakeFile was used consistently.
University of Hawaii
Slide-31
Workflow State Detection:
Parallel Coding
•
We defined the "parallel coding" state as the
editing of a file containing a parallel construct
(MPI, OpenMP, PThread call).
•
Similarly to serial coding, we get the data
required to infer this phase using a MakeFile
that runs SCLC and collects FileMetric data.
•
We were able to identify the parallel coding
state if the MakeFile was used consistently.
University of Hawaii
Slide-32
Workflow State Detection:
Testing
•
We defined the "testing" state as the
invocation of unit tests to determine the
functional correctness of the program.
•
Students were provided with test cases and
the CUTest to test their program.
•
We were able to infer the Testing state if
CUTest was used consistently.
University of Hawaii
Slide-33
Workflow State Detection:
Debugging
•
We have not yet been able to generate
satisfactory heuristics to infer the "debugging"
state from our data.
– Students did not use a debugging tool that would have
allowed instrumentation with a sensor.
– UMD heuristics, such as the presence of "printf"
statements, were not collected by SCLC.
– Debugging is entwined with Testing.
University of Hawaii
Slide-34
Workflow State Detection:
Optimization
•
We have not yet been able to generate
satisfactory heuristics to infer the "optimization"
state from our data.
– Students did not use a performance analysis tool that
would have allowed instrumentation with a sensor.
– Repeated command line invocation of the program
could potentially identify the activity as "optimization".
University of Hawaii
Slide-35
Insights from the pilot study, 1
•
Automatic inference of these workflow states
in a student setting requires:
– Consistent use of MakeFile (or some other
mechanism to invoke SCLC consistently) to infer
serial coding and parallel coding workflow states.
– Consistent use of an instrumented debugging tool
to infer the debugging workflow state.
– Consistent use of an "execute" MakeFile target
(and/or an instrumented performance analysis
tool) to infer the optimization workflow state.
University of Hawaii
Slide-36
Insights from the pilot study, 2
•
Ironically, it may be easier to infer workflow
states from industrial settings than from
classroom settings!
– Industrial settings are more likely to use a wider
variety of tools which could be instrumented and
provide better insight into development activities.
– Large scale programming leads inexorably to
consistent use of MakeFiles (or similar scripts)
that should simplify state inference.
University of Hawaii
Slide-37
Insights from the pilot study, 3
•
Are we defining the right set of workflow states?
•
For example, the "debugging" phase seems difficult
to distinguish as a distinct state.
• Do we really need to infer "debugging" as a distinct
activity?
•
Workflow inference heuristics appear to be highly
contextual, depending upon the language, toolset,
organization, and application. (This is not a bug, this
is just reality. We will probably need to enable each
MP to develop heuristics that work for them.)
University of Hawaii
Slide-38
Next steps
•
Graduate HPC classes at UH.
– The instructor (Henri Casanova) has agreed to
participate with UMD and UH/Hackystat in data collection
and analysis.
– Bigger assignments, more sophisticated students,
hopefully larger class!
•
Workflow Inference System for Hackystat (WISH)
– Support export of raw data to other tools.
– Support import of raw data from other tools.
– Provide high-level rule-based inference mechanism to
support organization-specific heuristics for workflow state
identification.
University of Hawaii
Slide-39