Transcript Document
INSPIRE
The Insieme
Parallel
Intermediate
Representation
Herbert Jordan,
Peter Thoman, Simone
Pellegrini, Klaus Kofler, and
Thomas Fahringer
University of Innsbruck
PACT’13 - 9 September
Programming Models
User:
HW:
C / C++
void main(…) {
int sum = 0;
for(i = 1..10)
sum += i;
print(sum);
}
C
Memory
Programming Models
User:
Compiler:
C / C++
void main(…) {
int sum = 0;
for(i = 1..10)
sum += i;
print(sum);
}
PL Assembly
• instruction selection
• register allocation
• optimization
IR
.START ST ST:
MOV R1,#2
MOV R2,#1
M1: CMP
R2,#20 BGT M2
MUL R1,R2 INI
R2 JMP M1
HW:
C
Memory
Programming Models
User:
Compiler:
C / C++
void main(…) {
int sum = 0;
for(i = 1..10)
sum += i;
print(sum);
}
PL Assembly
• instruction selection
• register allocation
• optimization
• loops & latency
IR
.START ST ST:
MOV R1,#2
MOV R2,#1
M1: CMP
R2,#20 BGT M2
MUL R1,R2 INI
R2 JMP M1
HW:
C
$
Memory
Programming Models
User:
Compiler:
C / C++
void main(…) {
int sum = 0;
for(i = 1..10)
sum += i;
print(sum);
}
PL Assembly
• instruction selection
• register allocation
• optimization
• loops & latency
HW:
C
$
Memory
IR
.START ST ST:
MOV R1,#2
MOV R2,#1
M1: CMP
R2,#20 BGT M2
MUL R1,R2 INI
R2 JMP M1
10 year old
architecture
Parallel Architectures
Multicore:
C
C
C
C
Accelerators:
C
$
$
Memory
M
OpenMP/Cilk
G
Clusters:
C
C
C
$
$ M
$ M
M
OpenCL/CUDA
M
MPI/PGAS
Compiler Support
C / C++
lib
IR
void main(…) {
int sum = 0;
#omp pfor
for(i = 1..10)
sum += i;
}
pfor:
mov eax,-2
bin cmp eax, 2
xor eax, eax
Start:
...
mov eax,2
mov ebx,1
call “pfor”
Label 1:
lea esi, Str
push esi
.START ST ST:
MOV R1,#2
MOV R2,#1
_GOMP_PFOR
M1: CMP
R2,#20 BGT M2
MUL R1,R2 INI
Frontend
Backend
sequential
Situation
Compilers
Libraries
unaware of thread-level parallelism
magic happens in libraries
limited perspective / scope
no static analysis, no transformations
User
has to manage and coordinate parallelism
no performance portability
Compiler Support?
HW:
C
C
C
C
C
G
C C
C
G
C C
C C $ G
C C $ G
$ M
M
$ M
M
M
M
M
M
C
C
Compiler:
PL Assembly
• instruction selection
• register allocation
• optimization
• loops & latency
• vectorization
IR
.START ST ST:
MOV R1,#2
MOV R2,#1
M1: CMP
R2,#20 BGT M2
MUL R1,R2 INI
R2 JMP M1
User:
C / C++
void main(…) {
int sum = 0;
for(i = 1..10)
sum += i;
print(sum);
}
Our approach:
HW:
C
C
C
C
C
G
C C
C
G
C C
C C $ G
C C $ G
$ M
M
$ M
M
M
M
M
M
C
C
Compiler:
PL Assembly
• instruction selection
• register allocation
• optimization
• loops & latency
• vectorization
IR
.START ST ST:
MOV R1,#2
MOV R2,#1
M1: CMP
R2,#20 BGT M2
MUL R1,R2 INI
R2 JMP M1
User:
C / C++
void main(…) {
int sum = 0;
for(i = 1..10)
sum += i;
print(sum);
}
Our approach: Insieme
HW:
C
C
C
C
C
G
C C
C
G
C C
C C $ G
C C $ G
M
$
M
M
$
M
M
M
M
M
C
C
Compiler:
PL Assembly
• instruction selection
• register allocation
• optimization
• loops & latency
• vectorization
IR
.START ST ST:
MOV R1,#2
MOV R2,#1
M1: CMP
R2,#20 BGT
M2 MUL R1,R2
INI R2 JMP M1
Insieme:
User:
PL PL + extras
• coordinate parallelism
• high-level optimization
• auto tuning
• instrumentation
INSPIRE
unit main(...) {
ref<int> v1 =0;
pfor(..., (){
...
});
}
C / C++
void main(…) {
int sum = 0;
#omp pfor
for(i = 1..10)
sum += i;
}
The Insieme Project
Goal:
to establish a research platform for
hybrid, thread level parallelism
Dyn. Optimizer
IR Toolbox
INSPIRE
Compiler
Backend
C/C++
OpenMP
Cilk
OpenCL
MPI
and extensions
Frontend
Static Optimizer
IRSM
Scheduler
Monitoring
Exec. Engine
Runtime
Parallel Programming
OpenMP
Cilk
Keywords
MPI
Pragmas (+ API)
library
OpenCL
library + JIT
Objective:
combine those using a
unified formalism and to
provide an infrastructure for
analysis and manipulations
INSPIRE Requirements
OpenMP / Cilk / OpenCL / MPI / others
INSPIRE
•
•
•
•
•
•
•
•
•
•
OpenCL / MPI / Insieme Runtime / others
complete
unified
explicit
analyzable
transformable
compact
high level
whole program
open system
extensible
INSPIRE
Functional
first-class functions and closures
generic (function) types
program = 1 expression
Imperative
Constructs
loops, conditions, mutable state
Explicit
Basis
Parallel Constructs
to model parallel control flow
Parallel Model
Parallel
Control Flow
defined by jobs: 𝑗𝑜𝑏 𝑒𝑙 , 𝑒𝑢 … 𝑓
processed cooperatively by thread groups
Parallel Model (2)
one
work-sharing construct
one
data-sharing construct
point-to-point
communication
abstract channels type: 𝑐ℎ𝑎𝑛𝑛𝑒𝑙 𝛼, 𝑠
Evaluation
What
inherent impact does the INSPIRE
detour impose?
C
Input
Code
FE
INSPIRE
BE
Insieme Compiler
No Optimization!
C
Target
Code
(IRT)
GCC
4.6.3
(-O3)
Binary A
(GCC)
Binary B
(Insieme)
Performance Impact
Relative execution time (𝑡𝑖𝑛𝑠𝑖𝑒𝑚𝑒 /𝑡𝑜𝑟𝑖𝑔𝑖𝑛𝑎𝑙 )
Derived Work (subset)
Adaptive Task Granularity Control
P. Thoman, H. Jordan, T. Fahringer, Adaptive Granularity Control in Task
Parallel Programs using Multiversioning, EuroPar 2013
Multiobjective Auto-Tuning
H. Jordan, P. Thoman, J. J. Durillo et al., A Multi-Objective Auto-Tuning
Framework for Parallel Codes, SC 2012
Compiler aided Loop Scheduling
P. Thoman, H. Jordan, S. Pellegrini et al., Automatic OpenMP Loop
Scheduling: A Combined Compiler and Runtime Approach, IWOMP 2012
OpenCL Kernel Partitioning
K. Kofler, I. Grasso, B. Cosenza, T. Fahringer, An Automatic Input-Sensitive
Approach for Heterogeneous Task Partitioning, ICS 2013
Improved usage of MPI Primitives
S. Pellegrini, T. Hoefler, T. Fahringer, On the Effects of CPU Caches on MPI
Point-to-Point Communications, Cluster 2012
Conclusion
INSPIRE is designed to
based on comprehensive parallel model
represent and unify parallel applications
to analyze and manipulate parallel codes
provide the foundation for researching parallel
language extensions
sufficient to cover leading standards for parallel
programming
Practicality has been demonstrated by a
variety of derived work
Thank You!
Visit: http://insieme-compiler.org
Contact: [email protected]
Types
7
type constructors:
Expressions
8
kind of expressions:
Statements
9
types of statements: