Type-Based Flow Analysis: From Polymorphic Subtyping to CFL-Reachability Jakob Rehof and Manuel Fähndrich
Download
Report
Transcript Type-Based Flow Analysis: From Polymorphic Subtyping to CFL-Reachability Jakob Rehof and Manuel Fähndrich
Type-Based Flow Analysis:
From Polymorphic Subtyping
to CFL-Reachability
Jakob Rehof and Manuel Fähndrich
Microsoft Research
Type-Based
Program Analysis
Common vocabulary
• Data access paths
• Function summary
• Context-sensitivity
•
•
•
• Directional flow
•
Type-based
Type structure ()
Function type (->)
Type instantiation,
polymorphism ()
Subtyping
()
GOAL: Scaleable Flow
Analysis of H.O. Programs
w. Polymorphic Subtyping
•Type-based
•Higher-order
•Context-sensitive (CS)
•Directional (DI)
+CS +DI
(, )
Precision
and Cost
-CS +DI
(=, )
+CS -DI
(,=)
-CS -DI
(=,=)
Outline
•
•
•
•
•
Goals
Problems and Results
Current Flow Analysis w. +
Our Solution
Summary
Current Method
• () Polymorphism by copying
types
• () Subtyping by constrained
types
• ( + ) constraint copying
Problems w. Current Method
• Constraint copying is expensive
(memory)
• Constraint simplification is hard
• Previous algorithm (Mossin) O(n8 )
(n = size of type-annotated program)
• No on-demand algorithms
Results
• No constraint copying
• On-demand queries
3
• All flow in O(n )
Outline
•
•
•
•
•
Goals
Problems and Results
Current Flow Analysis w. +
Our Solution
Summary
Current Flow Analysis w. +
(Mossin)
max(s,t) =
if s<=t then t else s
real * real -> real
standard type
Current Flow Analysis w. +
max(s:a,t:b) =
(if s<=t then t else s) :c
subtyping constraints
{a c, b c} =>
real:a * real:b -> real:c
analysis type
flow label
Current Flow Analysis w. +
max(s:a,t:b) =
(if s<=t then t else s) :c
{a c, b c} =>
real:a * real:b -> real:c
Current Flow Analysis w. +
max(s:a,t:b) :
{a c, b c} =>
real:a * real:b -> real:c
max(x0,y0)
max(x1,y1)
Current Flow Analysis w. +
max(s:a,t:b) :
{a c, b c} =>
real:a * real:b -> real:c
max(x0:a0,y0:b0):c0
{a0c0,b0c0}=>c0
max(x1:a1,y1:b1):c1
Current Flow Analysis w. +
max(s:a,t:b) :
{a c, b c} =>
real:a * real:b -> real:c
max(x0:a0,y0:b0):c0
{a0c0,b0c0}=>c0
max(x1:a1,y1:b1):c1
{a1c1,b1c1}=>c1
Current Flow Analysis w. +
v
norm(v)
v
with
and
vi 0
v max i vi
Without Subtyping:
norm(x :a’,y:a’) =
let m = max(x,y) in
scale(x,y,m)
end;
scale(z,w,n) = (z/n,w/n)
max(s:a,t:a) = if s<=t then t else s
real:a * real:a -> real:a
Without Subtyping:
norm(x:a’,y:a’) =
let m = max(x,y) in
scale(x,y,m)
end;
scale(z,w,n) = (z/n,w/n)
max(s:a,t:a) = if s<=t then t else s
real:a * real:a -> real:a
Outline
•
•
•
•
•
Goals
Problems and Results
Current Flow Analysis w. +
Our Solution
Summary
Flow Analysis Overview
Source
Code
Type Inference
Type Instantiation
Graph
B
Flow
Graph
A
Flow Analysis Overview
Source
Code
Polymorphic
Type Inference
Subtyping
Type Instantiation
Graph
B
Flow
Graph
CFLReachability
A
Eliminating constraint copies
max(s:a,t:b) : {ac, bc} => real:a * real:b -> real:c
{a0c0, b0c0} =>
real:a0 * real:b0 -> real:c0
max(x0:a0,y0:b0):c0
{a1c1, b1c1} =>
real:a1 * real:b1 -> real:c1
max(x1:a1,y1:b1):c1
1. Get a graph
max(s:a,t:b) : real:a * real:b -> real:c
real:a0 * real:b0 -> real:c0
max(x0:a0,y0:b0):c0
real:a1 * real:b1 -> real:c1
max(x1:a1,y1:b1):c1
2. Label instantiation
sites
max(s:a,t:b) : real:a * real:b -> real:c
real:a0 * real:b0 -> real:c0
i: max(x0:a0,y0:b0):c0
real:a1 * real:b1 -> real:c1
j: max(x1:a1,y1:b1):c1
3. Represent substitutions
max(s:a,t:b) : real:a * real:b -> real:c
Si :
a a0
b b0
c c0
real:a0 * real:b0 -> real:c0
i: max(x0:a0,y0:b0):c0
Sj :
a a1
b b1
c c1
real:a1 * real:b1 -> real:c1
j: max(x1:a1,y1:b1):c1
3.a. … as a graph
max(s:a,t:b) : real:a * real:b -> real:c
i
i
real:a0 * real:b0 -> real:c0
i: max(x0:a0,y0:b0):c0
i
real:a1 * real:b1 -> real:c1
j: max(x1:a1,y1:b1):c1
3.a. … as a graph
max(s:a,t:b) : real:a * real:b -> real:c
j
j
i
i
real:a0 * real:b0 -> real:c0
i: max(x0:a0,y0:b0):c0
j
i
real:a1 * real:b1 -> real:c1
j: max(x1:a1,y1:b1):c1
4. Eliminate constraint
copies !
max(s:a,t:b) : real:a * real:b -> real:c
j
j
i
i
real:a0 * real:b0 -> real:c0
i: max(x0:a0,y0:b0):c0
j
i
real:a1 * real:b1 -> real:c1
j: max(x1:a1,y1:b1):c1
?
?
?
max(s:a,t:b) : real:a * real:b -> real:c
j
j
i
i
real:a0 * real:b0 -> real:c0
i: max(x0:a0,y0:b0):c0
j
i
real:a1 * real:b1 -> real:c1
j: max(x1:a1,y1:b1):c1
Type Theory to the Rescue !
• Polarity (+,-)
->+
->
->+
-
+
-
+
5. Polarities (+,-)
max(s:a,t:b) : real:a * real:b -> real:c
j
j
-
i
-
i
-
-
real:a0 * real:b0 -> real:c0
i: max(x0:a0,y0:b0):c0
i
+
j
+
real:a1 * real:b1 -> real:c1
j: max(x1:a1,y1:b1):c1
6. Reverse negative edges
max(s:a,t:b) : real:a * real:b -> real:c
- j
- j
i
-
i -
real:a0 * real:b0 -> real:c0
i: max(x0:a0,y0:b0):c0
i
+
j
+
real:a1 * real:b1 -> real:c1
j: max(x1:a1,y1:b1):c1
7. Recover flow
max(s:a,t:b) : real:a * real:b -> real:c
- j
- j
i
-
i -
real:a0 * real:b0 -> real:c0
i: max(x0:a0,y0:b0):c0
i
+
j
+
real:a1 * real:b1 -> real:c1
j: max(x1:a1,y1:b1):c1
7. Recover flow
max(s:a,t:b) : real:a * real:b -> real:c
- j
- j
i
-
i -
real:a0 * real:b0 -> real:c0
i: max(x0:a0,y0:b0):c0
i
+
j
+
real:a1 * real:b1 -> real:c1
j: max(x1:a1,y1:b1):c1
7. Recover flow
max(s:a,t:b) : real:a * real:b -> real:c
- j
- j
i
-
i -
real:a0 * real:b0 -> real:c0
i: max(x0:a0,y0:b0):c0
i
+
j
+
real:a1 * real:b1 -> real:c1
j: max(x1:a1,y1:b1):c1
8. Be careful !
max(s:a,t:b) : real:a * real:b -> real:c
- j
- j
i
-
i -
real:a0 * real:b0 -> real:c0
i: max(x0:a0,y0:b0):c0
i
Spurious !
+
j
+
real:a1 * real:b1 -> real:c1
j: max(x1:a1,y1:b1):c1
9. Do CFL-reachability
d
CFG
d
M [k M ]k
max(s:a,t:b) : real:a * real:b -> real:c
[j
[j
[i
[i
real:a0 * real:b0 -> real:c0
i: max(x0:a0,y0:b0):c0
]j
]i
real:a1 * real:b1 -> real:c1
j: max(x1:a1,y1:b1):c1
Further Issues
• Polymorphic type structure
• Recursive type structure
– context-sensitive data-dependence
analysis is uncomputable [Reps 00]
– our techniques require finite types
– regular unbounded data types handled
via finite approximations: recursive type
expressions
One-level implementation
• GOLF analysis system for C by
Manuvir Das (MSR) and Ben Liblit
(Berkeley)
• Exhaustive points-to sets for MS
Word 97, 1.4 Mloc, in 2 minutes
Outline
•
•
•
•
•
Goals
Problems and Results
Current Flow Analysis w. +
Our Solution
Summary
Summary
• Elimination of constraint copying
• Reformulation of polymorphic
subtyping with instantiation
constraints
• Transfer of CFL-reachability
techniques to type-based flow
analysis
Scaleable Program Analysis
Project (MSR, spt)
[ RF, POPL 01 ]
+CS +DI
(, )
[ Das, PLDI 00 ]
-CS +DI
(=, )
[ FRD, PLDI 00 ]
+CS -DI
(,=)
-CS -DI
(=,=)
research.microsoft.com/spa
Summary
• Type-based flow analysis
– all flow in O(n 3 ) , n = typed pgm size
–
–
–
–
–
–
–
context-sensitive (polymorphism)
directional (subtyping)
demand-driven algorithm
incorporates label-polymorphic recursion
works directly on H.O. programs
structured data of finite type
unbounded data structures via approx.
CFL Formulation
S
P N
P
|
|
M P
[ P
N
|
|
M N
] N
M
|
|
|
[k M ]k
M M
d
Type System
e
a
a
i
a0,
ja1,
b
b
= let max(s:a,t:b) = …
in
(max(x0:a0,y0:b0),
max(x1:a1,y1:b1))
end
i
b0,
j
b1,
c
c
a
; b
i
c0,
j c1
c
c
; |- e: c0*c1
Type System
instantiation
constraints
a
a
i
a0,
ja1,
b
b
e
= let max(s:a,t:b) = …
in
(max(x0:a0,y0:b0),
max(x1:a1,y1:b1))
end
i
b0,
j
b1,
c
c
subtyping constraints
a
; b
i
c0,
j c1
c
c
; |- e: c0*c1
type environment