Type-Based Flow Analysis: From Polymorphic Subtyping to CFL-Reachability Jakob Rehof and Manuel Fähndrich

Download Report

Transcript Type-Based Flow Analysis: From Polymorphic Subtyping to CFL-Reachability Jakob Rehof and Manuel Fähndrich

Type-Based Flow Analysis:
From Polymorphic Subtyping
to CFL-Reachability
Jakob Rehof and Manuel Fähndrich
Microsoft Research
Type-Based
Program Analysis
Common vocabulary
• Data access paths
• Function summary
• Context-sensitivity
•
•
•
• Directional flow
•
Type-based
Type structure ()
Function type (->)
Type instantiation,
polymorphism ()
Subtyping
()
GOAL: Scaleable Flow
Analysis of H.O. Programs
w. Polymorphic Subtyping
•Type-based
•Higher-order
•Context-sensitive (CS)
•Directional (DI)
+CS +DI
(, )
Precision
and Cost
-CS +DI
(=, )
+CS -DI
(,=)
-CS -DI
(=,=)
Outline
•
•
•
•
•
Goals
Problems and Results
Current Flow Analysis w.  + 
Our Solution
Summary
Current Method
• () Polymorphism by copying
types
• () Subtyping by constrained
types
• ( + )  constraint copying
Problems w. Current Method
• Constraint copying is expensive
(memory)
• Constraint simplification is hard
• Previous algorithm (Mossin) O(n8 )
(n = size of type-annotated program)
• No on-demand algorithms
Results
• No constraint copying
• On-demand queries
3
• All flow in O(n )
Outline
•
•
•
•
•
Goals
Problems and Results
Current Flow Analysis w.  + 
Our Solution
Summary
Current Flow Analysis w. +
(Mossin)
max(s,t) =
if s<=t then t else s
real * real -> real
standard type
Current Flow Analysis w. +
max(s:a,t:b) =
(if s<=t then t else s) :c
subtyping constraints
{a  c, b  c} =>
real:a * real:b -> real:c
analysis type
flow label
Current Flow Analysis w. +
max(s:a,t:b) =
(if s<=t then t else s) :c
{a  c, b  c} =>
real:a * real:b -> real:c
Current Flow Analysis w. +
max(s:a,t:b) :
{a  c, b  c} =>
real:a * real:b -> real:c
max(x0,y0)
max(x1,y1)
Current Flow Analysis w. +
max(s:a,t:b) :
{a  c, b  c} =>
real:a * real:b -> real:c
max(x0:a0,y0:b0):c0
{a0c0,b0c0}=>c0
max(x1:a1,y1:b1):c1
Current Flow Analysis w. +
max(s:a,t:b) :
{a  c, b  c} =>
real:a * real:b -> real:c
max(x0:a0,y0:b0):c0
{a0c0,b0c0}=>c0
max(x1:a1,y1:b1):c1
{a1c1,b1c1}=>c1
Current Flow Analysis w. +
v
norm(v) 
v
with
and
vi  0
v  max i vi
Without Subtyping:
norm(x :a’,y:a’) =
let m = max(x,y) in
scale(x,y,m)
end;
scale(z,w,n) = (z/n,w/n)
max(s:a,t:a) = if s<=t then t else s
real:a * real:a -> real:a

Without Subtyping:
norm(x:a’,y:a’) =
let m = max(x,y) in
scale(x,y,m)
end;
scale(z,w,n) = (z/n,w/n)
max(s:a,t:a) = if s<=t then t else s
real:a * real:a -> real:a

Outline
•
•
•
•
•
Goals
Problems and Results
Current Flow Analysis w.  + 
Our Solution
Summary
Flow Analysis Overview
Source
Code
Type Inference
Type Instantiation
Graph
B
Flow
Graph
A
Flow Analysis Overview
Source
Code
Polymorphic
Type Inference
Subtyping
Type Instantiation
Graph
B
Flow
Graph
CFLReachability
A
Eliminating constraint copies
max(s:a,t:b) : {ac, bc} => real:a * real:b -> real:c
{a0c0, b0c0} =>
real:a0 * real:b0 -> real:c0
max(x0:a0,y0:b0):c0
{a1c1, b1c1} =>
real:a1 * real:b1 -> real:c1
max(x1:a1,y1:b1):c1
1. Get a graph
max(s:a,t:b) : real:a * real:b -> real:c
real:a0 * real:b0 -> real:c0
max(x0:a0,y0:b0):c0
real:a1 * real:b1 -> real:c1
max(x1:a1,y1:b1):c1
2. Label instantiation
sites
max(s:a,t:b) : real:a * real:b -> real:c
real:a0 * real:b0 -> real:c0
i: max(x0:a0,y0:b0):c0
real:a1 * real:b1 -> real:c1
j: max(x1:a1,y1:b1):c1
3. Represent substitutions
max(s:a,t:b) : real:a * real:b -> real:c
Si :
a  a0
b  b0
c  c0
real:a0 * real:b0 -> real:c0
i: max(x0:a0,y0:b0):c0
Sj :
a  a1
b  b1
c  c1
real:a1 * real:b1 -> real:c1
j: max(x1:a1,y1:b1):c1
3.a. … as a graph
max(s:a,t:b) : real:a * real:b -> real:c
i
i
real:a0 * real:b0 -> real:c0
i: max(x0:a0,y0:b0):c0
i
real:a1 * real:b1 -> real:c1
j: max(x1:a1,y1:b1):c1
3.a. … as a graph
max(s:a,t:b) : real:a * real:b -> real:c
j
j
i
i
real:a0 * real:b0 -> real:c0
i: max(x0:a0,y0:b0):c0
j
i
real:a1 * real:b1 -> real:c1
j: max(x1:a1,y1:b1):c1
4. Eliminate constraint
copies !
max(s:a,t:b) : real:a * real:b -> real:c
j
j
i
i
real:a0 * real:b0 -> real:c0
i: max(x0:a0,y0:b0):c0
j
i
real:a1 * real:b1 -> real:c1
j: max(x1:a1,y1:b1):c1
?
?
?
max(s:a,t:b) : real:a * real:b -> real:c
j
j
i
i
real:a0 * real:b0 -> real:c0
i: max(x0:a0,y0:b0):c0
j
i
real:a1 * real:b1 -> real:c1
j: max(x1:a1,y1:b1):c1
Type Theory to the Rescue !
• Polarity (+,-)
->+
->
->+
-
+
-
+
5. Polarities (+,-)
max(s:a,t:b) : real:a * real:b -> real:c
j
j
-
i
-
i
-
-
real:a0 * real:b0 -> real:c0
i: max(x0:a0,y0:b0):c0
i
+
j
+
real:a1 * real:b1 -> real:c1
j: max(x1:a1,y1:b1):c1
6. Reverse negative edges
max(s:a,t:b) : real:a * real:b -> real:c
- j
- j
i
-
i -
real:a0 * real:b0 -> real:c0
i: max(x0:a0,y0:b0):c0
i
+
j
+
real:a1 * real:b1 -> real:c1
j: max(x1:a1,y1:b1):c1
7. Recover flow
max(s:a,t:b) : real:a * real:b -> real:c
- j
- j
i
-
i -
real:a0 * real:b0 -> real:c0
i: max(x0:a0,y0:b0):c0
i
+
j
+
real:a1 * real:b1 -> real:c1
j: max(x1:a1,y1:b1):c1
7. Recover flow
max(s:a,t:b) : real:a * real:b -> real:c
- j
- j
i
-
i -
real:a0 * real:b0 -> real:c0
i: max(x0:a0,y0:b0):c0
i
+
j
+
real:a1 * real:b1 -> real:c1
j: max(x1:a1,y1:b1):c1
7. Recover flow
max(s:a,t:b) : real:a * real:b -> real:c
- j
- j
i
-
i -
real:a0 * real:b0 -> real:c0
i: max(x0:a0,y0:b0):c0
i
+
j
+
real:a1 * real:b1 -> real:c1
j: max(x1:a1,y1:b1):c1
8. Be careful !
max(s:a,t:b) : real:a * real:b -> real:c
- j
- j
i
-
i -
real:a0 * real:b0 -> real:c0
i: max(x0:a0,y0:b0):c0
i
Spurious !
+
j
+
real:a1 * real:b1 -> real:c1
j: max(x1:a1,y1:b1):c1
9. Do CFL-reachability
d
CFG
d
M  [k M ]k
max(s:a,t:b) : real:a * real:b -> real:c
[j
[j
[i
[i
real:a0 * real:b0 -> real:c0
i: max(x0:a0,y0:b0):c0
]j
]i
real:a1 * real:b1 -> real:c1
j: max(x1:a1,y1:b1):c1
Further Issues
• Polymorphic type structure
• Recursive type structure
– context-sensitive data-dependence
analysis is uncomputable [Reps 00]
– our techniques require finite types
– regular unbounded data types handled
via finite approximations: recursive type
expressions
One-level implementation
• GOLF analysis system for C by
Manuvir Das (MSR) and Ben Liblit
(Berkeley)
• Exhaustive points-to sets for MS
Word 97, 1.4 Mloc, in 2 minutes
Outline
•
•
•
•
•
Goals
Problems and Results
Current Flow Analysis w.  + 
Our Solution
Summary
Summary
• Elimination of constraint copying
• Reformulation of polymorphic
subtyping with instantiation
constraints
• Transfer of CFL-reachability
techniques to type-based flow
analysis
Scaleable Program Analysis
Project (MSR, spt)
[ RF, POPL 01 ]
+CS +DI
(, )
[ Das, PLDI 00 ]
-CS +DI
(=, )
[ FRD, PLDI 00 ]
+CS -DI
(,=)
-CS -DI
(=,=)
research.microsoft.com/spa
Summary
• Type-based flow analysis
– all flow in O(n 3 ) , n = typed pgm size
–
–
–
–
–
–
–
context-sensitive (polymorphism)
directional (subtyping)
demand-driven algorithm
incorporates label-polymorphic recursion
works directly on H.O. programs
structured data of finite type
unbounded data structures via approx.
CFL Formulation
S

P N
P

|
|
M P
[ P

N

|
|
M N
] N

M

|
|
|
[k M ]k
M M
d

Type System
e
a
a 

i
a0,
ja1,

b
b


= let max(s:a,t:b) = …
in
(max(x0:a0,y0:b0),
max(x1:a1,y1:b1))
end
i
 b0,
j
b1,

c
c


 a
; b

i
c0,
j c1

 c
 c

;  |- e: c0*c1

Type System
instantiation
constraints
a
a 

i
a0,
ja1,

b
b


e
= let max(s:a,t:b) = …
in
(max(x0:a0,y0:b0),
max(x1:a1,y1:b1))
end
i
 b0,
j
b1,

c
c
subtyping constraints


 a
; b

i
c0,
j c1

 c
 c

;  |- e: c0*c1

type environment