bddbddb: Using Datalog and BDDs for Program Analysis

Download Report

Transcript bddbddb: Using Datalog and BDDs for Program Analysis

bddbddb:
Using Datalog and BDDs for
Program Analysis
John Whaley
Stanford University and moka5 Inc.
June 11, 2006
Implementing Program Analysis
vs.
…56 pages!
June 11, 2006
Using Datalog and BDDs
for Program Analysis
• 2x faster
• Fewer bugs
• Extensible
1
Is it really that easy?
• Requires:
–
–
–
–
–
A different way of thinking
Knowledge, experience, and intuition
Perseverance to try different techniques
A lot of tuning and tweaking
Luck
• Despite all this, people who use it swear
by it and could “never go back”
June 11, 2006
Using Datalog and BDDs
for Program Analysis
2
Tutorial Structure
Part I: Essential Background
– …
Part II: Using the Tools
– …
Part III: Developing Advanced Analyses
– …
Part IV: Profiling, Debugging, Avoiding Gotchas
– …
Short break every 30 minutes
June 11, 2006
Using Datalog and BDDs
for Program Analysis
3
Tutorial Structure
Part I: Essential Background
– Datalog for Program Analysis
– Binary Decision Diagrams
Part II: Using the Tools
– …
Part III: Developing Advanced Analyses
– …
Part IV: Profiling, Debugging, Avoiding Gotchas
– …
June 11, 2006
Using Datalog and BDDs
for Program Analysis
4
Tutorial Structure
Part I: Essential Background
– …
Part II: Using the Tools
–
–
–
–
bddbddb
Compiler interface (Joeq compiler)
Datalog editor in Eclipse
Interactive mode
Part III: Developing Advanced Analyses
– …
Part IV: Profiling, Debugging, Avoiding Gotchas
– …
June 11, 2006
Using Datalog and BDDs
for Program Analysis
5
Tutorial Structure
Part I: Essential Background
– …
Part II: Using the Tools
– …
Part III: Developing Advanced Analyses
–
–
–
–
Context sensitivity
Combining multiple analyses
Race detection examples
Using advanced bddbddb features
Part IV: Profiling, Debugging, Avoiding Gotchas
– …
June 11, 2006
Using Datalog and BDDs
for Program Analysis
6
Tutorial Structure
Part I: Essential Background
– …
Part II: Using the Tools
– …
Part III: Developing Advanced Analyses
– …
Part IV: Profiling, Debugging, Avoiding Gotchas
–
–
–
–
Variable ordering
Iteration order
Machine learning
What it’s good for, what it isn’t good for
June 11, 2006
Using Datalog and BDDs
for Program Analysis
7
Try it yourself…
• Available as moka5 LivePC
–
–
–
–
Non-intrusive installation in a VM
Automatically kept up to date
Easy to try, easy to share
Complete environment on a USB stick
June 11, 2006
Using Datalog and BDDs
for Program Analysis
8
Part I: Essential Background
Program Analysis in Datalog
June 11, 2006
Using Datalog and BDDs
for Program Analysis
9
Datalog
• Declarative language for deductive
databases [Ullman 1989]
– Like Prolog, but no function symbols,
no predefined evaluation strategy
June 11, 2006
Using Datalog and BDDs
for Program Analysis
10
Datalog Basics
Predicate
Atom = Reach(d,x,i)
Arguments:
variables or constants
Literal = Atom or NOT Atom
Rule = Atom :- Literal & … & Literal
Make this
atom true
(the head ).
June 11, 2006
The body :
For each assignment of values
to variables that makes all these
true …
Using Datalog and BDDs
for Program Analysis
11
Datalog Example
parent(x,y) :- child(y,x).
grandparent(x,z) :- parent(x,y), parent(y,z).
ancestor(x,y) :- parent(x,y).
ancestor(x,z) :- parent(x,y), ancestor(y,z).
June 11, 2006
Using Datalog and BDDs
for Program Analysis
12
Datalog
• Intuition: subgoals in the body are
combined by “and” (strictly speaking:
“join”).
• Intuition: Multiple rules for a predicate
(head) are combined by “or.”
June 11, 2006
Using Datalog and BDDs
for Program Analysis
13
Another Datalog Example
hasChild(x) :- child(_,x).
hasNoChild(x) :- !child(_,x).
“!” inverts the relation, not the atom!
hasSibling(x) :- child(x,y), child(z,y), z!=x.
onlyChild(x) :- child(x,_), !hasSibling(x).
_ means “Dont-care” (at least one)
! means “Not”
June 11, 2006
Using Datalog and BDDs
for Program Analysis
14
Reaching Defs in Datalog
Reach(d,x,j) :- Reach(d,x,i),
StatementAt(i,s),
!Assign(s,x),
Follows(i,j).
Reach(s,x,j) :- StatementAt(i,s),
Assign(s,x),
Follows(i,j).
June 11, 2006
Using Datalog and BDDs
for Program Analysis
15
Definition: EDB Vs. IDB Predicates
• Some predicates come from the program,
and their tuples are computed by
inspection.
– Called EDB, or extensional database
predicates.
• Others are defined by the rules only.
– Called IDB, or intensional database
predicates.
June 11, 2006
Using Datalog and BDDs
for Program Analysis
16
Negation
• Negation makes things tricky.
• Semantics of negation
– No negation allowed [Ullman 1988]
– Stratified Datalog [Chandra 1985]
– Well-founded semantics [Van Gelder 1991]
June 11, 2006
Using Datalog and BDDs
for Program Analysis
17
Stratification
• A risk occurs if there are negated literals
involved in a recursive predicate.
– Leads to oscillation in the result.
• Requirement for stratification :
– Must be able to order the IDB predicates so that if
a rule with P in the head has NOT Q in the body,
then Q is either EDB or earlier in the order than P.
June 11, 2006
Using Datalog and BDDs
for Program Analysis
18
Example: Nonstratification
P(x) :- E(x), !P(x).
• If E(1) is true, is P(1) true?
• It is after the first round.
• But not after the second.
• True after the third, not after the fourth,…
June 11, 2006
Using Datalog and BDDs
for Program Analysis
19
Iterative Algorithm for Datalog
• Start with the EDB predicates = “whatever
the code dictates,” and with all IDB
predicates empty.
• Repeatedly examine the bodies of the
rules, and see what new IDB facts can be
discovered from the EDB and existing IDB
facts.
June 11, 2006
Using Datalog and BDDs
for Program Analysis
20
Datalog evaluation strategy
• “Semi-naïve” evaluation
– Remember that a new fact can be inferred by
a rule in a given round only if it uses in the
body some fact discovered on the previous
round.
• Evaluation strategy
– Top-down (goal-directed) [Ullman 1985]
– Bottom-up (infer from base facts) [Ullman
1989]
June 11, 2006
Using Datalog and BDDs
for Program Analysis
21
Our Dialect of Datalog
• Totally-ordered finite domains
– Domains are of a given, finite size
– Makes all Datalog programs “safe”
– Cannot mix variables of different domains
• Constants (named/integers)
• Comparison operators:
= != < <= > >=
• Dont-care: _ Universe: *
June 11, 2006
Using Datalog and BDDs
for Program Analysis
22
Why Datalog?
• Developed a tool to translate inference rules to
BDD implementation
• Later, discovered Datalog (Ullman, Reps)
• Semantics of BDDs match Datalog exactly
–
–
–
–
Obvious implementation of relations
Operations occur a set-at-a-time
Fast set compare, set difference
Wealth of literature about semantics, optimization,
etc.
June 11, 2006
Using Datalog and BDDs
for Program Analysis
23
Inference Rules
Assign(v
Assign(v
, 2v),2),vPointsTo(v
vPointsTo(v
, o).
1,1v
2,2o)
vPointsTo(v
vPointsTo(v11,, o)
o)
:-
• Datalog rules directly correspond to
inference rules.
June 11, 2006
Using Datalog and BDDs
for Program Analysis
24
Flow-Insensitive
Pointer Analysis
o1: p = new Object();
o2: q = new Object();
p.f = q;
r = p.f;
p
o1
q
o2
June 11, 2006
f
r
Input Tuples
vPointsTo(p, o1)
vPointsTo(q, o2)
Store(p, f, q)
Load(p, f, r)
Output Relations
hPointsTo(o1, f, o2)
vPointsTo(r, o2)
Using Datalog and BDDs
for Program Analysis
25
Inference Rule in Datalog
Assignments:
vPointsTo(v1, o)
:- Assign(v1, v2),
vPointsTo(v2, o).
v 1 = v 2;
v2
o
v1
June 11, 2006
Using Datalog and BDDs
for Program Analysis
26
Inference Rule in Datalog
Stores:
hPointsTo(o1, f, o2)
:- Store(v1, f, v2),
vPointsTo(v1, o1),
vPointsTo(v2, o2).
v1.f = v2;
June 11, 2006
v1
o1
v2
o2
Using Datalog and BDDs
for Program Analysis
f
27
Inference Rule in Datalog
Loads:
vPointsTo(v2, o2)
:- Load(v1, f, v2),
vPointsTo(v1, o1),
hPointsTo(o1, f, o2).
v2 = v1.f;
June 11, 2006
v1
o1
v2
o2
Using Datalog and BDDs
for Program Analysis
f
28
The Whole Algorithm
vPointsTo(v, o)
:- vPointsTo0(v, o).
vPointsTo(v1, o)
:- Assign(v1, v2),
vPointsTo(v2, o).
hPointsTo(o1, f, o2)
vPointsTo(v2, o2)
June 11, 2006
:- Store(v1, f, v2),
vPointsTo(v1, o1),
vPointsTo(v2, o2).
:- Load(v1, f, v2),
vPointsTo(v1, o1),
hPointsTo(o1, f, o2).
Using Datalog and BDDs
for Program Analysis
29
Format of a Datalog file
• Domains
Name
V
H
Size ( map file )
65536
32768
var.map
• Relations
Name ( <attribute list> )
Store (v1 : V, f : F, v2 : V)
PointsTo (v : V, h : H)
flags
input
input, output
• Rules
Head :- Body .
PointsTo(v1,h) :- Assign(v1,v), PointsTo(v,h).
June 11, 2006
Using Datalog and BDDs
for Program Analysis
30
Key Point
• Program information is stored in a
relational database.
– Everything in the program is numbered.
• Write declarative inference rules to infer
new facts about the program.
• Negations OK if they are not in a recursive
cycle.
June 11, 2006
Using Datalog and BDDs
for Program Analysis
31
Take a break…
(Next up: Binary Decision Diagrams)
June 11, 2006
Using Datalog and BDDs
for Program Analysis
32
Part I: Essential Background
Binary Decision Diagrams
June 11, 2006
Using Datalog and BDDs
for Program Analysis
33
Call graph relation
• Call graph expressed as a relation.
– Five edges:
•
•
•
•
•
June 11, 2006
Calls(A,B)
Calls(A,C)
Calls(A,D)
Calls(B,D)
Calls(C,D)
A
B
C
D
Using Datalog and BDDs
for Program Analysis
34
Call graph relation
• Relation expressed as a binary
function.
– A=00, B=01, C=10, D=11
Calls(A,B)
Calls(A,C)
Calls(A,D)
Calls(B,D)
Calls(C,D)
June 11, 2006
→ 00 01
→ 00 10
→ 00 11
→ 01 11
→ 10 11
Using Datalog and BDDs
for Program Analysis
A 00
01 B
C 10
D 11
35
from
x1
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
x2
0
0
0
0
1
1
1
1
0
0
0
0
1
1
1
1
June 11, 2006
Call graph relation
to
x3
0
0
1
1
0
0
1
1
0
0
1
1
0
0
1
1
x4
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
f
0
1
1
1
0
0
0
1
0
0
0
1
0
0
0
0
• Relation expressed as a binary
function.
– A=00, B=01, C=10, D=11
A 00
01 B
C 10
D 11
Using Datalog and BDDs
for Program Analysis
36
Binary Decision Diagrams (Bryant
1986)
• Graphical encoding of a truth table.
x1
0 edge
1 edge
x2
x2
x3
x4
x3
x4
x4
x3
x4
x4
x3
x4
x4
x4
0 1 1 1 0 0 0 1 0 0 0 1 0 0 0 0
June 11, 2006
Using Datalog and BDDs
for Program Analysis
37
Binary Decision Diagrams
• Collapse redundant nodes.
x1
0 edge
1 edge
x2
x2
x3
x4
x3
x4
x4
x3
x4
x4
x3
x4
x4
x4
0 1 1 1 0 0 0 1 0 0 0 1 0 0 0 0
June 11, 2006
Using Datalog and BDDs
for Program Analysis
38
Binary Decision Diagrams
• Collapse redundant nodes.
x1
0 edge
1 edge
x2
x2
x3
x4
x3
x4
x4
x4
0
June 11, 2006
x3
x4
x3
x4
x4
x4
1
Using Datalog and BDDs
for Program Analysis
39
Binary Decision Diagrams
• Collapse redundant nodes.
x1
0 edge
1 edge
x2
x3
x2
x3
x4
x4
0
June 11, 2006
x3
x3
x4
1
Using Datalog and BDDs
for Program Analysis
40
Binary Decision Diagrams
• Collapse redundant nodes.
x1
0 edge
1 edge
x2
x2
x3
x3
x4
x4
0
June 11, 2006
x3
x4
1
Using Datalog and BDDs
for Program Analysis
41
Binary Decision Diagrams
• Eliminate unnecessary nodes.
x1
0 edge
1 edge
x2
x2
x3
x3
x4
x4
0
June 11, 2006
x3
x4
1
Using Datalog and BDDs
for Program Analysis
42
Binary Decision Diagrams
• Eliminate unnecessary nodes.
x1
0 edge
1 edge
x2
x2
x3
x3
x4
0
June 11, 2006
1
Using Datalog and BDDs
for Program Analysis
43
Binary Decision Diagrams
• Size depends on amount of redundancy,
NOT size of relation.
– Identical subtrees share the same representation.
– As set gets very large, more nodes have identical zero
and one successors, so the size decreases.
June 11, 2006
Using Datalog and BDDs
for Program Analysis
44
BDD Variable Order is Important!
x 1x 2 + x 3x 4
x1
x1
x3
x2
x3
x2
x4
0
x2
x4
1
0
x1<x2<x3<x4
June 11, 2006
x3
1
x1<x3<x2<x4
Using Datalog and BDDs
for Program Analysis
45
Variable ordering is NP-hard
• No good general heuristic solutions
• Dynamic reordering heuristics don’t work
well on these problems
• We use:
– Trial and error
– Active learning
June 11, 2006
Using Datalog and BDDs
for Program Analysis
46
Apply Operation
• Concept
– Basic technique for building OBDD from Boolean formula.
A
op
B
a
A op B
a
b
c
|
d
0
c


b
c
d
d
1
0
1
Result
A and B: Boolean Functions
 Represented as OBDDs
op: Boolean Operation (e.g., ^, &, |)
June 11, 2006
0
1
Arguments A, B, op

a
Using Datalog and BDDs
for Program Analysis

OBDD representing
composite function

A op B
47
Apply Execution Example
Argument A
Argument B
A1 a
a B1
A2 b
A1,B1
A2,B2
Operation
c A6
|
c B5
A3 d
A4 0
Recursive Calls
B2 d
B3 0
1 A5
A6,B2
A3,B2
1 B4
A4,B3
A5,B2
A6,B5
A3,B4
A5,B4
• Optimizations
– Dynamic programming
– Early termination rules
June 11, 2006
Using Datalog and BDDs
for Program Analysis
48
Apply Result Generation
Recursive Calls
Without Reduction
A1,B1
b
b
A3,B2
A4,B3
a
a
A2,B2
A6,B2
A5,B2
A5,B4
With Reduction
c
A6,B5
A3,B4
d
0
c
c
1
1
1
d
0
1
– Recursive calling structure implicitly defines unreduced BDD
– Apply reduction rules bottom-up as return from recursive calls
June 11, 2006
Using Datalog and BDDs
for Program Analysis
49
BDD implementation
• ‘Unique’ table
– Huge hash table
– Each entry: level, left, right, hash, next
• Operation cache
– Memoization cache for operations
• Garbage collection
– Mark and sweep, free list.
June 11, 2006
Using Datalog and BDDs
for Program Analysis
50
Code for BDD ‘and’.
Base case:
Memo cache lookup:
Recursive step:
Memo cache insert:
June 11, 2006
Using Datalog and BDDs
for Program Analysis
51
BDD Libraries
• BuDDy
– Simple, fast, memory-friendly
– Identifies BDD by index in unique table
• JavaBDD
– 100% Java, based on BuDDy
– Also native interface to BuDDY, CUDD, CAL, JDD
• CUDD
– Most popular, most feature-complete
– Not as fast as BuDDy
– Other types: ZDD, ADD
• JDD
– 100% Java, fresh implementation
– Still under development
June 11, 2006
Using Datalog and BDDs
for Program Analysis
52
Depth-first vs. breadth-first
• BDD algorithms have natural depth-first
recursive formulations.
• Some work on using breadth-first
evaluation for better parallelism and
locality
– CAL: breadth-first BDD package
• General idea: Assume independent, fixup
if not.
• Doesn’t perform well in practice.
June 11, 2006
Using Datalog and BDDs
for Program Analysis
53
Take a break…
(Next up: Using the Tools)
June 11, 2006
Using Datalog and BDDs
for Program Analysis
54
Tutorial Structure
Part I: Essential Background
– Datalog for Program Analysis
– Binary Decision Diagrams
Part II: Using the Tools
–
–
–
–
bddbddb
Compiler interface (Joeq compiler)
Datalog editor in Eclipse
Interactive mode
–
–
–
–
Context sensitivity
Combining multiple analyses
Race detection examples
Using advanced bddbddb features
–
–
–
–
Variable ordering
Iteration order
Machine learning
What it’s good for, what it isn’t good for
Part III: Developing Advanced Analyses
Part IV: Profiling, Debugging, Avoiding Gotchas
June 11, 2006
Using Datalog and BDDs
for Program Analysis
55
Part II: Using the Tools
bddbddb
(BDD-based deductive database)
June 11, 2006
Using Datalog and BDDs
for Program Analysis
56
bddbddb System Overview
Java
bytecode
Joeq
frontend
Input
relations
Datalog
program
June 11, 2006
Output
relations
Using Datalog and BDDs
for Program Analysis
57
Compiler Frontend
• Convert IR into tuples
• Tuples format:
# V0:16 F0:11 V1:16
001
012
1470 0 1464
June 11, 2006
Using Datalog and BDDs
for Program Analysis
header line
one tuple per line
58
Compiler Frontend
• Robust frontends:
– Joeq compiler
– Soot compiler
– SUIF compiler (for C code)
• Still experimental:
– Eclipse frontend
– gcc frontend
–…
June 11, 2006
Using Datalog and BDDs
for Program Analysis
59
Extracting Relations
• Idea: Iterate thru compiler IR, numbering
and dumping relations of interest.
–
–
–
–
–
Types
Methods
Fields
Variables
…
June 11, 2006
Using Datalog and BDDs
for Program Analysis
60
joeq.Main.GenRelations
• Generate initial relations for points-to analysis.
– Does initial pass to discover call graph.
• Options:
-fly : dump on-the-fly call graph info
-cs : dump context-sensitive info
-ssa
: dump SSA representation
-partial : no call graph discovery
-Dpa.dumppath= : where to save files
-Dpa.icallgraph= : location of initial call graph
-Dpa.dumpdotgraph : dump call graph in dot
June 11, 2006
Using Datalog and BDDs
for Program Analysis
61
Demo of joeq GenRelations
June 11, 2006
Using Datalog and BDDs
for Program Analysis
62
Part II: Using the Tools
bddbddb:
From Datalog to BDDs
June 11, 2006
Using Datalog and BDDs
for Program Analysis
63
An Adventure in BDDs
• Context-sensitive numbering scheme
– Modify BDD library to add special operations.
– Can’t even analyze small programs.
Time: 
• Improved variable ordering
– Group similar BDD variables together.
– Interleave equivalence relations.
– Move common subsets to edges of variable order.
• Incrementalize outermost loop
Time: 40h
– Very tricky, many bugs.
Time: 36h
– Reduces number of variables.
Time: 32h
• Factor away control flow, assignments
June 11, 2006
Using Datalog and BDDs
for Program Analysis
64
An Adventure in BDDs
• Exhaustive search for best BDD order
– Limit search space by not considering intradomain
orderings.
Time: 10h
• Eliminate expensive rename operations
– When rename changes relative order, result is not
isomorphic.
Time: 7h
• Improved BDD memory layout
– Preallocate to guarantee contiguous.
Time: 6h
• BDD operation cache tuning
– Too small: redo work, too big: bad locality
– Parameter sweep to find best values.
Time: 2h
June 11, 2006
Using Datalog and BDDs
for Program Analysis
65
An Adventure in BDDs
• Simplified treatment of exceptions
– Reduce number of variables, iterations necessary for
convergence.
Time: 1h
• Change iteration order
– Required redoing much of the code.
Time: 48m
• Eliminate redundant operations
– Introduced subtle bugs.
Time: 45m
• Specialized caches for different operations
– Different caches for and, or, etc.
June 11, 2006
Using Datalog and BDDs
for Program Analysis
Time: 41m
66
An Adventure in BDDs
• Compacted BDD nodes
– 20 bytes  16 bytes
Time: 38m
• Improved BDD hashing function
– Simpler hash function.
Time: 37m
• Total development time: 1 year
– 1 year per analysis?!?
• Optimizations obscured the algorithm.
• Many bugs discovered, maybe still more.
June 11, 2006
Using Datalog and BDDs
for Program Analysis
67
bddbddb:
BDD-Based Deductive DataBase
• Automatically generate from Datalog
– Optimizations based on my experience with
handcoded version.
– Plus traditional compiler algorithms.
• bddbddb even better than handcoded
– handcoded: 37m bddbddb: 19m
June 11, 2006
Using Datalog and BDDs
for Program Analysis
68
Datalog  BDDs
Datalog
BDDs
Relations
Boolean functions
Relation ops:
⋈, ∪, select, project
Boolean function ops:
∧, ∨, −, ∼
Relation at a time
Function at a time
Semi-naïve evaluation
Incrementalization
Fixed-point
Iterate until stable
June 11, 2006
Using Datalog and BDDs
for Program Analysis
69
Compiling Datalog to BDDs
1.
2.
3.
4.
Apply Datalog source level transforms.
Stratify and determine iteration order.
Translate into relational algebra IR.
Optimize IR and replace relational algebra ops
with equivalent BDD ops.
5. Assign relation attributes to physical BDD
domains.
6. Perform more optimizations after domain
assignment.
7. Interpret the resulting program.
June 11, 2006
Using Datalog and BDDs
for Program Analysis
70
High-Level Transform:
Magic Set Transformation
• Add “magic” predicates to control
generated tuples [Bancilhon 1986, Beeri
1987]
– Combines ideas from top-down and bottomup evaluation
• Doesn’t always help
– Leads to more iterations
– BDDs are good at large operations
June 11, 2006
Using Datalog and BDDs
for Program Analysis
71
Predicate Dependency Graph
vPointsTo0
Assign
Load
Store
vPointsTo
hPointsTo
add edge from RHS to
LHS
hPointsTo(o
)) :vPointsTo(v
:- Store(v
Load(v11,, f,f, vv22),),
1, f,2,oo
22
vPointsTo(v
,0v(v,
),o).
vPointsTo(v,
o) ::- Assign(v
vPointsTo
1, o)
1
2
vPointsTo(v
vPointsTo(v11,, o
o11),
),
vPointsTo(v
,, o).
2
vPointsTo(v
hPointsTo(o21, of,2).
o2).
June 11, 2006
Using Datalog and BDDs
for Program Analysis
72
Determining Iteration Order
• Tradeoff between faster convergence and
BDD cache locality
• Static heuristic
– Visit rules in reverse post-order
– Iterate shorter loops before longer loops
• Profile-directed feedback
• User can control iteration order
– pri=# keywords on rules/relations
June 11, 2006
Using Datalog and BDDs
for Program Analysis
73
Predicate Dependency Graph
vPointsTo0
Assign
Load
Store
vPointsTo
hPointsTo
June 11, 2006
Using Datalog and BDDs
for Program Analysis
74
Datalog to Relational Algebra
vPointsTo(v1, o)
:- Assign(v1, v2),
vPointsTo(v2, o).
t1 = ρvariable→source(vPointsTo);
t2 = assign ⋈ t1;
t3 = πsource(t2);
t4 = ρdest→variable(t3);
vPointsTo = vPointsTo ∪ t4;
June 11, 2006
Using Datalog and BDDs
for Program Analysis
75
Incrementalization
t1 = ρvariable→source(vP);
t2 = assign ⋈ t1;
t3 = πsource(t2);
t4 = ρdest→variable(t3);
vP = vP ∪ t4;
June 11, 2006
vP’’ = vP – vP’;
vP’ = vP;
assign’’ = assign – assign’;
assign’ = assign;
t1 = ρvariable→source(vP’’);
t2 = assign ⋈ t1;
t5 = ρvariable→source(vP);
t6 = assign’’ ⋈ t5;
t 7 = t 2 ∪ t 6;
t3 = πsource(t7);
t4 = ρdest→variable(t3);
vP = vP ∪ t4;
Using Datalog and BDDs
for Program Analysis
76
Optimize into BDD operations
vP’’ = vP – vP’;
vP’ = vP;
assign’’ = assign – assign’;
assign’ = assign;
t1 = ρvariable→source(vP’’);
t2 = assign ⋈ t1;
t5 = ρvariable→source(vP);
t6 = assign’’ ⋈ t5;
t 7 = t 2 ∪ t 6;
t3 = πsource(t7);
t4 = ρdest→variable(t3);
vP = vP ∪ t4;
June 11, 2006
vP’’ = diff(vP, vP’);
vP’ = copy(vP);
t1 = replace(vP’’,variable→source);
t3 = relprod(t1,assign,source);
t4 = replace(t3,dest→variable);
vP = or(vP, t4);
Using Datalog and BDDs
for Program Analysis
77
Physical domain assignment
vP’’ = diff(vP, vP’);
vP’ = copy(vP);
t1 = replace(vP’’,variable→source);
t3 = relprod(t1,assign,source);
t4 = replace(t3,dest→variable);
vP = or(vP, t4);
vP’’ = diff(vP, vP’);
vP’ = copy(vP);
t3 = relprod(vP’’,assign,V0);
t4 = replace(t3, V1→V0);
vP = or(vP, t4);
• Minimizing renames is NP-complete
• Renames have vastly different costs
• Priority-based assignment algorithm
June 11, 2006
Using Datalog and BDDs
for Program Analysis
78
Other optimizations
•
•
•
•
•
•
•
Dead code elimination
Constant propagation
Definition-use chaining
Redundancy elimination
Global value numbering
Copy propagation
Liveness analysis
June 11, 2006
Using Datalog and BDDs
for Program Analysis
79
Splitting rules
R(a,e) :- A(a,b), B(b,c), C(c,d), R(d,e).
Can be split into:
T1(a,c) :- A(a,b), B(b,c).
T2(a,d) :- T1(a,c), C(c,d).
R(a,e) :- T2(a,d), R(d,e).
Affects incrementalization, iteration.
Use “split” keyword to auto-split rules.
June 11, 2006
Using Datalog and BDDs
for Program Analysis
80
Other Tools
• Banshee (John Kodumal)
– Results are harder to use (not relational)
• Paddle/Jedd (Ondrej Lhotak)
– Imperative style: more expressive
– Not as efficient, doesn’t scale as well
June 11, 2006
Using Datalog and BDDs
for Program Analysis
81
Jedd code
• Jedd code is like bddbddb internal IR
before domain assignment:
vP’’ = diff(vP, vP’);
vP’ = copy(vP);
t1 = replace(vP’’,variable→source);
t3 = relprod(t1,assign,source);
t4 = replace(t3,dest→variable);
vP = or(vP, t4);
June 11, 2006
Using Datalog and BDDs
for Program Analysis
82
Demo of using bddbddb
June 11, 2006
Using Datalog and BDDs
for Program Analysis
83
Tutorial Structure
Part I: Essential Background
– Datalog for Program Analysis
– Binary Decision Diagrams
Part II: Using the Tools
–
–
–
–
bddbddb
Compiler interface (Joeq compiler)
Datalog editor in Eclipse
Interactive mode
–
–
–
–
Context sensitivity
Combining multiple analyses
Race detection examples
Using advanced bddbddb features
–
–
–
–
Variable ordering
Iteration order
Machine learning
What it’s good for, what it isn’t good for
Part III: Developing Advanced Analyses
Part IV: Profiling, Debugging, Avoiding Gotchas
June 11, 2006
Using Datalog and BDDs
for Program Analysis
84
Part III: Developing Advanced Analyses
Context Sensitivity
June 11, 2006
Using Datalog and BDDs
for Program Analysis
85
Old Technique:
Summary-Based Analysis
• Idea: Summarize the effect of a method on
its callers.
–
–
–
–
Sharir, Pnueli [Muchnick 1981]
Landi, Ryder [PLDI 1992]
Wilson, Lam [PLDI 1995]
Whaley, Rinard [OOPSLA 1999]
June 11, 2006
Using Datalog and BDDs
for Program Analysis
86
Old Technique:
Summary-Based Analysis
• Problems:
–
–
–
–
Difficult to summarize pointer analysis.
Composed summaries can get large.
Recursion is difficult: Must find fixpoint.
Queries (e.g. which context points to x)
require expanding an exponential number of
contexts.
June 11, 2006
Using Datalog and BDDs
for Program Analysis
87
My Technique:
Cloning-Based Analysis
• Simple brute force technique.
– Clone every path through the call graph.
– Run context-insensitive algorithm on
expanded call graph.
• The catch: exponential blowup
June 11, 2006
Using Datalog and BDDs
for Program Analysis
88
Cloning is exponential!
June 11, 2006
Using Datalog and BDDs
for Program Analysis
89
Recursion
• Actually, cloning is unbounded in the
presence of recursive cycles.
• Technique: We treat all methods within a
strongly-connected component as a single
node.
June 11, 2006
Using Datalog and BDDs
for Program Analysis
90
Recursion
A
A
B
C
E
F
G
June 11, 2006
D
E
F
B
C
D
E
F
E
G
Using Datalog and BDDs
for Program Analysis
G
F
G
91
Top 20 Sourceforge Java Apps
Number of Clones
1.E+16
1016
Number of clones
1.E+14
1.E+12
1012
1.E+10
1.E+08
108
1.E+06
1.E+04
104
1.E+02
0
10
1.E+00
1000
10000
100000
1000000
Size of program (variable nodes)
June 11, 2006
Using Datalog and BDDs
for Program Analysis
92
Cloning is infeasible (?)
• Typical large program has ~1014 paths.
• If you need 1 byte to represent a clone:
– Would require 256 terabytes of storage
– >12 times size of Library of Congress
– Registered ECC 1GB DIMMs: $41.7 million
• Power: 96.4 kilowatts = Power for 128 homes
– 500 GB hard disks: 564 x $195 = $109,980
• Time to read sequential: 70.8 days
June 11, 2006
Using Datalog and BDDs
for Program Analysis
93
Key Insight
• There are many similarities across
contexts.
– Many copies of nearly-identical results.
• BDDs can represent large sets of
redundant data efficiently.
– Need a BDD encoding that exploits the
similarities.
June 11, 2006
Using Datalog and BDDs
for Program Analysis
94
Expanded Call Graph
A
B
C
A0
D
E
F
G
H
June 11, 2006
B0
C0
D0
E0
E1
E2
F0 F1 F2
G0 G1 G2
H0 H1 H2
H3 H4 H5
Using Datalog and BDDs
for Program Analysis
95
Numbering Clones
0
A
0
B
A0
0
0
D
C
0
1
E
0-2
2
G
June 11, 2006
C0
D0
E0
E1
E2
0-2
F
0-2
B0
H
3-5
F0 F1 F2
G0 G1 G2
H0 H1 H2
H3 H4 H5
Using Datalog and BDDs
for Program Analysis
96
Context-sensitive Pointer Analysis
Algorithm
1. First, do context-insensitive pointer
analysis to get call graph.
2. Number clones.
3. Do context-insensitive algorithm on the
cloned graph.
• Results explicitly generated for every clone.
• Individual results retrievable with Datalog query.
June 11, 2006
Using Datalog and BDDs
for Program Analysis
97
Counting rule
• IEnum(i,m,vc2,vc1) :- roots(m), mI0(m,i),
IE0(i,m). number
• Special rule to define numbering.
• Head: result of numbering
– First two variables: edge you want to number
– Second two variables: context numbering
• Subgoals: graph edges
– Single variable: roots of graph
June 11, 2006
Using Datalog and BDDs
for Program Analysis
98
Demo of context-sensitive
June 11, 2006
Using Datalog and BDDs
for Program Analysis
99
Part III: Developing Advanced Analyses
Example: Race Detection
June 11, 2006
Using Datalog and BDDs
for Program Analysis
100
Object Sensitivity
• k-object-sensitivity (Milanova, Ryder, Rountev 2003)
• k=3 suffices in our experiments
• CHA/context-insensitive/k-CFA too imprecise
static main() {
h1: C a = new A();
h2: C b = new B();
p1: foo(a);
p2: foo(b);
p3: foo(a); }
static foo(C c) {
p4: c.bar(); }
June 11, 2006
Contexts of method bar():
1-CFA: { p4 }
2-CFA: { p1:p4, p2:p4, p3:p4 }
1-objsens: { h1, h2 }
2-objsens: { h1, h2 }
Using Datalog and BDDs
for Program Analysis
101
Open Programs
• Analyzing open programs is important
– Many “programs” are libraries
– Developers need to understand behavior w/o a client
• Standard approach
– Write a “harness” manually
– A client exercising the interface of the open program
• Our approach
– Generate the harness automatically
June 11, 2006
Using Datalog and BDDs
for Program Analysis
102
Race Detection
• A multi-threaded program contains a race if:
– Two threads can access a memory location
– At least one access is a write
– No ordering between the accesses
• As a rule, races are bad
– And common …
– And hard to find …
June 11, 2006
Using Datalog and BDDs
for Program Analysis
103
Running Example
public A() {
f = 0; }
public int get() {
return rd(); }
public sync int inc() {
int t = rd() + (new A()).wr(1);
return wr(t); }
private int rd() {
return f; }
private int wr(int x) {
f = x;
return x; }
June 11, 2006
static public void main() {
A a;
a = new A();
a.get();
a.inc(); }
Harness
(Note: Single-threaded)
Using Datalog and BDDs
for Program Analysis
104
Example: Two Object-Sensitive
Contexts
static public void main() {
A a;
a = new A();
a.get();
a.inc(); }
public A() {
f = 0; }
public int get() {
return rd(); }
public sync int inc() {
int t = rd() + (new A()).wr(1);
return wr(t); }
private int rd() {
return f; }
private int wr(int x) {
f = x;
return x; }
June 11, 2006
Using Datalog and BDDs
for Program Analysis
105
Example: 1st Context
static public void main() {
A a;
a = new A();
a.get();
a.inc(); }
public A() {
f = 0; }
public int get() {
return rd(); }
public sync int inc() {
int t = rd() + (new A()).wr(1);
return wr(t); }
private int rd() {
return f; }
private int wr(int x) {
f = x;
return x; }
June 11, 2006
Using Datalog and BDDs
for Program Analysis
106
Example: 2nd Context
static public void main() {
A a;
a = new A();
a.get();
a.inc(); }
public A() {
f = 0; }
public int get() {
return rd(); }
public sync int inc() {
int t = rd() + (new A()).wr(1);
return wr(t); }
private int rd() {
return f; }
private int wr(int x) {
f = x;
return x; }
June 11, 2006
Using Datalog and BDDs
for Program Analysis
107
Computing Original Pairs
All pairs of accesses such that
– Both references of one of the following forms:
• e1.f and e2.f (the same instance field)
• C.g and C.g (the same static field)
• e1[e3] and e2[e4] (any array elements)
– At least one is a write
June 11, 2006
Using Datalog and BDDs
for Program Analysis
108
Example: Original Pairs
public A() {
f = 0; }
public int get() {
return rd(); }
public sync int inc() {
int t = rd() + (new A()).wr(1);
return wr(t); }
public A() {
f = 0; }
public int get() {
return rd(); }
public sync int inc() {
int t = rd() + (new A()).wr(1);
return wr(t); }
private int rd() {
return f; }
private int wr(int x) {
f = x;
return x; }
private int rd() {
return f; }
private int wr(int x) {
f = x;
return x; }
June 11, 2006
Using Datalog and BDDs
for Program Analysis
109
Computing Reachable Pairs
• Step 1
– Access pairs with at least one write to same field
• Step 2
– Consider any access pair (e1, e2)
– To be a race e1 must be:
– Reachable from a thread-spawning call site s1
• Without “switching” threads
– Where s1 is reachable from main
– (and similarly for e2)
June 11, 2006
Using Datalog and BDDs
for Program Analysis
110
Example: Reachable Pairs
static public void main() {
A a;
a = new A();
a.get();
a.inc(); }
public A() {
f = 0; }
public int get() {
return rd(); }
public sync int inc() {
int t = rd() + (new A()).wr(1);
return wr(t); }
private int rd() {
return f; }
private int wr(int x) {
f = x;
return x; }
private int rd() {
return f; }
private int wr(int x) {
f = x;
return x; }
June 11, 2006
Using Datalog and BDDs
for Program Analysis
111
Computing Aliasing Pairs
• Steps 1-2
– Access pairs with at least one write to same field
– And both are executed in a thread in some context
• Step 3
– To have a race, both must access the same memory
location
– Use alias analysis
June 11, 2006
Using Datalog and BDDs
for Program Analysis
112
Example: Aliasing Pairs
static public void main() {
A a;
a = new A();
a.get();
a.inc(); }
public A() {
f = 0; }
public int get() {
return rd(); }
public sync int inc() {
int t = rd() + (new A()).wr(1);
return wr(t); }
private int rd() {
return f; }
private int wr(int x) {
f = x;
return x; }
private int rd() {
return f; }
private int wr(int x) {
f = x;
return x; }
June 11, 2006
Using Datalog and BDDs
for Program Analysis
113
Computing Escaping Pairs
• Steps 1-3
– Access pairs with at least one write to same field
– And both are executed in a thread in some context
– And both can access the same memory location
• Step 4
– To have a race, the memory location must also
be thread-shared
– Use thread-escape analysis
June 11, 2006
Using Datalog and BDDs
for Program Analysis
114
Example: Escaping Pairs
static public void main() {
A a;
a = new A();
a.get();
a.inc(); }
public A() {
f = 0; }
public int get() {
return rd(); }
public sync int inc() {
int t = rd() + (new A()).wr(1);
return wr(t); }
private int rd() {
return f; }
private int wr(int x) {
f = x;
return x; }
private int rd() {
return f; }
private int wr(int x) {
f = x;
return x; }
June 11, 2006
Using Datalog and BDDs
for Program Analysis
115
Computing Unlocked Pairs
• Steps 1-4
– Access pairs with at least one write to same field
– And both are executed in a thread in some context
– And both can access the same memory location
– And the memory location is thread-shared
• Step 5
– Discard pairs where the memory location is guarded
by a common lock in both accesses
June 11, 2006
Using Datalog and BDDs
for Program Analysis
116
Example: Unlocked Pairs
static public void main() {
A a;
a = new A();
a.get();
a.inc(); }
public A() {
f = 0; }
public int get() {
return rd(); }
public sync int inc() {
int t = rd() + (new A()).wr(1);
return wr(t); }
private int rd() {
return f; }
private int wr(int x) {
f = x;
return x; }
private int rd() {
return f; }
private int wr(int x) {
f = x;
return x; }
June 11, 2006
Using Datalog and BDDs
for Program Analysis
117
Counterexamples
• Each pair of paths in the context-sensitive call graph
from a pair of roots to a pair of accesses along
which a common lock may not be held
• Different from most other systems
– Pairs of paths (instead of single interleaved path)
– At call-graph level
June 11, 2006
Using Datalog and BDDs
for Program Analysis
118
Example: Counterexample
// file Harness.java
static public void main() {
A a;
a = new A();
4: a.get();
5: a.inc(); }
field reference A.f (A.java:10) [Rd]
A.get(A.java:4)
Harness.main(Harness.java:4)
field reference A.f (A.java:12) [Wr]
A.inc(A.java:7)
Harness.main(Harness.java:5)
June 11, 2006
// file A.java
public A() {
f = 0; }
public int get() {
4:
return rd(); }
public sync int inc() {
int t= rd() + (new A()).wr(1);
7:
return wr(t); }
private int rd() {
10: return f; }
private int wr(int x) {
12: f = x;
return x; }
Using Datalog and BDDs
for Program Analysis
119
Race Checker Datalog
June 11, 2006
Using Datalog and BDDs
for Program Analysis
120
Map Sensitivity
...
String username = request.getParameter(“user”)
map.put(“USER_NAME”, username);
...
“USER_NAME” ≠ “SEARCH_QUERY”
String query = (String) map.get(“SEARCH_QUERY”);
stmt.executeQuery(query);
...
• Maps with constant string keys are common
• Augment pointer analysis:
– Model Map.put/get operations specially
June 11, 2006
Using Datalog and BDDs
for Program Analysis
121
Resolving Reflection
• Reflection is a dynamic language feature
• Used to query object and class information
– static Class Class.forName(String className)
• Obtain a java.lang.Class object
• I.e. Class.forName(“java.lang.String”) gets an
object corresponding to class String
– Object Class.newInstance()
• Object constructor in disguise
• Create a new object of a given class
Class c = Class.forName(“java.lang.String”);
Object o = c.newInstance();
• This makes a new empty string o
June 11, 2006
Using Datalog and BDDs
for Program Analysis
122
What to Do About Reflection?
1.
2.
3.
4.
String className = ...;
Class c = Class.forName(className);
Object o = c.newInstance();
T t
= (T) o;
1. Anything goes
2. Ask the user
3. Subtypes of T
4. Analyze className
+
+
-
+
-
+
-
-
Obviously
conservative
Call graph
extremely big
and imprecise
June 11, 2006
Good results
A lot of work for
user, difficult
to find
answers
More precise
T may have
many
subtypes
Using Datalog and BDDs
for Program Analysis
Better still
Need to know
where
className
comes from
123
Analyzing Class Names
• Looking at className seems promising
String stringClass = “java.lang.String”;
foo(stringClass);
...
void foo(String clazz){
bar(clazz);
}
void bar(String className){
Class c = Class.forName(className);
}
• This is interprocedural const+copy prop on strings
June 11, 2006
Using Datalog and BDDs
for Program Analysis
124
Pointer Analysis Can Help
Stack variables
Heap objects
stringClass
clazz
className
java.lang.String
June 11, 2006
Using Datalog and BDDs
for Program Analysis
125
Reflection Resolution Using
Points-to
1.
2.
3.
4.
String className = ...;
Class c = Class.forName(className);
Object o = c.newInstance();
T t
= (T) o;
• Need to know what className is
– Could be a local string constant like java.lang.String
– But could be a variable passed through many layers of calls
• Points-to analysis says what className refers to
– className --> concrete heap object
June 11, 2006
Using Datalog and BDDs
for Program Analysis
126
Reflection Resolution
Constants
1.
2.
3.
4.
Specification points
String className = ...;
Class c = Class.forName(className);
Object o = c.newInstance();
T t
= (T) o;
1. String className = ...;
2. Class c = Class.forName(className);
Object o = new T1();
Object o = new T2();
Object o = new T3();
June 11, 2006
Using Datalog and BDDs
4. T t
= (T) o;
for Program Analysis
Q: what object
does this
create?
127
Resolution May Fail!
1.
2.
3.
4.
•
•
String className = r.readLine();
Class c = Class.forName(className);
Object o = c.newInstance();
T t
= (T) o;
Need help figuring out what className is
Two options
1.
Can ask user for help
•
•
•
2.
Call to r.readLine on line 1 is a specification point
User needs to specify what can be read from a file
Analysis helps the user by listing all specification points
Can use cast information
•
•
June 11, 2006
Constrain possible types instantiated on line 3 to subclasses of T
Need additional assumptions
Using Datalog and BDDs
for Program Analysis
128
1. Specification Files

Format: invocation site => class
loadImpl() @ 43 InetAddress.java:1231 =>
java.net.Inet4AddressImpl
loadImpl() @ 43 InetAddress.java:1231 =>
java.net.Inet6AddressImpl
lookup() @ 86 AbstractCharsetProvider.java:126 =>
sun.nio.cs.ISO_8859_15
lookup() @ 86 AbstractCharsetProvider.java:126 =>
sun.nio.cs.MS1251
tryToLoadClass() @ 29 DataFlavor.java:64 =>
java.io.InputStream
June 11, 2006
Using Datalog and BDDs
for Program Analysis
129
2. Using Cast Information
1.
2.
3.
4.
String className = ...;
Class c = Class.forName(className);
Object o = c.newInstance();
T t
= (T) o;
•
Providing specification files is tedious,
time-consuming, error-prone
• Leverage cast data instead
– o instanceof T
– Can constrain type of o if
1. Cast succeeds
2. We know all subclasses of T
June 11, 2006
Using Datalog and BDDs
for Program Analysis
130
Analysis Assumptions
1. Assumption: Correct casts.
Type cast operations that always operate on
the result of a call to Class.newInstance are
correct; they will always succeed without
throwing a ClassCastException.
2. Assumption: Closed world.
We assume that only classes reachable from
the class path at analysis time can be used by
the application at runtime.
June 11, 2006
Using Datalog and BDDs
for Program Analysis
131
Casts Aren’t Always Present
• Can’t do anything if no cast post-dominating a
Class.newInstance call
Object factory(String className){
Class c = Class.forName(className);
return c.newInstance();
}
...
SunEncoder t
= (SunEncoder)
factory(“sun.io.encoder.” + enc);
SomethingElse e = (SomethingElse)
factory(“SomethingElse“);
June 11, 2006
Using Datalog and BDDs
for Program Analysis
132
Call Graph Discovery Process
Program IR
Call graph
construction
Reflection
resolution
using
points-to
User-provided
spec
June 11, 2006
Resolved
calls
Final call
graph
Cast-based
approximation
Specification
points
Using Datalog
and BDDs
for Program Analysis
133
Implementation Details
• Call graph construction algorithm in the presence of
reflection is integrated with pointer analysis
– Pointer analysis already has to deal with virtual calls: new
methods are discovered, points-to relations for them are
created
– Reflection analysis is another level of complexity
• See Datalog specification
June 11, 2006
Using Datalog and BDDs
for Program Analysis
134
Reflection Resolution Results
• Applied to 6 large Java apps, 190,000 LOC combined
16,000
Call graph sizes compared
Methods
18,000
14,000
12,000
10,000
8,000
6,000
4,000
2,000
0
June 11, 2006
jgap
f reet t s
Using Datalog and BDDs
grunt spud
for Program
Analysis
135
jedit
columba
jf reechart
Map relations
• Need to map from values in one domain to
another?
• Use special operator “=>”
• mapAtoB(a,b) :- a => b.
• Elements in A are appended to domain of
B
– A must have map file.
– B must have enough space.
June 11, 2006
Using Datalog and BDDs
for Program Analysis
136
Using Code Fragments
• Execute a code fragment before/after every rule
invocation.
A(x,y) :- B(y,a), C(a,z). { code goes here }
• Can access:
– Relations by name.
– Rule information.
– Solver information.
• Can also add code fragment to relations
(triggered on change).
• Special keywords: “modifies”, “pre”, “post”
June 11, 2006
Using Datalog and BDDs
for Program Analysis
137
Take a break…
(Next up: Profiling, Debugging)
June 11, 2006
Using Datalog and BDDs
for Program Analysis
138
Tutorial Structure
Part I: Essential Background
– Datalog for Program Analysis
– Binary Decision Diagrams
Part II: Using the Tools
–
–
–
–
bddbddb
Compiler interface (Joeq compiler)
Datalog editor in Eclipse
Interactive mode
–
–
–
–
Context sensitivity
Combining multiple analyses
Race detection examples
Using advanced bddbddb features
–
–
–
–
Variable ordering
Iteration order
Machine learning
What it’s good for, what it isn’t good for
Part III: Developing Advanced Analyses
Part IV: Profiling, Debugging, Avoiding Gotchas
June 11, 2006
Using Datalog and BDDs
for Program Analysis
139
Part IV: Profiling, Debugging, Avoiding Gotchas
Variable Ordering
June 11, 2006
Using Datalog and BDDs
for Program Analysis
140
TryDomainOrders
• Try all possible domain orders for a given operation
and inputs.
– Bounded: if an order takes longer than current best, abort
it.
• To profile slow-running operations:
Run with -Ddumpslow, -Ddumpcutoff=5000
java net.sf.bddbddb.TryDomainOrders
• If you know ordering constraints, you can add them
to rules/relations
– Constraints automatically propagated to other
rules/relations
June 11, 2006
Using Datalog and BDDs
for Program Analysis
141
Variable Numbering:
Active Machine Learning
•
•
•
•
Must be determined dynamically
Limit trials with properties of relations
Each trial may take a long time
Active learning:
select trials based on uncertainty
– Can build up trial database to improve accuracy
• Several hours
• Comparable to exhaustive for small apps
June 11, 2006
Using Datalog and BDDs
for Program Analysis
142
Using Machine Learning
• -Dfindbestorder
– Enable machine learning
• -Dfbocutoff=#
– Minimum runtime (in ms) for an operation to
be considered
• -Dfbotrials=#
– Maximum number of trials to run
• -Dtrialfile=
– Filename to load/store trial information.
June 11, 2006
Using Datalog and BDDs
for Program Analysis
143
Changing Iteration Order
• bddbddb uses simple iteration order
heuristic – not always optimal
• If a rule is iterating too many times:
– Lower its priority with pri=5
– Increase other rules with pri=-5
– Can also adjust priority of relations
• Solver prints iteration order on startup
• Also try reformulating the problem or
changing input relations
June 11, 2006
Using Datalog and BDDs
for Program Analysis
144
Reformulate the Problem
• Change rule form:
A(a,c) :- A(a,b), A(b,c).
vs
A(a,c) :- A(a,b), A(b,c).
• Change input relations
– Short-circuit paths
• Filter relations as you go
June 11, 2006
Using Datalog and BDDs
for Program Analysis
145
Debugging
• Debugging can be tricky
– Relations are huge
– Declarative: not so straightforward
• Adding code fragments can help.
• Try it on a small example with full trace
information.
• Best: Interactive solver
June 11, 2006
Using Datalog and BDDs
for Program Analysis
146
“Comes from” query
• Special kind of query:
A(3,5) :- ?
“What contributed to (3,5) being added to A?”
• Add ‘single’ keyword to get only one path.
• Doesn’t solve the negated problem (missing
tuples)
June 11, 2006
Using Datalog and BDDs
for Program Analysis
147
Solver options
-Dnoisy
-Dtracesolve
-Dfulltracesolve
-Dbddvarorder=
-Dbddnodes=
-Dbddcache=
-Dbddminfree=
-Dfindbestorder
June 11, 2006
-Dbasedir=
-Dincludedirs=
-Ddumprulegraph
-Duseir
-Dprintir
-Dsplit_all_rules
-Dsplit_no_rules
Using Datalog and BDDs
for Program Analysis
148
Datalog directives
•
•
•
•
•
•
•
.include
.split_all_rules
.report_stats
.noisy
.strict
.singleignore
.trace
June 11, 2006
•
•
•
•
•
•
•
•
.bddvarorder
.bddnodes
.bddcache
.bddminfree
.findbestorder
.incremental
.dot
.basedir
Using Datalog and BDDs
for Program Analysis
149
Relation options
input / inputtuples
output / outputtuples
printtuples
printsize
pri=#
{ code fragment }
x<y
June 11, 2006
Rule options
split
number
single
cacheafterrename
findbestorder
trace / tracefull
pre / post { code }
modifies R
Using Datalog and BDDs
for Program Analysis
150
Experimental Features
•
•
•
•
•
•
•
Distributed computation: (dbddbddb?)
Profile-directed feedback of iteration order
Eclipse integration
Touchgraph integration
Debugging interface
Tracing information
Include rules in come-from query
June 11, 2006
Using Datalog and BDDs
for Program Analysis
151
What works well
• Big sets of mostly redundant data
– Pointer analysis
– Context-sensitive analysis
• Short propagation paths
– Each iteration takes quite a bit of time, so >1000
iterations will hurt
– Try to preprocess/reformulate problem to shorten
paths
• Natural ‘flow’ problems
• Pure analysis problems (no transformations)
June 11, 2006
Using Datalog and BDDs
for Program Analysis
152
What doesn’t work well
• Long propagation paths
– Traditional dataflow analysis (use sparse form
instead)
• Huge problems with little redundancy
– Too much context sensitivity
• Domains that are not easily countable
– Need to manufacture names on the fly
• Problems that have inherently complicated
formulations
• Problems optimized for particular data structures
(union-find, etc.)
June 11, 2006
Using Datalog and BDDs
for Program Analysis
153
Using bddbddb in a class
• bddbddb has been very useful in Stanford
advanced compiler course
– Comparing/contrasting analyses becomes easier
– Students can implement and evaluate multiple
techniques without much overhead
• Projects:
– Implement an algorithm from a paper in Datalog,
make a small change and evaluate its effectiveness
– Experiment with different kinds of context sensitivity
on a given problem
– Improve on BDD solver efficiency
– Build a tool based on analysis results
June 11, 2006
Using Datalog and BDDs
for Program Analysis
154
Questions?
June 11, 2006
Using Datalog and BDDs
for Program Analysis
155
That’s all, folks!
Thanks for sticking around for all 156 slides!
June 11, 2006
Using Datalog and BDDs
for Program Analysis
156
Experimental Results
June 11, 2006
Using Datalog and BDDs
for Program Analysis
157
Experimental Results
Java Context-Insensitive Pointer Analysis
Speed Comparison (Normalized to Handcoded)
Speed relative to handcoded
2.5
2
Handcoded
No Opts
1.5
Incr
+DU
1
+Dom
+All
0.5
+Order
0
joeq
June 11, 2006
jgraph
jbidwatch
jedit
Using Datalog and BDDs
for Program Analysis
umldot
megamek
158
Experimental Results
Java Context-Insensitive Pointer Analysis
Speed Comparison (Normalized to Handcoded)
Speed relative to handcoded
2.5
Handcoded
2
No Opts
Incr
1.5
+DU
+Dom
+All
1
0.5
+Order
0
joeq
June 11, 2006
jgraph
jbidwatch
jedit
Using Datalog and BDDs
for Program Analysis
umldot
megamek
159
Experimental Results
Java Context-Insensitive Pointer Analysis
Speed Comparison (Normalized to Handcoded)
Speed relative to handcoded
2.5
2
Handcoded
No Opts
Incr
1.5
+DU
+Dom
1
+All
+Order
0.5
0
joeq
June 11, 2006
jgraph
jbidwatch
jedit
Using Datalog and BDDs
for Program Analysis
umldot
megamek
160
Experimental Results
Java Context-Insensitive Pointer Analysis
Speed Comparison (Normalized to Handcoded)
Speed relative to handcoded
2.5
2
Handcoded
No Opts
Incr
1.5
+DU
+Dom
1
+All
+Order
0.5
0
joeq
June 11, 2006
jgraph
jbidwatch
jedit
Using Datalog and BDDs
for Program Analysis
umldot
megamek
161
Experimental Results
Java Context-Insensitive Pointer Analysis
Speed Comparison (Normalized to Handcoded)
Speed relative to handcoded
2.5
2
Handcoded
No Opts
Incr
1.5
+DU
+Dom
1
+All
+Order
0.5
0
joeq
June 11, 2006
jgraph
jbidwatch
jedit
Using Datalog and BDDs
for Program Analysis
umldot
megamek
162
Experimental Results
Speed relative to handcoded
Java Context-Insensitive Pointer Analysis
Speed Comparison (Normalized to Handcoded)
2.5
Handcoded
No Opts
Incr
2
1.5
+DU
+Dom
1
+All
+Order
0.5
0
joeq
June 11, 2006
jgraph
jbidwatch
jedit
Using Datalog and BDDs
for Program Analysis
umldot
megamek
163
Experimental Results
Speed relative to handcoded
Java Context-Insensitive Pointer Analysis
Speed Comparison (Normalized to Handcoded)
2.5
Handcoded
No Opts
2
1.5
Incr
+DU
1
+Dom
+All
0.5
+Order
0
joeq
June 11, 2006
jgraph
jbidwatch
jedit
Using Datalog and BDDs
for Program Analysis
umldot
megamek
164
Experimental Results
Speed relative to handcoded
Java Context-Sensitive Pointer Analysis
Speed Comparison (Normalized to Handcoded)
1.6
1.4
Handcoded
No Opts
Incr
1.2
1
0.8
+DU
+Dom
0.6
+All
+Order
0.4
0.2
0
joeq
June 11, 2006
jgraph
jbidwatch
jedit
Using Datalog and BDDs
for Program Analysis
umldot
megamek
165
Experimental Results
C Pointer Analysis
Speed Comparison (Normalized to Handcoded)
Speed relative to handcoded
4
3.5
Handcoded
3
No Opts
Incr
2.5
2
+DU
+Dom
1.5
+All
+Order
1
0.5
0
crafty
June 11, 2006
enscript
hypermail
Using Datalog and BDDs
for Program Analysis
monkey
166
Experimental Results
External Lock Analysis
Speed Comparison (Normalized to No Opts)
Speed relative to No Opts
6
5
No Opts
4
Incr
+DU
3
+Dom
+All
2
+Order
1
0
joeq
June 11, 2006
jgraph
jbidwatch
jedit
Using Datalog and BDDs
for Program Analysis
umldot
megamek
167
Experimental Results
Speed relative to Incr
SQL Injection Analysis
Speed Comparison (Normalized to Incr)
5
4.5
4
No Opts
3.5
3
Incr
+DU
2.5
2
1.5
1
0.5
0
+Dom
+All
+Order
personalblog
June 11, 2006
road2hibernate
snipsnap
Using Datalog and BDDs
for Program Analysis
roller
168
Related Work
• Datalog in Program Analysis
–
–
–
–
–
–
Specify as Datalog query [Ullman 1989]
Toupie system [Corsini 1993]
Demand-driven using magic sets [Reps 1994]
Program analysis with logic programming [Dawson 1996]
Crocopat system [Beyer 2003]
Modular class analysis [Besson 2003]
• BDDs in Program Analysis
–
–
–
–
Predicate abstraction [Ball 2000]
Shape analysis [Manevich 2002, Yavuz-Kahveci 2002]
Pointer Analysis [Zhu 2002, Berndl 2003, Zhu 2004]
Jedd system [Lhotak 2004]
June 11, 2006
Using Datalog and BDDs
for Program Analysis
169
Related Work
• BDD Variable Ordering
–
–
–
–
–
Variable ordering is NP-complete [Bollig 1996]
Interleaving [Fujii 1993]
Sifting [Rudell 1993]
Genetic algorithms [Drechsler 1995]
Machine learning for BDD orders [Grumberg 2003]
–
–
–
–
–
–
–
Semi-naïve evaluation [Balbin 1987]
Bottom-up evaluation [Ullman 1989, Ceri 1990, Naughton 1991]
Top-down evaluation with tabling [Tamaki 1986, Chen 1996]
Rule ordering [Ramakrishnan 1990]
Magic sets transformation [Bancilhon 1986]
Computing with BDDs [Iwaihara 1995]
Time and space guarantees [Liu 2003]
• Efficient Evaluation of Datalog
June 11, 2006
Using Datalog and BDDs
for Program Analysis
170
Program Analysis with bddbddb
• Context-sensitive Java
pointer analysis
• C pointer analysis
• Escape analysis
• Type analysis
• External lock analysis
• Finding memory leaks
• Interprocedural def-use
• Interprocedural mod-ref
• Object-sensitive analysis
• Cartesian product
algorithm
• Resolving Java reflection
• Bounds check elimination
• Finding race conditions
• Finding Java security
vulnerabilities
• And many more…
Performance better than handcoded!
June 11, 2006
Using Datalog and BDDs
for Program Analysis
171
Conclusion
• bddbddb: new paradigm in program analysis
– Datalog compiled into optimized BDD operations
– Efficiently and easily implement context-sensitive
analyses
– Easier to develop correct analyses
– Easily experiment with new ideas
– Growing library of program analyses
– Easily use and build upon work of others
• Available as open-source LGPL:
http://bddbddb.sourceforge.net
June 11, 2006
Using Datalog and BDDs
for Program Analysis
172
My Contribution (2)
bddbddb
(BDD-based deductive database)
– Pointer analysis in 6 lines of Datalog
(a database language)
• Hard to create & debug efficient BDD-based
algorithms (3451 lines, 1 man-year)
• Automatic optimizations in bddbddb
– Easy to create context-sensitive analyses
using pointer analysis results (a few lines)
– Created many analyses using bddbddb
June 11, 2006
Using Datalog and BDDs
for Program Analysis
173
Outline
• Pointer Analysis
– Problem Overview
– Brief History
– Pointer Analysis in Datalog
•
•
•
•
Context Sensitivity
Improving Performance
bddbddb: BDD-based deductive database
Experimental Results
– Analysis Time
– Analysis Memory
– Analysis Accuracy
• Conclusion
June 11, 2006
Using Datalog and BDDs
for Program Analysis
174
Performance is Tricky!
• Context-sensitive numbering scheme
– Modify BDD library to add special operations.
– Can’t even analyze small programs.
Time: 
• Improved variable ordering
– Group similar BDD variables together.
– Interleave equivalence relations.
– Move common subsets to edges of variable order.
• Incrementalize outermost loop
Time: 40h
– Very tricky, many bugs.
Time: 36h
– Reduces number of variables.
Time: 32h
• Factor away control flow, assignments
June 11, 2006
Using Datalog and BDDs
for Program Analysis
175
Java Security Vulnerabilities
Application
Name
blueblog
webgoat
blojsom
personalblog
snipsnap
road2hiberna
pebble
roller
Total
June 11, 2006
Classes
306
349
428
611
653
867
889
989
5356
Reported Errors
contextinsensitive
1
81
48
350
>321
15
427
>267
>1508
Using Datalog and BDDs
for Program Analysis
contextsensitive
1
6
2
2
27
1
1
1
41
Actual
Errors
1
6
2
2
15
1
1
1
29
176
due to V. Benjamin Livshits
Vulnerabilities Found
SQL
HTTP Cross-site
Path
Total
injection splitting scripting traversal
Header
Parameter
Cookie
Non-Web
Total
June 11, 2006
0
6
1
2
9
6
5
0
0
11
Using Datalog and BDDs
for Program Analysis
4
0
0
0
4
0
2
0
3
5
10
13
1
5
29
177
Summary of Contributions
• The first scalable context-sensitive subset-based pointer
analysis.
– Cloning-based technique using BDDs
– Clever context numbering
– Experimental results on the effects of context sensitivity
• bddbddb: new paradigm in program analysis
–
–
–
–
Efficiently and easily implement context-sensitive analyses
Datalog compiled into optimized BDD operations
Library of program analyses (with many others)
Active learning for BDD variable orders (with M. Carbin)
• Artifacts:
– Joeq compiler and virtual machine
– JavaBDD library and BuDDy library
– bddbddb tool
June 11, 2006
Using Datalog and BDDs
for Program Analysis
178
Conclusion
• The first scalable context-sensitive subset-based
pointer analysis.
– Accurate: Results for up to 1014 contexts.
– Scales to large programs.
• bddbddb: a new paradigm in prog analysis
– High-level spec  Efficient implementation
• System is publicly available at:
http://bddbddb.sourceforge.net
June 11, 2006
Using Datalog and BDDs
for Program Analysis
179