Transcript Slide 1

Chord: A Versatile Platform for
Program Analysis
Mayur Naik
Intel Labs, Berkeley
PLDI 2011 Tutorial
What is Chord?
• Static and dynamic program analysis framework for Java
• Started in 2006 as static Checker of races and deadlocks
• Publicly available under New BSD License
• Key goals:
–
–
–
–
versatile: applies to various analyses, domains, platforms
extensible: users can build own analyses atop given ones
productive: facilitates rapid prototyping of analyses
robust: deterministic, handles partial programs, etc.
Key Features of Chord
• Many standard static and dynamic analyses
• Writing/solving analyses using Datalog/BDDs
• Analyses as “building blocks”
• Context-sensitive static analysis framework
• Dynamic analysis framework
Outline of Tutorial
• Part 1:
• Getting Started With Chord
• Program Representation
• Part 2:
• Analysis Using Datalog/BDDs
• Chaining Analyses Together
• Part 3:
• Context-Sensitive Analysis
• Dynamic Analysis
Downloading Chord
•
Stable Binary Release
– http://jchord.googlecode.com/files/chord-bin-2.0.tar.gz
•
Stable Source Release
1. http://jchord.googlecode.com/files/chord-src-2.0.tar.gz (mandatory)
–
Chord’s source code + JARs of libraries used by Chord
2. http://jchord.googlecode.com/files/chord-libsrc-2.0.tar.gz (optional)
–
•
(adapted) Java source code of libraries used by Chord
Latest Development Snapshot
svn checkout http://jchord.googlecode.com/svn/trunk/ chord
Or checkout only relevant directories under trunk/:
–
–
–
–
main/
libsrc/
test/
…
(released as 1 above)
(released as 2 above)
(Chord’s regression test suite)
(many more)
Compiling Chord
• Requirements:
– JVM for Java 5 or higher
– Apache Ant
– C++ compiler
(not needed by default)
• Optional: edit chord.properties
– to enable C BuDDy library:
set chord.use.buddy=true
– to enable C++ JVMTI agent:
set chord.use.jvmti=true
main/
build.xml
chord.properties
agent/
bdd/
doc/
examples/
lib/
src/
web/
chord.jar
• Run in main directory:
ant compile
libbuddy.so | buddy.dll |
libbuddy.dylib
libchord_instr_agent.so
Running Chord
• Requirements: JVM for Java 5 or higher
• no other dependencies (e.g., Eclipse)
• Run either command in any directory:
• ant –f <...>/build.xml [–Dkeyi=vali]* run
• requires Apache Ant
• not available in Binary Release
• java –cp <…>/chord.jar [–Dkeyi=vali]* chord.project.Boot
where <…> denotes path of Chord’s main/ directory
–Dkeyi=vali sets value of system property keyi to vali
Chord Properties
• All inputs to Chord are specified via System Properties
• conventionally named chord.* (e.g., chord.work.dir)
•
Three choices with decreasing precedence:
1. On command line via –Dkey=val format
•
use to specify properties specific to the current Chord run
2. Via user-specified file denoted by chord.props.file
•
use to specify properties specific to program being analyzed
(e.g. its main class, classpath, etc.)
•
default value = "[chord.work.dir]/chord.properties"
3. Via pre-defined file main/chord.properties
•
use to specify properties that must hold in every Chord run
(e.g., maximum memory to be used by JVM)
Architecture of Chord
starts, blocks
resumes,
runs
example
program analysis
D1
toon
finish
program
quadcode
bytecode
translator
(joeq)
Java program
program
bytecode
program
inputs
program
source
starts, runs
to finish
dynamic
analysis
bytecode
instrumentor
(javassist)
starts, blocks
resumes,
runs
D1
toon
finish
Java2HTML
starts, runs
to finish
domain D1
analysis
relation R12
analysis
domain D2
analysis
domain D1
relation R12
domain D2
relation
R1
Datalog
analysis
relation
R2
static
analysis
bddbddb
BuDDy
starts,
blocks on
user demands
resumes,
Classic
or
Modern
Runtime
D
, Rfinish
this to run
runs
1, D2to
1, R12
analysis result
in HTML
saxon XSLT
starts,
resumes,
blocks
runs
on to
R2,finish
D2
analysis result
in XML
Setting Up a Java Program for Analysis
example/
src/
foo/
classes/
foo/
lib/
src/
jar/
Main.java
...
Command to run in Chord’s main directory:
ant –Dchord.work.dir=<…>/example run
Main.class
...
taz/
...
taz.jar
chord.properties
chord_output/
bddbddb/
chord.main.class=foo.Main
chord.class.path=classes:lib/jar/taz.jar
chord.src.path=src:lib/src
chord.run.ids=0,1
chord.args.0="-thread 1 -n 10"
chord.args.1="-thread 2 -n 50"
Java Program Representations
Java source code
.java
javac
Java bytecode
.class
javap
Disassembled
Java bytecode
Example: Java Source Code
File test/HelloWorld.java:
1: package test;
2:
3: public class HelloWorld {
4:
public static void main(String[] args) {
5:
System.out.print("Hello World!");
6:
}
7: }
Pretty-Printing Java Bytecode
javap –private –verbose –classpath <CLASS_PATH>
[–bootclasspath <BOOT_CLASS_PATH>] <CLASS_NAME>
public class test.HelloWorld extends java.lang.Object
SourceFile: "HelloWorld.java"
Constant pool:
const #1 = Method #6.#20; // java/lang/Object."<init>":()V
...
public static void main(java.lang.String[]);
Code:
Stack=2, Locals=1, Args_size=1
0: getstatic #2; // Field java/lang/System.out:Ljava/io/PrintStream;
3: ldc #3; // String Hello World!
5: invokevirtual #4; // Method java/io/PrintStream.println:...
8: return
LineNumberTable:
Run "javac –g" on .java files to keep debug
line 5: 0
info (lines, vars, source) in .class files
line 6: 8
LocalVariableTable:
Start Length Slot Name Signature
0
9
0
args [Ljava/lang/String;
Java Program Representations
Java source code
.java
javac
Java bytecode
.class
javap
Disassembled
Java bytecode
Joeq
Quadcode
Pretty-Printing Quadcode
ant –Dchord.work.dir=<WORK_DIR> –Dchord.out.file=<OUTPUT_FILE>
–Dchord.print.classes=<CLASS_NAMES> –Dchord.verbose=0 run
Class: test.HelloWorld
Method: main:([Ljava/lang/String;)[email protected]
0#1 5#3 5#2 8#4
Control flow graph:
BB0 (ENTRY) (in: <none>, out: BB2)
BB2 (in: BB0 (ENTRY), out: BB1 (EXIT))
1: GETSTATIC_A T1, .out
3: MOVE_A T2, AConst: "Hello World!"
2: INVOKEVIRTUAL_V
println:(Ljava/lang/String;)[email protected], (T1,T2)
4: RETURN_V
BB1 (EXIT) (in: BB2, out: <none>)
Exception handlers: []
Register factory: Registers: 3
Alternative options:
–Dchord.print.methods=<METHOD_SIGNS> Replace any `$` by `#` to
prevent shell interpretation
–Dchord.print.all.classes=true
Type Hierarchy
jq_Type
jq_Primitive
jq_Reference
jq_Class
(all defined in package joeq.Class)
jq_Array
chord.program.Program API
• static Program g()
• fully-qualified name of the class, e.g., "java.lang.String[]"
• IndexSet<jq_Type> getTypes()
• all types in classes that may be loaded
• IndexSet<jq_Reference> getClasses()
• all classes that may be loaded
• IndexSet<jq_Method> getMethods()
• all methods that may be called
joeq.Class.jq_Class API
• String getName()
• fully-qualified name of the class, e.g., "java.lang.String[]"
• jq_InstanceField[] getDeclaredInstanceFields()
• all instance fields declared in the class
• jq_StaticField[] getDeclaredStaticFields()
• all static fields declared in the class
• jq_InstanceMethod[] getDeclaredInstanceMethods()
• all instance methods declared in the class
• jq_StaticMethod[] getDeclaredStaticMethods()
• all static methods declared in the class
joeq.Class.jq_Method API
• String getName().toString()
• name of the method
• String getDesc().toString()
• descriptor of the method, e.g., "(Ljava/lang/String;)V"
• jq_Class getDeclaringClass()
• declaring class of the method
• ControlFlowGraph getCFG()
• control-flow graph of the method
• Quad getQuad(int bci)
• first quad at the given bytecode offset (null if missing)
• int getLineNumber(int bci)
• line number of the given bytecode offset (-1 if missing)
• String toString()
• ID of the method in format mName:mDesc@cName
Control Flow Graphs (CFGs)
• Each CFG contains:
• a set of registers (register factory)
• a directed graph whose nodes are basic blocks and
edges denote control flow
• Register Factory:
• one register per argument (local variables)
• named R0, R1, …, Rn
• one register per temporary (stack variables)
• named Tn+1, Tn+2, …, Tm
• Basic Block (BB):
• sequence of primitive statements (quads)
• unique entry BB: no quads and no incoming edges
• unique exit BB: no quads and no outgoing edges
joeq.Compiler.Quad.ControlFlowGraph API
• RegisterFactory getRegisterFactory()
• set of all local variables
• EntryOrExitBasicBlock entry()
• unique entry basic block
• EntryOrExitBasicBlock exit()
• unique exit basic block
• List<BasicBlock> reversePostOrder ()
• List of all basic blocks in reverse post-order
• jq_Method getMethod()
• containing method of the CFG
joeq.Compiler.Quad.BasicBlock API
• int size()
• number of quads in the basic block
• Quad getQuad(int index)
• quad at the given 0-based index
• List<BasicBlock> getPredecessors()
• list of immediate predecessor basic blocks
• List<BasicBlock> getSuccessors()
• list of immediately successor basic blocks
• jq_Method getMethod()
• containing method of the basic block
Quad Instructions
• Each quad contains an operator and upto 4 operands
• Example: getfield l = b.f:
Operand lo = Getfield.getDest(q);
Operand bo = Getfield.getBase(q);
if (lo instanceof RegisterOperand &&
bo instanceof RegisterOperand) {
Register l = ((RegisterOperand) lo).getRegister();
Register b = ((RegisterOperand) bo).getRegister();
jq_Field f = Getfield.getField(q).getField();
...
}
Kinds of Quads
joeq.Compiler.Quad.Operator
Move
Phi
Unary
Binary
New
NewArray
MultiNewArray
Alength
Monitor
Getstatic
Putstatic
Getfield
Putfield
ALoad
AStore
Checkcast
Instanceof
Return
Branch
Invoke
IntIfCmp
InvokeVirtual
Goto
InvokeStatic
Jsr
InvokeInterface
Ret
LookupSwitch
TableSwitch
joeq.Compiler.Quad.Quad API
• Operator getOperator()
• kind of the quad
• int getBCI()
• bytecode offset of the quad in its containing method
• String toByteLocStr()
• unique identifier of the quad in format offset!mName:mDesc@cName
• String toJavaLocStr()
• location of the quad in format fileName:lineNum in Java source code
• String toLocStr()
• location of the quad in both Java bytecode and source code
• String toVerboseStr()
• verbose description of the quad (its location plus contents)
• BasicBlock getBasicBlock()
• containing basic block of the quad
Traversing Quadcode
import chord.program.Program;
import joeq.Class.jq_Method;
import joeq.Compiler.Quad.*;
QuadVisitor qv = new QuadVisitor.EmptyVisitor() {
public void visitNew(Quad q) { ... }
public void visitPhi(Quad q) { ... }
...
};
Program program = Program.g();
for (jq_Method m : program.getMethods()) {
if (!m.isAbstract()) {
ControlFlowGraph cfg = m.getCFG();
for (BasicBlock bb : cfg.reversePostOrder())
for (Quad q : bb.getQuads())
q.accept(qv);
}
}
Java Program Representations
Java source code
.java
j2h
Java2HTML
HTMLized
Java source code
.html
javac
Java bytecode
.class
javap
Disassembled
Java bytecode
Joeq
Quadcode
HTMLizing Java Source Code
•
Programmatically:
import chord.program.Program;
Program program = Program.g();
program.HTMLizeJavaSrcFiles();
•
From command line:
1. Use j2h:
ant –Djava.dir=<JAVA_DIR> –Dhtml.dir=<HTML_DIR> j2h_xref
2. Use Java2HTML:
ant –Djava.dir=<JAVA_DIR> –Dhtml.dir=<HTML_DIR> j2h_fast
Java Program Representations
Java source code
.java
j2h
Java2HTML
HTMLized
Java source code
.html
javac
Java bytecode
.class
javap
Disassembled
Java bytecode
Joeq
Quadcode
Jasmin
Chord
Jasmin code
.j
Analysis Scope Construction
• Determines which parts of the program to analyze
• Computed in either of these cases:
• chord.build.scope=true
• chord.program.Program.g() is called
• Algorithm specified by chord.scope.kind=[rta|cha|dynamic]
• Rapid Type Analysis (RTA)
• Class Hierarchy Analysis (CHA)
• Dynamic Analysis
• All three algorithms require specifying:
• chord.main.class=<MAIN CLASS>
• chord.class.path=<CLASSPATH>
Analysis Scope Representation
• Reachable Methods
• stored in file specified by chord.methods.file
(default = "[chord.out.dir]/methods.txt")
• Resolved Reflection
mname:mdesc@cname
...
• stored in file specified by chord.reflect.file
(default = "[chord.out.dir]/reflect.txt")
Class Class.forName(String)
# resolvedClsForNameSites
...
Object Class.newInstance()
# resolvedObjNewInstSites
...
Object Constructor.newInstance(Object[])
# resolvedConNewInstSites
...
Object Array.newInstance(Class, int)
# resolvedAryNewInstSites
...
bci!mname:mdesc@cname->cname1,cname2,...,cnameN
Rapid Type Analysis (RTA)
• Preferred (and default) scope construction algorithm
• Allows specifying reflection resolution via
chord.reflect.kind=[none|static|dynamic]
• Preferred way to resolve reflection is ‘dynamic’ and
requires specifying how to run program:
• chord.run.args=id1,…,idN
• chord.args.id1=<ARGS1>, …, chord.args.idN=<ARGSN>
Dynamic Analysis Based Scope Construction
• Runs program and observes which classes are loaded
• Requires JVMTI (set chord.use.jvmti=true in file
main/chord.properties)
• Requires specifying how to run program:
• chord.run.args=id1,…,idN
• chord.args.id1=<ARGS1>, …, chord.args.idN=<ARGSN>
• All methods of each loaded class are deemed reachable
• Currently no support for reflection resolution
Additional Analysis Scope Features
• Scope Reuse
• Enables using scope constructed by a previous run of Chord
• Constructs scope from files specified by chord.methods.file
and chord.reflect.file
• Specified via chord.reuse.scope=true
• Scope Exclusion
• Enables excluding certain classes from scope
• Treats all methods in such classes as no-ops
• Specified via three properties:
1. chord.std.scope.exclude (default = "")
2. chord.ext.scope.exclude (default = "")
3. chord.scope.exclude (default =
"[chord.std.scope.exclude],[chord.ext.scope.exclude]")
Native Method Stubs
• Specified in file main/src/chord/program/stubs/stubs.txt
in format:
mname:mdesc@cname stub_cname
where stub_cname denotes a class implementing:
public interface joeq.Compiler.Quad.ICFGBuilder {
public ControlFlowGraph run(jq_Method m);
}
• Example:
start:()[email protected] chord.program.stubs.ThreadStartCFGBuilder
Example Native Method Stub
void start() {
this.run();
return;
}
public ControlFlowGraph run(jq_Method m) {
jq_Class c = m.getDeclaringClass();
jq_Method n = c.getDeclaredInstanceMethod(
new jq_NameAndDesc("run", "()V"));
RegisterFactory f = new RegisterFactory(0, 1);
Register r = f.getOrCreateLocal(0, c);
ControlFlowGraph cfg = new ControlFlowGraph(m, 1, 0, f);
Quad q1 = Invoke.create(0, m, Invoke.INVOKEVIRTUAL_V.INSTANCE,
null, new MethodOperand(n), 1);
Invoke.setParam(q1, 0, new RegisterOperand(r, c));
Quad q2 = Return.create(1, m, RETURN_V.INSTANCE);
BasicBlock bb = cfg.createBasicBlock(1, 1, 2, null);
bb.appendQuad(q1); bb.appendQuad(q2);
BasicBlock eb = cfg.entry(), xb = cfg.exit();
eb.addSuccessor(bb); bb.addPredecessor(eb);
bb.addSuccessor(xb); xb.addPredecessor(bb);
return cfg;
}
Outline of Tutorial
• Part 1:
• Getting Started With Chord
• Program Representation
• Part 2:
• Analysis Using Datalog/BDDs
• Chaining Analyses Together
• Part 3:
• Context-Sensitive Analysis
• Dynamic Analysis
Program Domain
• Building block for analyses based on Datalog/BDDs
• Represents an indexed set of values of a fixed kind
• typically artifacts from program being analyzed (e.g., set of
all methods in the program)
• Assigns unique 0-based index to each value
•
•
•
•
everything in Datalog/BDDs must be numbered
indices given in order in which values are added
order affects efficiency of running analysis on large sets
initial indices (0, 1, ...) typically given to frequently-used
values (e.g., the main method)
• O(1) access to value given index, and vice versa
Example Predefined Program Domains
Name
Description
Defining Class
T
types
chord.analyses.type.DomT
M
methods
chord.analyses.method.DomM
F
fields
chord.analyses.field.DomF
V
variables of ref type
chord.analyses.var.DomV
P
quads (program points)
chord.analyses.point.DomP
H
object allocation quads
chord.analyses.alloc.DomH
I
method call quads
chord.analyses.invk.DomI
E
heap-accessing quads
chord.analyses.heapacc.DomE
A
abstract threads
chord.analyses.alias.DomA
C
abstract method contexts
chord.analyses.alias.DomC
O
abstract objects
chord.analyses.alias.DomO
Writing a Program Domain Analysis
package chord.analyses.method;
@Chord(name = "M")
public class DomM extends ProgramDom<jq_Method> {
@Override public void fill() {
Program program = Program.g();
add(program.getMainMethod());
jq_Method start = program.getThreadStartMethod();
if (start != null) add(start);
for (jq_Method m : program.getMethods()) add(m);
}
}
Domain M: all methods in the program
– main method has index 0
– java.lang.Thread.start() method has index 1
Running a Program Domain Analysis
package chord.analyses.method;
@Chord(name = "M")
public class DomM extends ProgramDom<jq_Method> {
@Override public void fill() {
Program program = Program.g();
add(program.getMainMethod());
jq_Method start = program.getThreadStartMethod();
if (start != null) add(start);
for (jq_Method m : program.getMethods()) add(m);
}
}
ant –Dchord.work.dir=<…> –Dchord.run.analyses=M run
Running a Program Domain Analysis
package chord.analyses.method;
@Chord(name = "M")
public class DomM extends ProgramDom<jq_Method> {
@Override public void fill() {
Program program = Program.g();
add(program.getMainMethod());
jq_Method start = program.getThreadStartMethod();
if (start != null) add(start);
for (jq_Method m : program.getMethods()) add(m);
}
}
M.map
main:([Ljava/lang/String;)V@Bldg
start:()[email protected]
<init>:()V@Bldg
…
M.dom
M <N> M.map
chord_output/
bddbddb/
<N>
chord.project.analyses.ProgramDom<T> API
• void setName(String name)
• set name of domain
• boolean add(T val)
• add value to domain if not present; return true if added
• int getOrAdd(T val)
• add value to domain if not present; return its index in either case
• void save()
• save domain to disk (.dom and .map files)
• String toUniqueString(T val)
• unique string representation of value
• int size()
• number of values in domain
Note: values once added
cannot be removed!
• T get(int index)
• value having the given index; IndexOutofBoundsEx if not found
• int indexOf(T val)
• index of given value; -1 if not found
Program Relation
• Building block for analyses based on Datalog/BDDs
• Represents a set of tuples over one or more fixed
program domains
• Represented symbolically as a BDD
• enables storing and manipulating large relations efficiently
• Provides various relational operations
• projection, selection, join, etc.
• BDD size and efficiency of operations depends heavily
on encoding of relation content as opposed to size
• ordering of values within program domains
• relative ordering between program domains
Writing a Program Relation Analysis
package chord.analyses.invk;
@Chord(name = "MI", sign = "M0,I0:M0_I0")
public class RelMI extends ProgramRel {
@Override public void fill() {
DomI domI = (DomI) doms[1];
for (Quad q : domI) {
jq_Method m = q.getMethod();
add(m, q);
}
}
}
Relation MI: tuples (m, i) such that method m contains call i
•
M0,I0: Domain names
• Order mnemonically (hard to
change over time)
• Suffix 0, 1, etc. distinguishes
repeating domains
•
M0_I0: Domain order
• Only dictates performance
• Can also be I0_M0 or I0xM0
• Easy to change over time
Writing a Program Relation Analysis
package chord.analyses.var;
@Chord(name = "VT", sign = "V0,T0:T0_V0")
public class RelVT extends ProgramRel {
@Override public void fill() {
for (each RegisterOperand o of each quad) {
Register v = o.getRegister();
jq_Type t = o.getType();
add(v, t);
}
}
}
Relation VT: tuples (v, t) such that local variable v has type t
Running a Program Relation Analysis
package chord.analyses.var;
@Chord(name = "VT", sign = "V0,T0:T0_V0")
public class RelVT extends ProgramRel {
@Override public void fill() {
for (each RegisterOperand o of each quad) {
Register v = o.getRegister();
jq_Type t = o.getType();
add(v, t);
}
}
}
ant –Dchord.work.dir=<…> –Dchord.run.analyses=VT run
Running a Program Relation Analysis
package chord.analyses.var;
@Chord(name = "VT", sign = "V0,T0:T0_V0")
public class RelVT extends ProgramRel {
@Override public void fill() {
for (each RegisterOperand o of each quad) {
Register v = o.getRegister();
jq_Type t = o.getType();
add(v, t);
}
# V0:2 T0:2
}
#12
}
#34
chord_output/
bddbddb/
V.dom, T.dom,
V.map, T.map
VT.bdd
6
2
7
6
5
4
3
2
4
1
4
3
3
2
2
1
4
0
7
0
5
6
3
3
1
1
7
0
5
4
Program Relation as Binary Function
V
Variable v0 has types t1, t2, t3
Variable v1 has type t3
Variable v2 has type t3
Relation VT = {
(0, 1), (0, 2), (0, 3),
(1, 3),
(2, 3)
}
b1
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
T
b2
0
0
0
0
1
1
1
1
0
0
0
0
1
1
1
1
b3
0
0
1
1
0
0
1
1
0
0
1
1
0
0
1
1
b4
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
f
0
1
1
1
0
0
0
1
0
0
0
1
0
0
0
0
BDD: Binary Decision Diagrams (Bryant 1986)
0 edge
1 edge
b1
b2
b2
b3
b3
b4
0
b4
1
1
b3
b4
1
0
b4
0
0
b3
b4
1
0
b4
0
0
b4
1
0
b4
0
Graphical Encoding of a Binary Function
0
0
BDD: Collapsing Redundant Nodes
0 edge
1 edge
b1
b2
b2
b3
b3
b4
0
b4
1
1
b3
b4
1
0
b4
0
0
b3
b4
1
0
b4
0
0
b4
1
0
b4
0
0
0
BDD: Collapsing Redundant Nodes
0 edge
1 edge
b1
b2
b2
b3
b4
b3
b4
b4
b3
b4
0
b4
b3
b4
1
b4
b4
BDD: Collapsing Redundant Nodes
0 edge
1 edge
b1
b2
b3
b2
b3
b4
b3
b4
0
b4
1
b3
BDD: Collapsing Redundant Nodes
0 edge
1 edge
b1
b2
b2
b3
b3
b4
b4
0
b3
b4
1
BDD: Eliminating Unnecessary Nodes
0 edge
1 edge
b1
b2
b2
b3
b3
b4
b4
0
b3
b4
1
BDD: Eliminating Unnecessary Nodes
0 edge
1 edge
b1
b2
b2
b3
b3
b4
0
1
BDD Representation on Disk
2
chord_output/
b1
bddbddb/
V.dom, T.dom,
V.map, T.map
3
4
b2
b2
VT.bdd
# internal
nodes
BDD
variable
order
# V0:2 T0:2
# b1 b2
# b3 b4
64
b2 b1 b4 b3
7 b4 0 1
6 b3 7 1
5 b3 0 7
4 b2 5 0
3 b2 6 5
2 b1 3 4
5
6
b3
b3
7
# BDD
variables
b4
0
1
One entry per internal node of form:
<nodeId, varId, loNodeId, hiNodeId>
BDD Variable Order is Important
b1b2 + b3b4
b1
b1
b2
b3
b3
b2
b4
0
b3
b2
b4
1
b1 < b2 < b3 < b4
0
1
b1 < b3 < b2 < b4
chord.project.analyses.ProgramRel<T> API
• void setName(String name)
• set name of relation
• void setSign(RelSign sign)
• set signature (domain names and order) of relation
• void setDoms(Dom[] doms)
• set domains of relation
• void zero() or one()
• initialize contents of relation to zero (no tuples) or one (all tuples)
• void add(T1 e1, …, TN eN)
• add tuple (e1, …, eN) to relation
• void remove(T1 e1, …, TN eN)
• remove tuple (e1, …, eN) from relation
• void save()
• save contents of relation to disk
chord.project.analyses.ProgramRel<T> API
• void load()
• load contents of relation from disk
• Iterable<T1,…,TN> getAryNValTuples()
• iterate over all tuples in the relation
• int size()
• number of tuples in the relation
• boolean contains(T1 e1, …, TN eN)
• does relation contain tuple (e1, …, eN)?
• RelView getView()
• obtain a copy of the relation upon which to do projection, selection, etc.
without affecting original relation
• void close()
• free memory used to hold relation
Pointer Analysis
• Answers which pointers can point to which objects at run-time
• Central to many program optimization & verification problems
• Problem is undecidable
• No exact (i.e. both sound and complete) solution
• But many conservative (i.e. sound) approximate solutions exist
• Determine which pointers may point to which objects
• All are incomplete but differ in precision (i.e. false-positive rate)
• Continues to be active area of research
Example
class Bldg {
List events, floors;
static void main(String[] a) {
Bldg b = new Bldg();
}
Bldg() {
List el = new List();
this.events = el;
List fl = new List();
this.floors = fl;
for (int i = 0; i < K; i++)
Event e = new Event();
el.elems[i] = e;
for (int i = 0; i < M; i++)
Floor f = new Floor();
fl.elems[i] = f;
}
}
disjoint-reach(el, fl)?
class List {
Obj[] elems;
List() {
Obj[] a = new Obj[…];
this.elems = a;
}
b
}
Bldg
floors
events
List
List
elems
elems
Obj[]
Obj[]
el
a
0
1
Event Event
e
e
0
fl
a
1
Floor Floor
f
f
0-CFA Pointer Analysis for Java
•
Flow sensitivity
• flow-insensitive: ignores intra-procedural control flow
•
Call graph construction
•
Heap abstraction
•
Aggregate modeling
•
Context sensitivity
Example: Flow Insensitivity
class Bldg {
class List {
List events, floors;
Obj[] elems;
static void main(String[] a) {
List() {
Bldg b = new Bldg();
Obj[] a = new Obj[…];
}
this.elems = a;
Bldg() {
}
}
List el = new List();
this.events = el;
List fl = new List();
this.floors = fl;
for (int i = 0; i < K; i++)
Event e = new Event();
el.elems[i
*] = e;
for (int i = 0; i < M; i++)
Floor f = new Floor();
fl.elems[*
i] = f;
}
}
0-CFA Pointer Analysis for Java
•
Flow sensitivity
• flow-insensitive: ignores intra-procedural control flow
•
Call graph construction
• “on-the-fly”: mutually recursively with pointer analysis
•
Heap abstraction
•
Aggregate modeling
•
Context sensitivity
Example: Call Graph (Base Case)
class Bldg {
List events, floors;
static void main(String[] a) {
Bldg b = new Bldg();
}
Bldg() {
List el = new List();
this.events = el;
List fl = new List();
this.floors = fl;
for (int i = 0; i < K; i++)
Event e = new Event();
el.elems[*] = e;
for (int i = 0; i < M; i++)
Floor f = new Floor();
fl.elems[*] = f;
}
}
class List {
Obj[] elems;
List() {
Obj[] a = new Obj[…];
this.elems = a;
}
}
Code deemed reachable
so far …
reachableM(0).
0-CFA Pointer Analysis for Java
•
Flow sensitivity
• flow-insensitive: ignores intra-procedural control flow
•
Call graph construction
• “on-the-fly”: mutually recursively with pointer analysis
•
Heap abstraction
• allocation sites: objects at same site indistinguishable
•
Aggregate modeling
•
Context sensitivity
Example: Heap Abstraction
class Bldg {
List events, floors;
static void main(String[] a) {
Bldg b = new1 Bldg();
}
Bldg() {
List el = new2 List();
this.events = el;
List fl = new3 List();
this.floors = fl;
for (int i = 0; i < K; i++)
Event e = new4 Event();
el.elems[*] = e;
for (int i = 0; i < M; i++)
Floor f = new5 Floor();
fl.elems[*] = f;
}
}
class List {
Obj[] elems;
List() {
Obj[] a = new6 Obj[…];
this.elems = a;
}
}
Rule for Object Allocation Sites
…
• Before:
v
newh’
…
v = newh
…
…
newh’
…
• After:
v
newh
VH(v, h) :- reachableM(m), MobjValAsgnInst(m, v, h).
Rule for Copy Assignments
v1
newh’
…
…
• Before:
v2
newh
…
newh
…
v2
v1 = v2
…
newh’
…
…
• After:
v1
…
newh
VH(v1, h) :- reachableM(m), MobjVarAsgnInst(m, v1, v2), VH(v2, h).
0-CFA Pointer Analysis for Java
•
Flow sensitivity
• flow-insensitive: ignores intra-procedural control flow
•
Call graph construction
• “on-the-fly”: mutually recursively with pointer analysis
•
Heap abstraction
• allocation sites: objects at same site indistinguishable
•
Aggregate modeling
• instance field sensitive but array element insensitive
•
Context sensitivity
Rule for Heap Writes
newh1
v
f
newh1
newh3
…
…
…
b.f = v
newh2
…
b
…
…
• Before:
f is instance field or [*] (array element)
…
newh1
v
newh2
newh1
…
…
f
… …
b
…
…
• After:
f
newh3
newh2
…
HFH(h1, f, h2) :- reachableM(m), MputInstFldInst(m, b, f, v),
VH(b, h1), VH(v, h2).
Rule for Heap Reads
b
newh1
f
newh2
…
…
…
v = b.f
newh1
…
newh
…
…
• Before:
v
f is instance field or [*] (array element)
…
newh
b
newh1
newh1
f
…
VH(v, h2) :- reachableM(m), MgetInstFldInst(m, v, b, f),
VH(b, h1), HFH(h1, f, h2).
newh2
…
…
newh2
…
…
v
… …
• After:
0-CFA Pointer Analysis for Java
•
Flow sensitivity
• flow-insensitive: ignores intra-procedural control flow
•
Call graph construction
• “on-the-fly”: mutually recursively with pointer analysis
•
Heap abstraction
• allocation sites: objects at same site indistinguishable
•
Aggregate modeling
• instance field sensitive but array element insensitive
•
Context sensitivity
• context-insensitive: ignores inter-procedural control flow
(analyzes each method in single context)
Rule for Dynamically Dispatching Calls
…
• Before:
newh T
v
…
i
Tn.bar() { …;
v.foo()
CHA(T, foo) =
; …; }
…
}
Tm.foo() {
…
}
…
Tm.foo() {
• After:
v
newh T
…
i
Tn.bar()
Tm.foo()
IM(i, m) :- reachableM(n), MI(n, i), virtIM(i, m’),
IinvkArg0(i, v), VH(v, h), HT(h, t), CHA(t, m’, m).
reachableM(m) :- IM(_, m).
Writing a Datalog Analysis
#name=cipa-0cfa-dlog
.include "V.dom"
.include "T.dom"
...
program domains
BDD variable order
.bddvarorder M0xI0_F0_V0xV1_T0_H0xH1
VT(v:V0, T0) input
reachableM(m:M0)
FH(f:F0, h:H0) output
VH(v:V0, h:H0) output
HFH(h1:H0, f:F0, h2:H1) output
IM(i:I0, m:M0) output
...
reachableM(m) :- IM(_, m).
...
input, intermediate, output
program relations
represented as BDDs
analysis constraints
(Horn clauses)
solved via BDD operations
Running a Datalog Analysis
#name=cipa-0cfa-dlog
.include "V.dom"
.include "T.dom"
...
.bddvarorder M0xI0_F0_V0xV1_T0_H0xH1
chord_output/
bddbddb/
V.dom, T.dom,
V.map, T.map
VT(v:V0, T0) input
reachableM(m:M0)
FH(f:F0, h:H0) output
VH(v:V0, h:H0) output
HFH(h1:H0, f:F0, h2:H1) output
IM(i:I0, m:M0) output
...
VT.bdd
reachableM(m) :- IM(_, m).
...
IM.bdd
reachableM.bdd
FH.bdd
VH.bdd
HFH.bdd
ant –Dchord.work.dir=<…> –Dchord.run.analyses=cipa-0cfa-dlog run
Example
class Bldg {
class List {
List events, floors;
Obj[] elems;
static void main(String[] a) {
List() {
1
Bldg b = new Bldg();
Obj[] a = new6 Obj[…];
}
this.elems = a;
Bldg() {
}
2
}
List el = new List();
1
b
this.events = el;
2,3
List fl = new3 List();
this.floors = fl;
new1 Bldg
for (int i = 0; i < K; i++)
el
fl
Event e = new4 Event();
events floors
el.elems[*] = e;
new2 List
new3 List
for (int i = 0; i < M; i++)
Floor f = new5 Floor();
elems elems
fl.elems[*] = f;
}
}
new6 Obj[]
[*] a [*]
new4 Event
new5 Floor
f
e
Printing Program Relations (Command Line)
ant –Dwork.dir=<…>/chord_output/bddbddb –Ddlog.file=a.dlog solve
Relation rVV:
el!<init>:()V@Bldg, fl!<init>:()V@Bldg
...
disjoint-reach(el, fl)?
b
File a.dlog:
.include "V.dom"
.include "H.dom"
.include "F.dom"
.bddvarorder ...
new1 Bldg
el
events floors
new2 List
elems
VH(v:V0, h:H0) input
HFH(h1:H0, f:F0, h2:H1) input
rVH(v:V0, h:H0)
rVV(v1:V0, v2:V1) printtuples
rVH(v, h) :- VH(v, h).
rVH(v, h) :- rVH(v, h’), HFH(h’, _, h).
rVV(v1, v2) :- v1<v2, rVH(v1, h), rVH(v2, h).
fl
new3 List
elems
new6 Obj[]
[*] a [*]
new4 Event
new5 Floor
f
e
Querying Program Relations (Command Line)
ant –Dwork.dir=<…>/chord_output/bddbddb –Ddlog.file=q.dlog debug
File q.dlog:
.include "V.dom"
.include "H.dom"
.include "F.dom"
b
.bddvarorder ...
VH(v:V0, h:H0) input
HFH(h1:H0, f:F0, h2:H1) input
File V.map:
b!main:(…)@Bldg
...
File H.map:
null
1!main:(…)@Bldg
2!<init>:()V@Bldg
3!<init>:()V@Bldg
...
new1 Bldg
el
events floors
new2 List
elems
prompt> VH(0,h)?
1!main:(…)@Bldg
prompt> HFH(1,_,h)?
2!<init>:()V@Bldg
3!<init>:()V@Bldg
fl
new3 List
elems
new6 Obj[]
[*] a [*]
new4 Event
new5 Floor
f
e
Pros and Cons of Datalog/BDDs
1.
Good for rapidly crafting initial versions of analysis with focus
on false positive/negative rate instead of scalability
2.
Good for analyses …
3.
4.
1.
whose constraint solving strategy is not obvious (e.g. best known
alternative is chaotic iteration)
2.
on data with lots of redundancy and too large to compute/store/read
using Java if represented explicitly (e.g. cloning-based analyses)
3.
involving few simple rules (e.g. transitive closure)
Bad for analyses …
1.
with more complicated formulations (e.g. summary-based analyses)
2.
over domains not known exactly in advance (i.e. on-the-fly analyses)
3.
involving many interdependent rules (e.g. points-to analyses)
Unintuitive effects of BDDs on performance (e.g. k-CFA: small
non-uniform k across sites worse than large uniform k)
Writing an Analysis in Chord
• Declaratively in Datalog or imperatively in Java
• Datalog analysis is any file that:
• has extension .dlog or .datalog
• occurs in path specified by property chord.dlog.analysis.path
• Java analysis is any class that:
• is annotated with @Chord
• occurs in path specified by property chord.java.analysis.path
Writing a Java Analysis
• Create subclass of chord.project.analyses.JavaAnalysis:
mandatory
@Chord(name = "my-java",
field
consumes = { "C1", ..., "Cm" },
produces = { "P1", ..., "Pn" },
namesOfTypes = { “T1", ..., “Tk" },
target types
types = { T1.class, ..., Tk.class },
not inferable
namesOfSigns = { "S1", ..., "Sr" },
otherwise
signs = { "...", ..., "..." })
public class MyAnalysis extends JavaAnalysis { relation signs
@Override public void run() { ... }
not inferable
}
otherwise
• Compile above class to a location in path specified by any of:
Property name
Default value
chord.std.java.analysis.path
"chord.jar"
chord.ext.java.analysis.path
""
chord.java.analysis.path
concat. of above two property values
Chord Project
• Global entity for organizing all analyses and their inputs
and outputs (collectively called analysis results)
• Computed if chord.project.Project.g() is called
• Consists of set of each of:
• analyses called tasks
• analysis results called targets
• data/control dependencies between tasks and targets
• Either of two kinds chosen by chord.classic=[true|false]:
• chord.project.ClassicProject (this tutorial)
• only data dependencies, can only run tasks sequentially
• chord.project.ModernProject (ongoing)
• data and control dependencies, can run tasks in parallel
Computing a Chord Project
• Compute all tasks:
• Each file with extension .dlog/.datalog in chord.dlog.analysis.path
• Each class having annotation @Chord in chord.java.analysis.path
• Compute all targets:
• Each target consumed or produced by some task
• Compute dependency graph:
• Nodes are all tasks and targets
• Edge from target C to task T if T consumes C
• Edge from task T to target P if T produces P
• Perform consistency checks
• Error if target has no type or has multiple types, error if relation
has no sign, warn if target produced by multiple tasks, etc.
Example: Chord Project
Each task has form { C1, …, Cm } T { P1, …, Pn } where:
– T is name of task
– C1, …, Cm are names of targets consumed by the task
– P1, …, Pn are names of targets produced by the task
T1
{} T1 { R1 }
T2
R1
T3
R2
{} T2 { R1 }
{ R4} T3 { R2 }
T4
{ R1, R2 } T4 { R3, R4 }
R3
R4
Running a Java Analysis
ant –Dchord.work.dir=<…> –Dchord.run.analyses=my-java run
@Chord(name = "my-java",
consumes = { "C1", ..., "Cm" },
produces = { "P1", ..., "Pn" }
)
public class MyAnalysis extends JavaAnalysis {
@Override public void run() { ... }
}
•
If done bit of this analysis is 1: do nothing
•
Else do the following in order:
•
For each of C1, …, Cm whose done bit is 0:
•
Recursively run unique analysis producing it
•
Report runtime error if none or multiple such analyses exist
•
Execute run() method of this analysis
•
Set done bits of this analysis and P1, …, Pn to 1
Running a Java Analysis
T1
{} T1 { R1 }
{} T2 { R1 }
T2
R1
{ R4} T3 { R2 }
T3
R2
T4
{ R1, R2 } T4 { R3, R4 }
R3
R4
ant –Dchord.work.dir=<…> –Dchord.run.analyses=T1,T4 run
Predefined Analysis Templates
Organized in a hierarchy in package chord.project.analyses:
ProgramDom
ProgramRel
DlogAnalysis
JavaAnalysis
ForwardRHSAnalysis
RHSAnalysis
BackwardRHSAnalysis
BasicDynamicAnalysis
DynamicAnalysis
chord.project.ClassicProject API
• ITask getTask(String name)
• representation of named task
• Object getTrgt(String name)
• representation of named target
• ITask runTask(String name)
• run named task (and any needed tasks prior to it)
• boolean is[Task|Trgt]Done(String name)
• is named task/target already executed/computed?
• void set[Task|Trgt]Done(String name)
• set ‘done’ bit of named task/target to 1
• void reset[Task|Trgt]Done(String name)
• Set ‘done’ bit of named task/target to 0
Example Java Analysis
package chord.analyses.alias;
@Chord(name = "cicg-java", consumes = { "IM" })
public class CICGAnalysis extends JavaAnalysis {
private ProgramRel cg;
@Override public void run() {
cg = (ProgramRel) ClassicProject.g().getTrgt("IM");
}
public Set<jq_Method> getCallees(Quad q) {
if (!cg.isOpen()) cg.load();
RelView view = cg.getView();
view.selectAndDelete(0, q);
Iterable<jq_Method> res = view.getAry1ValTuples();
Set<jq_Method> callees = new HashSet<jq_Method>();
for (jq_Method m : res) callees.add(m);
view.free();
return callees;
}
public void free() {
if (cg.isOpen()) cg.close();
}
}
Example Java Analysis
@Chord(name = "my-java")
public class MyAnalysis extends JavaAnalysis {
@Override public void run() {
ClassicProject p = ClassicProject.g();
CICGAnalysis a = (CICGAnalysis) p.getTask("cicg-java");
p.runTask(a);
for (Quad q : ...) {
Set<jq_Method> tgts = a.getCallees(q);
...
}
a.free();
}
}
Specialized Java Analyses
• ProgramDom:
• Consumes targets specified in @Chord annotation
• Produces only a single target (the defined program domain itself)
• run() method computes and saves domain to disk
• ProgramRel:
• Consumes targets specified in @Chord annotation, plus target of
each of its program domains
• Produces only a single target (the defined program relation itself)
• run() method computes and saves relation to disk
• DlogAnalysis:
• Consumes only its declared domains and declared input relations
• Produces only its declared output relations
• run() method runs bddbddb
Analyses as Building Blocks
1. Modularity
•
each analysis is written independently
2. Flexibility
•
analyses can interact in powerful ways with other analyses
(by user-specified data/control dependencies)
3. Efficiency
•
analyses executed in demand-driven fashion
•
results computed by each analysis automatically cached
for reuse by other analyses without re-computation
•
independent analyses automatically executed in parallel
4. Reliability
•
result is independent of order in which analyses are run
Outline of Tutorial
• Part 1:
• Getting Started With Chord
• Program Representation
• Part 2:
• Analysis Using Datalog/BDDs
• Chaining Analyses Together
• Part 3:
• Context-Sensitive Analysis
• Dynamic Analysis
Context-Sensitive Analysis
• Respects inter-procedural control-flow to varying degrees
• Broadly two kinds:
• Bottom-Up: analyze method without any knowledge of its callers
• Top-Down: analyze method only in called contexts
• Two kinds of top-down approaches:
• Cloning-based (k-limited)
• Summary-based
• Fully context-sensitive approaches:
• Bottom-up
• Top-down summary-based
Context-Sensitive Analysis in Chord
• Top-down: both cloning-based and summary-based
• Cloning-based analysis
• k-CFA, k-object-sensitivity, hybrid
• Summary-based analysis
• Tabulation algorithm from Reps, Horwitz, Sagiv (POPL’95)
Example: Context-Insensitive Analysis
class Bldg {
List events, floors;
static void main(String[] a) {
Bldg b = new1 Bldg();
}
Bldg() {
List el = new2 List();
1
this.events = el;
List fl = new3 List();
this.floors = fl;
for (int i = 0; i < K; i++)
Event e = new4 Event();
el.elems[*] = e;
for (int i = 0; i < M; i++)
Floor f = new5 Floor();
fl.elems[*] = f;
}
}
2, 3
class List {
Obj[] elems;
List() {
Obj[] a = new6 Obj[…];
this.elems = a;
}
}
disjoint-reach(el, fl)?
b
new1 Bldg
el
events floors
new2 List
elems
fl
new3 List
elems
new6 Obj[]
[*] a [*]
new4 Event
new5 Floor
f
e
Example: Cloning-Based Analysis
class Bldg {
List events, floors;
static void main(String[] a) {
Bldg b = new1 Bldg();
}
Bldg() {
List el = new2 List();
1
this.events = el;
List fl = new3 List();
this.floors = fl;
for (int i = 0; i < K; i++)
Event e = new4 Event();
el.elems[*] = e;
for (int i = 0; i < M; i++)
Floor f = new5 Floor();
fl.elems[*] = f;
}
}
2
class List {
Obj[] elems;
List() {
Obj[] a = new6 Obj[…];
2
this.elems = a;
}
}
3
disjoint-reach(el, fl)?
b
new1 Bldg
el
events floors
new2 List
elems
fl
new3 List
elems
new6 Obj[]
[*] a [*]
new4 Event
new5 Floor
f
e
List() {
Obj[] a = new6 Obj[…];
this.elems = a;
}
3
Example: Cloning with Object Sensitivity
class Bldg {
List events, floors;
static void main(String[] a) {
Bldg b = new1 Bldg();
}
Bldg() {
List el = new2 List();
1
this.events = el;
List fl = new3 List();
this.floors = fl;
for (int i = 0; i < K; i++)
Event e = new4 Event();
el.elems[*] = e;
for (int i = 0; i < M; i++)
Floor f = new5 Floor();
fl.elems[*] = f;
}
}
2
class List {
Obj[] elems;
List() {
Obj[] a = new6 Obj[…];
2
this.elems = a;
}
}
3
disjoint-reach(el, fl)?
b
new1 Bldg
el
2
a
events floors
fl
new2 List
new3 List
elems
elems
new6 Obj[]
new6 Obj[]
[*]
[*]
new4 Event
new5 Floor
f
e
List() {
Obj[] a = new6 Obj[…];
this.elems = a;
}
3
3
a
Running Cloning-based Analyses in Chord
cspa_0cfa.dlog, cspa_kcfa.dlog, cspa_kobj.dlog, cspa_hybrid.dlog
ant –Dchord.work.dir=<…> –Dchord.run.analyses=<ONE OF ABOVE> run
• chord.ctxt.kind=[ci|cs|co]
• kind of context sensitivity for each method and its locals
• chord.inst.ctxt.kind=[ci|cs|co]
• kind of context sensitivity for each instance method and its locals
• chord.stat.ctxt.kind=[ci|cs|co]
• kind of context sensitivity for each static method and its locals
• chord.kobj.k=[1|2|…]
• k value to use for each object allocation site
• chord.kcfa.k=[1|2|…]
• k value to use for each method call site
Output of Pointer/Call-Graph Analyses in Chord
cspa_0cfa.dlog, cspa_kcfa.dlog, cspa_kobj.dlog, cspa_hybrid.dlog
• rootCM
• (c,m): m is entry method in ctxt c
• CICM
• (c1,i,c2,m): call site i in ctxt c1 may call
method m in ctxt c2
• CVC
• (c,v,o): local v may point to object o in
ctxt c of its declaring method
• FC
cipa_0cfa.dlog
• rootM
• IM
• VH
• FH
• HFH
• (f,o): static field f may point to object o
• CFC
• (o1,f,o2): instance field f of object o1 may point to object o2
Cloning-Based vs. Summary-Based Analysis
• Cloning-based Analysis:
• Flow-insensitive
• Notion of method contexts is somewhat arbitrary
• Summary-based Analysis:
• Flow-sensitive
• Notion of method contexts is defined by the user
Example: Thread-Escape Analysis
class Bldg {
List events, floors;
static void main(String[] a) {
Bldg b = new Bldg();
}
Bldg() {
List el = new List();
this.events = el;
List fl = new List();
this.floors = fl;
for (int i = 0; i < K; i++)
Event e = new Event();
el.elems[i] = e;
el
for (int i = 0; i < M; i++)
Floor f = new Floor();
fl.elems[i] = f;
}
}
class List {
Obj[] elems;
List() {
Obj[] a = new Obj[…];
this.elems = a;
}
}
Bldg
b
events floors
List
List
elems
elems
Obj[]
Obj[]
0
1
Event Event
0
fl
1
Floor Floor
Example: Thread-Escape Analysis
class Bldg {
List events, floors;
static void main(String[] a) {
Bldg b = new Bldg();
for (i = 0; i < K; i++)
List el = b.events;
p:
Event v = el.elems[i];
}
Bldg() {
List el = new List();
this.events = el;
List fl = new List();
this.floors = fl;
el
for (int i = 0; i < K; i++)
Event e = new Event();
el.elems[i] = e;
for (int i = 0; i < M; i++)
Floor f = new Floor();
fl.elems[i] = f;
for (i = 0; i < N; i++)
v
Elev t = new Elev(fl);
t.start();
}
}
= local
= shared
class List {
Obj[] elems;
List() {
Obj[] a = new Obj[…];
this.elems = a;
}
}
Bldg
b
events floors
List
floors
List
elems
Obj[]
0
Elev
fl
elems floors
Elev
Obj[]
1
Event Event
0
1
Floor Floor
local(p,v): Is v reachable
from single thread at p?
Example: Trivial Pointer Abstraction
class Bldg {
List events, floors;
static void main(String[] a) {
Bldg b = new Bldg();
for (i = 0; i < K; i++)
List el = b.events;
p:
Event v = el.elems[i];
}
Bldg() {
List el = new List();
this.events = el;
List fl = new List();
this.floors = fl;
for (int i = 0; i < K; i++)
Event e = new Event();
el.elems[i] = e;
for (int i = 0; i < M; i++)
Floor f = new Floor();
fl.elems[i] = f;
for (i = 0; i < N; i++)
Elev t = new Elev(fl);
t.start();
}
}
class List {
Obj[] elems;
List() {
Obj[] a = new Obj[…];
this.elems = a;
}
}
Bldg
Elev
events floors
List
List
elems
Obj[]
0
elems floors
Elev
Obj[]
1
Event Event
v
floors
0
1
Floor Floor
local(p, v)?
Example: Allocation Sites Pointer Abstraction
class Bldg {
List events, floors;
static void main(String[] a) {
Bldg b = new Bldg();
for (i = 0; i < K; i++)
List el = b.events;
p:
Event v = el.elems[i];
}
Bldg() {
List el = new List();
this.events = el;
List fl = new List();
this.floors = fl;
for (int i = 0; i < K; i++)
Event e = new Event();
el.elems[i] = e;
for (int i = 0; i < M; i++)
Floor f = new Floor();
fl.elems[i] = f;
for (i = 0; i < N; i++)
Elev t = new Elev(fl);
t.start();
}
}
class List {
Obj[] elems;
List() {
Obj[] a = new Obj[…];
this.elems = a;
}
}
Bldg
Elev
events floors
List
List
elems
Obj[]
0
elems floors
Elev
Obj[]
1
Event Event
v
floors
0
1
Floor Floor
local(p, v)?
Example: k-CFA Pointer Abstraction
class Bldg {
List events, floors;
static void main(String[] a) {
Bldg b = new Bldg();
for (i = 0; i < K; i++)
List el = b.events;
p:
Event v = el.elems[i];
}
Bldg() {
List el = new List();
this.events = el;
List fl = new List();
this.floors = fl;
for (int i = 0; i < K; i++)
Event e = new Event();
el.elems[i] = e;
for (int i = 0; i < M; i++)
Floor f = new Floor();
fl.elems[i] = f;
for (i = 0; i < N; i++)
Elev t = new Elev(fl);
t.start();
}
}
class List {
Obj[] elems;
List() {
Obj[] a = new Obj[…];
this.elems = a;
}
}
Bldg
Elev
events floors
List
List
elems
Obj[]
0
elems floors
Elev
Obj[]
1
Event Event
v
floors
0
1
Floor Floor
local(p, v)?
Complexity of Static Analyses
control-flow
abstraction
max
abstract states
trivial
1
flow and context
insensitive
1
flow sensitive
context insensitive
L
flow and context
sensitive
L . 2^(N2 . F)
allocation
sites
H
k-CFA
H . I^k
H = allocation sites, I = call sites
scalable
max abstract
values (N)
precise
pointer
abstraction
L = program points, F = fields
Challenge: an abstraction that is both precise and scalable
Our Static Analysis:
2-partition
2
flow and context
sensitive
Q . L . 4^F
Q = queries
Drawback of Existing Static Analyses
• Different queries require different parts of the program
to be abstracted precisely
• But existing analyses use the same abstraction to prove
all queries simultaneously
⇒ existing analyses sacrifice precision and/or scalability
P
Q1
static analysis
P
⊢ Q 1?
Q2
abstraction A
P
⊢ Q 2?
Insight 1: Client-Driven Static Analysis
• Query-driven: allows using separate abstractions for
proving different queries
• Parametrized: parameter dictates how much precision
to use for each program part for a given query
Q2
Q1
static analysis
static analysis
abstraction A 1
P
⊢ Q 1?
P
abstraction A 2
P
⊢ Q 2?
Example: Client-Driven Static Analysis (RHS)
h1:
p:
h2:
h3:
h4:
h5:
h6:
static void main(…) {
Bldg b = new Bldg();
for (*)
List el = b.events;
Event v = el.elems[*];
}
Bldg() {
List el = new List();
this.events = el;
List fl = new List();
this.floors = fl;
for (*)
Event e = new Event();
el.elems[*] = e;
for (*)
Floor f = new Floor();
fl.elems[*] = f;
for (*)
Elev t = new Elev(fl);
t.start();
}
v b el
events
elems
[*]
b
local(p, v)?
h1 h2 h3 h4 h5 h6 h7
this el
e this el
events
elems
[*]
t
fl
elems
[*]
floors
this
this
List() {
h7: Obj[] a = new Obj[…];
this.elems = a;
}
elems
this
floors
f
elems
this
this
Writing a Summary-Based Analysis in Chord
• Implement representations of path/summary edges:
class PE, SE implements chord.project.analyses.rhs.IEdge {
@Override public boolean matchesSrcNodeOf(IEdge edge) { … }
@Override public boolean mergeWith(IEdge edge) { … }
}
• Create a subclass of chord.project.analyses.rhs.
[Forward|Backward]RHSAnalysis
@Chord(name = "…")
public class MyAnalysis extends ForwardRHSAnalysis<PE, SE> {
@Override ICICG getCallGraph() { … }
@Override Set<Pair<Location, PE>> getInitPathEdges() { … }
@Override PE getInitPathEdge(Quad q, jq_Method m, PE pe) { … }
@Override PE getMiscPathEdge(Quad q, PE pe) { … }
@Override PE getInvkPathEdge(Quad q, PE clr,
jq_Method m, SE tgt) { … }
@Override SE getSummaryEdge(jq_Method m, PE pe);
@Override public boolean doMerge() { … }
@Override PE getCopy(PE pe) { … }
}
Insight 2: Leveraging Dynamic Analysis
• Challenge: Efficiently find cheap parameter to prove query
• 2^H choices, most choices imprecise or unscalable
• Our solution: Use dynamic analysis
• parameter is inferred efficiently (linear in H)
• it can fail to prove query, but it is precise in practice and no
cheaper parameter can prove query
H
inputs
I1 ... In
dynamic analysis
Q
static analysis
P
abstraction A
P
⊢ Q?
Example: Leveraging Dynamic Analysis
h1:
p:
h2:
h3:
h4:
h5:
h6:
h7:
static void main(String[] a) {
Bldg b = new Bldg();
for (i = 0; i < K; i++)
List el = b.events;
Event v = el.elems[i];
}
Bldg() {
List el = new List();
this.events = el;
List fl = new List();
this.floors = fl;
for (int i = 0; i < K; i++)
Event e = new Event();
el.elems[i] = e;
for (int i = 0; i < M; i++)
Floor f = new Floor();
fl.elems[i] = f;
for (i = 0; i < N; i++)
Elev t = new Elev(fl);
t.start();
}
List() {
Obj[] a = new Obj[…];
this.elems = a;
}
local(p, v)?
h1 h2 h3 h4 h5 h6 h7
Bldg
events
Elev
floors
floors
List
List
elems
elems
Obj[]
Obj[]
0
1
Event Event
v
0
floors
Elev
1
Floor Floor
Dynamic Analysis Implementation Space for Java
Chord supports instrumenting bytecode at load-time and offline
Implement
inside a JVM

Use JVMTI
Instrument
bytecode at
load-time
Instrument
bytecode offline



Portability dependency on not supported by
specific version some JVMs (e.g.
of specific JVM
Android)
Efficiency
Flexibility
Other
issues
not supported by
some JVMs (e.g.
Android)








no support for
can change only
what is doable by method bytecode
bytecode instru. after class loaded
not trivial to
event handing
modify
code must be
production JVM written in C/C++
must run program
twice to find which
classes to instru.
bytecode verifier may fail at runtime
Writing A Dynamic Analysis in Chord
import chord.project.analyses.DynamicAnalysis;
@Chord(name = "…")
public class MyDynamicAnalysis extends DynamicAnalysis {
@Override public InstrScheme getInstrScheme() {
InstrScheme s = new InstrScheme();
s.set<event1>(<args1>);
...
s.set<eventN>(<argsN>);
return scheme;
}
@Override public void initAllPasses() { … }
@Override public void doneAllPasses() { … }
@Override public void initPass() { … }
@Override public void donePass() { … }
@Override public void process<event1>(<args1>) { … }
...
@Override public void process<eventN>(<argsN>) { … }
}
Predefined Instrumentation Events
Dynamic IDs: t=thread ID, o=object ID (0 denotes null)
Static IDs: m:M, b:B, p:P, i:I, h:H, e:E, f:F, l:L, r:R
•
EnterMainMethod(t)
•
[Get|Put]staticPrimitive(e, t, b, f)
•
EnterMethod(m, t)
•
[Get|Put]staticReference (e, t, b, f, o)
•
LeaveMethod(m, t)
•
[Get|Put]fieldPrimitive(e, t, b, f)
•
EnterLoop(b, t)
•
[Get|Put]fieldReference (e, t, b, f, o)
•
LoopIteration(b, t)
•
[Get|Put]aloadPrimitive(e, t, b, i)
•
LeaveLoop(b, t)
•
[Get|Put]aloadReference (e, t, b, i, o)
•
BasicBlock(b, t)
•
[Get|Put]astorePrimitive(e, t, b, i)
•
Quad(p, t)
•
[Get|Put]astoreReference (e, t, b, i, o)
•
[Bef|Aft]MethodCall(i, t, o)
•
Thread[Start|Join](i, t, o)
•
[Bef|Aft]New(h, t, o)
•
[Acquire|Release]Lock([l|r], t, o)
•
NewArray(h, t, o)
•
Wait|NotifyAny|NotifyAll(i, t, o)
Configuring Dynamic Analysis
• Bytecode instrumentation kind: chord.instr.kind=[online|offline]
• How to communicate events: chord.trace.kind=[none|pipe|full]
in same JVM as that
running instrumented
program
Pro: can inspect state
in separate JVM after
JVM running
instrumented program
finishes
in separate JVM in
parallel with JVM
running instrumented
program
Con: either exclude JDK
from instrumentation or
don’t use it in event
handling code, to avoid
correctness or
performance problems
Con: infeasible for longrunning programs which
generate lots of events,
since all events are
stored in a (binary) file
on disk
Best option: uses
buffered POSIX pipe to
communicate events
between eventgenerating JVM and
event-handling JVM
• JVMTI to start/end generating events: chord.use.jvmti=[true|false]
• Reuse traces from older Chord run: chord.reuse.traces=[true|false]
Architecture of Dynamic Analysis in Chord
• chord.project.analyses.BasicDynamicAnalysis
• workhorse run() method: configures and runs dynamic analysis
• chord.project.analyses.DynamicAnalysis
• provides interface to handle predefined instrumentation events
• chord.instr.BasicInstrumentor
• provides interface to instrument various parts of a Java program
• chord.instr.Instrumentor
• instruments predefined events
• chord.runtime.BasicEventHandler
• starts/stops one-JVM dynamic analysis and maintains object IDs
• chord.runtime.TraceEventHandler
• starts/stops two-JVM dynamic analysis
• chord.runtime.EventHandler
• writes predefined events to buffer encapsulating trace file
Combining Static and Dynamic Analysis
•
Static followed by Dynamic
• reduce instrumentation overhead of dynamic
•
Dynamic followed by Static
• Counterexamples: query is false on some input
• Likely invariants: a query true on some inputs is likely true
on all inputs [Ernst 2001]
• Proofs: a query true on some inputs is likely true on all
inputs and for likely the same reason [this talk]
•
Static and Dynamic interleaved
• Yogi, concolic testing (EXE, DART, CUTE, SAGE)
Benchmark Characteristics
classes
methods
(x 1000)
bytecodes
(x 1000)
allocation
sites
(x 1000)
hedc
309
1.9
151
1.9
0. 6
weblech
532
3.1
230
3.0
0.7
lusearch
611
3.8
267
3.5
7.2
hsqldb
771
6.4
472
5.1
14.4
avrora
1498
5. 9
312
5.9
14.4
992
6.6
478
6.1
10.0
sunflow
queries
(x 1000)
Benchmark Characteristics
classes
methods
(x 1000)
bytecodes
(x 1000)
allocation
sites
(x 1000)
hedc
309
1.9
151
1.9
0. 6
weblech
532
3.1
230
3.0
0.7
lusearch
611
3.8
267
3.5
7.2
hsqldb
771
6.4
472
5.1
14.4
avrora
1498
5. 9
312
5.9
14.4
992
6.6
478
6.1
10.0
sunflow
queries
(x 1000)
Precision Comparison
Previous Approach
Our Approach
100%
100%
80%
80%
unknown
60%
thread-shared
40%
thread-local
20%
60%
40%
20%
0%
0%
• Pointer abstraction:
• Allocation sites
• Control abstraction:
• Flow insensitive
• Context insensitive
• Pointer abstraction:
• 2-partition
• Control abstraction:
• Flow sensitive
• Context sensitive
Precision Comparison
Previous Approach
Our Approach
100%
100%
80%
80%
60%
40%
20%
0%
unknown
thread-shared
thread-local
60%
40%
20%
0%
• Previous scalable approach resolves 27% of queries
• Our approach resolves 82% of queries
• 55% of queries are proven thread-local
• 27% of queries are observed thread-shared
Running Time Breakdown
our approach
baseline
static
analysis
static analysis
dynamic
analysis
total
per query group
mean
max
hedc
24s
6s
38s
1s
2s
weblech
39s
8s
1m
2s
4s
lusearch
43s
31s
8m
3s
6s
hsqldb
1m08s
35s
86m
11s
21s
avrora
1m00s
32s
41m
5s
8s
sunflow
1m18s
3m
74m
9s
19s
Sparsity of Our Abstraction
# sites set to
total
# sites
all queries
mean
proven queries
max
mean
max
hedc
1,914
3.2
12
1.4
5
weblech
2,958
2.2
8
1.5
5
lusearch
3,549
2.2
18
1.5
18
hsqldb
5,056
2.7
56
1.3
5
avrora
5,923
12.1
195
2.3
31
sunflow
6,053
2.2
18
1.3
15
Related Open-Source Projects
• JikesRVM: Java Research Virtual Machine
• Soot + Paddle: Static analysis and transformation
framework for Java bytecode
• IBM WALA: Static analysis framework for Java
bytecode and related languages
• RoadRunner (Flanagan & Freund): Dynamic analysis
framework for Java concurrency
Acknowledgments
• Joeq: Static analysis and transformation framework
for Java bytecode
• Javassist: Java bytecode manipulation framework
• bddbddb: BDD-based Datalog solver
Further Information
• Chord homepage:
http://jchord.googlecode.com/
• Chord user guide:
http://chord.stanford.edu/user_guide/
• Chord questions:
[email protected]
Thank You!