www.cs.colorado.edu

Download Report

Transcript www.cs.colorado.edu

End-User Shape Analysis
Bor-Yuh Evan Chang 張博聿
Xavier Rival
George C. Necula
U of Colorado, Boulder
INRIA/ENS Paris
U of California, Berkeley
National Taiwan University – August 11, 2009
Programming Languages Research
at the University of Colorado,
Boulder
Software errors cost a lot
~$60 billion annually (~0.5% of US GDP)
– 2002 National Institute of Standards and
Technology report
>
total annual revenue of
>
10x annual budget of
Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
3
But there’s hope in program analysis
Microsoft uses and distributes
the Static Driver Verifier
Airbus applies
the Astrée Static Analyzer
Companies, such as Coverity and Fortify,
market static source code analysis tools
Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
4
Because program analysis can
eliminate entire classes of bugs
For example,
– Reading from a closed file: read(
– Reacquiring a locked lock:
);
acquire(

); 
How?
– Systematically examine the program
– Simulate running program on “all inputs”
– “Automated code review”
Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
5
Program analysis by example:
Checking for double acquires
Simulate running program on “all inputs”
…code …
// x now points to an unlocked lock
analysis
acquire(x);
state
… code …
x
acquire(x);
… code …

Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
6
Program analysis by example:
Checking for double acquires
Simulate running program on “all inputs”
…code …
// x now points to an unlocked lock in a linked list
ideal analysis state
or
x
or
x
or
…
x
acquire(x);
… code …
Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
7
Must abstract
Abstraction too coarse or not precise enough
(e.g., lost x is always unlocked)
…code …
// x now points to an unlocked lock in a linked list
ideal analysis state
or
x
or
x

acquire(x);
… code … mislabels good code
as buggy
or
analysis
state
…
x
x
For decidability, must
abstract—“model all
inputs” (e.g., merge
objects)
Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
8
To address the precision challenge
Traditional program analysis mentality:
“ Why can’t developers write more specifications for
our analysis? Then, we could verify so much more.”
“ Since developers won’t write specifications, we will
use default abstractions (perhaps coarse) that work
hopefully most of the time.”
End-user approach:
“ Can we design program analyses around the user?
Developers write testing code. Can we adapt the
analysis to use those as specifications?”
Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
9
Summary of overview
Challenge in analysis: Finding a good abstraction
precise enough but not more than necessary
Powerful, generic abstractions
expensive, hard to use and understand
Built-in, default abstractions
often not precise enough (e.g., data structures)
End-user approach:
Must involve the user in abstraction
without expecting the user to be a program analysis
expert
Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
10
Overview of contributions
Extensible Inductive Shape Analysis (Xisa)
Precise inference of data structure properties
Able to check, for instance, the locking example
Targeted to software developers
Uses data structure checking code for guidance
 Turns testing code into a specification for static
analysis
Efficient
~10-100x speed-up over generic approaches
 Builds abstraction out of developer-supplied
checking code
Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
11
End-user approach
Extensible Inductive
Shape Analysis
Precise inference of
data structure properties
…
Shape analysis is a fundamental analysis
Data structures are at the core of
– Traditional languages (C, C++, Java)
– Emerging web scripting languages
Improves verifiers that try to
– Eliminate resource usage bugs
…
(locks, file handles)
– Eliminate memory errors (leaks, dangling pointers)
– Eliminate concurrency errors (data races)
– Validate developer assertions
Enables program transformations
– Compile-time garbage collection
– Data structure refactorings
Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
13
Shape analysis by example:
Removing duplicates
Example/Testing
l
2
2
4
Code Review/Static Analysis
4
l
“sorted dl list”
// l is a sorted doubly-linked list program-specific
for each node cur in list l {
intermediate state
remove cur if duplicate;
more complicated
l
2
4
4
l
}
cur
assert l is sorted, doubly-linked
with no duplicates;
l
2
4
l
“segment with
no duplicates”
“sorted dl list”
cur
“no duplicates”
Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
14
Shape analysis is not yet practical
Choosing the heap abstraction difficult for precision
Some representative approaches:
89
Parametric in low-level,
analyzer-oriented predicates
TVLA
[Sagiv et al.]
+ Very general and expressive
- Harder for non-expert
Built-in high-level predicates
Space Invader
[Distefano et al.]
End-user approach:
- Harder to extend
+ No additional user effort (if
precise enough)
Parametric in high-level,
developer-oriented predicates
Xisa
+ Extensible
+ Targeted at developers
Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
15
Our approach: Executable specifications
Utilize “run-time checking code” as specification
for static analysis.
h.dll(p) =
if (h =null) then
true
else
h!prev= p and
h!next.dll(h)
checker
• p specifies where
prev should point
Contribution:
Build
the abstraction
assert(sorted_dll(l,…));
for analysis out of
for each nodecurinlistl {
developer-specified
checking
code duplicate;
removecurif
Contribution:
Automatically
generalize checkers
for complicated
intermediate states
}
l
l
cur
assert(sorted_dll_nodup(l,…)); l
Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
16
Xisa is …
An automated shape analysis with a precise memory
abstraction based around invariant checkers.
h.dll(p) =
if (h = null) then
true
else
h!prev = prev and
h!next.dll(h)
checkers
Xisa
• Extensible and targeted for developers
– Parametric in developer-supplied checkers—viewed as
inductive definitions in separation logic
• Precise yet compact abstraction for efficiency
– Data structure-specific based on properties of interest
to the developer
Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
17
Shape analysis is an abstract interpretation
on abstract memory descriptions with …
Splitting of summaries
l
l
cur
cur
To reflect updates precisely
l
l
cur
And summarizing for termination
l
cur
l
cur
Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
cur
18
Roadmap: Components of Xisa
Learn information
about the checker to
use it as an abstraction
h.dll(p) =
if (h = null) then
true
else
h!prev = prev and
h!next.dll(h)
checkers
level-type
inference
on checker
definitions
splitting and
interpreting update
Compare and contrast
summarizing
manual
code review
and our automated
shape
analysis
abstract
interpretation
Xisa shape analyzer
Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
19
Overview: Split summaries
to interpret updates precisely
Want abstract update to be “exact”, that is, to
update one “concrete memory cell”.
The example at a high-level: iterate using cur changing the
doubly-linked list from purple to red.
Challenge:
How does the
analysis “split”
summaries and
know where to
“split”?
l
l
split at cur
cur
l
update cur purple to red
cur
l
cur
cur
Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
20
“Split forward”
by unfolding inductive definition
l
dll(cur, p)
p
cur
get: cur!next
l
Ç
null
cur
l
p
n
cur
dll(n, cur)
Analysis doesn’t
forget the
empty case
h.dll(p) =
if(h =null) then
true
else
h!prev= p and
h!next.dll(h)
Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
21
“Split backward” also possible and necessary
l
“dll segment”
p
n
dll(n, cur)
cur!prev!next
= cur!next;
cur
for each node cur in list l {
remove cur if duplicate;
}
How does the analysis do this unfolding?
assert l is sorted, doublyWhy is this unfolding allowed?
linked with no
duplicates;
(Key: Segments are also inductively defined)
get: cur!prev!next
Technical
Details:
l
null
n
curHow
l
Ç
dll(n, cur)
[POPL’08]
h.dll(p) =
does the analysis know to do thisifunfolding?
(h =null) then
“dll segment”
p0
n
cur
dll(n, cur)
true
else
h!prev= p and
h!next.dll(h)
Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
22
Roadmap: Components of Xisa
Derives additional
information to
guide unfolding
h.dll(p) =
if (h = null) then
true
else
h!prev = prev and
h!next.dll(h)
level-type
inference
on checker
definitions
How do we decide
where to unfold?
splitting and
interpreting update
summarizing
checkers
Contribution:
Turns testing code
into specification
for static analysis
abstract interpretation
Xisa shape analyzer
… to be discussed this afternoon
Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
23
Summary of interpreting updates
Splitting of summaries needed for precision
Unfolding checkers is a natural way to do
splitting
When checker traversal matches code traversal
Checker parameter type analysis
Useful for guiding unfolding in difficult cases, for
example, “back pointer” traversals
Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
24
Times negligible for data
structure operations
(often in sec or 1/10 sec)
Results: Performance
Expressiveness:
Different data structures
Max. Num.
Graphs at a
Program Pt
Benchmark
singly-linked list reverse
doubly-linked list reverse
doubly-linked list copy
doubly-linked list remove
1
Analysis
Time
(ms)
TVLA: 290 ms
1
Space Invader
only analyzes
2 lists
(built-in)
5
1.0
1.5
5.4
17.9
doubly-linked list remove and back
5
18.1
search tree with parent insert
3
TVLA: 850 ms 16.6
search tree with parent insertand back
5
64.7
two-level skip list rebalance
1
11.7
Linux scull driver (894 loc)
(char arrays ignored, functions inlined)
4
3969.6
Verified shape invariant as given by the
checker is preserved across the operation.
Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
25
Demo: Doubly-linked list reversal
Body of loop over the elements:
Swaps the next and prev fields
of curr.
Already reversed segment
Node whose next and
prev fields were swapped
Not yet reversed list
http://www.cs.colorado.edu/~bec/
Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
26
Experience with the tool
Checkers are easy to write and try out
– Enlightening (e.g., red-black tree checker in 6 lines)
– Harder to “reverse engineer” for someone else’s code
– Default checkers based on types useful
Future expressiveness and usability improvements
– Pointer arithmetic and arrays (in progress)
– More generic checkers:
polymorphic
higher-order
“element kind unspecified”
parameterized by other predicates
Future evaluation: user study
Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
27
Near-term future work:
Exploiting common specification framework
Scenario: Code instrumented with lots of checker calls
(perhaps automatically with object invariants)
assert( mychecker(x) );
// … operation on x …
assert( mychecker(x) );
• Very slow to execute
• Hard to prove statically (in general)
Can we prove parts statically?
Static Analysis View: Hybrid checking
Testing View:
Incrementalize invariant checking
Example: Insert in a sorted list
l
u
v
w
Preservation of sortedness shown statically
Emit run-time check for new element: u · v · w
Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
28
Conclusion
Extensible Inductive Shape Analysis
precision demanding program analysis
improved by novel user interaction
Developer: Gets results corresponding to
intuition
Analysis:
Focused on what’s important to
the developer
Practical precise tools for better software
with an end-user approach!
Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
29
Programming Languages Research
at the University of Colorado,
Boulder
Who we are
Faculty
Amer Diwan
Jeremy Siek
Bor-Yuh Evan Chang
Sriram Sankaranarayanan
Ph.D. Students
Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder
31
Outline
• Gradual Programming
– A new collaborative project involving
Amer Diwan, Jeremy Siek, and myself
• Brief Sketches of Other Activities
Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder
32
Gradual Programming:
Bridging the Semantic Gap
Have you noticed a time where your program
is not optimized where you expect?
Observation: A disconnect between programmer
intent and program meaning
“I need a
map data
structure”
Load class file
Run class initialization
Create hashtable
semantic gap
Problem: Tools (IDEs, checkers, optimizers) have no
knowledge of what the programmer cares about
… hampering programmer productivity,
software reliability, and execution efficiency
Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder
34
Example: Iteration Order
Must specify an iteration order
class OpenArray extends Object {
even
when it should not matter
private Double
data[];
public boolean contains(Object lookFor) {
for (i = 0; i < data.length; i++) {
if (data[i].equals(lookFor)) return true;
}
return false;
}
}
Compiler cannot choose a different
iteration order (e.g., parallel)
Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder
35
Wild and Crazy Idea: Use Non-Determinism
• Programmer starts with a potentially
non-deterministic program
• Analysis identifies instances of “underdeterminedness”
• Programmer eliminates “underdeterminedness”
Question: What does this mean?
“over-determined”
just right
Is it “under-determined”?
class OpenArray extends Object {
Response:
Depends, is the
private Double
data[];
iteration
order important?
public boolean
contains(Object
lookFor) {
for (i0 =..0;data.length-1
i
i < data.length;
{ i++) {
if (data[i].equals(lookFor)) return true;
}
return false;
}
}
Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder
starting point
“under-determined”
36
Let’s try a few program variants
public boolean contains(Object lookFor) {
for (i = 0; i < data.length; i++) {
if(data[i].equals(lookFor)) return true; }
return false;
}
public boolean contains(Object lookFor) {
for (i = data.length-1; i >= 0; i--) {
if(data[i].equals(lookFor)) return true; }
return false;
}
public boolean contains(Object lookFor) {
parallel_for (0, data.length-1) i => {
if(data[i].equals(lookFor)) return true; }
return false;
}
Do they compute
the same result?
Approach: Try to
verify equivalence
of program variants
up to a specification
Yes  Pick any one
No  Ask user
What about here?
Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder
37
Surprisingly, analysis says no. Why?
Exceptions!
a.data=
null
a.contains(
)
left-to-right iteration
returns true
right-to-left iteration
throws NullPointerException
Need user
interaction
to refine
specification
that captures
programmer
intent
Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder
38
Proposal Summary
• “Fix semantics per program”:
Abstract constructs with many possible
concrete implementations
• Apply program analysis to find inconsistent
implementations
• Interact with the user to refine the
specification
• Language designer role can enumerate the
possible implementations
Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder
39
Bridging the Semantic Gap
“I need a
map data
structure”
“Yes, I need
iteration in
sorted order”
“Looks like iterator
order matters for
your program”
“Let’s use a
balanced binary
tree (TreeMap)”
Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder
40
Other Activities
Formal Methods
Prof. Sriram Sankaranarayanan (CS)
Cyber-physical systems verification
– hybrid automata theory, control systems verification,
analysis of Simulink and Stateflow diagrams
– advanced mathematical techniques:
• convex optimization: linear and semi-definite
• differential equations: set-valued analysis
• SMT solvers over non-linear theories
– applications to automotive software (with NEC labs
and GM labs)
Prof. Aaron Bradley (ECEE)
Decision procedures, Model checking
Prof. Fabio Somenzi (ECEE)
Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder
42
Programming Languages and Analysis
Prof. Amer Diwan (CS)
Performance analysis of computer systems
How do we know that we have not perturbed our data?
Using machine learning and statistical techniques to reason about data
Tool-assisted program transformations
Algorithmic optimizations for performance
Program metamorphosis for improving code quality
Prof. Jeremy Siek (ECEE/CS)
Gradual type checking: static (Java)
dynamic (Python)
Meta-programming: programs that write programs
Compilers for optimizing scientific codes
Prof. Bor-Yuh Evan Chang (CS)
End-user program analysis
Precise analysis (shape, collections)
Interactive analysis refinement (type checking + symbolic evaluation)
Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder
43
Applying to Colorado
• Computer Science Department information
http://www.cs.colorado.edu/grad/admission/
• Deadlines
Dec 1 for Fall (Sep 1 for Spring)
• Graduate Advisor: Nicholas Vocatura
[email protected]
• Talk to me about application fee waiver
http://www.cs.colorado.edu/~bec/
Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder
44