Verifying Dereference Safety via Expanding-Scope Analysis Alexey Loginov (GrammaTech, Inc.) Joint work with: E.

Download Report

Transcript Verifying Dereference Safety via Expanding-Scope Analysis Alexey Loginov (GrammaTech, Inc.) Joint work with: E.

Verifying Dereference Safety via
Expanding-Scope Analysis
Alexey Loginov (GrammaTech, Inc.)
Joint work with: E. Yahav, S. Chandra, S. Fink (IBM TJ Watson)
N. Rinetzky (Tel-Aviv University)
M.G. Nanda (IBM IRL)
Why Null-Dereference Analysis?
 Common problem
 …or symptom of other problems
› Null-dereference warning may help in identifying root cause
 Relevant to all software
 Specification is obvious (absence of NPE)
› Requires no user interaction
2
Why Sound Null-Dereference Analysis?
 Safety guarantees are important in some domains
 Results can become an in-code specification, e.g., via JSR 305
› Annotations can help with code understanding
› Annotations can simplify future analyses (e.g., after modifications)
 Precise and efficient sound analysis is challenging
› Lessons carry over to other static analyses
3
Example answers expected
1. class A {
2. final A a = new A();
3.
4.
5.
6.
7.
static main() {
B b = new B();
initB(b);
a.foo(b); // okay
}
8. foo(B b) {
9.
b.f.fun(); // okay
10. b.f.f.gun(); // null-deref.
11. }
 Interprocedural information is needed often
12. static initB(B b) {
13. b.f = new F(); // okay
– Allocations in callers (e.g., new B()) common
14. b.f.f = null;
// okay
– Allocations in callees (e.g., new F()) common
15. }
16. }
4
Common approaches
 Most existing tools perform intraprocedural analysis
 Have to make assumptions about callers/callees
 Option 1: pessimistic assumptions about callers/callees
› Result: a sea of false alarms
5
Results of pessimistic intraproc. analysis
1. class A {
2. final A a = new A();
3.
4.
5.
6.
7.
static main() {
B b = new B();
initB(b);
a.foo(b); // null deref.
}
8. foo(B b) {
9.
b.f.fun(); // two null derefs.
10. b.f.f.gun(); // null deref.
11. }
12. static initB(B b) {
13. b.f = new F(); // null deref.
14. b.f.f = null;
// okay
15. }
16. }
6
 Reports four false alarms
– Only real error is on line 10
Common approaches
 Most existing tools perform intraprocedural analysis
 Have to make assumptions about callers/callees
 Option 2: optimistic assumptions about callers/callees
› Result: missing real errors (catching the most glaring ones)
7
Results of optimistic intraproc. analysis
1. class A {
2. final A a = new A();
3.
4.
5.
6.
7.
static main() {
B b = new b();
initB(b);
a.foo(b); // okay
}
8. foo(B b) {
9.
b.f.fun(); // okay
10. b.f.f.gun(); // okay
11. }
12. static initB(B b) {
13. b.f = new F(); // okay
14. b.f.f = null;
// okay
15. }
16. }
8
 Misses the real error on line 10
Common approaches
 Most existing tools perform intraprocedural analysis
 Have to make assumptions about callers/callees
 Option 3: mostly optimistic assumptions
› Detects inconsistencies in programmer’s beliefs
• Test x == null: belief that x could be null before test
• Dereference of x without a test: belief that x cannot be null
› Allow analysis to dismiss assumptions contradicted by beliefs
› Result: missing real errors, reporting safe dereferences as unsafe
• Generally, few false alarms but many missed errors
• Same result as option 2 (optimistic assumptions) in our example
9
Prospects for interprocedural analysis
 Whole-program analysis cannot scale to large software
› Majority of instructions are relevant to null-dereference analysis
• Can’t prune down program to a small relevant subset
 Need mechanism to break down a program’s complexity
10
Expanding-Scope Analysis
 Holy Grail
›
›
Cost: INTRAprocedural analysis
Precision: INTERprocedural (whole-program) analysis
 Staged approach
›
›
›
›
Analyze dereferences with limited interprocedural context
Verify dereferences with the least amount of context
Increase interprocedural context for harder cases
In simplest form
•
Start with local analysis (with pessimistic assumptions)
–
•
Consider remaining dereferences with extra level of context
–
•
›
11
Verify some dereferences without considering context
Verify some dereferences within a call subtree of immediate callers
…
We refer to individual analyses as Limited-Scope Analyses
Expanding-Scope Analysis
f
f
f
f
… f.foo() …
f
f
12
f
Expanding-Scope Analysis
main
B b = new B();
initB(b);
a.foo(b);
initB
b.f = new F();
b.f.f = null
13
foo
b.f.fun();
b.f.f .gun();
Abstract Domain
 Product of three abstract domains
1. Abstract domain for may-alias analysis
•
Implementation: flow- & context-insensitive Andersen-style
2. Abstract domain for must-alias analysis
•
Implementation: demand-driven (based on def-use chains)
3. Set APnn of non-null access paths
•
Access paths denote l-value expressions:
–
•
Finiteness of domain guaranteed by (parameterized) bounds on
–
–
›
14
(VarId | StaticFieldId).InstanceFieldId*
Size of APnn
Maximal length of access paths in APnn
Only the final component (set of non-null access paths APnn) changes
Transfer Functions (statements)
Let  = InstanceFieldId* (sequences of instance fields)
Statement
Transfer function
v = null
APnn \ { v. |   }
v = new T()
APnn  {v}
v=w
APnn  {v. | w.  APnn}
v = w.f
APnn  {v. | w.f.  APnn}  mustAlias(w)
v.f = null
APnn \ {e′.f. | e′  mayAlias(v),   }  mustAlias(v)
v.f = w
APnn  {e′.f. | w.  APnn, e′  mustAlias(v)}  mustAlias(v)
…v.foo()…
…v[i]…
…v.length…
APnn  mustAlias(v)
15
Transfer Functions (conditions)
Condition
16
Transfer function
on true branch
on false branch
v == null
v  APnn ?  : APnn
APnn  mustAlias(v)
v instanceof T
APnn  mustAlias(v)
APnn
APnn
v == w
APnn
 (mustAlias(w) if v  APnn)
 (mustAlias(v) if w  APnn)
Staged Analysis in SALSA
(Scalable Analysis via Lazy Scope expAnsion)

Real OO applications (e.g., web applications) have wide call graphs
›

High scope limits are too expensive to analyze
New stages help stave off the need for high scope limits
1. Pruning
•
Verifies dereferences of (non-null) final and stationary fields
2. Special local (scope-0) analyses
a. Caller-guarantee analysis (top-down in call graph)
–
–
Propagates callers’ guarantees to callees
E.g., for references passed as arguments down deep call chains
b. Callee-guarantee analysis (bottom-up in call graph)
–
–
17
Propagates callees’ guarantees up to callers
E.g., for field initializations in deep initialization call chains
Staged Analysis in SALSA
(Scalable Analysis via Lazy Scope expAnsion)
caller-guarantee
callee-guarantee
scope-1
subtrees of depth 1 from parents
scope-2 subtrees of depth 2 from grandparents
…
…
limited-scope
data-flow analyses
pruning
symbolic
high priority
18
low priority
Steps of staged interproc. analysis
1. class A {
2. final A a = new A();
3.
4.
5.
6.
7.
static main() {
B b = new B();
initB(b); b.f  AP
nn
a.foo(b);
}
8. foo(B b) { b  APnn
b.f.fun();
9.
10. b.f.f .gun();
11. }
12. static initB(B b) { b  APnn
13. b.f = new F();
14. b.f.f = null;
15. }
16. }
20
 Pruning (final & stationary fields)
 Limited-scope analysis
1.
2.
2.
3.
Scope-0 (local analysis)
Caller-guarantee
(local) analysis
Callee-guarantee
Scope-1 analysis(local) analysis
Scope-1 analysis
Experimental results
 21 (mostly open-source) applications
› ~3K-465K bytecodes; ~300-37K dereferences
 Avg: ~90% of dereferences verified soundly and automatically
› ~8% dismissed by Pruning
› ~77% dismissed by caller-guarantee analysis
› ~5% dismissed by remaining stages
 Final scope limit: between 2 and 5 (chosen heuristicallly)
› Diminishing returns after local analyses (caller-/callee-guarantee)
› Higher scope limits useful in the absence of caller/callee guarantees
 Max. access-path length: 2 for all but four applications
› Higher access-path lengths had no effect for most applications
› Helped C-like applications (direct field dereferences without getters)
21
Experimental results
 Expected many false alarms due to simple abstract domain
 Implemented heuristic symbolic path-validity checking
› This phase selected ~20% as high-priority warnings
› Surprisingly low incidence of false alarms due to path-correlation
 Biggest domain shortcoming: not tracking access-path types
› Causes unnecessarily high cost of verifying certain dereferences
• Includes too many irrelevant code portions when verifying a dereference
› Produces false alarms due to examining type-infeasible paths
 Results are encouraging for the simplicity of the domain
22
Tool-User Interaction
 The output includes suggested annotations
› Ordered by the number of warnings guaranteed to be dismissed
• Actual number would require an alternate abstract domain
› Current annotation options
• Field f is non-null
• Parameter p or return value of method foo() is non-null
 User may choose to accept some annotations
› We studied annotations for 8 benchmarks with high warning counts
› A few hours effort for non-familiar code
• Result: 30% decrease in warning counts
23
Summary
 Novel expanding-scope analysis
› Applicable to multiple abstract domains
 Scalable and precise null-dereference analysis
› Staged analysis makes a simple abstract domain effective
 Vision: improve programs’ specifications and robustness
› Cleanse programs by examining warnings and suggested annotations
› Check accepted annotations with assertions or symbolic techniques
› Extend the program’s specification and analyzability via annotations
25