An Overview on Static Program Analysis

Download Report

Transcript An Overview on Static Program Analysis

An Overview on
Static Program Analysis
Mooly Sagiv
http://www.math.tau.ac.il/~sagiv/courses/pa01.html
Tel Aviv University
640-6706
Wednesday 10-12
Textbook: Principles of Program Analysis
F. Nielson, H. Nielson, C.L. Hankin
Other sources: Semantics with Application Nielson & Nielson

http://listserv.tau.ac.il/archives/cs0368-4051-01.html
Course Requirements
 Prerequisites
– Compiler Course
 A theoretical
course
– Semantics of programming languages
– Topology theory
– Algorithms
 Grade
– Course Notes 10%
– Assignments 30%
» Mostly theoretical but while using software tools
– Home exam 60%
» One week
Outline
 What
is static analysis
 Usage in compilers
 Other clients
 Why is it called ``abstract interpretation''?
 Undecidability
 Handling Undecidability
 Soundness of abstract interpretation
 Relation to program verification
 Origins
 Complementary approaches
 Tentative schedule
Static Analysis
 Automatic derivation of static properties which
hold on every execution leading to a program
location
Example Static Analysis Problem
 Find
variables with constant value at a given
program location
int p(int x){
return x *x ;
}
void main()
{
int z;
if (getc())
z = p(6) + 8;
else z = p(5) + 7;
printf (z);
}
int p(int x){
return (x *x) ;
}
void main()
{
int z;
if (getc())
z = p(3) + 1;
else z = p(-2) + 6;
printf (z);
}
More Programs
int x
void p(a) {
read (c);
if c > 0 {
a = a -2;
p(a);
a = a + 2;
}
x = -2 * a + 5;
print (x);
}
void main {
p(7);
print(x);
}
Compiler Scheme source-program
String
Tokens
Scanner
tokens
Parser
AST
Semantic Analysis
AST
Code Generator
AST
IR
StaticLIR
analysis
IR +information
Transformations
Example Static Analysis Problems










Live variables
Reaching definitions
Expressions that are “available”
Dead code
Pointer variables never point into the same location
Points in the program in which it is safe to free an object
An invocation of virtual method whose address is unique
Statements that can be executed in parallel
An access to a variable which must be in cache
Integer intervals
The Need for Static Analysis
 Compilers
– Advanced computer architectures
(Superscalar pipelined, VLIW, prefetching)
– High level programming languages
(functional, OO, garbage collected, concurrent)
 Software
Productivity Tools
– Compile time debugging
»
»
»
»
»
Stronger type Checking for C
Array bound violations
Identify dangling pointers
Generate test cases
Generate certification proofs
Challenges in Static Analysis
 Non-trivial
 Correctness
 Precision
 Efficiency
 Scaling
of the analysis
C Compilers
 The
language was designed to reduce the need for
optimizations and static analysis
 The programmer has control over performance
– order of evaluation
– Storage
– registers
C
compilers nowadays spend most of the
compilation time in static analysis
 Sometimes C compilers have to work harder!
Software Quality Tools
 Detecting
hazards (lint)
– Uninitialized variables
a = malloc() ;
b = a;
cfree (a);
c = malloc ();
if (b == c)
printf(“unexpected equality”);
 References
outside array bounds
 Memory leaks
Foundation of Static Analysis
 Static
analysis can be viewed as
interpreting the program over an “abstract
domain”
 Execute the program over larger set of
execution paths
 Guarantee sound results
– Every identified constant is indeed a constant
– But not every constant is identified as such
Example Abstract Interpretation
Casting Out Nines




Check soundness of arithmetic using 9 values
0, 1, 2, 3, 4, 5, 6, 7, 8
Whenever an intermediate result exceeds 8, replace by the sum of its
digits (recursively)
Report an error if the values do not match
Example “123 * 457 + 76543 = 132654?”
– 123*457 + 76543  6 * 7 + 7 = 6 + 7  4
– 21  3
– Report an error
 Soundness
(10a + b) mod 9 = (a + b) mod 9
(a+b) mod 9 = (a mod 9) + (b mod 9)
(a*b) mod 9 = (a mod 9) * (b mod 9)
Abstract (Conservative) interpretation
Set of states
Operational
semantics
statement s
concretization
abstract
representation
statement s
Abstract
semantics
Set of states
abstraction
abstract
representation
Example rule of signs
 Safely
identify the sign of variables at every
program location
 Abstract representation {P, N, ?}
 Abstract (conservative) semantics of *
Abstract (conservative) interpretation
{…,<-88, -2>,…}
Operational
semantics
x := x*y
concretization
<N, N>
x := x*#y
Abstract
semantics
{…, <176, -2>…}
abstraction
<P, N>
Example rule of signs (cont)
 Safely
identify the sign of variables at every
program location
 Abstract representation {P, N, ?}
 (C) = if all elements in C are positive
then return P
else if all elements in C are negative
then return N
else return ?
 (a) = if (a==P) then
return{0, 1, 2, … }
else if (a==N)
return {-1, -2, -3, …, }
else return Z
Example Constant Propagation
 Abstract
representation set of integer values and
and extra value “?” denoting variables not known
to be constants
 Conservative interpretation of +
Example Constant Propagation(Cont)
 Conservative
interpretation of *
Example Program
x = 5;
y = 7;
if (getc())
y = x + 2;
z = x +y;
Example Program (2)
if (getc())
x= 3 ; y = 2;
else
x =2; y = 3;
z = x +y;
Undecidability Issues
 It
is undecidable if a program point is reachable
in some execution
 Some static analysis problems are undecidable
even if the program conditions are ignored
The Constant Propagation Example
while (getc()) {
if (getc()) x_1 = x_1 + 1;
if (getc()) x_2 = x_2 + 1;
...
if (getc()) x_n = x_n + 1;
}
y = truncate (1/ (1 + p2(x_1, x_2, ..., x_n))
/* Is y=0 here? */
Coping with undecidabilty
 Loop
free programs
 Simple static properties
 Interactive solutions
 Conservative estimations
– Every enabled transformation cannot change the
meaning of the code but some transformations are no
enabled
– Non optimal code
– Every potential error is caught but some “false alarms”
may be issued
Analogies with Numerical Analysis
 Approximate
the exact semantics
 More precision can be obtained at greater
computational costs
Violation of soundness
 Loop
invariant code motion
 Dead code elimination
 Overflow
((x+y)+z) != (x + (y+z))
 Quality checking tools may decide to ignore
certain kinds of errors
Abstract interpretation cannot be
always homomorphic (rules of signs)
<-8, 7>
Operational
semantics
x := x+y
<-1, 7>
abstraction
<N, P>
x := x+#y
Abstract
semantics
abstraction
<? P>
<N, P>
Local Soundness of
Abstract Interpretation
Operational
semantics
statement
abstraction
abstraction
statement#
Abstract
semantics

Optimality Criteria
 Precise
(with respect to a subset of the programs)
 Precise under the assumption that all paths are
executable (statically exact)
 Relatively optimal with respect to the chosen
abstract domain
 Good enough
Program Verification
 Mathematically
prove the correctness of the
program
 Requires formal specification
 Example
Hoare Logic {P} S {Q}
– {x = 1} x++ ; {x = 2}
– {x =1}
{true} if (y >0) x = 1 else x = 2 {?}
– {y=n} z = 1 while (y>0) {z = z * y-- ; } {?}
Relation to Program Verification
Program Analysis







Fully automatic
But can benefit from
specification
Applicable to a programming
language
Can be very imprecise
May yield false alarms
Identify interesting bugs
Establish non-trivial properties
using effective algorithms
Program Verification

Requires specification and loop
invariants

Program specific

Relative complete
Must provide counter examples
Provide useful documentation


Origins of Abstract Interpretation








[Naur 1965] The Gier Algol compiler
“`A process which combines the operators and operands of the source
text in the manner in which an actual evaluation would have to do it,
but which operates on descriptions of the operands, not their value”
[Reynolds 1969] Interesting analysis which includes infinite domains
(context free grammars)
[Syntzoff 1972] Well foudedness of programs and termination
[Cousot and Cousot 1976,77,79] The general theory
[Kamm and Ullman, Kildall 1977] Algorithmic foundations
[Tarjan 1981] Reductions to semi-ring problems
[Sharir and Pnueli 1981] Foundation of the interprocedural case
[Allen, Kennedy, Cock, Jones, Muchnick and Scwartz]
Complementary Approaches
Unsound Approaches
– Compute underapproximation
Better
programming language design
Type checking
Just in time and dynamic compilation
Profiling
Runtime tests
Tentative schedule
 Operational
Semantics (Semantics Book)
 Introduction (Chapter 1)
 The abstract interpretation technique (CC79)
 The TVLA system (Material will be given)
 Interprocedural and Object Oriented Languages
 Advanced Applications
– Detecting buffer overflow
– Compile-time Garbage Collection
– Mutlithreded programs