Transcript slides

Quantified Invariant Generation
using an
Interpolating Saturation Prover
Ken McMillan
Cadence Research Labs
Quantified invariants
• Many systems that we would like to verify formally are effectively infinite
state
– Parameterized protocols
– Programs manipulating unbounded data structures (arrays, heaps, stacks)
– Programs with unbounded thread creation
• To verify such systems, we must construct a quantified invariant
– For all processes, array elements, threads, etc.
• Existing fully automated techniques for generating invariants are not
strongly relevance driven
– Invisible invariants
– Indexed predicate abstraction
– Shape analysis
Interpolants and abstraction
• Interpolants derived from proofs can provide an effective relevance
heuristic for constructing inductive invariants
– Provides a way of generalizing proofs about bounded behaviors to the
unbounded case
• Exploits a prover’s ability to focus on relevant facts
– Used in various applications, including
• Hardware verification (propositional case)
• Predicate abstraction (quantifier-free)
• Program verification (quantifier-free)
• This talk
– Moving to the first-order case, including FO(TC)
– Modifying SPASS to create an interpolating FO prover
– Apply to program verification with arrays, linked lists
Invariants from unwindings
• Consider this very simple approach:
– Partially unwind a program into a loop-free, in-line program
– Construct a Floyd/Hoare proof for the in-line program
– See if this proof contains an inductive invariant proving the property
• Example program:
x = y = 0;
while(*)
x++; y++;
while(x != 0)
x--; y--;
assert (y == 0);
invariant:
{x == y}
Unwind the loops
{True}
x = y = 0;
x++; y++;
= y0}= 0}
{x = {y
0^
{x
{y = y}
1}
x++; y++;
{x
{y = y}
2}
Proof of inline program
contains invariants
for both loops
[x!=0];
x--; y--;
{x
{y = y}
1}
[x!=0];
x--; y--;
[x == 0]
[y != 0]
= 0}
{x = {y
0)
y = 0}
{False}
• Assertions may diverge as we unwind
• A practical method must somehow
prevent this kind of divergence!
Interpolation Lemma
[Craig,57]
• If A  B = false, there exists an interpolant A' for (A,B) such that:
– A implies A’
– A’ is inconsistent with B
– A’ is expressed over the common vocabulary of A and B
A variety of techniques exist for deriving an interpolant from a
refutation of A  B, generated by a theorem prover.
Interpolants for sequences
• Let A1...An be a sequence of formulas
• A sequence A’0...A’n is an interpolant for A1...An when
– A’0 = True
– A’i-1 ^ Ai ) A’i, for i = 1..n
– An = False
– and finally, A’i 2 L (A1...Ai) \ L(Ai+1...An)
A1
A2
A3
...
An
True ) A'1 ) A'2 ) A'3 ... A‘n-1 ) False
In other words, the interpolant is a structured
refutation of A1...An
Interpolants as Floyd-Hoare proofs
True
{True}
1. Each formula implies the next
)
xxx=y;
= yy0
1=
x1{x=y}
=y0
)
y1y++;
y++
=y0+1
2. Each is over common symbols of
prefix and suffix
y1>x
{y>x}
1
3. Begins with true, ends with false
)
x==
[x[x=y]
1=yy]
1
False
{False}
Proving in-line programs
SSA
sequence
proof
Hoare
Proof
Prover
Interpolation
FOCI: An Interpolating Prover
• Proof-generating decision procedure for quantifier-free FOL
– Equality with uninterpreted function symbols
– Theory of arrays
– Linear rational arithmetic, integer difference bounds
• SAT Modulo Theories approach
– Boolean reasoning performed by SAT solver
– Exploits SAT relevance heuristics
• Quantifier-free interpolants from proofs
– Linear-time construction [TACAS 04]
– From Q-F interpolants, we can derive atomic predicates for Predicate
Abstraction [Henzinger, et al, POPL 04]
• Allows counterexample-based refinement
– Integrated with software verification tools
• Berkeley BLAST, Cadence IMPACT
Avoiding divergence
• Programs are infinite state, so convergence to a fixed point is not
guaranteed.
• What would prevent us from computing an infinite sequence of
interpolants, say, x=0, x=1, x=2,... as we unwind the loops further?
• Limited completeness result [TACAS06]
– Stratify the logical language L into a hierarchy of finite languages
– Compute minimal interpolants in this hierarchy
– If an inductive invariant proving the property exists in L, you must eventually
converge to one
Interpolation provides a means of static analysis in abstract domains
of infinite height. Though we cannot compute a least fixed point, we
can compute a fixed point implying a given property if one exists.
Expressiveness
Expressiveness hierarchy
Canonical
Heap
Abstractions
8FO(TC)
Indexed
Predicate
Abstraction
8FO
Predicate
Abstraction
QF
Parameterized
Abstract Domain
Interpolant
Language
Need for quantified interpolants
for(i = 0; i < N; i++)
a[i] = i;
invariant:
{8 x. 0 · x ^ x < i ) a[x] = x}
for(j = 0; j < N; j++)
assert a[j] = j;
• Existing interpolating provers cannot produce quantified interpolants
• Problem: how to prevent the number of quantifiers from diverging in the
same way that constants diverge when we unwind the loops?
Need for Reachability
...
node *a = create_list();
while(a){
assert(alloc(a));
a = a->next;
}
invariant:
8x (rea(next,a,x) ^ x  nil ! alloc(x))
...
• This condition needed to prove memory safety (no use after free).
• Cannot be expressed in FO
– We need some predicate identifying a closed set of nodes that is allocated
• We require a theory of reachability (in effect, transitive closure)
Can we build an interpolating prover for full FOL
than that handles reachability, and avoids divergence?
Clausal provers
• A clausal refutation prover takes a set of clauses and returns a proof of
unsatisfiability (i.e., a refutation) if possible.
• A prover is based on inference rules of this form:
P1 ... Pn
C
• where P1 ... Pn are the premises and C the conclusion.
• A typical inference rule is resolution, of which this is an instance:
p(a)
p(U) ! q(U)
q(a)
• This was accomplished by unifying p(a) and P(U), then dropping the
complementary literals.
Superposition calculus
Modern FOL provers based on the superposition calculus
– example superposition inference:
Q(a)
P ! (a = c)
P ! Q(c)
– this is just substitution of equals for equals
– in practice this approach generates a lot of substitutions!
– use reduction order to reduce number of inferences
Reduction orders
• A reduction order  is:
– a total, well founded order on ground terms
– subterm property: f(a) Â a
– monotonicity: a  b implies f(a)  f(b)
• Example: Recursive Path Ordering (with Status) (RPOS)
– start with a precedence on symbols: a  b  c  f
– induces a reduction ordering on ground terms:
f(f(a)  f(a)  a  f(b)  b  c  f
Ordering Constraint
• Constrains rewrites to be “downward” in the reduction order:
Q(a)
P ! (a = c)
P ! Q(c)
These terms must be maximal in their clauses
example: this inference only possible if a  c
Thm: Superposition with OC is complete for refutation in FOL
with equality.
So how do we get interpolants from these proofs?
Local Proofs
• A proof is local for a pair of clause sets (A,B) when every inference step
uses only symbols from A or only symbols from B.
• From a local refutation of (A,B), we can derive an interpolant for (A,B) in
linear time.
• This interpolant is a Boolean combination of formulas in the proof
Reduction orders and locality
• A reduction order is oriented for (A,B) when:
– s  t for every s  L (B) and t 2L(B)
• Intuition: rewriting eliminates first A variables, then B variables.
A
B
x=y
f(y) = d
f(x) = c
c d
oriented: x y c d f
x=y
f(y) = c
f(x) = c ` f(y) = c
Local!!
f(y)
=d ` c=d
c=d
cd ` ?
Orientation is not enough
A
B
Q(a)
a=c
:Q(b)
b=c
QÂaÂbÂc
• Local superposition gives only c=c.
• Solution: replace non-local superposition with two inferences:
Q(a) a = c
Q(c)
Q(a)
a = U ! Q(U)
a=c
Q(c)
Second inference can be postponed until after resolving with : Q(b)
This “procrastination” step is an example of a reduction rule,
and preserves completeness.
Completeness of local inference
• Thm: Local superposition with procrastination is complete for refutation
of pairs (A,B) such that:
– (A,B) has a universally quantified interpolant
– The reduction order is oriented for (A,B)
• This gives us a complete method for generation of universally quantified
interpolants for arbitrary first-order formulas!
• This is easily extensible to interpolants for sequences of formulas, hence
we can use the method to generate Floyd/Hoare proofs for inline
programs.
Avoiding Divergence
• As argued earlier, we still need to prevent interpolants from diverging as
we unwind the program further.
• Idea: stratify the clause language
Example: Let Lk be the set of clauses with at most k
variables and nesting depth at most k.
Note that each Lk is a finite language.
• Stratified saturation prover:
– Initially let k = 1
– Restrict prover to generate only clauses in Lk
– When prover saturates, increase k by one and continue
The stratified prover is complete, since every proof is contained
in some Lk.
Completeness for universal invariants
• Lemma: For every safety program M with a 8 safety invariant, and every
stratified saturation prover P, there exists an integer k such that P refutes
every unwinding of M in Lk, provided:
– The reduction ordering is oriented properly
• This means that as we unwind further, eventually all the interpolants are
contained in Lk, for some k.
• Theorem: Under the above conditions, there is some unwinding of M for
which the interpolants generated by P contain a safety invariant for M.
This means we have a complete procedure for finding universally
quantified safety invariants whenever these exist!
In practice
• We have proved theoretical convergence. But does the procedure
converge in practice in a reasonable time?
• Modify SPASS, an efficient superposition-based saturation prover:
–
–
–
–
Generate oriented precedence orders
Add procrastination rule to SPASS’s reduction rules
Drop all non-local inferences
Add stratification (SPASS already has something similar)
• Add axiomatizations of the necessary theories
– An advantage of a full FOL prover is we can add axioms!
– As argued earlier, we need a theory of arrays and reachability (TC)
• Since this theory is not finitely axiomatizable, we use an incomplete
axiomatization that is intended to handle typical operations in listmanipulating programs
Partially Axiomatizing FO(TC)
• Axioms of the theory of arrays (with select and store)
8(A, I, V) (select(update(A,I,V), I) = V
8(A,I,J,V) (I  J ! select(update(A,I,V), J) = select(A,J))
• Axioms for reachability (rea)
8(L,E) rea(L,E,E)
8(L,E,X) (rea(L,select(L,E),X) ! rea(L,E,X))
[ if e->link reaches x then e reaches x]
8(L,E,X) (rea(L,E,X) ! E = X _ rea(L,select(L,E),X))
etc...
[ if e reaches x then e = x or e->link reaches x]
Since FO(TC) is incomplete, these axioms must be incomplete
Simple example
for(i = 0; i < N; i++)
a[i] = i;
for(j = 0; j < N; j++)
assert a[j] = j;
invariant:
{8 x. 0 · x ^ x < i ) a[x] = x}
Unwinding simple example
• Unwind the loops twice
i0 = 0
i = 0;
i0 < N
[i < N];
a1 = update(a0,i0,i0)
a[i] = i; i++;
i1 = i0 + 1
[i < N];
i1 < N
a[i] = i; i++;
a2 = update(a1,i1,i1)
i2 = i+1 + 1
[i >= N]; j = 0;
i ¸ N ^ j0 = 0
[j < N]; j++;
j0 < N ^ j1 = j0 + 1
[j < N];
a[j] != j;
j1 < N
select(a2,j1)  j1
{i0 = 0}
invariant
{0 · U ^ U < i1 ) select(a1,U)=U}
{0 · U ^ U < i2 ) select(a2,U)=U} invariant
{j · U ^ U < N ) select(a2,U)=U}
{j · U ^ U < N ) select(a2,U) = U}
note: stratification prevents constants diverging
as 0, succ(0), succ(succ(0)), ...
List deletion example
a = create_list();
while(a){
tmp = a->next;
free(a);
a = tmp;
}
• Invariant synthesized with 3 unwindings (after some: simplification):
{rea(next,a,nil) ^
8x (rea(next,a,x)! x = nil _ alloc(x))}
• That is, a is acyclic, and every cell is allocated
• Note that interpolation can synthesize Boolean structure.
More small examples
name
descript ion
assert ion
unwindings bound t ime ( s)
array set set all array element s t o 0 all element s zero
3
L1
0.01
array test set all array element s t o 0 all t est s O K
3
L1
0.01
t hen t est all element s
ll safe
creat e a linked list t hen
memory safet y
3
L1
0.04
t raverse it
ll acyc
creat e a linked list
list acyclic
3
L1
0.02
ll delete delet e an acyclic list
memory safet y
2
L1
0.01
ll delmid delet e any element
result acyclic
2
L1
0.02
of acyclic list
ll rev
reverse an acyclic list
result acyclic
3
L1
0.02
This shows that divergence can be controlled.
But can we scale to large programs?...
Canonical abstraction
• Abstraction replaces concrete heaps with abstract symbolic heaps
• Abstraction parameterize by “instrumentation predicates”
Pta Reaa
a
next
Reaa
next
Reaa
next
is_null Reaa
null
a
• Abstract heap represents infinite class of concrete heaps
– “Summary” node represents equivalence class of concrete nodes
– Dotted arcs mean “may point to”
Example program
node *create_list(){
node *l = NULL;
while(*){
node *n = malloc(...);
n->next = l;
l = n;
return l;
}
main(){
node *a = create_list();
while(a){
assert(alloced(a));
a = a->next;
}
}
• Want to prove this program does not access a freed cell.
Canonical Abstraction
• Predicates: Pta, Reaa, is_null, alloc
• Relations: next
Pta Rean is_null
a
Pta Rean
Rean is_null
a
alloc
Pta
Rean
Rean
Rean is_null(n)
a
alloc
alloc
All three abstract heaps verify property!
A slightly larger program
main(){
node *a = create_list();
node *b = create_list();
node *c = create_list();
node *p = * ? a : * ? b : c
while(p){
assert(alloced(p));
p = p->next;
}
}
• We have to track a, b and c to prove this property
– Lets look at what happens with canonical heap abstractions...
After creating “a”
• Predicates: Pta, Reaa, is_null, alloced
• Relations: next
is_null(n)
a
Pta(n)
is_null(n)
a
alloced(n)
Pta(n)
Ren(n)
a
alloced(n)
alloced(n)
is_null(n)
After creating “b”
is_null(n)
is_null(n)
a
is_null(n)
a
is_null(n)
a
Pta(n)
b
Pta(n)
is_null(n)
alloced(n)
is_null(n)
Pta(n)
a
alloced(n)
is_null(n)
Pta(n)
a
alloced(n)
Pta(n)
b
Pta(n)
is_null(n)
a
is_null(n)
Pta(n)
alloced(n)
is_null(n)
Ren(n)
is_null(n)
Pta(n)
alloced(n)
Ren(n)
is_null(n)
a
alloced(n)
is_null(n)
b
alloced(n)
a
alloced(n)
Ren(n)
b
alloced(n)
Ren(n)
is_null(n)
alloced(n)
b
Pta(n)
alloced(n)
a
alloced(n)
is_null(n)
is_null(n)
b
b
Pta(n)
Ren(n)
Pta(n)
alloced(n)
alloced(n)
alloced(n)
Pta(n)
Ren(n) is_null(n)
is_null(n)
b
b
alloced(n)
alloced(n)
alloced(n)
After creating “c”
[ Picture 27 abstract heaps here ]
Problem: abstraction scales exponentially with number of independent
data structures.
Independent analyses
• Suppose we do a Cartesian product of 3 independent analyses for a,b,c.
is_null(n)
is_null(n)
a
Pta(n)is_null(n)
a
alloced(n)
alloced(n)
alloced(n)
Pta(n) Ren(n)is_null(n)
a
alloced(n)alloced(n)
^
is_null(n)
b
Pta(n)is_null(n)
b
alloced(n)
alloced(n)
alloced(n)
Pta(n) Ren(n)is_null(n)
b
alloced(n)
^
c
Pta(n)is_null(n)
c
alloced(n)
alloced(n)
alloced(n)
Pta(n) Ren(n)is_null(n)
c
alloced(n)alloced(n)
alloced(n)
alloced(n)alloced(n)
alloced(n)
• How do we know we can decompose the analysis in this way and prove
the property?
– What if some correlations are needed between the analyses?
• For non-heap properties, one good answer is to compute interpolants.
Abstraction from interpolants
main(){
node *a = create_list();
node *b = create_list();
node *c = create_list();
node *p = * ? x : * ? b : c
while(p){
assert(alloced(p));
p = p->next;
}
}
• Interpolants contain inductive invariants after unrolling loops 3 times.
• Interpolant after creating c:
( a  0 ) alloced(a) ) ^
( b  0 ) alloced(b) ) ^
( c  0 ) alloced(c) )
^
8x. (x  0 ^ alloced(x)
) alloced(next(x))
Shape of the interpolant
( a  0 ) alloced(a) ) ^
( b  0 ) alloced(b) ) ^
( c  0 ) alloced(c) )
^
8x. (x  0 ^ alloced(x)
) alloced(next(x))
next
a
b
c
alloced
next
null
• Invariant says that allocated cells closed under ‘next’ relation
• Notice also the size of this formula is linear in the number of lists, not
exponential as is the set of shape graphs.
Suggests decomposition
( a  0 ) alloced(a) ) ^
( b  0 ) alloced(a) ) ^
( c  0 ) alloced(a) )
^
8x. (x  0 ^ alloced(x)
) alloced(next(x))
Canonical abstract domains
Predicates
Relations
a = 0, alloced(n)
b = 0, alloced(n)
c = 0, alloced(n)
n = 0, alloced(n)
next
• Each of these analyses proves one conjunct of the invariant.
Conclusion
• Interpolants and invariant generation
– Computing interpolants from proofs allows us to generalize from special
cases such as loop-free unwindings
– Interpolation can extract relevant facts from proofs of these special cases
– Must avoid divergence
• Quantified invariants
– Needed for programs that manipulating arrays or heaps
– FO equality prover modified to produce local proofs (hence interpolants)
• Complete for universal invariants
– Can be used to construct invariants of simple array- and list-manipulating
programs, using partial axiomatization of FO(TC)
• Language stratification prevents divergence
– Might be used as a relevance heuristic for shape analysis, IPA
For this approach to work in practice, we need FO provers
with strong relevance heuristics as in DPLL...
Expressiveness
Expressiveness hierarchy
Canonical
Heap
Abstractions
8FO(TC)
Indexed
Predicate
Abstraction
8FO
Predicate
Abstraction
QF
Parameterized
Abstract Domain
Interpolant
Language
Need for Reachability
...
node *a = create_list();
while(a){
assert(alloc(a));
a = a->next;
}
invariant:
8x (rea(next,a,x) ^ x  nil ! alloc(x))
...
• This condition needed to prove memory safety (no use after free).
• Cannot be expressed in FO
– We need some predicate identifying a closed set of nodes that is allocated
• We require a theory of reachability (in effect, transitive closure)
Can we build an interpolating prover for full FOL
than that handles reachability, and avoids divergence?