272: Software Engineering Fall 2012 Instructor: Tevfik Bultan Lecture 3: Modular Verification with Magic, Predicate Abstraction.

Download Report

Transcript 272: Software Engineering Fall 2012 Instructor: Tevfik Bultan Lecture 3: Modular Verification with Magic, Predicate Abstraction.

272: Software Engineering
Fall 2012
Instructor: Tevfik Bultan
Lecture 3: Modular Verification with Magic,
Predicate Abstraction
Modular verification with Magic
• MAGIC: Modular Analysis of proGrams In C
• Goal: Automated verification of C programs against finite state
machine specifications (given as labeled transition systems)
– Checks that the behavior of the C program conforms to the
behavior of the state machine
• It is a modular verification approach, the decomposition of the
verification task follows the modularity in the code
– The procedure that is being analyzed can invoke other procedures
which are themselves specified as state machines
• It uses predicate abstraction for automatically generating procedure
abstractions and then checks conformance of the extracted procedure
abstraction to the specification
• It uses the abstract-verify-refine approach
– If the conformance check fails, the procedure abstraction can be
refined
Labeled transition systems as specifications
• A labeled transition system (LTS) M is a 4-tuple (S, S0, Act, T) where
– S is a finite, non-empty set of states
– S0  S is the set of initial states
– Act is the set of actions
– T  S × Act × S is the transition relation
• Assume that there is a special type of state called STOP state.
– A STOP state has no outgoing transitions
• (s, a, s’)  T is also written as s →a s’
• s  a s’ means that s’ is reachable from s by following only a single
a-transition and arbitrary number of ε-transitions
– ε is a specific type of action in Act. It corresponds to a silent action
(like skip)
Example LTS
lock
return[0]
MyLock
return[1]
STOP
• There is a textual language (called Finite State Processes, FSP) for
specifying labeled transition systems
• For the above LTS, the FSP specification would be:
MyLock = {
lock -> return {$0 == 0} -> STOP
| return {$0 == 1} -> STOP } .
An example LTS and an example procedure
lock
return[0]
MyLock
return[1]
STOP
int proc()
{
if (do_lock())
return 0;
else
return 1;
}
• The goal is to check the conformance between the C procedures and
the specification LTSs
Procedure Abstractions
• They define a procedure abstraction (PA) as a set of LTSs.
• A PA is a tuple <d, l> where
– d is the declaration for the procedure (as it appears in a C header
file)
– l is a finite list <g1, M1> , …, <gn, Mn> where each gi is a guard
formula ranging over the parameters of the procedure and each Mi
is an LTS with a single initial state
• The guards are mutually exclusive
• A PA is an abstraction of a procedure, if, for all i between 1 and n,
when the guard gi evaluates to true over the actual parameters
passed to the procedure, the procedure conforms to the LTS Mi
Procedure Abstractions
• Procedure abstractions serve two purposes
1. They are used to specify desired behavior of the procedures
• They present automated extraction techniques to
automatically extract a PA from a given procedure
2. They are used to achieve modular verification
• During verification of a procedure, the behaviors of
procedures that are called by that procedure are abstracted
as PAs
Conformance as Weak Simulation
• Once a PA is extracted from a given procedure, then we want to
check if the extracted PA conforms to the given LTS specification
• In order to do this we need to formalize what it means to “conform” to
a given LTS specification
• They do this by using weak simulation
• Weak simulation preservers LTLX properties
– LTLX is the temporal logic LTL without the next state operator X
– So,
1. if we verify an LTLX property on the specification LTS, and
2. show that the procedure conforms to the specification LTS,
then
3. we can conclude that the procedure also satisfies the LTL
property
Conformance as Weak Simulation
• Given two LTSs M = (S, S0, Act, T) and M’ = (S’, S0’, Act, T’)
• M’ weakly simulates M if and only if there exists a weak simulation
relation E  S × S’ such that
1. For all s  S0 there exists an s’  S0’ such that (s, s’)  E
2. (s, s’)  E implies that for all actions a  Act \ {ε}
if s  a s1 then there exists an s1’  S0’ such that
s’  a s1’ and (s1, s1’)  E
Weak Simulation
• The existence of a simulation relation between two labeled transition
systems can be checked by reducing the problem to an instance of
Boolean satisfiability
• Due to the specific structure of the SAT instances produced in this
reduction, satisfiability of the resulting SAT instance can be solved in
linear time.
• Weak simulation is the conformance criteria that is used in Magic:
– A procedure conforms to an LTS if the LTS can weakly simulate
the procedure
– This means that the implementation (the C procedure) is safely
abstracted by its specification (the LTS)
Weak Simulation
• Weak simulation is the conformance criteria that is used in Magic:
– A procedure conforms to an LTS if the LTS can weakly simulate
the procedure
– This means that the implementation (the C procedure) is safely
abstracted by its specification (the LTS)
Overall Approach
Given a specification Mspec for a procedure
• First, extract Mimp which abstracts the behavior of the procedure
– During the abstraction process, the procedures that are called by
the procedure that is being analyzed are modeled using a set of
given procedures abstractions (which are called assumption PAs)
– The procedure abstraction is automatically generated using the
given assumption PAs and predicate abstraction
• Then, check if Mimp conforms to Mspec (via weak simulation)
– If Mimp conforms to Mspec then verification is successful and we are
done
– If Mimp does not conform to Mspec then we check the cause for nonconformance
• If it is a bug in the implementation, then we found an error and
we are done
• If it is not a bug, but non-conformance is due to imprecision in
the abstraction Mimp, then refine Mimp and repeat the process
Model Extraction
Extraction of Mimp relies on the following principles:
• Every state of Mimp models a state during execution of the procedure,
so every state is composed of a control component and a data
component
• The control components intuitively represent the values of the
program counter and are formally obtained from the CFG
• The data components are abstract representations of the memory
state of the procedure and are obtained using predicate abstraction
• The transitions between states of the Mimp are derived from the
transitions in the control flow graph taking into account the
assumption PAs and the predicate abstraction
Inlining assumption PAs
• During the model extraction, assumption PAs are used to handle
procedure calls
• If the procedure that is being abstraction calls another procedure p,
then the PA for p is inlined by
– creating a copy of the LTS for p
– inserting an ε-transition from the call location to the initial state of
the LTS for p
– inserting ε-transitions from the STOP states of the LTS for p to the
statement right after the call statement
Experiments with MAGIC
• OpenSSL if an open source implementation of the publicly available
SSL specification
– SSL protocol is used by a client (typically a web browser) and a
server to establish a secure socket connection over a malicious
network using public and symmetric key cryptography
• A critical component of the protocol is the handshake
• Check if the openssl-0.9.6c implementation of the server side
handshake conforms to its specification
– Implementation is encapsulated in a single procedure with 347
lines of C code
– They wrote the Mspec manually (an LTS with 28 states and 67
transitions)
• Check if the client-side implementation conforms to the specification
– Implementation is encapsulated in a single procedure with 345
lines of C code
– Mspec is an LTS with 28 states and 60 transitions
Experiments with MAGIC
• They provided 18 predicates for abstraction and provided the PAs for
12 library routines
• Server-side verification took 255 seconds and 130MB of memory
• Client-side verification took 226 seconds and 107MB of memory
• They then changed the specification model to see if their approach
can catch errors
– Server-side error was found in 247 seconds using 130MB of
memory
– Client-side error was found in 227 seconds using 11MB of
memory
Predicate Abstraction
• In the following slides I will give an overview of the predicate
abstraction technique
Abstraction (A simplified view)
• How do we generate an abstract transition system?
• Merge states in the concrete transition system (based on some
criteria)
– This reduces the number of states, so it should be easier to do
verification
• Do not eliminate transitions
– This will make sure that the paths in the abstract transition system
subsume the paths in the concrete transition system
Abstraction (A simplified view)
• For every path in the concrete transition system, there is an
equivalent path in the abstract transition system
– If no path in the abstract transition system violate a property, then
no path in the concrete system can violate the property
• Using this reasoning we can verify properties in the abstract transition
system
– If the property holds on the abstract transition system, we are sure
that the property holds in the concrete transition system
– If the property does not hold in the abstract transition system, then
we are not sure if the property holds or not in the concrete
transition system
Abstraction (A simplified view)
• If the property does not hold in the abstract transition system, what
can we do?
• We can refine the abstract transition system (split some states that
we merged)
• We have to make sure that the refined transition system is still an
abstraction of the concrete transition system
• Then, we can recheck the property again on the refined transition
system
– If the property does not hold again, we can refine again
Predicate Abstraction
• An automated abstraction technique which can be used to reduce the
state space of a program
• The basic idea in predicate abstraction is to remove some variables
from the program by just keeping information about a set of
predicates about them
• For example a predicate such as x = y maybe the only information
necessary about variables x and y to determine the behavior of the
program
– In that case we can just store a boolean variable which
corresponds to the predicate x = y and remove variables x and y
from the program
– Predicate abstraction is a technique for doing such abstractions
automatically
Predicate Abstraction
• Given a program and a set of predicates, predicate abstraction
abstracts the program so that only the information about the given
predicates are preserved
• The abstracted program adds nondeterminism since in some cases it
may not be possible to figure out what the next value of a predicate
will be based on the predicates in the given set
• One needs an automated theorem prover to compute the abstraction
Predicate Abstraction, A Very Simple Example
• Assume that we have two integer variables x,y
• We want to abstract the program using a single predicate “x=y”
• We will divide the states of the program to two:
1. The states where “x=y” is true
2. The states where “x=y” is false, i.e., “xy”
• We will then merge all the states in the same set
– This is an abstraction
– Basically, we forget everything except the value of the predicate
“x=y”
Predicate Abstraction, A Very Simple Example
• We will represent the predicate “x=y” as the boolean variable B in the
abstract program
– “B=true” will mean “x=y” and
– “B=false” will mean “xy”
• Assume that we want to abstract the following program which
contains only one statement:
y := y+1
Predicate Abstraction, Step 1
• Calculate preconditions based on the predicate
{x = y + 1} y := y + 1 {x = y}
precondition for B being true after
executing the statement y:=y+1
{x  y + 1} y := y + 1 {x  y}
precondition for B being false after
executing the statement y:=y+1
Using our temporal logic notation
we can say something like:
{x=y+1}  AX{x=y}
Again, using our temporal logic
notation:
{x≠y+1}  AX{x≠y}
Predicate Abstraction, Step 2
• Use decision procedures to determine if the predicates used for
abstraction imply any of the preconditions
x = y  x = y + 1 ? No
x  y  x = y + 1 ? No
x = y  x  y + 1 ? Yes
x  y  x  y + 1 ? No
Predicate Abstraction, Step 3
• Generate abstract code
Predicate abstraction
wrt the predicate “x=y”
IF B THEN B := false
ELSE B := true | false
y := y + 1
1) Compute
preconditions
3) Generate
abstract code
{x = y + 1} y := y + 1 {x = y}
{x  y + 1} y := y + 1 {x  y}
2) Check
implications
x = y  x = y + 1 ? No
x  y  x = y + 1 ? No
x = y  x  y + 1 ? Yes
x  y  x  y + 1 ? No
Checking conformance to a state machine
• We want to check if this procedure conforms to this LTS
void example() {
do {
A: KeAcquireSpinLock();
nPacketsOld = nPackets;
req = devExt->WLHV;
if(req && req->status){
devExt->WLHV = req->Next;
B:
KeReleaseSpinLock();
irp = req->irp;
if(req->status > 0){
irp->IoS.Status = SUCCESS;
irp->IoS.Info = req->Status;
} else {
irp->IoS.Status = FAIL;
irp->IoS.Info = req->Status;
}
SmartDevFreeBlock(req);
IoCompleteRequest(irp);
nPackets++;
}
} while(nPackets!=nPacketsOld);
C: KeReleaseSpinLock();
}
KeAcquireSpinLock()
SpinLock
KeReleaseSpinLock()
return
STOP
Converting a C program to a state machine
• We can convert a C program to a state machine
– The control component of the state machine will be states of the
control from graph
– The data component of the state machine will be the values of the
predicates used for predicate abstraction
C Code:
void example() {
do {
A: KeAcquireSpinLock();
nPacketsOld = nPackets;
req = devExt->WLHV;
if(req && req->status){
devExt->WLHV = req->Next;
B:
KeReleaseSpinLock();
irp = req->irp;
if(req->status > 0){
irp->IoS.Status = SUCCESS;
irp->IoS.Info = req->Status;
} else {
irp->IoS.Status = FAIL;
irp->IoS.Info = req->Status;
}
SmartDevFreeBlock(req);
IoCompleteRequest(irp);
nPackets++;
}
} while(nPackets!=nPacketsOld);
C: KeReleaseSpinLock();
}
State Machine (as a program):
void example()
begin
do
A: KeAcquireSpinLock();
skip;
if (*) then
skip;
B:
KeReleaseSpinLock();
skip;
if (*) then
skip;
else
skip;
fi
skip;
fi
while (*);
C: KeReleaseSpinLock();
end
Other than the statements labeled A,
B and C, all the rest are ε-transitions
Abstraction Preserves Correctness
• The state machine that is generated with predicate abstraction is nondeterministic (the branches labeled “*” are non-deterministic choices)
– Non-determinism is used to handle the cases where the
predicates used during predicate abstraction are not sufficient
enough to determine which branch will be taken
• If we find no error in the generated state machine then we are sure
that there are no errors in the original program
– The abstract state machine allows more behaviors than the
original program due to non-determinism.
– Hence, if the abstract state machine is correct then the original
program is also correct.
Counter-Example Guided Abstraction
Refinement (CEGAR)
• However, if we find an error in the abstract state machine this does
not mean that the original program is incorrect.
– The erroneous behavior in the abstract state machine could be an
infeasible execution path that is caused by the non-determinism
introduced during abstraction.
• Counter-example guided abstraction refinement is a technique used
to iteratively refine the abstract state machine in order to remove the
spurious counter-example traces
CEGAR
The basic idea in counter-example guided abstraction refinement is the
following:
• First look for an error in the abstract program (if there are no errors,
we can terminate since we know that the original program is correct)
• If there is an error in the abstract program, generate a counterexample path on the abstract program
• Check if the generated counter-example path is feasible using a
theorem prover
• If the generated path is infeasible add the predicate from the branch
condition where an infeasible choice is made to the predicate set and
generate a new abstract program using predicate abstraction
CEGAR
Refined Abstraction:
Abstraction:
(using the predicate (nPackets = npacketsOld))
the boolean variable b
void example()
void example()
represents the predicate
begin
begin
(nPackets = npacketsOld)
do
do
A: KeAcquireSpinLock();
A: KeAcquireSpinLock();
skip;
b := T;
if (*) then
if (*) then
skip;
skip;
B:
KeReleaseSpinLock();
B:
KeReleaseSpinLock();
skip;
skip;
if (*) then
if (*) then
skip;
skip;
else
else
skip;
skip;
fi
fi
skip;
b := b ? F : *;
fi
fi
while (*);
while (!b);
C: KeReleaseSpinLock();
C: KeReleaseSpinLock();
end
end
CEGAR
• Using counter-example guided abstraction refinement we are
iteratively creating more an more refined abstractions
• This iterative abstraction refinement loop is not guaranteed to
converge for infinite domains
– This is not surprising since automated verification for infinite
domains is undecidable in general
• The challenge in this approach is automatically choosing the right set
of predicates for abstraction refinement
– This is similar to finding a loop invariant that is strong enough to
prove the property of interest
SLAM Project
• SLAM project at Microsoft Research
– Verification of C programs
– Can handle unbounded recursion but does not handle
concurrency
– Uses predicate abstraction and CEGAR
• SLAM toolkit was developed to find errors in windows device drivers
– Predicate abstraction example in my slides is from:
• “The SLAM Toolkit”, Thomas Ball and Sriram K. Rajamani,
CAV 2001
• Windows device drivers are required to interact with the windows
kernel according to certain interface rules
• SLAM toolkit has an interface specification language called SLIC
(Specification Language for Interface Checking) which is used for
writing these interface rules (which are state machines)
• The SLAM toolkit checks if the driver code conforms to these
interface specifications