Abstractions and Decision Procedures for Effective Software Model Checking Prof. Natasha Sharygina The University of Lugano, Carnegie Mellon University Microsoft Summer School, Moscow, July 2011 Lecture1 1

Download Report

Transcript Abstractions and Decision Procedures for Effective Software Model Checking Prof. Natasha Sharygina The University of Lugano, Carnegie Mellon University Microsoft Summer School, Moscow, July 2011 Lecture1 1

Abstractions and Decision
Procedures for Effective
Software Model Checking
Prof. Natasha Sharygina
The University of Lugano,
Carnegie Mellon University
Microsoft Summer School, Moscow, July 2011
Lecture1 1
Outline
Day 1 (Lectures 1 and 2)
• Model Checking in a Nutshell
• Abstractions in Model Checking
– Predicate Abstraction
– SAT-based approach
2
Guess what this is!
Bug Catching: Automated
Program Analysis
Informatics Department
The University of Lugano
Professor Natasha Sharygina
Two trains, one bridge – model transformed
with a simulation tool, Hugo
Bug Catching: Automated
Program Analysis
Informatics Department
The University of Lugano
Professor Natasha Sharygina
What is Formal Verification?
• Build a mathematical model of the system:
– what are possible behaviors?
• Write correctness requirement in a specification
language:
– what are desirable behaviors?
• Analysis: (Automatically) check that model satisfies
specification
12
What is Formal Verification (2)?
• Formal - Correctness claim is a precise mathematical
statement
• Verification - Analysis either proves or disproves the
correctness claim
13
Algorithmic Analysis by Model
Checking
• Analysis is performed by an algorithm (tool)
• Analysis gives counterexamples for debugging
• Typically requires exhaustive search of state-space
• Limited by high computational complexity
14
Temporal Logic Model Checking
[Clarke,Emerson 81][Queille,Sifakis 82]
M |= P
“implementation”
(system model)
“specification”
(system property)
“satisfies”, “implements”, “refines”
(satisfaction relation)
15
Temporal Logic Model Checking
more detailed
“implementation”
(system model)
M |= P
more abstract
“specification”
(system property)
“satisfies”, “implements”, “refines”, “confirms”,
(satisfaction relation)
16
Temporal Logic Model Checking
M |= P
system model
system specification
satisfaction relation
17
Decisions when choosing a system
model:
 variable-based vs. event-based
 interleaving vs. true concurrency
 synchronous vs. asynchronous interaction
 clocked vs. speed-independent progress
 etc.
18
Characteristics of system models
which favor model checking over other verification
techniques:
 ongoing input/output behavior
(not: single input, single result)
 concurrency
(not: single control flow)
 control intensive
(not: lots of data manipulation)
19
Decisions when choosing a system
model:
While the choice of system model is important for
ease of modeling in a given situation,
the only thing that is important for model checking is
that the system model can be translated into some
form of state-transition graph.
20
Finite State Machine (FSM)
• Specify state-transition behavior
• Transitions depict observable behavior
unlock
unlock
lock
ERROR
lock
Acceptable sequences of acquiring and releasing a lock
21
High-level View
Linux
Kernel
(C)
Conformance
Check
Spec
(FSM)
22
High-level View
Model
Checking
Linux
Kernel
(C)
Finite State
Model
(FSM)
Spec
(FSM)
By Construction
23
Low-level View
State-transition graph
S
set of states
I
set of initial states
AP
set of atomic observation
RSS
L: S  2AP
transition relation
observation (labeling) function
24
s1
a
a,b
b
s2
Run:
s1  s3  s1  s3  s1 
Trace: a  b  a  b  a 
s3
state sequence
observation sequence
25
Model of Computation
ab
a
b
b
c
b
c
c
State Transition Graph
a
b
c
c
c
Infinite Computation Tree
Unwind State Graph to obtain Infinite Tree.
26
A trace is an infinite sequence of state observations
Semantics
ab
a
b
b
c
b
c
c
State Transition Graph
a
b
c
c
c
Infinite Computation Tree
The semantics of a FSM is a set of traces
27
Where is the model?
• Need to extract automatically
• Easier to construct from hardware
• Fundamental challenge for software
Linux Kernel
~1000,000 LOC
Recursion and data structures
Pointers and Dynamic memory
Processes and threads
Finite State
Model
28
Mutual-exclusion protocol
|| loop
loop
out: x1 := 1; last := 1
out: x2 := 1; last := 2
req: await x2 = 0 or last = 2
req: await x1 = 0 or last = 1
in:
in:
x1 := 0
end loop.
P1
x2 := 0
end loop.
P2
29
oo001
or012
ro101
io101
rr112
ir112
pc1: {o,r,i}
pc2: {o,r,i}
x1: {0,1}
x2: {0,1}
last: {1,2}
33222 = 72 states
30
State space blow up
The translation from a system description
to a state-transition graph usually involves
an exponential blow-up !!!
e.g., n boolean variables  2n states
This is called the “state-explosion problem.”
31
Temporal Logic Model Checking
M |= P
system model
system specification
satisfaction relation
32
Decisions when choosing system
properties:
 operational vs. declarative:
automata vs. logic
 may vs. must:
branching vs. linear time
 prohibiting bad vs. desiring good behavior:
safety vs. liveness
33
System Properties/Specifications
-
Atomic propositions: properties of states
(Linear) Temporal Logic Specifications: properties of
traces.
34
Examples of the Robot Control
Properties
• Configuration Validity Check:
If an instance of EndEffector is in the “FollowingDesiredTrajectory” state,
then the instance of the corresponding Arm class is in the ‘Valid” state
Always((ee_reference=1) ->(arm_status=1)
• Control Termination: Eventually the robot control terminates
Eventually(abort_var=1)
37
What is “satisfy”?
M satisfies S if all the reachable states satisfy P
Different Algorithms to check if M |= P.
- Explicit State Space Exploration
For example: Invariant checking Algorithm.
1.
Start at the initial states and explore the states of M
using DFS or BFS.
2.
In any state, if P is violated then print an “error trace”.
3.
If all reachable states have been visited then say “yes”.
38
Abstractions
• They are one of the most useful ways to fight the state
explosion problem
• They should preserve properties of interest:
properties that hold for the abstract model should hold
for the concrete model
• Abstractions should be constructed directly from
the program
40
Abstractions
• Why do we need to abstract?
– To reduce a number of states
– To represent (in a sound manner) infinite state
systems as finite state systems
Abstractions
• Why we need to abstract?
– To reduce a number of states
– To represent (in a sound manner) infinite state
systems as finite state systems
• How do we abstract?
– By removing irrelevant to verification details
Data Abstraction
Given a program P with variables x1,...xn , each over
domain D, the concrete model of P is defined over states
(d1,...,dn)  D...D
Choosing
• Abstract domain A
• Abstraction mapping (surjection) h: D  A
we get an abstract model over abstract states (a1,...,an) 
A...A
43
Example
Given a program P with variable x over the integers
Abstraction 1:
A1 = { a–, a0, a+ }
a+ if d>0
h1(d) =
a0 if d=0
a– if d<0
Abstraction 2:
A2 = { aeven, aodd }
h2(d) = if even( |d| ) then aeven else aodd
44
Existential Abstraction
A
M<A
h
h
h
M
45
Existential Abstraction
1
a
[1]
b
2
c
4
a
3
b
[2,3]
d
e
f
c
5
6
7
[4,5]
M
d
e
f
[6,7]
A
46
Existential Abstraction
• Every trace of M is a trace of A
– A over-approximates what M can do
(Preserves safety properties!): A satisfies   M satisfies 
• Some traces of A may not be traces of M
– May yield spurious counterexamples - < a, e >
• Eliminated via abstraction refinement
– Splitting some clusters in smaller ones
– Refinement can be automated
47
Original Abstraction
1
a
[1]
b
2
c
4
a
3
b
[2,3]
d
e
f
c
5
6
7
[4,5]
M
d
e
f
[6,7]
A
48
Refined Abstraction
1
a
[1]
b
2
c
4
a
3
b
[2]
[3]
d
e
f
c
5
6
7
[4,5]
M
d
e
f
[6,7]
A
49
How to define an abstract model
Given M (model) and ϕ (spec), choose
• Sh - a set of abstract states
• AP – a set of atomic propositions that label
concrete and abstract states
• h : S → Sh - a mapping from S on Sh that
satisfies:
h(s) = h(t) only if L(s)=L(t)
Abstraction
Depending on h and the size of M, Mh (i.e., Ih,
Rh ) can be built using:
• BDDs or
• SAT solver or
• Theorem prover
Predicate Abstraction
[Graf/Saïdi 97]
• Idea: Only keep track of predicates on data
• Abstraction function:
52
Predicate Abstraction
Predicate Abstraction
Labeling of concrete states:
L(s) = { Pi | s |= Pi }
Abstract Model
• Abstract states are defined over Boolean
variables { B1,...,Bk }:
Example
Program over natural variables x, y
AP = { P1, P2, P3 }, where
P1 = x≤1 , P2 = x>y , P3 = y=2
AP = { x≤1 , x>y , y=2 }
For state s, where s(x)=s(y)=0: L(s) = { P1 }
For state t, where t(x)=1, t(y)=2: L(t) = { P1,P3 }
Example
Computing abstract transition
relation
(the same example)
Abstract transition relation
Example
Concrete States:
Predicates:
Abstract transitions?
Predicate Abstraction
 
Abstract Transitions:

Property:
Property holds. Ok.
Predicate Abstraction
 
Property:
Abstract Transitions:

This trace is
spurious!
Predicate Abstraction
New Predicates:

CEGAR
Counter Example Guided Abstraction Refinement
CEGAR approach
CEGAR
M
Spurious
Spurious
counterexample
Validation or
Counterexample
M
Initial Abstraction
Refinement
Refinement
Correct !
Original Model
Abstraction Refinement Loop
Initial
Abstraction
Actual
Program
Verification
Concurrent
Boolean
Program
No error
or bug found
Model
Checker
Property
holds
Counterexample
Abstraction refinement
Refinement
Simulation
successful
Simulator
Bug found
Spurious counterexample
68
Predicate Abstraction for Software
• Let’s take existential abstraction seriously
• Basic idea: with n predicates, there are 2n x 2n
possible abstract transitions
• Let’s just check them!
69
Predicates
Existential Abstraction
Basic Block
Formula
i++;
p’1
p’2
p’3
0
0
0
0
0
1
0
0
1
0
1
1
0
1
1
1
0
0
1
0
0
1
0
1
1
0
1
1
1
0
1
1
0
1
1
1
1
1
1
p1
p2
p3
0
0
0
0
0
1
0
1
0
Current Abstract State
?

Next Abstract State
Query
70
Predicates
Existential Abstraction
Basic Block
Formula
i++;
p’1
p’2
p’3
0
0
0
0
0
1
0
1
0
1
0
1
1
0
0
1
0
0
1
0
1
1
0
1
1
1
0
1
1
0
1
1
1
1
1
1
p1
p2
p3
0
0
0
0
0
1
0
1
0
0
1
1
?

Current Abstract State
Next Abstract State
Query
… and so on …
2n x 2n possible abstract transitions for
71
n predicates
What is the problem?
Problem of existing tools:
 Large number of expensive theorem prover calls – slow
 Over-approximation yields additional,
unnecessary spurious counterexamples
 Theorem prover works on natural numbers, but ANSI-C
uses bit-vectors  false positives
 Most theorem provers support only few operators
(+, -, <, ≤, …), no bitwise operators
SAT-based approach
• Successfully used for abstraction of various designs
(Clarke, Kroening, Sharygina, Yorav – SAT-based
predicate abstraction)
• There is now a version of MSR tool (SLAM) that has it
– Found previously unknown Windows bug
• Create a SAT instance which relates initial value of
predicates, basic block, and the values of predicates
after the execution of basic block
• SAT also used for simulation and refinement
Use SAT solver!
SAT Approach: Given a propositional formula in CNF,
find an assignment to Boolean variables that makes the
formula true:
 = 1  2  3 , where
1 = (x2  x3)
2 = (x1  x4)
3 = (x2  x4)
A = {x1=0, x2=1, x3=0, x4=1}
SATisfying
assignment to !
Use SAT solver!
1.
Generate query equation with
predicates as free variables
SAT-based Solution
Single query for Theorem Prover
Query for SAT
Our Solution
Query for SAT
Use SAT solver!
1. Generate query equation with
predicates as free variables
2.
Transform equation into CNF using
Bit Vector Logic
One satisfying assignment matches
one abstract transition
3.
Obtain all satisfying assignments
= most precise abstract transition
relation
Our Solution
Use SAT solver!
1. Generate query equation with
predicates as free variables
2.
Transform equation into CNF using
Bit Vector Logic
One satisfying assignment matches
one abstract transition
3.
Obtain all satisfying assignments
= most precise abstract transition
relation
This solves two problems:
1.
Now can do all ANSI-C
integer operators, including
*, /, %, <<, etc.
2.
Sound with respect to
overflow
No more unnecessary
spurious counterexamples!
Performance
How does the performance compare with existing
approaches?
1. Runtime potentially exponential
2. Exponential part is inside of SAT solver,
instead of exponential number of
Theorem Prover calls
3. SAT solver is not re-started; all the learning and
pruning done by modern SAT solvers is retained
between iterations.
Performance
• Worst case:
all possible
assignments are
satisfying
• Runtime uncritical up
to 2^14 assignments
Performance
• A realistic experiment: two 32-bit variables, plus n predicates
• Various operators: +, <, shifting, xor, or, and, combinations
thereof, …
• All predicates are affected by basic block
No. of Predicates
Runtime
(with 32-bit *)
4
0.35 s
8
7.20 s
16
71.16 s
32
512.72 s
Compare to 2n x 2n potential theorem prover calls!
Experimental Results

Comparison of SLAM with Integer-based theorem prover against
SAT-based SLAM

308 device drivers

Timeout: 1200s
SATABS

SATABS toolset – SAT-based predicate abstraction

Download and use with Cadence SMV model checker

Take a tcas program and verify it using SATABS

Make a SHORT report following the steps of the assignment