Decision Procedures Customized for Formal Verification Randal E. Bryant Carnegie Mellon University http://www.cs.cmu.edu/~bryant Contributions by former graduate students: Sanjit Seshia, Shuvendu Lahiri.

Download Report

Transcript Decision Procedures Customized for Formal Verification Randal E. Bryant Carnegie Mellon University http://www.cs.cmu.edu/~bryant Contributions by former graduate students: Sanjit Seshia, Shuvendu Lahiri.

Decision Procedures Customized for Formal Verification

Randal E. Bryant

Carnegie Mellon University

http://www.cs.cmu.edu/~bryant Contributions by former graduate students: Sanjit Seshia, Shuvendu Lahiri

Outline

Context

Infinite state models of hardware systems

Verification techniques Needs

Requirements for decision procedures

Dealing with quantifiers Our Solution

 

SAT-based procedure “Eager” Boolean encoding

– 2 – CADE ‘05

Verification Example

Task

Verify that microprocessor correctly implements instruction set definition

Even though heavily pipelined

– 3 –

Alpha 21264 Microprocessor Microprocessor Report, Oct. 28, 1996

CADE ‘05

Existing Hardware Verification Methods

Simulators, equivalence checkers, model checkers, … All Operate at Bit Level

View each register or memory bit as state variable

Behavior of each state variable defined by Boolean function Strengths

Finite-state systems conceptually simple

BDDs & SAT procedures allow high degrees of automation Limitations

State space can be very large

Only verify fixed instantiation of system

Specific memory sizes, number of processes, buffer lengths, …

– 4 – CADE ‘05

Verification Challenges

Sources of Complexity

Lots of internal state

Complex control logic Opportunities

Most of the logic serves to store, select, and communicate data

– 5 –

Alpha 21264 Microprocessor Microprocessor Report, Oct. 28, 1996

CADE ‘05

Applying Data Abstraction to Hardware Verification

Idea

Abstract details of data encodings and operations

Keep control logic precise Applications

Verify overall correctness of system

Assuming individual functional units correct Advantages of Abstraction

Abstract infinite-state system easier to verify than detailed finite-state one

Parametric representation allows verification of many different system variants

Arbitrary number of processes, buffer lengths, etc.

– 6 – CADE ‘05

Word Abstraction

Control Logic Com.

Log.

1 Data Path Com.

Log.

2

– 7 –

Data: Abstract details of form & functions Control: Keep at bit level Timing: Keep at cycle level

CADE ‘05

Data Abstraction #1: Bits → Terms

x

0

x

1

x

2 

x x n

-1

View Data as Symbolic Words

Arbitrary integers

No assumptions about size or encoding

Classic model for reasoning about software

Can store in memories & registers

– 8 – CADE ‘05

Abstracting Data Bits

Control Logic Com.

?

1 Com.

?

2

What do we do about logic functions?

– 10 – CADE ‘05

Abstraction #2: Uninterpreted Functions

A L U

f

For any Block that Transforms or Evaluates Data:

Replace with generic, unspecified function

Only assumed property is functional consistency:

a

=

x

b

=

y

f

(

a, b

) =

f

(

x, y

)

– 11 – CADE ‘05

Abstracting Functions

Control Logic Com.

Log.

1 Data Path Com.

Log.

1 For Any Block that Transforms Data:

Replace by uninterpreted function

Ignore detailed functionality

Conservative approximation of actual system

– 12 – CADE ‘05

Abstraction #3: Modeling Memories as Mutable Functions

Memory M Modeled as Function

a

M

M

(

a

): Value at location

a

Initially M

a

m

0

Arbitrary state

Modeled by uninterpreted function

m

0

– 14 – CADE ‘05

Effect of Memory Write Operation

Writing Transforms Memory

M

= Write(

M

,

wa

,

wd

)

wa

M

=

wd a

M 1 0 Express with Lambda Notation

M

 = 

a

. ITE(

a

=

wa

,

wd

,

M

(

a

)) – 15 –  Reading from updated memory:   Address

wa

will get

wd

Otherwise get what’s already in M CADE ‘05

Systems with Buffers

Unbounded Buffer In Use Circular Queue In Use • • • Modeling Method

Mutable function to describe buffer contents

Integers to represent head & tail pointers

Parameterize buffer capacity with symbolic value Max Max 1

– 16 – CADE ‘05

Some History of Term-Level Modeling

Historically

Standard model used for program verification

Unbounded integer data types

Widely used with theorem-proving approaches to hardware verification

E.g, Hunt ’85 Automated Approaches to Hardware Verification

Burch & Dill, ’95

Tool for verifying pipelined microprocessors

Implemented by form of symbolic simulation

Continued application to pipelined processor verification

– 17 – CADE ‘05

UCLID

Seshia, Lahiri, Bryant, CAV ‘02 Term-Level Verification System

Language for describing systems

Inspired by CMU SMV

Symbolic simulator

Generates integer expressions describing system state after sequence of steps

Decision procedure

Determines validity of formulas

Support for multiple verification techniques Available by Download http://www.cs.cmu.edu/~uclid

– 18 – CADE ‘05

Required Logic

Scalar Data Types

Formulas (

F

)

Control signals

Terms (

T

)

Data values Boolean Expressions Integer Expressions Functional Data Types

Functions (

Fun

) Integer

Integer

Immutable: Functional units

Mutable: Memories

Predicates (

P

) Integer

Boolean

Immutable: Data-dependent control

Mutable: Bit-level memories

– 19 – CADE ‘05

CLU Logic

C ounter Arithmetic, L ambda Expressions and U interpreted Functions Terms (

T

)

ITE

(

F

,

T

1 ,

T

2 )

Fun

(

T

1 , …,

T k

)

succ

(

T

)

pred

(

T

) Integer Expressions If-then-else Function application Increment Decrement Formulas (

F

)

F

,

F

1

F

2 ,

F

1

T

1 =

T

2

T

1 <

T

2

P

(

T

1 , …,

T k

)

F

2 Boolean Expressions Boolean connectives Equation Inequality Predicate application

– 20 –

To support pointer operations

CADE ‘05

CLU Logic (Cont.)

Functions (

Fun

)

f

x

1

, …, x k . T

Predicates (

P

)

p

x

1

, …, x k . F

Integer

Integer Uninterpreted function symbol Function definition Integer

Boolean Uninterpreted predicate symbol Predicate definition

– 21 – CADE ‘05

Outline

Context

Infinite state models of hardware systems

Verification techniques Needs

Requirements for decision procedures

Dealing with quantifiers Our Solution

 

SAT-based procedure “Eager” Boolean encoding

– 22 – CADE ‘05

Verifying Safety Properties

Present State Next State Bad States

Reachable States Reset States Reset Inputs (Arbitrary) State Machine Model

State encoded as Booleans, integers, and functions

Next state function expresses how updated on each step Prove: System will never reach bad state

– 23 – CADE ‘05

Bounded Model Checking

Reachable R n Bad States

– 24 –

R 2 R 1 Reset States Repeatedly Perform Image Computations

Set of all states reachable by one more state transition Underapproximation of Reachable State Set

But, typically catch most bugs with 8 –10 steps

CADE ‘05

S

Implementing BMC

Reset

     

Bad

Satisfiable?

 – 25 –

X 1 X 2 X n

Construct verification condition formula for step n by symbolically simulating system for n cycles

Check with decision procedure

Do as many cycles as tractable

CADE ‘05

True Model Checking

R n Bad States R 2 R 1 Reset States Reach Fixed-Point

R n = R n+1 = Reachable

– 26 –

Impractical for Term-Level Models

Many systems never reach fixed point

Can keep adding elements to buffer

Convergence test undecidable (Bryant, Lahiri, Seshia, CHARME ’03)

CADE ‘05

Inductive Invariant Checking

I

Bad States Reachable States Reset States Key Properties of System that Make it Operate Correctly

Formulate as formula

I

Prove Inductive

 – 27 – 

Holds initially

I

(s 0 ) Preserved by all state changes

I

(s)

I

(

(i, s))

CADE ‘05

Inductive Invariants

Formulas

I

1 , …,

I n

 

I I j

(

s

0 ) holds for any initial state

s

0 , for 1

1 (

s

)

I

2 (

s

)

successor state

s I n

(

s

)

 

for 1

j I j

(

s

n

j

n

) for any current state

s

and Overall Correctness

Follows by induction on time Restricted form of invariants

   

x 1

x 2 …

x k

(x 1 …x k )

(x 1 …x k ) is a CLU formula without quantifiers x 1 …x k are integer variables free in

(x 1 …x k )

Express properties that hold for all buffer indices, register IDs, etc.

– 28 – CADE ‘05

Proving Invariants

Proving invariants inductive requires quantifiers |= [

x 1

x 2 …

x k

(x 1 …x k ) ]

[

y 1

y 2 …

y m

(y 1 …y m ) ] Prove unsatisfiability of formula

x 1

x 2 …

x k

(x 1 …x k )

  

(y 1 …y m ) Undecidable Problem

In logic with uninterpreted functions and equality

– 29 – CADE ‘05

Invariant Checking: Out-of-Order Processor Designs

Total Invariants UCLID time Person time base 13 54 s 2 days exc 34 exc / br 39 exc / br / mem-simp 67 exc / br / mem 71 236 s 7 days 403 s 9 days 1594 s 24 days 2200 s 34 days

Generating invariants requires considerable human effort

Impractical for realistic designs

– 30 – CADE ‘05

Constructing Invariants from Predicates

Predicates rob.head

reg.tag(r) reg.valid(r)

– 31 –

reg.tag(r) = t rob.dest(t) = r Invariant

r,t.

reg.valid(r)

 

reg.tag(r) = t ( rob.head

 

reg.tag(r) < rob.tail rob.dest(t) = r ) Result: Correctness

CADE ‘05

Automatic Predicate Abstraction

Graf & Saïdi, CAV ’97 Idea

Given set of predicates

P

1 (

s

), …,

P k

(

s

)

Boolean formulas describing properties of system state

View as abstraction mapping:

States

{0,1}

k

Defines abstract FSM over state set {0,1}

k

Form of abstract interpretation

Do reachability analysis similar to symbolic model checking Early Implementations Inefficient

Guess at possible next abstract states

Test with call to decision procedure

– 32 – CADE ‘05

P.E. as Invariant Generator

A R n Reach Fixed-Point on Abstract System R 2 Abstract System R 1 Reset States

Termination guaranteed, since finite state Concretize

Equivalent to Computing Invariant for Concrete System Concrete System

C

I

Strongest possible invariant that can be expressed by formula over these predicates Reset States

– 33 – CADE ‘05

Symbolic Formulation of Predicate Abstraction

Lahiri, Bryant, Cook, CAV ‘03 Basic Operation

Compute set of legal abstract next states



( B

) given current abstract states

( B ) B, B

: Abstract current and next-state state variables

,



: Boolean formulas

Create formula of form

( S , B

) Possible combinations of current concrete state S abstract state B

and next Formulate as Quantifier Elimination Problem

Generate formula of form



( B

)

 

S

( S , B

) S : Integer variables

For interpretation of B

, formula



true iff

( S , B

) satisfiable

– 34 – CADE ‘05

Outline

Context

Infinite state models of hardware systems

Verification techniques Needs

Requirements for decision procedures

Dealing with quantifiers Our Solution

 

SAT-based procedure “Eager” Boolean encoding

– 35 – CADE ‘05

Decision Procedure Needs

Bounded Model Checking

Satisfiability of quantifier-free CLU formula

Handled by decision procedure Invariant Checking

Satisfiability of quantified CLU formula

Undecidable Predicate Abstraction

Eliminate quantifiers from CLU formula Role of Decision Procedure

Apply in sound, but incomplete way

– 36 – CADE ‘05

UCLID Decision Procedure Operation

CLU Formula Lambda Expansion

Series of transformations leading to propositional formula

Except for lambda expansion, each has polynomial complexity

-free Formula Function & Predicate Elimination Term Formula Finite Instantiation Boolean Formula Boolean Satisfiability

– 37 – CADE ‘05

SAT-based Decision Procedures

Input Formula Satisfiability-preserving Boolean Encoder Boolean Formula SAT Solver – 38 – satisfiable unsatisfiable

EAGER ENCODING

Input Formula Approximate Boolean Encoder additional clause unsatisfiable Boolean Formula SAT Solver satisfiable satisfying assignment First-order Conjunctions SAT Checker unsatisfiable

LAZY ENCODING

satisfiable CADE ‘05

Eager Encoding Characteristics

Input Formula Satisfiability-preserving Boolean Encoder Boolean Formula SAT Solver –

Must encode all information about domain properties into Boolean formula

Some properties can give exponential blowup

+

Lets SAT solver do all of the work Good Approach for Some Domains

Modern SAT solvers have remarkable capacity

Good at extracting relevant portions out of very large formulas

Learns about formula properties as search proceeds

satisfiable unsatisfiable – 39 – CADE ‘05

Encoding Methods

Difference Logic Formula

– 41 – Small Domain Encoding (SD)

Boolean Formula

SAT Solver satisfiable/unsatisfiable Per-Constraint Encoding (PC) CADE ‘05

Small Domain Encoding (SD)

[Bryant, Lahiri, Seshia, CAV’02]

x

y

y

z

z

x+

1

0

x

1

x

0

  

0

y

1

y

0

  

0

y

1

y

0

  

0

z

1

z

0

  

0

z

1

z

0

  

0

x

1

x

0

+

1 Observation: To check satisfiability, need to consider all possible

relative

orderings of

finitely-many

expressions

z x x

+1 Values increase

– 42 –

y z y x x

+1 Can use Boolean encoding of finite range of values

4 values in this case, so 2-bit encoding

CADE ‘05

Per-Constraint Encoding (PC)

[Strichman, Seshia, Bryant, CAV’02]

x

y

y

z

z

x+

1

Overall Boolean Encoding

e 1

e 2

e 3 e 1

 

e 2

e 4

  

e 3 e 4 e 1 e 2 e 3 z

x y

 

x+ z y

1

New Difference Predicate

e 4 x

z

Transitivity Constraints – 43 – CADE ‘05

Size of Boolean Encoding: SD better than PC

Let

N

be size of original difference logic formula

Size of a directed acyclic graph representation SD encoding size is worst-case

O

(

N

2

)

PC encoding size is worst-case

O

( 2

N

)

Can generate

O

( 2

N

) transitivity constraints Example:

N =

6813 Method PC SD Boolean Encoding Size > 1000000 54465

– 44 – CADE ‘05

Impact on SAT problem: SD vs PC

Experimentally compared zChaff performance on SD and PC encodings of several unsatisfiable formulas Sample result: Method PC # Boolean variables 57211 # CNF Clauses 169387 # Conflict Clauses 150 zChaff Time (sec) 0.56

SD 23112 67699 15811 21.63

PC better than SD for zChaff – 45 – CADE ‘05

How to Choose Encoding

Hybrid Strategy

Partition variables into classes

Which ones are compared to each other

For each class, choose encoding method

PC except SD when PC blows up How to Determine Whether PC Will Work

Try to predict based on formula characteristics

Number of constraints, density, …

Selection procedure trained by machine learning

– 46 – CADE ‘05

Some Lessons We’ve Learned About Decision Procedures

Preserve Boolean Structure

Other approaches require collapsing to conjunctions of predicates (or extracting them dynamically) Exploit Problem Characteristics

Sparseness

Polarity structure Let SAT Solver Do the Work

Eager encoding: provide sufficient set of constraints to prove / disprove formula

They are good at digesting large volume of information

– 47 – CADE ‘05

Invariant Checking Revisited

Prove Unsatisfiability of Formula

x 1

x 2 …

x k

(x 1 …x k )

 

General Form:

X

 

(y 1 …y m )

(X)

  

(Y) Quantifier Instantiation

Generate expressions E 1 (Y), …, E n (Y)

Using terms that appear in Q

Expand as

( E 1 (Y) )

 

 

( E n (Y) )

  

(Y) If unsatisfiable, then so is quantified formula

Sound, but incomplete Trade-off

Be clever about instantiation, or

Instantiate many terms and rely on decision procedure capacity

– 48 – CADE ‘05

Predicate Abstraction Revisited

Formulate as Quantifier Elimination Problem

Generate formula of form



( B

)

 

S

( S , B

) S : Integer variables Use Eager SAT Encoding of

 

Get formula

A P( A , B

) A : Boolean variables

Satisfying solutions for P w.r.t. B

same as those for

 

Core problem of symbolic model checking

– 49 – CADE ‘05

Quantifier Elimination for P.A.

Formula

A P( A , B

) A : Boolean variables

Typically: 200+ variables for A , ~20 for B BDD-Based

Use partitioning techniques developed for symbolic model checking

Typically too many total Boolean variables SAT Enumeration

Find satisfying solution

( A )

 

( B

) to P

 

Enumerate solution

( B

) Reformulate P as P

 

( B

)

Performance: about 1000 solutions / second

– 50 – CADE ‘05

Why Verification Tasks Feasible

CLU Logic Fairly Simple

Equality, uninterpreted functions, difference constraints

Small model property “Deep” Reasoning Not Required

Formulas large and messy, but straightforward

Verifying systems that are designed to have constrained behaviors

Only checking effect of a few cycles of system operation

– 51 – CADE ‘05

Decision Procedures Revisited

SAT-Based Approaches Effective

Good performance as decision procedures

Key to implementing predicate abstraction

Quantifier elimination Eager Encoding Gives Good Performance

Avoids many iterations of theory-specific checkers

Extends to linear integer arithmetic

Seshia & Bryant, LICS ‘04

Quantifier-free Presburger

Small domain encoding exploiting sparseness

– 52 – CADE ‘05

Areas of Research

Bit-Vector Decision Procedures

True model for hardware & low-level software

Bit-field extraction

Bit-wise Boolean operations

Overflow effects

Automatically apply abstractions

Abstract to symbolic terms whenever possible Boolean Quantifier Elimination

SAT enumeration still not good enough

Limits predicate abstraction to ~25 predicates

Core problem for symbolic model checking

– 53 – CADE ‘05

More Research

Proof Generation

Hard to see how to generate unsatisfiability proof for CLU formula Debugging Support

Bounded model checking: provide counterexample trace

Invariant checking: hard to determine why invariant fails

And may be due to weakness in quantifier instantiation

Predicate abstraction: Gets nowhere without right set of predicates Proving Liveness

Current abstractions do not preserve liveness properties

Can help in proving progress invariant

– 54 – CADE ‘05

Questions?