트랜잭션 기반 데이터베이스 응용프로그램의 안전성 자동 검증

Download Report

Transcript 트랜잭션 기반 데이터베이스 응용프로그램의 안전성 자동 검증

A Path Sensitive Type System
for Resource Usage Verification
of C like languages
Korea Advanced Institute of Science and Technology
Hyun-Goo Kang, Youil Kim,
Taisook Han, Hwansoo Han
APLAS05
1
Outline

Problem & Goal

Type System

Conclusion
APLAS05
2
Resource Usage Protocol

A program should use resources in a valid way.

Such a protocol is usually specified by a correct
sequence of actions on the resource, which is
recognizable by a finite state machine.

Example
–
–
–
–
APLAS05
A file should be open before being written.
A memory cell should not be accessed after deallocation.
An acquired lock should be released eventually.
…
3
Example
[ Program 1 ]
main() {
FILE* fp = fopen(“f”,”w”);
fprintf(fp,”x”);
fclose(fp);
}
[ Program 2 ]
main() {
FILE* fp = fopen(“f”,”w”);
if (fp) {
fprintf(fp,”x”);
fclose(fp);
}
Path
}
APLAS05
When a program analyzer assumes that
fopen always opens the specified file,
Miss the bug
False alarm
Sensitivity is Essential !
4
A Path Sensitive Specification in FA
fopen
{ret<=0}
read/
write
fopen
{ret>0}
Opened
close
Closed
read/
write/
close
APLAS05
fopen
Error
5
Related Works

Path insensitive verification : actions in finite
automata specification are limited as syntactically
identifiable sets
– Resource Usage Analysis (Igarashi & Kobayashi)
– Vault (DeLine & Fahndrich)

Path sensitive but whole program analysis
– SLAM (Ball & Rajamani @ MSR), BLAST (Henzinger et. al. @
UCB)
– ESP (Das et. al. @ MSR)

Path sensitive and modular, but unsound
– Saturn (Yichen Xie, Alex Aiken @ Stanford)
APLAS05
6
Our Goal is

To design a path sensitive resource usage
analysis

To design it as a modular analysis for
modular specification/verification and
scalability

To design it as an automatic and sound
analysis
APLAS05
7
Observations





Path sensitivity is essential.
Values to identify paths are mainly constants and
limited to some simple integer values.
A pointer to file-like resources is normally used just
as a reference.
Intraprocedural alias of resources is often but
interprocedural alias of resources is not frequent.
Resource allocation rarely appears within loops.
Even if it appears, every resource allocated in the
loop should be deallocated or should have the same
specification.
APLAS05
8
Selected Abstraction

Domain abstraction
–
–

Join at merge point
–
–

If resource contexts from different paths are different, then we
collect (union) them as a set.
Otherwise we do normal join over our lattice type. (t)
Resource identification
–

Resource states are traced in concrete level. (no abstraction,
finite)
Values that identify paths are traced with a constant propagation
lattice.
Resources are identified by allocation points. All resources
allocated in the same program point should satisfy the same
resource usage specification.
Tracing resources
–
APLAS05
Alias information is traced in the path sensitive way within
function body under the assumption of no interprocedural alias.
9
Outline

Problem & Goal

Type System

Future work / Conclusion
APLAS05
10
Our Type System

Type ≈ lattice element instrumented with
type variables

Basically a subtype system (bounded
polymorphism)

We add flow and path sensitivity.
APLAS05
11
Domain Design (Basic Types)
T
T
MZ
M
MP

Z
T
ZP
NR
r 1 … rn
P

sign
AL
NA

C
O


£
resource id
value
A ` X1 v X2
; ` P v MP
APLAS05

RC
T
allocation
£
state
resource
state
state of a resource
if X1vX2 2 Bas or X1 v X2 2 A
{  v P } `  v MP
12
Domain Design (Resource Heap)


Natural definition of resource heap would be
– resource Id ! (allocation state, resource state)
But we are interested only in the resources
related to the function inferred.
– constrained heap
{}
w
{h | h(r1) = open}
concretize()
w
{(r1,AL,O)} w {(r1,AL,O),(r2,NA,C)}
– heap update history

w
H¢[r1 (AL,O)]
H¢[r1 (NA,C)]¢[r1 (AL,O)]
APLAS05
13
Domain Design (Set of Paths)

A Input Path (A)
–
a set of constraints over all type variables (input partition)
{ 1vP, 1vRC, v{(1,AL,O)} }
A 1 ` A2
– order is defined as
` A1 v A 2

Output Paths ()
–
–
set of outputs : { (v1,1,H1), …, (vn,n,Hn) }
order is defined as
8 (v,,H) 2 1. 9 (v’,’,h’) 2 2.
A` vvv’ Æ A` v’ Æ A` HvH’
` 1v2
APLAS05
14
Input Path Partitioning / Merging
vMZ
vP
vMZ, v>, v{}
{ v>, v>, v{} }
vP, v>, v{}
x>0
(x) = (,)
vMZ, v>, v{}
v>, v>, v{}
vP, v>, v{}
APLAS05
, 
, 
, 
1, 
1t2, 
2, 
15
v>, vNR, v{}
v>, vRC, v{(,NA,>)}
{ v>, v>, v{} }
v>, vRC, v{(,AL,C)}
, 
v>, vRC, v{(,AL,O)}
(x) = (,)
(x)=(R,D)
close x
A ` R v RC
A ` H v {(R,AL,O)}
A,,H ` close x : {(Z,,H¢[R  (AL,C)])}
error “not resource”
error “not allocated”
error “not opened”
{ (Z, , ¢[  (AL,C)]) }
APLAS05
16
v>, vNR, v{}
v>, vRC, v{(,NA,>)}
{ v>, v>, v{} }
v>, vRC, v{(,AL,O)}
, 
v>, vRC,  v{(,AL,C)}
(x) = (,)
open x
(x)=(R,D)
A ` RvRC
A ` H v {(R,AL,C)}
A,,H ` open x : {(P,,H¢[R (AL,O)]),
(Z,,H) }
error “not resource”
error “not allocated”
error “not closed”
P,,H¢[  (AL,O)]
Z,,H
APLAS05
17
Domain Design (Function Type)

A set of input path(A)/output paths() pairs:
– 8,,. {(A1,1),…,(An,n) }
– order is defined as
8(A2,2)2ts2. 9(A1,1)2ts1. ` A2vA1 Æ A2 ` 1v2
8(A1,1)2ts1. 9(A2,2)2ts2. ` A1vA2 Æ A1 ` 1v2
A ` ts1vts2
APLAS05
18
open x : 8,,. {vRC, v{(,AL,C)} ! {((,P),¢[(AL,O)]),((,Z),)}
close x : 8,,. {vRC, v{(,AL,O)} ! {((NR,Z),¢[(AL,C)])}
Typing Example
use x : 8,,. {vRC, v{(,AL,O)} ! {((NR,Z),)}
f (x) v>,v>, v{}
vP,v>, v{}
x>0
[x:(,)]

[x:(,)]

vMZ,v>, v{}
[x:(,)]

use x
vP,vRC,v{(,AL,O)}
vMZ,vRC,v{(,AL,C)}
{}
.[(AL,C)]
{}
x=open x
vP, vRC,  v{(,AL,O)}
[x:(,)]

vMZ, vRC, v{(,AL,C)}
vP,vRC,v{(,AL,O)}
[x:(,P)]
.[(AL,O)]
vMZ,vRC,v{(,AL,C)} .[(AL,C)]
f(x)
close x
vP, vRC,  v{(,AL,O)}
[x:(,)]
.[(AL,C)]
v>, v>, v{}
]
[x:(,Z)]

.[(AL,C)]
Fixpoint !!
vMZ,vRC,v{(,AL,C)}
vMZ,vRC,v{(,C)}
[x:(,P)]
[x:(,ZP)]
[x:(,ZP)
] [x:(,Z)]
{}
{}
(={.[((AL,C)]}]{})
{.[(AL,C)]}
[
APLAS05
19
Soundness

Theorem 1 [Correctness of Type System]
If a configuration C is typed,
then C is (finished) or it goes without type error.
– Two main lemma : subject reduction & progress

Theorem 2 [Correctness of Algorithm]
If I(A,,H,e) = { (A1,1), , (An,n) },
then Ai,,H ` e : i.
APLAS05
20
Implementation
We have implemented a prototype, and
experimented it with some C programs.
 The prototype extends the algorithm in the
paper:

– Partitions input constraints more lazily.
– Handles global variables and heap storage.
– Detects resource leaks.
APLAS05
21
Ongoings and future work





Type based dynamic allocation
Multiple error message
Resource type based slicing
Modular pointer analysis specialized for this
problem
Specification language
APLAS05
22
Conclusion

We formalized a sound path-sensitive analysis for
resource usage protocols.

Our analysis is modular; the analysis summarizes
each function as a type scheme, without using any
user annotations.

In the paper, we also showed how to handle dynamic
resource allocation and aliases.
APLAS05
23
Thank You
APLAS05
24
Demo
APLAS05
25
APLAS05
26
Related Works

Path insensitive verification : actions in finite automata specification
are limited as syntactically identifiable sets
– Resource Usage Analysis (Igarashi & Kobayashi)
– Vault (DeLine & Fahndrich)

Path sensitive but whole program analysis
– SLAM (Ball & Rajamani @ MSR), BLAST (Henzinger et. al. @ UCB)
– C2BP. Then, model check
– ESP (Das et. al. @ MSR)
– Ideas of selective join
– Lighter-weighted than SLAM/BLAST. But still whole program analysis

Path sensitive and modular, but unsound
– Saturn (Yichen Xie, Alex Aiken @ Stanford)
–
–
–
–
APLAS05
Program constructs  Bit level boolean constraint (equation)
Inference  SAT solving
Unsound : assumption of no alias between arguments, finite loop unrolling
Blind summary : not symbolic (their optimization : slicing query dependent part
after whole equation generation)
27
Ongoings and future work

Type based dynamic allocation
v {(ri,NA,X)} ! ¢[][ri (AL,Y)]

v {(,NA,X)} ―alloc({})! ¢[][
In GCC package of SPEC95 benchmark, there is a function
that opens 15 file concurrently (215 path), but if we slice it
based on FILE* type, then we can accelerate the complexity
of inference to 2£15 safely
Pointer / structure / array
–

Better error recovery algorithm to remove multiple false
alarm caused by one bug
Resource type based slicing
–

now  is program point of
(AL,Y)]allocator function (instantiated)
Multiple error message
–

ri is program point of alloci
Modular pointer analysis specialized for this problem
Specification Language
APLAS05
28
Alias
vMZ
vP
vMZ, v>, v{}
{ v>, v>, v{} }
vP, v>, v{}
x>0
, 
, 
, 
(x) = (,)
Can not be combined !
1 # 2 by no
interprocedural alias
assumption
APLAS05
vMZ, v>, v{}
[fp:(1,1)], 
vP, v>, v{}
[fp:(2,2)], 
29
The Resource Language
APLAS05
30
Dynamic Semantics
APLAS05
31
Types
APLAS05
32
Typing Rule (Resource API)
output path generator
APLAS05
33
Resource Path Sensitive Join
APLAS05
34
Typing Rule (Branch)
input path generator
APLAS05
35
Typing Rule (Func Abstr / App)
input/output path generator
APLAS05
36
Typing Rule (others)
APLAS05
37
Retrospection (what’s hard)

To be modular
– Managing/Inferring , ,  part in sound/symbolic
way is complex

To be a lazy input path (constraint)
partitioning algorithm
– Assumption set is not boolean complete lattice.
(We don’t have exact Ac)
APLAS05
38