Binary Concolic Execution for Automatic Exploit Generation

Download Report

Transcript Binary Concolic Execution for Automatic Exploit Generation

Binary Concolic Execution for
Automatic Exploit Generation
Todd Frederick
Paradyn Project
Paradyn / Dyninst Week
Madison, Wisconsin
April 12-14, 2010
Vulnerabilities are everywhere…
Binary Concolic Execution
2
An exploit
1987
Finger Server
Robert shell#
Morris
rtm
DD8F2F736800DD8F2F6
2696ED05E5ADD00DD00
DD5ADD03D05E5CBC3B
DD8F2F736800DD8F2F6
2696ED05E5ADD00DD00
DD5ADD03D05E5CBC3B
Binary Concolic Execution
3
The problem: exploiting vulnerable code
o Find an exploit state in a program
o Use a known existing vulnerability
o Previous work automatically finds vulnerable states
[Giffin, Jha, Miller 2006]
o Find input that drives the program down a path
to the exploit state
o Analyze program control flow
o Walk through the program, finding inputs to reach the
current point
o Explore paths in the program to reach the
vulnerability
Binary Concolic Execution
4
The problem
normal input
exploit
Program
Assume we know
of a vulnerability
Binary Concolic Execution
5
Running example
login: good
bad
Using backdoor!
password:
Program
Binary Concolic Execution
6
Working with binary code
exploit
8048282:
8048286:
8048289:
804828c:
804828d:
804828f:
8048290:
8048291:
8048294:
8048299:
804829e:
80482a3:
80482a8:
80482ad:
80482af:
80482b2:
80482b9:
80482bc:
80482c3:
80482c6:
80482c8:
80482ca:
80482cc:
80482cf:
80482d1:
lea
and
pushl
push
mov
push
push
sub
call
mov
mov
mov
mov
int
mov
movzbl
movsbl
movzbl
movsbl
mov
sub
mov
cmp
jne
movzbl
0x4(%esp),%ecx
$0xfffffff0,%esp
0xfffffffc(%ecx)
%ebp
%esp,%ebp
%ebx
%ecx
$0x10,%esp
8048210 <prompt>
$0x3,%eax
$0x0,%ebx
$0x80bd884,%ecx
$0x10,%edx
$0x80
%eax,0xfffffff0(%ebp)
0x80bd886,%eax
%al,%edx
0x80bd884,%eax
%al,%eax
%edx,%ecx
%eax,%ecx
%ecx,%eax
$0x2,%eax
8048302 <main+0x80>
0x80bd886,%eax
80482d8:
80482db:
80482e2:
80482e5:
80482e7:
80482e9:
80482eb:
80482ee:
80482f0:
80482f7:
80482f9:
movsbl
movzbl
movsbl
mov
sub
mov
cmp
jne
movzbl
cmp
jne
%al,%edx
0x80bd885,%eax
%al,%eax
%edx,%ecx
%eax,%ecx
%ecx,%eax
$0x3,%eax
8048302 <main+0x80>
0x80bd886,%eax
$0x64,%al
8048302 <main+0x80>
80482fb: call 804825c <backdoor>
Program
8048300:
8048302:
8048307:
804830c:
8048311:
8048313:
8048316:
804831b:
804831e:
804831f:
8048320:
8048321:
8048324:
jmp
call
mov
mov
int
mov
mov
add
pop
pop
pop
lea
ret
8048307 <main+0x85>
8048236 <login>
$0x1,%eax
$0x0,%ebx
$0x80
%eax,0xfffffff4(%ebp)
$0x0,%eax
$0x10,%esp
%ecx
%ebx
%ebp
0xfffffffc(%ecx),%esp
Binary Concolic Execution
7
Conceptual approach
Program
Symbolic Execution
Generated
Input
o Run program, tracking variables as expressions
instead of actual (concrete) values
o Collect expressions along the current path
o Find concrete input to satisfy these expressions
Binary Concolic Execution
8
Conceptual approach
Program
Symbolic
Executor
Path
Conditions
Solver
Generated
Input
o Run program, tracking variables as expressions
instead of actual (concrete) values
o Collect expressions along the current path
o Find concrete input to satisfy these expressions
Binary Concolic Execution
9
Conceptual approach
Program
Symbolic
Executor
Path
Conditions
Solver
Generated
Input
Path Selector
o Exponential number of paths
o Limit and prioritize the paths we will explore
Binary Concolic Execution
10
Traditional symbolic execution
read_input()
if( input[2]–input[0] == 2 )
if( input[2]-input[1] == 3 )
if( input[2] == ‘d’ )
login()
backdoor()
Binary Concolic Execution
11
Traditional symbolic execution
Symbolic Memory
buffer:
input[0],input[1],input[2]
read_input()
if( input[2]–input[0] == 2 )
if( input[2]-input[1] == 3 )
if( input[2] == ‘d’ )
login()
backdoor()
Binary Concolic Execution
12
Traditional symbolic execution
Symbolic Memory
Symbolic Memory
buffer:
input[0],input[1],input[2]
buffer:
input[0],input[1],input[2]
Path Condition
input[2]-input[0] != 2
Path Condition
read_input()
if( input[2]–input[0] == 2 )
input[2]-input[0] == 2
if( input[2]-input[1] == 3 )
if( input[2] == ‘d’ )
login()
backdoor()
Binary Concolic Execution
13
Traditional symbolic execution
Symbolic Memory
Symbolic Memory
buffer:
input[0],input[1],input[2]
buffer:
input[0],input[1],input[2]
Path Condition
input[2]-input[0] == 2 &&
input[2]-input[1] != 3
Path Condition
read_input()
if( input[2]–input[0] == 2 )
input[2]-input[0] == 2 &&
input[2]-input[1] == 3
if( input[2]-input[1] == 3 )
if( input[2] == ‘d’ )
login()
backdoor()
Binary Concolic Execution
14
Traditional symbolic execution
Symbolic Memory
Symbolic Memory
buffer:
input[0],input[1],input[2]
buffer:
input[0],input[1],input[2]
Path Condition
input[2]-input[0] == 2 &&
input[2]-input[1] == 3 &&
input[2] != ‘d’
Path Condition
read_input()
if( input[2]–input[0] == 2 )
input[2]-input[0] == 2 &&
input[2]-input[1] == 3 &&
input[2] == ‘d’
if( input[2]-input[1] == 3 )
if( input[2] == ‘d’ )
login()
backdoor()
Binary Concolic Execution
15
Problems with symbolic execution
• Must maintain exponentially many symbolic states
• Expressions may be difficult or unfeasible to solve
Solution: Run program concretely and symbolically
Concrete Execution
Concolic Execution
Symbolic Execution
Binary Concolic Execution
16
Concolic execution overview
Input
Instructions
Program
Concrete
Executor
Symbolic
Executor
Path
Conditions
Solver
Generated
Input
Path Selector
o Symbolic execution follows concrete path
o Some expressions use concrete values
Binary Concolic Execution
17
Concolic execution
• Advantages
• Track less state in parallel by following a single path
at a time
• Simplify expressions by substituting concrete
values for difficult sub expressions
• Disadvantage
• Concrete values only hold for a specific set of
concrete inputs, so mixing concrete values and
expressions may produce inaccurate expressions
Binary Concolic Execution
18
Concolic execution example
Symbolic Memory
Input
buffer:
good
read_input()
if( input[2]–input[0] == 2 )
if( input[2]-input[1] == 3 )
Concrete Memory
buffer:
if( input[2] == ‘d’ )
login()
backdoor()
Binary Concolic Execution
19
Concolic execution example
Symbolic Memory
Input
buffer:
input[0],input[1],input[2]
good
read_input()
if( input[2]–input[0] == 2 )
if( input[2]-input[1] == 3 )
Concrete Memory
buffer:
g,o,o,d
if( input[2] == ‘d’ )
login()
backdoor()
Binary Concolic Execution
20
Concolic execution example
Symbolic Memory
Input
buffer:
input[0],input[1],input[2]
good
Path Condition
input[2]-input[0] != 2
read_input()
if( input[2]–input[0] == 2 )
if( input[2]-input[1] == 3 )
Concrete Memory
buffer:
g,o,o,d
if( input[2] == ‘d’ )
login()
backdoor()
Binary Concolic Execution
21
Concolic execution example
Symbolic Memory
Input
buffer:
input[0],input[1],input[2]
good
Path Condition
input[2]-input[0] == 2
read_input()
if( input[2]–input[0] == 2 )
if( input[2]-input[1] == 3 )
Concrete Memory
buffer:
g,o,o,d
if( input[2] == ‘d’ )
backdoor()
login()
Generated Input
egg
Binary Concolic Execution
22
Concolic execution example
Symbolic Memory
Input
buffer:
egg
read_input()
if( input[2]–input[0] == 2 )
if( input[2]-input[1] == 3 )
Concrete Memory
buffer:
if( input[2] == ‘d’ )
login()
backdoor()
Binary Concolic Execution
23
Concolic execution example
Symbolic Memory
Input
buffer:
input[0],input[1],input[2]
egg
read_input()
if( input[2]–input[0] == 2 )
if( input[2]-input[1] == 3 )
Concrete Memory
buffer:
e,g,g
if( input[2] == ‘d’ )
login()
backdoor()
Binary Concolic Execution
24
Concolic execution example
Symbolic Memory
Input
buffer:
input[0],input[1],input[2]
egg
Path Condition
input[2]-input[0] == 2
read_input()
if( input[2]–input[0] == 2 )
if( input[2]-input[1] == 3 )
Concrete Memory
buffer:
e,g,g
if( input[2] == ‘d’ )
login()
backdoor()
Binary Concolic Execution
25
Concolic execution example
Symbolic Memory
Input
buffer:
input[0],input[1],input[2]
egg
Path Condition
input[2]-input[0] == 2 &&
input[2]-input[1] != 3
read_input()
if( input[2]–input[0] == 2 )
if( input[2]-input[1] == 3 )
Concrete Memory
buffer:
e,g,g
if( input[2] == ‘d’ )
login()
backdoor()
Binary Concolic Execution
26
Concolic execution example
Symbolic Memory
Input
buffer:
input[0],input[1],input[2]
egg
Path Condition
input[2]-input[0] == 2 &&
input[2]-input[1] == 3
read_input()
if( input[2]–input[0] == 2 )
if( input[2]-input[1] == 3 )
Concrete Memory
buffer:
e,g,g
if( input[2] == ‘d’ )
backdoor()
login()
Generated Input
port
Binary Concolic Execution
27
Concolic execution example
Symbolic Memory
Input
buffer:
port
read_input()
if( input[2]–input[0] == 2 )
if( input[2]-input[1] == 3 )
Concrete Memory
buffer:
if( input[2] == ‘d’ )
login()
backdoor()
Binary Concolic Execution
28
Concolic execution example
Symbolic Memory
Input
buffer:
input[0],input[1],input[2]
port
read_input()
if( input[2]–input[0] == 2 )
if( input[2]-input[1] == 3 )
Concrete Memory
buffer:
p,o,r,t
if( input[2] == ‘d’ )
login()
backdoor()
Binary Concolic Execution
29
Concolic execution example
Symbolic Memory
Input
buffer:
input[0],input[1],input[2]
port
Path Condition
input[2]-input[0] == 2
read_input()
if( input[2]–input[0] == 2 )
if( input[2]-input[1] == 3 )
Concrete Memory
buffer:
p,o,r,t
if( input[2] == ‘d’ )
login()
backdoor()
Binary Concolic Execution
30
Concolic execution example
Symbolic Memory
Input
buffer:
input[0],input[1],input[2]
port
Path Condition
input[2]-input[0] == 2 &&
input[2]-input[1] == 3
read_input()
if( input[2]–input[0] == 2 )
if( input[2]-input[1] == 3 )
Concrete Memory
buffer:
p,o,r,t
if( input[2] == ‘d’ )
login()
backdoor()
Binary Concolic Execution
31
Concolic execution example
Symbolic Memory
Input
buffer:
input[0],input[1],input[2]
port
Path Condition
input[2]-input[0] == 2 &&
input[2]-input[1] == 3 &&
input[2] != ‘d’
read_input()
if( input[2]–input[0] == 2 )
if( input[2]-input[1] == 3 )
Concrete Memory
buffer:
p,o,r,t
if( input[2] == ‘d’ )
login()
backdoor()
Binary Concolic Execution
32
Concolic execution example
Symbolic Memory
Input
buffer:
input[0],input[1],input[2]
port
Path Condition
input[2]-input[0] == 2 &&
input[2]-input[1] == 3 &&
input[2] == ‘d’
read_input()
if( input[2]–input[0] == 2 )
if( input[2]-input[1] == 3 )
Concrete Memory
buffer:
p,o,r,t
if( input[2] == ‘d’ )
backdoor()
login()
Generated Input
bad
Binary Concolic Execution
33
Concolic execution example
Symbolic Memory
Input
buffer:
bad
read_input()
if( input[2]–input[0] == 2 )
if( input[2]-input[1] == 3 )
Concrete Memory
buffer:
if( input[2] == ‘d’ )
login()
backdoor()
Binary Concolic Execution
34
Concolic execution example
Symbolic Memory
Input
buffer:
input[0],input[1],input[2]
bad
read_input()
if( input[2]–input[0] == 2 )
if( input[2]-input[1] == 3 )
Concrete Memory
buffer:
b,a,d
if( input[2] == ‘d’ )
login()
backdoor()
Binary Concolic Execution
35
Concolic execution example
Symbolic Memory
Input
buffer:
input[0],input[1],input[2]
bad
Path Condition
input[2]-input[0] == 2
read_input()
if( input[2]–input[0] == 2 )
if( input[2]-input[1] == 3 )
Concrete Memory
buffer:
b,a,d
if( input[2] == ‘d’ )
login()
backdoor()
Binary Concolic Execution
36
Concolic execution example
Symbolic Memory
Input
buffer:
input[0],input[1],input[2]
bad
Path Condition
input[2]-input[0] == 2 &&
input[2]-input[1] == 3
read_input()
if( input[2]–input[0] == 2 )
if( input[2]-input[1] == 3 )
Concrete Memory
buffer:
b,a,d
if( input[2] == ‘d’ )
login()
backdoor()
Binary Concolic Execution
37
Concolic execution example
Symbolic Memory
Input
buffer:
input[0],input[1],input[2]
bad
Path Condition
input[2]-input[0] == 2 &&
input[2]-input[1] == 3 &&
input[2] == ‘d’
read_input()
if( input[2]–input[0] == 2 )
if( input[2]-input[1] == 3 )
Concrete Memory
buffer:
b,a,d
if( input[2] == ‘d’ )
login()
backdoor()
Binary Concolic Execution
38
Concolic execution example
Symbolic Memory
Input
buffer:
input[0],input[1],input[2]
bad
Path Condition
input[2]-input[0] == 2 &&
input[2]-input[1] == 3 &&
input[2] == ‘d’
read_input()
if( input[2]–input[0] == 2 )
if( input[2]-input[1] == 3 )
Concrete Memory
buffer:
b,a,d
if( input[2] == ‘d’ )
backdoor()
login()
Success
Binary Concolic Execution
39
Inaccurate expressions
• Some variables depend on input
• Replacing these variables with concrete values
may yield inaccurate expressions
• Solving an inaccurate path condition may produce
input that does not take the desired path
Binary Concolic Execution
40
Concolic execution system design
Program
Input
Instructions
Concrete
Executor
Symbolic
Executor
Path
Conditions
Solver
Generated
Input
Path Selector
Binary Concolic Execution
41
Concolic execution system design
Program
Input
Instructions
Concrete
Executor
Dyninst
ProcControl
API
Symbolic
Executor
SymEval
Path
Conditions
STP
(Solver)
Generated
Input
Path Selector
Binary Concolic Execution
42
Concrete execution components
Concrete
Executor
Dyninst
ProcControl
API
Binary Concolic Execution
43
Concrete execution components
Concrete Executor
• Redirects program
input
• Reads actual values of
instruction operands
• Tracks path taken
Dyninst
• Assists with static
analysis
ProcControl
API
• Runs program using
single-stepping or
breakpoints
Binary Concolic Execution
44
Concolic execution system design
Program
Input
Instructions
Concrete
Executor
Dyninst
ProcControl
API
Symbolic
Executor
SymEval
Path
Conditions
STP
(Solver)
Generated
Input
Path Selector
Binary Concolic Execution
45
Symbolic execution components
Symbolic
Executor
SymEval
Binary Concolic Execution
46
Symbolic execution components
Symbolic Executor
• Symbolic memory
• Identify input
• Update symbolic
memory
• Extract conditional
predicates
SymEval
• Represents
instruction semantics
as ASTs
Binary Concolic Execution
47
Concolic execution system design
Program
Input
Instructions
Concrete
Executor
Dyninst
ProcControl
API
Symbolic
Executor
SymEval
Path
Conditions
STP
(Solver)
Generated
Input
Path Selector
Binary Concolic Execution
48
Path searching components
Path
Conditions
STP
(Solver)
Path Selector
Binary Concolic Execution
49
Path searching components
STP
(Solver)
Path Conditions
• One term for
each branch
taken
• Designed for program
analysis applications
• Handles bit-vector
data types
Path Selector
• Decides where to branch off
from current path
• Is a depth-first search for now
• Other strategies will use static
CFG analysis
Binary Concolic Execution
50
Previous Work in Binary Concolic Execution
• IDS signature generation [Song, et al. 2008]
• Combined exploit strings to create signatures
• Required an initial exploit, or a patch for the
vulnerability
• Program testing [Godefroid, et al. 2008]
• Created test cases with maximum code coverage in
mind
• Used instruction-level tracing for concrete execution
Binary Concolic Execution
51
Potential Benefits of our Approach
• Our approach will be capable of finding the
initial exploit
• We will do concrete execution with
instrumentation,
which gives us the flexibility to instrument
selectively
• We plan to develop smarter path selection
techniques using static control flow analysis
Binary Concolic Execution
52
Status
• Concrete execution partially implemented using
ProcControlAPI
• Using standard input
• Will support network and environment as inputs
• Symbolic execution and path selection not
implemented yet
• Driving development of SymEval
• Instruction semantics
• AST simplification
Binary Concolic Execution
53
Conclusion
mov
sub
mov
cmp
jne
movzbl
0x80bd886,%eax
cmp $0x64,%al
jne 8048302
call 804825c
%edx,%ecx
%eax,%ecx
%ecx,%eax
$0x3,%eax
8048302
mov
sub
mov
cmp
jne
%edx,%ecx
%eax,%ecx
%ecx,%eax
$0x2,%eax
8048302
movzbl
0x80bd886,%eax
cmp $0x64,%al
jne 8048302
call 804825c
Exploit
Program
Finding the
first exploit
input[2] == ‘d’
with binary
concolic execution
Binary Concolic Execution
using
instrumentation
54