MemSherlock: An Automated Debugger for Unknown Memory

Download Report

Transcript MemSherlock: An Automated Debugger for Unknown Memory

Computer Science
Post-Attack Analysis of Unknown Vulnerabilities
Peng Ning
With Emre C. Sezer, Chongkyung Kil, and Jun Xu
Motivation
• Vulnerability analysis
– Essential for
• Patching
• Vulnerability based signature generation
– Painstakingly slow
• Depends on human efforts
• Existing approaches
– Static analysis (e.g., [Chen et al. 04] , [Feng et al. 04], [Larochelle & Evans 01])
• False positives
– Dynamic analysis (e.g., Minos [Crandall et al. 04], TaintCheck [Newsome &
Song 05], DIRA [Smirnov & Chiueh 05])
• Used for detection; inadequate vulnerability information
– Symbolic execution (e.g., Exe [Cadar et al. 06], DACODA [Crandall et al. 05])
• Scalability issues
– Recovery (e.g., STEM [Sidiroglou et al. 05], SEAD [Lacosto et al. 07])
• Change of application semantics
Computer Science
Nov 14, 2007
2007 GMU-CSA Workshop
2
MemSherlock
• MemSherlock is an automated debugger
– Automated analysis of unknown memory corruption vulnerabilities
– Appeared in ACM CCS ’07
• MemSherlock provides
–
–
–
–
Statement that causes the memory corruption
Dynamic program slice leading to the corruption
Program variables involved in the vulnerability
All presented at programming language level
• Implications
– Generating vulnerability conditions
– Improves signature or patch generation speed
Computer Science
Nov 14, 2007
2007 GMU-CSA Workshop
3
General Framework: Web Application Example
Light-weight IDS
MemSherlock
Program
Instrumented
Program
Traffic
Trigger
Replayer
Logger
Computer Science
Nov 14, 2007
2007 GMU-CSA Workshop
4
MemSherlock Overview
• Goal is to provide vulnerability information
– Intuitive, easy to understand for the programmer
• Not only the corruption point
–
–
–
–
–
Slice of program involved in the vulnerability
Effects of user inputs
Program variables involved
Variable relationships (e.g., pointer aliasing)
Type of vulnerability (e.g., stack buffer overflow)
• MemSherlock performs two important tasks
– Finding the corruption point
– Tracking program state
Computer Science
Nov 14, 2007
2007 GMU-CSA Workshop
5
MemSherlock: Finding Corruption Point
• Observation: A memory object is modified by a small set of statements
(inspired by AccMon)
• For memory object m, write set of m is the set of statements that
legitimately modify m, WS(m)
• Security Condition: Memory object m should only be updated by
statements in WS(m)
Computer Science
Nov 14, 2007
2007 GMU-CSA Workshop
6
MemSherlock: Assembly Line
• Pre-Debugging Phase
– Instruments the program for debugging phase
– Extracts program information via static analysis
– Needs to be performed once
• Debugging Phase
– Tracks program state
– Monitors memory writes and checks for violation of security condition
– Tracks tainted data and its propagation
Computer Science
Nov 14, 2007
2007 GMU-CSA Workshop
7
MemSherlock Architecture
Program
executable
Compiler
0101
1010
0101
C
Debugging
Agent
Static
Analyzer
proc
var
addr
Malicious
input
Original
source files
C
Source
Code
Rewriting
Library
specification
Pre-debugging phase
Computer Science
Nov 14, 2007
Vulnerability
information
Debugging
information
2007 GMU-CSA Workshop
8
Pre-debugging: Generating Write Sets
• MemSherlock analyses source code to determine write sets
• For a program variable v, WS(v) includes
– Assignment statements (i.e., v=expr)
– Library function calls where v is passed as an argument that can be modified
(i.e., memcpy(&v,src))
• MemSherlock treats DLLs as black boxes
– Assumption: A DLL is internally secure, but externally insecure
• e.g., no stack overflows in the library functions
• Sound for common, well tested libraries (e.g., clib)
– Requires library specifications
– For each DLL, a list of functions and the arguments they might modify
Computer Science
Nov 14, 2007
2007 GMU-CSA Workshop
9
Dealing with Pointers
1
2
3
4
int i = 0;
int *p = &i;
*p = 1;
p = NULL;
(a) Code example
WS(i) = {1}
WS(p) = {2,4}
WS(ref(p)) = {3}
(b) Write sets after static
analysis
Line
1
2
3
4
ref(p)
N/A
i
i
NULL
WS(i)
{1}
{1,3}
{1,3}
{1}
(c) ref(p) and WS(i) during monitoring
• For a pointer variable p two write sets are kept
– WS(p) – Statements that modify p
– WS(ref(p)) – Statements that modify the referent (e.g., *p=5)
• ref(p) is resolved during runtime (debugging)
• Perform the same analysis for pointer-type function arguments at function
calls
– Removes the requirement for inter-procedural static analysis
Computer Science
Nov 14, 2007
2007 GMU-CSA Workshop
10
Chained Dereferences
1
2
3
4
int z;
int *y = &z;
int **x = &y;
**x = 10;
1
2
3
4
5
int z;
int *y = &z;
int **x = &y;
int *temp = *x;
*temp = 10;
• Earlier technique can only handle simple dereferences
• Source code rewriting is used to convert all chained dereferences to simple
dereferences
• Any other dereference that is not simple is converted in the same manner
Computer Science
Nov 14, 2007
2007 GMU-CSA Workshop
11
Output of Pre-debugging Phase
• Simplified program
– Simplified pointer dereferences
– Compiled with debugging options
• Input file for the debugger
–
–
–
–
Program variables and their write sets
Addresses of global symbols
Frame pointer offsets of local variables
Other flags that help the debugger
Computer Science
Nov 14, 2007
2007 GMU-CSA Workshop
12
MemSherlock Architecture: Debugging
Program
executable
Compiler
0101
1010
0101
C
Debugging
Agent
Static
Analyzer
proc
var
addr
Malicious
input
Original
source files
C
Source
Code
Rewriting
Library
specification
Debugging
phase
Vulnerability
information
Debugging
information
Computer Science
Nov 14, 2007
2007 GMU-CSA Workshop
13
Debugging: Dynamic Monitoring
• Runtime monitoring
– State Maintenance
– Incorporates taint analysis from TaintCheck
• Produces a dynamic slice of the program leading to the vulnerability
• Write Checking
– Monitors and validates memory writes
– Write sets are file name and line number pairs <f,l>
• Instruction pointer IP is translated into <f,l>
– Write sets are associated with program variables
• A destination address is translated into a program variable
Computer Science
Nov 14, 2007
2007 GMU-CSA Workshop
14
Keeping Program State
Virtual Address Space
Stack base
Stack base
Memory write
main
main
fnc A
fnc A
fnc B
fnc C
0xABABABAB
Memory write
0xABABABAB
Program State 1
Program State 2
• A given memory region may correspond to different program variables
depending on program state
• Dynamic monitor keeps track of memory mapping
Computer Science
Nov 14, 2007
2007 GMU-CSA Workshop
15
Debugging: Key Data Structures
• Keeps two lists of memory regions
– ActiveMemoryRegions
• Memory corresponding to program variables or their referent memory regions
– NonWritableRegions
• Saved registers, return addresses, metadata encapsulating dynamically allocated
memory regions
Computer Science
Nov 14, 2007
2007 GMU-CSA Workshop
16
Debugging: State Maintenance
• Function calls/returns (memory)
– Local variable addresses are calculated and added to ActiveMemoryRegions
– Location of return address and saved registers are added to
NonWritableRegions list
• Heap memory (memory)
– malloc/free calls are intercepted
– Allocated memory is added to ActiveMemoryRegions
– The metadata encapsulating the buffer is added to NonWritableRegions
• Pointer value updates (write sets)
– Searches ActiveMemoryRegions to find the referent and updates its WS
Computer Science
Nov 14, 2007
2007 GMU-CSA Workshop
17
Debugging: Write Checking
• When instruction IP modifies memory m
– if m is in ActiveMemoryRegions
• determines the variable v it belongs to
• converts IP into <f,l>
• checks if <f,l> is in WS(v)
• If the memory write check fails or m is in NonWritableRegions
– Marks the operation as a memory corruption
– Displays the vulnerability information
Computer Science
Nov 14, 2007
2007 GMU-CSA Workshop
18
Generating Vulnerability Information
• The slice of program contributing to the vulnerability
– Statements that have propagated tainted values
– Statements that have modified related memory regions
• Dependency between memory objects involved in the vulnerability
– Points to analysis shows memory regions and how they were accessed
• Program state
– Call stack information
– Write set information
Computer Science
Nov 14, 2007
2007 GMU-CSA Workshop
19
Example Test Case: Null HTTP
•~~http.c~~
• 91: void ReadPOSTData(int sid) {
•
…
•100:
conn[sid].PostData=calloc(conn[sid].dat->in_ContentLength+1024, sizeof(char));
•101:
if (conn[sid].PostData==NULL) { ...
•107:
do {
•108:
rc=recv(conn[sid].socket, pPostData, 1024, 0);
•109:
…
•Error Report:
•--20361-- Error type: Heap Buffer Overflow
•--20361-- Dest Addr: 3AB3E360
•--20361-- IP: 0x804E5C7: ReadPOSTData (http.c:108)
•--20361-- Dest address resolved to:
•--20361-- Global variable "heap var"
•
@ 3AB3E280 (size: 224)
•--20361-•--20361-- Memory allocated by 0x804E531:
•
ReadPOSTData (http.c:100)
Computer Science
•--20361-- TAINTED destination 3AB3E360
•--20361-- Fully tainted from:
•--20361-- 0x804E5C7: ReadPOSTData (http.c:108)
•--20361-•--20361-- TAINTED size used during allocation
•--20361-- Tainted from:
•--20361-- 0x804E456: ReadPOSTData (http.c:100)
•--20361-- 0x804FBB5: read_header (http.c:153)
•--20361-- 0x805121B: sgets (server.c:211)
Nov 14, 2007
2007 GMU-CSA Workshop
20
Vulnerability Analysis Example
~~http.c~~
91: void ReadPOSTData(int sid) {
92:
char *pPostData;
...
100:
conn[sid].PostData=calloc(
conn[sid].dat->in_ContentLength+1024, sizeof(char));
...
107:
do {
108:
rc=recv(conn[sid].socket, pPostData, 1024, 0);
...
Computer Science
Nov 14, 2007
Create
Heap
Object
2007 GMU-CSA Workshop
21
Vulnerability Analysis Example
~~http.c:~~
119: int read_header(int sid) {
121:
char line[2048];
...
127:
do {
128:
memset(line, 0, sizeof(line));
129:
sgets(line, sizeof(line)-1, conn[sid].socket);
...
Object
153:
169:
170:
conn[sid].dat->in_ContentLength=atoi((char *)&line+16);
...
if (conn[sid].dat->in_ContentLength<MAX_POSTSIZE) {
ReadPOSTData(sid);
~~http.c~~
91: void ReadPOSTData(int sid) {
92:
char *pPostData;
...
100:
conn[sid].PostData=calloc(
conn[sid].dat->in_ContentLength+1024, sizeof(char));
...
107:
do {
108:
rc=recv(conn[sid].socket, pPostData, 1024, 0);
...
Computer Science
Nov 14, 2007
Taint
Object
Use
2007 GMU-CSA Workshop
22
Vulnerability Analysis Example
~~http.c:~~
119: int read_header(int sid) {
121:
char line[2048];
...
127:
do {
128:
memset(line, 0, sizeof(line));
129:
sgets(line, sizeof(line)-1, conn[sid].socket);
...
~~server.c~~
202: int sgets(char *buffer, int max, int fd)
203: {
...
153: 209: conn[sid].atime=time((time_t*)0);
conn[sid].dat->in_ContentLength=atoi((char *)&line+16);
... (n<max) {
210: while
169: 211: ifif(conn[sid].dat->in_ContentLength<MAX_POSTSIZE)
{
((rc=recv(conn[sid].socket, buffer, 1, 0))<0) {
170:
ReadPOSTData(sid);
...
Create
Taint
Object
Taint
Object
Computer Science
Nov 14, 2007
2007 GMU-CSA Workshop
23
Implementation
• Source code is rewritten using CIL (C Intermediate Language)
• CodeSurfer was used to extract program variables and their write sets
– A commercial static analysis tool
• objdump and dwarfdump were used to extract global symbol information
• Dynamic Monitoring is implemented in Valgrind
– An open source emulator
Computer Science
Nov 14, 2007
2007 GMU-CSA Workshop
24
Evaluation
• Tested 11 real-world applications with known memory corruption
vulnerabilities
• Test cases included
– Stack/Heap buffer overflow, Format string
– Both control flow and non-control data attacks
• Testing methodology
– Programs were run under MemSherlock
– Exploit programs were used to attack the applications
– Log and replay was not used
Computer Science
Nov 14, 2007
2007 GMU-CSA Workshop
25
Evaluation Results
Application
Name
Vuln.
Type
Description
Captured?
#FP
GHTTP
S
A small HTTP server
Yes
7
Icecast
S
An mp3 broadcast server
Yes
0
Sumus
S
A game server for ‘mus’
Yes
0
Monit
S
Multi-purpose anomaly detector
Yes
0
Newspost
S
Automatic news posting
Yes
2
Prozilla
S
A download accelerator for Linux
No
0
NullHTTP
H
An HTTP server
Yes
0
Xtelnet
H
A telnet server
Yes
4
Wsmp3
H
Web server with mp3 broadcasting
Yes
0
OpenVMPS
F
Open source VLan management policy server
Yes
2
Power
F
UPS monitoring utility
Yes
10
Type abbreviations: (S)tack overflow, (H)eap overflow and (F)ormat string
Computer Science
Nov 14, 2007
2007 GMU-CSA Workshop
26
False Negatives
• Prozilla:
– memcpy uses a kernel function to manipulate page tables when copying entire
pages
– Valgrind cannot trace into kernel
– Can be prevented by function wrappers
• Other false negatives are theoretically possible
– structs within unions or arrays
• Current implementation does not support unions
• Currently do not differentiate between elements of an array
– Memory corruption errors inside DLLs
Computer Science
Nov 14, 2007
2007 GMU-CSA Workshop
27
False Positives
• Embedded assembly
• Incomplete library specification
– library functions keeping internal state (e.g., strtok(Null, delim) )
– library functions that modify global variables as side effects (e.g., optarg,
errno)
– pointers that point to hidden global structures (e.g., getdatetime() in time.h)
• struct pointers
– void pointers that are type-cast to modify struct variables
– since the pointer is not of type struct, MemSherlock fails to update accordingly
Computer Science
Nov 14, 2007
2007 GMU-CSA Workshop
28
Conclusion
• Fully automated vulnerability analysis
• The analysis output is intuitive and human readable
• Future Challenges
– Automated, long-term fix of vulnerabilities
• Semantic consistency is a great challenge
– Automated, temporary fix of vulnerabilities
• Generating vulnerability condition
• Improving signature generation
Computer Science
Nov 14, 2007
2007 GMU-CSA Workshop
29
Computer Science
Thank You