Taint-Enhanced Policy Enforcement
Download
Report
Transcript Taint-Enhanced Policy Enforcement
RAMSES
(Regeneration And iMmunity SErviceS):
A Cognitive Immune System
James E. Just,
Nathan Li,
Mark Cornwell
Global Infotek
21 July 2015
R. Sekar
(Stony Brook
University)
Goals
1. Prevent
most attacks from
compromising application
But, stopped exploits cause some
damage:
Program crashes, requiring restart
Wasted computation or memory
2. Refine
response to preserve
application availability
21 July 2015
Minimize performance impact of
unsuccessful attacks
2
Mimic biological immune system…
Innate
immune response is nonspecific and causes significant
“collateral damage”
Acquired immune response is much
more effective and targeted
Biological systems “learn” from
“attacks” and “automatically develop
responses targeted at a pathogen”
Our Goal: Build an artificial immune system
that is more like acquired immune system
21 July 2015
3
Technologies feeding into RAMSES
DAWSON
Address-space randomization for
Windows [ACSAC ’06]
Taint-enhanced
policy enforcement for
accurate detection of wide range of attacks
(source code/Linux) [USENIX Security ’06]
Automated
vulnerability-oriented signature
generation for buffer overflow attacks
(COTS/Linux) [ACM CCS ’05, ACSAC ’05]
Learning
security-relevant data flows in
COTS applications (Linux) [Oakland ’06]
21 July 2015
4
System Architecture
Library Interposition
External
Environment
Behavior Model
Input
Filter
Protected
Process
Logger
Signature
Generator
21 July 2015
Detector
5
Elements of RAMSES
Instrumentation
To intercept/monitor/alter application behavior
Targeted for Windows
Accurate
attack detection
Address-space randomization
Taint-enhanced policy enforcement
Taint-enhanced anomaly detection
Efficient
taint tracking for blackbox COTS
Automated signature generation/filtering
Vulnerability-oriented signatures
Based on policy projection
Use of program context to minimize FPs
Continued
21 July 2015
evaluation of signatures
6
Instrumentation
Instrument
Working
important APIs
at API level avoids
Need for source code
Need to know representation or semantics of
application-specific data-structures
Complex analyses or transformations on binaries
Instrumentation
Logging
of relevant operations
Including parameters and return values
Interposition
will support
of filter functions
Including injection of failure returns at
appropriate points to ensure error recovery
21 July 2015
7
Instrumentation Issues
System-call
Vs API instrumentation
Instrumentation at the “right” point
e.g.,
after URL or other decoding on web
servers
Being
calls
able to instrument “internal”
Need
expertise in various wrapping and
linking techniques on Windows
Being
Walk
21 July 2015
able to obtain program context
the stack etc.
8
Attack Detection
21 July 2015
Address-Space Randomization
Detects
a wide-range of memory
corruption based attacks
Very low overhead
Technology developed in
DAWSON project (SRS Phase I)
for Windows XP
21 July 2015
10
Challenge …
Accurate
attacks
detection of a wider range of
Config errors
3%
Tempfile
4%
Memory
errors
27%
Other logic
errors
22%
(Stack-smashing, heap
overflow, integer overflow,
data attacks)
Format string
4%
Input
validation/
DoS
9%
Generalized
Injection Attacks
Directory
traversal
10%
Cross-site
scripting
4%
SQL injection
2%
Command
injection
15%
CVE
Vulnerabilities
(Ver. 20040901)
A Unified View of Attacks
$name=$_GET[‘name’]
$query= “SELECT price
FROM products WHERE
name=‘” . $name . “’”
sql_query($query)
21 July 2015
Input
Interface
Program
SecuritySensitive
Operations
Attacker injects
malicious input data
Attacker-provided data
propagated in program
Attacker-provided data
used as argument to
corrupt system/data
12
A Unified View of Attacks
$name=$_GET[‘name’]
Input
Interface
$name=“xyz’; UPDATE
products SET price=0
WHERE name=‘iPod”
$query= “SELECT price
FROM products WHERE
name=‘” . $name . “’”
Program
$query=“SELECT price
FROM products WHERE
name=‘xyz’; UPDATE
products SET
products
SETprice=0
price=0
WHERE name=‘iPod’”
name=‘iPod’”
sql_query($query)
SecuritySensitive
Operations
sql_query($query)
Attack:iPods priced to $0
21 July 2015
13
Overview of Approach
Input Interface
Marking untrusted data as tainted
Marking using wrapper functions
Typical: network inputs marked as tainted
Fine-grained taint tracking
Program
Current
approaches:
Source
transformation (not useful for COTS)
dynamic translation of binaries (very high
overheads)
Fine-grained
Taint Tracking
RAMSES
approach: infer (most) flows using a
light-weight learning technique
Policy checking using taint-enhanced policies
Security-Sensitive
Operations
21 July 2015
Policies as predicates on arguments of
security-sensitive operations
14
Taint-Enhanced Policy Enforcement
Key requirements:
Fine-grained taint
(byte-level
tracking)
Taint-enhanced
policies
Input Interface
Program
Fine-grained
Taint Tracking
Security-Sensitive
Operations
21 July 2015
$name=“xyz’; UPDATE
products SET price=0
WHERE name=‘iPod”
$query=“SELECT price
FROM products WHERE
name=‘xyz’; UPDATE
products SET price=0
WHERE name=‘iPod’”
Policy: SQL commands
within $query shouldn’t
be tainted
Detects SQL Injection!
15
Preliminary Results on
Taint-Enhanced Policy Enforcement
Uses
source code transformation on C
programs on Linux
For
details, see [USENIX Security ’06]
Idea
Runtime
representation of taint
Use
bit array tagmap to store taint tags for each
byte of memory
Tag(a): representing taint bits of bytes at
address a in tagmap
Update
21 July 2015
tagmap for each assignment
16
Transformation Overview
Assignment
x = y + z;
x = *p;
21 July 2015
Tag(&x) = Tag(&y)||Tag(&z);
Tag(&x) = Tag(p);
17
Taint-Enhanced Policies
Attack Type
Role of Tainted
Data
Taint-Enhanced Policy
Control-flow
hijack
Code pointers
jmp(addr) | addr should not be
tainted
Format string
Format specifiers
(e.g. “%n”)
vfprintf(fmtstr) | fmtstr should
not contain tainted format
specifiers
Directory
traversal
Directory traversal
strings (e.g. “/../”)
open(path) | path should not
contain tainted directory
traversal strings
Cross-site
scripting
Scripts in dynamic
HTML
html_print(str) | str should not
contain tainted script tags (e.g
<script>)
SQL injection
SQL keywords or
meta-chars
sql_query(qstr) | qstr should
not contain tainted SQL
keywords/meta-chars
Shell
command
injection
Shell commands or
meta-chars
popen(cmd) | cmd should not
contain tainted shell
commands/meta-chars
21 July 2015
18
Effectiveness Evaluation
CVE#
Program
Lang. Attack Type
CAN-2003-0201 Samba 2.2.8
C
Stack smashing
CVE-2000-0573
C
Format string
CAN-2005-1365 Pico server 3.2
C
Directory traversal
CAN-2003-0486 phpBB 2.0.5
PHP
SQL injection
CAN-2005-0258 phpBB 2.0.5
PHP
Directory traversal
CAN-2002-1341 SquirrelMail 1.2.10
PHP
Cross site scripting
CAN-2003-0990 SquirrelMail 1.4.0
PHP
Command injection
CAN-2005-1921 PHP XML-RPC
PHP
Command injection
CVE-1999-0045
Bash
Shell meta-character
wu-ftpd 2.6.0
nph-test-cgi
Detected
all attacks
Reported no false positives
21 July 2015
19
Learning Dataflows …
Instrument
important APIs to record
operations, operands and return
codes
Define “interesting relationships” on
operands to these operations
Estimate dynamic information flow
from operand X to Y from
Dataflow
model
Runtime observed values of X, Y
21 July 2015
20
Possible Dataflow Relationships
Unary
Relation: property of an individual
operand
Represented as X R c, where
X: an argument name,
c: a constant value,
R: a unary relation
Examples of unary relation R:
equal => X takes only a single value always equal to c
elementOf => X takes any value from the set c
range => X takes values in the range c (e.g., c = (0, 2))
isWithinDir => X is a file name argument that is
always contained within a specified directory c
21 July 2015
21
Possible Dataflow Relationships
Binary
relations: captures relationships
between operands of two operations
Represented as X R Y, where
X, Y: argument names
R: a binary relation
Examples of binary relation R:
equal => equality between X and Y
isWithinDir => file/directory X is within directory Y
contains => directory X contains file/directory Y
hasSameDirAs => X and Y are within a common directory
hasSameBaseAs => X and Y have same base (eg: a.c, a.h)
hasSameExtensionAs => X and Y have same extension (eg:
a.c, b.c)
21 July 2015
22
Example
start(“/opt/proj”, “/tmp/proj.tar”)
open(“/tmp/proj.tar”,
WR)**argv)
=3 {
1. int main(int argc, char
2. source_dir = argv[1], target_file = argv[2];
3. target_fd = open(target_file, WR);
opendir(“/opt/proj”)
opendir(“/opt/proj/src”)
4. push(source_dir);
5. while ((dir_name = pop()) != NULL) {
isdirectory(“/opt/proj/README”)
6. isdirectory(“/opt/proj/src”)
dir = opendir(dir_name);
7.
foreach (dir_entry 2 dir) {
8.
if (isdirectory(dir_entry))
open(“/opt/proj/README”,
9.
push(dir_entry); RD)=4
read(4,…)
10.
11.
else {
write(4, …)
source_fd
= open(dir_entry, RD);
12.
close(4)
read(source_fd,
buf);
13.
write(target_fd, buf);
14.
close(source_fd);
15.
}
16.
}
17. }
18. close(target_fd);
}
21 July 2015
START
start (I, O)
L3
FD3 = open(F3,M3)
L6
isdirectory(F8)
| F8 isWithinDir F6
close(FD14)
| FD14 equal FD11
L8
opendir(F6)
F6 isWithinDir I
isdirectory(F8)
| F8 isWithinDir F6
L11
L13
FD11=open(F11, M11)
| F11 equal F8;
M11 elementOf {RD}
read(FD12)
| FD12 equal FD11
L14
write(FD13)
| FD13 equal FD3
L12
close(FD14)
| FD14 equal FD11
| F3 equal O;
M3 elementOf {WR}
L18
23
close(FD18)
| FD18 equal FD3
Challenges in Dataflow Inference
What operations to monitor
Useful to monitor internal operations: strcpy,…
Transformations,
e.g., URL encoding …
Mitigate by monitoring internal functions that
perform such decoding
Scaling
learning algorithms to deal with
bulk data (arguments of read/write)
Previous work addressed mainly file names
More
complex relationships
Not just equality or substring
Capturing
21 July 2015
absence of flows
24
Taint-Enhanced Anomaly Detection
Extend
learning technique to detect
anomalies
Anomalous
structure of operands
Anomalous relationships between
operands to different operations
Key
point: utilize taint information in
structure
Example: (SQL injection)
Normal
SQL query contains alphanumeric tainted characters
Anomalous query contains several types
of delimiter characters
21 July 2015
25
Filter Generation
21 July 2015
Automatic Synthesis of Filters
To
learn a generalized filter from
many benign samples
few (maybe just one) attack samples
Challenge
To distinguish essential attack features from
nonessential features that can be easily changed
Possible
solutions:
1. “Understand”
Requires extensive human involvement
2. Rely
on protocol standards
Possible in some cases, e.g., CGI for web apps
3. Leverage
21 July 2015
the meaning of inputs
program context
27
Leveraging Program Context ...
Observation:
Programs already “understand” their inputs and
know how to process them
They implicitly encode the characteristics of
inputs that are correctly handled, and those
that aren’t handled correctly
Our
Approach:
Leverage this implicit knowledge
Do this without requiring any manual effort
Approximate
implicit knowledge using the context in
which an input is processed
Leverage a model of program behavior to identify
context
21 July 2015
28
Light-weight Recovery
Recovery
steps:
Release resources for current request
Return control to await next request
How
to do this automatically?
Observation: Servers expect and handle
transient network errors
Approach
Return network error code when dropping input
Break TCP connections
graceful
handling if benign input mistakenly dropped
Error
returns may work on non-input
operations as well
Choose operations where error returns seen before
21 July 2015
29
Preliminary Results: Buffer
Overflow Signatures on Linux ...
Essential
characteristics of BO attacks:
Excessive
input length
Anomalous character distribution (CD)
Identify
anomalous characteristic
present
in suspicious input
absent in every benign input
Check
21 July 2015
for anomaly in length, then CD
30
Signature Example
Current
context, size-based (ntpd)
At read@(S1, 0xB3D0, 0xE857FF04)
Size > 500
“read() library call made at offset
0xB3D0 of segment S1, with the
return addresses hashed to
0xE857FF04, cannot receive input
larger than 500”
Segment
is a region of code, e.g.,
a shared library
21 July 2015
31
wu
ap -ftp
ac
he d
-s s
l
n tp
d
ir c
d
lsh
gtk d
ftp
sam d
ba
ep
ic 4
pa cvs
ssl
og
d
oo
ps
Attack to Benign Ratio
Effectiveness Summary
21 July 2015
20
18
16
14
12
10
8
6
4
2
0
32
Server Availability
named
named/Protected
• Similar results on
ntpd
1
Availability
0.8
• Improvement for
Apache is less
dramatic due to
use of threads
0.6
0.4
0.2
0
0.1
1
10
100
• Still, availability
improved by a
factor of 10 or
more
Attack rate (per second)
21 July 2015
33
Signature Generation for
Generalized Injection Attacks
Available information
Bytes S involved in attack
Earliest operation Op(S’) where S’ flowed to S
Context C in which S’ is invoked
Types
of filters
Exact filter: deny Op(S’) in context C
Statistical filter: deny Op(S’’) in context C
if S’ and S’’ share a statistical property P
Filter derived by Policy projection:
Let
P denote taint-enhanced policy violated in the attack
compute predicate P’ so that P'(S’) => P(S)
21 July 2015
34
Policy Projection
Input: P (policy automaton), S (attack argument)
Output: P’ (filter automaton)
Steps:
Match S in P and find out tainted segment. This becomes
initial P’
(Generalization) Add other tainted transitions between
states in S into S
S: password=‘Bad’ OR 1==1’
P
S
#
S0
’
(’|’)
’
!(’|’|#)
21 July 2015
#
S1
’
S0
S2
!(’|’)
!(’|#)
Bad
35
S1
Success Criteria
Detect/respond/immunize
attacks
RAMSES
attacks
50%
Goal: 75-80% remote targeted
Preserve
availability while immunizing
False alarm rate < 10%
RAMSES
Generate
RAMSES
21 July 2015
Goal: <2%
responses within 250 ms.
Goal: 100 ms
36
RAMSES Project Schedule
CY06
Baseline Tasks
CY09
CY08
CY07
Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3
1. Refine RAMSES
Requirements
2. Design RAMSES
3. Develop
Components
4. Integrate System
5. Analyze & Test
RAMSES
6. Coordinate & Rept
Prototypes
1
2
3
Optional Tasks
O.1 Enhancements
O.2 Cross-Area Exp. 1
O.3 Cross-Area Exp. 2
O.4 Transition Support
21 July 2015
37
Questions?
21 July 2015
38