Taint-Enhanced Policy Enforcement

Transcript Taint-Enhanced Policy Enforcement

RAMSES
(Regeneration And iMmunity SErviceS):
A Cognitive Immune System
James E. Just,
Nathan Li,
Mark Cornwell
Global Infotek
21 July 2015
R. Sekar
(Stony Brook
University)
Goals
1. Prevent
most attacks from
compromising application

But, stopped exploits cause some
damage:
Program crashes, requiring restart
 Wasted computation or memory

2. Refine
response to preserve
application availability

21 July 2015
Minimize performance impact of
unsuccessful attacks
2
Mimic biological immune system…
 Innate
immune response is nonspecific and causes significant
“collateral damage”
 Acquired immune response is much
more effective and targeted
 Biological systems “learn” from
“attacks” and “automatically develop
responses targeted at a pathogen”
Our Goal: Build an artificial immune system
that is more like acquired immune system
21 July 2015
3
Technologies feeding into RAMSES
 DAWSON
Address-space randomization for
Windows [ACSAC ’06]
 Taint-enhanced
policy enforcement for
accurate detection of wide range of attacks
(source code/Linux) [USENIX Security ’06]
 Automated
vulnerability-oriented signature
generation for buffer overflow attacks
(COTS/Linux) [ACM CCS ’05, ACSAC ’05]
 Learning
security-relevant data flows in
COTS applications (Linux) [Oakland ’06]
21 July 2015
4
System Architecture
Library Interposition
External
Environment
Behavior Model
Input
Filter
Protected
Process
Logger
Signature
Generator
21 July 2015
Detector
5
Elements of RAMSES
 Instrumentation
To intercept/monitor/alter application behavior
 Targeted for Windows

 Accurate
attack detection
Address-space randomization
 Taint-enhanced policy enforcement
 Taint-enhanced anomaly detection

 Efficient
taint tracking for blackbox COTS
 Automated signature generation/filtering
Vulnerability-oriented signatures
 Based on policy projection
 Use of program context to minimize FPs

 Continued
21 July 2015
evaluation of signatures
6
Instrumentation
 Instrument
 Working
important APIs
at API level avoids
Need for source code
 Need to know representation or semantics of
application-specific data-structures
 Complex analyses or transformations on binaries

 Instrumentation
 Logging

of relevant operations
Including parameters and return values
 Interposition

will support
of filter functions
Including injection of failure returns at
appropriate points to ensure error recovery
21 July 2015
7
Instrumentation Issues
 System-call
Vs API instrumentation
 Instrumentation at the “right” point
 e.g.,
after URL or other decoding on web
servers
 Being
calls
able to instrument “internal”
 Need
expertise in various wrapping and
linking techniques on Windows
 Being
 Walk
21 July 2015
able to obtain program context
the stack etc.
8
Attack Detection
21 July 2015
Address-Space Randomization
Detects
a wide-range of memory
corruption based attacks
Very low overhead
Technology developed in
DAWSON project (SRS Phase I)
for Windows XP
21 July 2015
10
Challenge …
 Accurate
attacks
detection of a wider range of
Config errors
3%
Tempfile
4%
Memory
errors
27%
Other logic
errors
22%
(Stack-smashing, heap
overflow, integer overflow,
data attacks)
Format string
4%
Input
validation/
DoS
9%
Generalized
Injection Attacks
Directory
traversal
10%
Cross-site
scripting
4%
SQL injection
2%
Command
injection
15%
CVE
Vulnerabilities
(Ver. 20040901)
A Unified View of Attacks
$name=$_GET[‘name’]
$query= “SELECT price
FROM products WHERE
name=‘” . $name . “’”
sql_query($query)
21 July 2015
Input
Interface
Program
SecuritySensitive
Operations
Attacker injects
malicious input data
Attacker-provided data
propagated in program
Attacker-provided data
used as argument to
corrupt system/data
12
A Unified View of Attacks
$name=$_GET[‘name’]
Input
Interface
$name=“xyz’; UPDATE
products SET price=0
WHERE name=‘iPod”
$query= “SELECT price
FROM products WHERE
name=‘” . $name . “’”
Program
$query=“SELECT price
FROM products WHERE
name=‘xyz’; UPDATE
products SET
products
SETprice=0
price=0
WHERE name=‘iPod’”
name=‘iPod’”
sql_query($query)
SecuritySensitive
Operations
sql_query($query)
Attack:iPods priced to $0
21 July 2015
13
Overview of Approach
Input Interface
Marking untrusted data as tainted


Marking using wrapper functions
Typical: network inputs marked as tainted
Fine-grained taint tracking
Program
 Current
approaches:
 Source
transformation (not useful for COTS)
 dynamic translation of binaries (very high
overheads)
Fine-grained
Taint Tracking
 RAMSES
approach: infer (most) flows using a
light-weight learning technique
Policy checking using taint-enhanced policies
Security-Sensitive
Operations
21 July 2015

Policies as predicates on arguments of
security-sensitive operations
14
Taint-Enhanced Policy Enforcement
Key requirements:
 Fine-grained taint
(byte-level
tracking)
 Taint-enhanced
policies
Input Interface
Program
Fine-grained
Taint Tracking
Security-Sensitive
Operations
21 July 2015
$name=“xyz’; UPDATE
products SET price=0
WHERE name=‘iPod”
$query=“SELECT price
FROM products WHERE
name=‘xyz’; UPDATE
products SET price=0
WHERE name=‘iPod’”
Policy: SQL commands
within $query shouldn’t
be tainted
Detects SQL Injection!
15
Preliminary Results on
Taint-Enhanced Policy Enforcement
 Uses
source code transformation on C
programs on Linux
 For
details, see [USENIX Security ’06]
 Idea
 Runtime
representation of taint
 Use
bit array tagmap to store taint tags for each
byte of memory
 Tag(a): representing taint bits of bytes at
address a in tagmap
 Update
21 July 2015
tagmap for each assignment
16
Transformation Overview
Assignment
x = y + z;
x = *p;
21 July 2015
 Tag(&x) = Tag(&y)||Tag(&z);
 Tag(&x) = Tag(p);
17
Taint-Enhanced Policies
Attack Type
Role of Tainted
Data
Taint-Enhanced Policy
Control-flow
hijack
Code pointers
jmp(addr) | addr should not be
tainted
Format string
Format specifiers
(e.g. “%n”)
vfprintf(fmtstr) | fmtstr should
not contain tainted format
specifiers
Directory
traversal
Directory traversal
strings (e.g. “/../”)
open(path) | path should not
contain tainted directory
traversal strings
Cross-site
scripting
Scripts in dynamic
HTML
html_print(str) | str should not
contain tainted script tags (e.g
<script>)
SQL injection
SQL keywords or
meta-chars
sql_query(qstr) | qstr should
not contain tainted SQL
keywords/meta-chars
Shell
command
injection
Shell commands or
meta-chars
popen(cmd) | cmd should not
contain tainted shell
commands/meta-chars
21 July 2015
18
Effectiveness Evaluation
CVE#
Program
Lang. Attack Type
CAN-2003-0201 Samba 2.2.8
C
Stack smashing
CVE-2000-0573
C
Format string
CAN-2005-1365 Pico server 3.2
C
Directory traversal
CAN-2003-0486 phpBB 2.0.5
PHP
SQL injection
CAN-2005-0258 phpBB 2.0.5
PHP
Directory traversal
CAN-2002-1341 SquirrelMail 1.2.10
PHP
Cross site scripting
CAN-2003-0990 SquirrelMail 1.4.0
PHP
Command injection
CAN-2005-1921 PHP XML-RPC
PHP
Command injection
CVE-1999-0045
Bash
Shell meta-character
wu-ftpd 2.6.0
nph-test-cgi
 Detected
all attacks
 Reported no false positives
21 July 2015
19
Learning Dataflows …
 Instrument
important APIs to record
operations, operands and return
codes
 Define “interesting relationships” on
operands to these operations
 Estimate dynamic information flow
from operand X to Y from
 Dataflow
model
 Runtime observed values of X, Y
21 July 2015
20
Possible Dataflow Relationships
 Unary
Relation: property of an individual
operand

Represented as X R c, where
X: an argument name,
c: a constant value,
R: a unary relation

Examples of unary relation R:
 equal => X takes only a single value always equal to c
 elementOf => X takes any value from the set c
 range => X takes values in the range c (e.g., c = (0, 2))
 isWithinDir => X is a file name argument that is
always contained within a specified directory c
21 July 2015
21
Possible Dataflow Relationships
 Binary
relations: captures relationships
between operands of two operations

Represented as X R Y, where
X, Y: argument names
R: a binary relation

Examples of binary relation R:
equal => equality between X and Y
 isWithinDir => file/directory X is within directory Y
 contains => directory X contains file/directory Y
 hasSameDirAs => X and Y are within a common directory
 hasSameBaseAs => X and Y have same base (eg: a.c, a.h)
 hasSameExtensionAs => X and Y have same extension (eg:
a.c, b.c)

21 July 2015
22
Example
start(“/opt/proj”, “/tmp/proj.tar”)
open(“/tmp/proj.tar”,
WR)**argv)
=3 {
1. int main(int argc, char
2. source_dir = argv[1], target_file = argv[2];
3. target_fd = open(target_file, WR);
opendir(“/opt/proj”)
opendir(“/opt/proj/src”)
4. push(source_dir);
5. while ((dir_name = pop()) != NULL) {
isdirectory(“/opt/proj/README”)
6. isdirectory(“/opt/proj/src”)
dir = opendir(dir_name);
7.
foreach (dir_entry 2 dir) {
8.
if (isdirectory(dir_entry))
open(“/opt/proj/README”,
9.
push(dir_entry); RD)=4
read(4,…)
10.
11.
else {
write(4, …)
source_fd
= open(dir_entry, RD);
12.
close(4)
read(source_fd,
buf);
13.
write(target_fd, buf);
14.
close(source_fd);
15.
}
16.
}
17. }
18. close(target_fd);
}
21 July 2015
START
start (I, O)
L3
FD3 = open(F3,M3)
L6
isdirectory(F8)
| F8 isWithinDir F6
close(FD14)
| FD14 equal FD11
L8
opendir(F6)
F6 isWithinDir I
isdirectory(F8)
| F8 isWithinDir F6
L11
L13
FD11=open(F11, M11)
| F11 equal F8;
M11 elementOf {RD}
read(FD12)
| FD12 equal FD11
L14
write(FD13)
| FD13 equal FD3
L12
close(FD14)
| FD14 equal FD11
| F3 equal O;
M3 elementOf {WR}
L18
23
close(FD18)
| FD18 equal FD3
Challenges in Dataflow Inference
 What operations to monitor
 Useful to monitor internal operations: strcpy,…
 Transformations,

e.g., URL encoding …
Mitigate by monitoring internal functions that
perform such decoding
 Scaling
learning algorithms to deal with
bulk data (arguments of read/write)

Previous work addressed mainly file names
 More

complex relationships
Not just equality or substring
 Capturing
21 July 2015
absence of flows
24
Taint-Enhanced Anomaly Detection
 Extend
learning technique to detect
anomalies
 Anomalous
structure of operands
 Anomalous relationships between
operands to different operations
 Key
point: utilize taint information in
structure
 Example: (SQL injection)
 Normal
SQL query contains alphanumeric tainted characters
 Anomalous query contains several types
of delimiter characters
21 July 2015
25
Filter Generation
21 July 2015
Automatic Synthesis of Filters
 To
learn a generalized filter from
many benign samples
 few (maybe just one) attack samples

 Challenge

To distinguish essential attack features from
nonessential features that can be easily changed
 Possible
solutions:
1. “Understand”

Requires extensive human involvement
2. Rely

on protocol standards
Possible in some cases, e.g., CGI for web apps
3. Leverage
21 July 2015
the meaning of inputs
program context
27
Leveraging Program Context ...
 Observation:
Programs already “understand” their inputs and
know how to process them
 They implicitly encode the characteristics of
inputs that are correctly handled, and those
that aren’t handled correctly

 Our
Approach:
Leverage this implicit knowledge
 Do this without requiring any manual effort

 Approximate
implicit knowledge using the context in
which an input is processed
 Leverage a model of program behavior to identify
context
21 July 2015
28
Light-weight Recovery
 Recovery
steps:
Release resources for current request
 Return control to await next request

 How
to do this automatically?
 Observation: Servers expect and handle
transient network errors
 Approach
Return network error code when dropping input
 Break TCP connections

 graceful
handling if benign input mistakenly dropped
 Error
returns may work on non-input
operations as well

Choose operations where error returns seen before
21 July 2015
29
Preliminary Results: Buffer
Overflow Signatures on Linux ...
 Essential
characteristics of BO attacks:
 Excessive
input length
 Anomalous character distribution (CD)
 Identify
anomalous characteristic
 present
in suspicious input
 absent in every benign input
 Check
21 July 2015
for anomaly in length, then CD
30
Signature Example
 Current
context, size-based (ntpd)
At read@(S1, 0xB3D0, 0xE857FF04)
Size > 500
“read() library call made at offset
0xB3D0 of segment S1, with the
return addresses hashed to
0xE857FF04, cannot receive input
larger than 500”
Segment
is a region of code, e.g.,
a shared library
21 July 2015
31
wu
ap -ftp
ac
he d
-s s
l
n tp
d
ir c
d
lsh
gtk d
ftp
sam d
ba
ep
ic 4
pa cvs
ssl
og
d
oo
ps
Attack to Benign Ratio
Effectiveness Summary
21 July 2015
20
18
16
14
12
10
8
6
4
2
0
32
Server Availability
named
named/Protected
• Similar results on
ntpd
1
Availability
0.8
• Improvement for
Apache is less
dramatic due to
use of threads
0.6
0.4
0.2
0
0.1
1
10
100
• Still, availability
improved by a
factor of 10 or
more
Attack rate (per second)
21 July 2015
33
Signature Generation for
Generalized Injection Attacks
 Available information
 Bytes S involved in attack
 Earliest operation Op(S’) where S’ flowed to S
 Context C in which S’ is invoked
 Types
of filters
Exact filter: deny Op(S’) in context C
 Statistical filter: deny Op(S’’) in context C
if S’ and S’’ share a statistical property P
 Filter derived by Policy projection:

 Let
P denote taint-enhanced policy violated in the attack
 compute predicate P’ so that P'(S’) => P(S)
21 July 2015
34
Policy Projection
Input: P (policy automaton), S (attack argument)
 Output: P’ (filter automaton)
 Steps:



Match S in P and find out tainted segment. This becomes
initial P’
(Generalization) Add other tainted transitions between
states in S into S
S: password=‘Bad’ OR 1==1’
P
S
#
S0
’
(’|’)
’
!(’|’|#)
21 July 2015
#
S1
’
S0
S2
!(’|’)
!(’|#)
Bad
35
S1
Success Criteria
 Detect/respond/immunize
attacks
 RAMSES
attacks
50%
Goal: 75-80% remote targeted
 Preserve
availability while immunizing
 False alarm rate < 10%
 RAMSES
 Generate
 RAMSES
21 July 2015
Goal: <2%
responses within 250 ms.
Goal: 100 ms
36
RAMSES Project Schedule
CY06
Baseline Tasks
CY09
CY08
CY07
Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3
1. Refine RAMSES
Requirements
2. Design RAMSES
3. Develop
Components
4. Integrate System
5. Analyze & Test
RAMSES
6. Coordinate & Rept
Prototypes
1
2
3
Optional Tasks
O.1 Enhancements
O.2 Cross-Area Exp. 1
O.3 Cross-Area Exp. 2
O.4 Transition Support
21 July 2015
37
Questions?
21 July 2015
38