Enhancing Security of Real-World Systems with a Better Understanding of Threats Shuo Chen Ph.D.

Transcript Enhancing Security of Real-World Systems with a Better Understanding of Threats Shuo Chen Ph.D.

Enhancing Security of Real-World Systems
with a Better Understanding of Threats
Shuo Chen
Ph.D. Candidate in Computer Science
Center for Reliable and High Performance Computing
University of Illinois at Urbana-Champaign
1
My Dissertation

Security Threat Analysis and Mitigations in Real-World
Systems
– How errors in hardware and software impose security threats to
real-world systems? (common characteristics?)
– How effective are current defense techniques? (substantial
deficiencies?)
– How to build better defenses?

Analysis-centric research approach
– Study hardware memory errors  impact on system security
– Software vulnerabilities reported in Bugtraq and CERT
databases, source code of vulnerable applications
– Current attack methods and defense techniques
– Analysis results motivate the development of new defense
techniques.

Many areas related to my dissertation
2
I as a System Hacker/Builder

Summer’01, Avaya Labs, Basking Ridge, NJ
– Port Libsafe to Windows NT/2000.

Summer’02, Bell Labs, Holmdel, NJ
– Detection of network denial of service attacks
– Hack FreeBSD TCP/IP, network card drivers

Summer’03, Microsoft Research, Redmond, WA
– Audit-enhanced authentication in Kerberos
– NTOS security subsystem, Kerberos, LSA, NTDLL

Summer’04, Microsoft Research, Redmond, WA
– A tracing technique to identify the dependencies of
Windows applications on Administrator privileges
– NTOS security subsystem, access/privilege checking,
application interactions with NTOS
3
Outlines

– Security compromises due to HW/SW memory
corruptions
– A type of memory corruption attacks currently
believed to be rare is a realistic threat.
– Deficiencies of current defense techniques
Analyses

Solutions
Analyzing and Identifying Security Threats
on Real-World Systems
New Defense Techniques Towards a Better
Security Protection
– A common characteristic of memory corruption
attacks: pointer taintedness
– A theorem proving based program analysis
– A runtime detection technique
4
Analyzing and Identifying Security
Threats on Real-World Systems
5
Threat of Hardware Memory Errors
Due to hardware memory errors, users can log in with arbitrary passwords

Attacker
Network server (FTP and SSH)
Due to hardware memory errors, packets can penetrate firewalls
Attacker



Firewall (IPChains and Netfilter)
Target host
Emulate random hardware memory errors
A stochastic model to estimate such threats in real environments
Motivate other researchers to conduct physical fault injections
– Java type system subverted due to random hardware memory errors.
6
Threat of Software Vulnerabilities
Other
33%
Buffer
Overflow
44%
Globbing
2%
Format
String
7%


Heap
Corruption
8%
Integer
Overflow
6%
CERT Advisories:  66% vulnerabilities are low level
memory errors in software.
Widely exploited by attackers, worms and viruses.
7
State Machine Model: WU-FTP Server Attack
repeat
Embed
malicious
contents in
input
FTP_service()
Authentication;
x = user ID
seteuid(x)
get an FTP
command
SITE_EXEC(fn)
printf(fn,…)
Overwrite
a return
address
seteuid(0)
exec(“/bin/sh”)
8
Execute malicious code
State Machine Model: NULL-HTTP Server Attack
repeat
Overwrite
function
pointer foo
HTTP_service()
p=malloc(…)
process HTTP
header
free(p)
HTTP_POST()
*foo()
recv(p,…)
Corrupt
heap
structure
seteuid(0)
exec(“/bin/sh”)
Execute malicious code
9
Control Data Attack: Well-Known, Dominant

Control data:
– data used as targets of call, return and jump.
– widely understood as security critical elements


Control data attack: the most dominant form
of memory corruption attacks [CERT and
Microsoft Security Bulletin]
Many current defense techniques: to enforce
program control flow integrity to provide
security.
10
Non-control-data attacks



Currently very rare in reality.
One instance suggested by Young and
McHugh in 1987.
How applicable are such attacks
against many real-world software?
– Not studied yet, but important.
11
An Important Question

Are attackers in general incapable to mount noncontrol-data attacks against many real systems?
– PROBABLY NOT!
– Random hardware memory errors can subvert the security of
real-world systems with a non-negligible probability.
– Software vulnerabilities are more deterministic and more
amenable to attacks.
– Each attack exploiting software vulnerabilities is composed by
multiple primitive components. Allow potentially polymorphic
attacks. Dangerous.
12
Our Claim: General Applicability of
Non-control-data Attacks

We claim:
– Many real-world software applications are
susceptible to non-control-data attacks.
– The severity of the attack consequences is
equivalent to that due to control data attacks.

Validate the claim by constructing non-controldata attacks to get the root privilege on major
network servers
– FTP, HTTP, SSH and Telnet servers
– Over 1/3 of vulnerabilities in CERT advisories

Non-control-data attacks are realistic threats.
13
Non-control-data attack against WU-FTP
Server (via a format string bug)
int x;
FTP_service(...) {
authenticate(); x uninitialized, run as EUID 0
x = user ID of the authenticated user; x=109, run as EUID 0
seteuid(x); x=109, run as EUID 109. Lose the root privilege!
while (1) {
Get a special SITE EXEC command.
Get
a data command (e.g., PUT)
get_FTP_command(...); //vulnerable
Exploit a format string vulnerability.
x= 0, still run as EUID 109.
if (a data command?)
getdatasock(...);
}
}
getdatasock(
... ) { loop, still runs as EUID 0 (root).
When
return to service
x=0, run as EUID 0
Allow seteuid(0);
me to upload /etc/passwd
setsockopt(
); root privilege!
I can grant
myself...
the
x=0, run as EUID 0
seteuid(x);
14
Only}corrupt an integer, not a control data attack.
Non-control-hijacking attack against
NULL-HTTP Server (via a heap overflow bug)


Attack the configuration string of CGI-BIN path.
Mechanism of CGI
– suppose server name = www.foo.com
CGI-BIN = /usr/local/httpd/exe
/usr/local/httpd/exe
/bar
– Requested URL = http://www.foo.com/cgi-bin/bar
– The server executes

Our attack
– Exploit the vulnerability to overwrite CGI-BIN to /bin
/sh
– Request URL http://www.foo.com/cgi-bin/sh
– The server executes
The server gives me a root shell!
Only overwrite four characters in the CGI-BIN string.
Not a control data attack.
15
Non-control-data attack against SSH Communications
SSH Server (via an integer overflow bug)
void do_authentication(char *user, ...) {
auth = 0
int auth = 0;
...
auth = 0
while (!auth) {
/* Get a packet from the client */
type = packet_read();
auth = 1
switch (type) {
...
case SSH_CMSG_AUTH_PASSWORD:
Password incorrect,
if (auth_password(user, password))but auth = 1
auth =1;
case ...
}
if (auth) break;
auth = 1
}
/* Perform session preparation. */
Logged in without
do_authenticated(…);
16
correct password
}
More non-control-hijacking attacks

Against NetKit Telnet server (default Telnet
server of Redhat Linux)
– Exploit a heap overflow bug
– Overwrite two strings:
/bin/login –h foo.com -p
(normal scenario)
/bin/sh –h
–p
-p
(attack scenario)
– The server runs /bin/sh when it tries to authenticate
the user.

Against GazTek HTTP server
– Exploit a stack buffer overflow bug




Send a legitimate URL http://www.foo.com/cgi-bin/bar
The server checks that “/..” is not embedded in the URL
Exploit the bug to change the URL to
http://www.foo.com/cgi-bin/../../../../bin/sh
17
The server executes /bin/sh
Implications of Non-Control-Data Attacks



Control flow integrity is not a
sufficiently accurate approximation to
software security.
Many types of non-control data critical
to security
Once attackers have the incentive, they
are likely to succeed in non-controldata attacks.
18
Re-Examining Current Defense Techniques





Many of them are based on control flow
integrity
– Monitor system call sequences
– Protect control data
– Non-executable stack and heap
Pointer encryption PointGuard
Address space randomization
StackGuard, Libsafe and FormatGuard
Building a generic and secure defense
technique: still an open problem.
19
Pointer Taintedness Detection:
Towards a Better Security
Protection for Real-World Systems
20
Pointer Taintedness



Pointer Taintedness: a pointer value,
including a return address, is derived
from user input.
Most memory corruption attacks are due
to pointer taintedness.
Pointer taintedness: a unifying
perspective for reasoning about many
security attacks.
21
Most Memory Corruption Attacks are Due
to Pointer Taintedness

Format string attack
– Taint an argument pointer of functions such
as printf, sprintf and syslog.

Stack buffer overflow (stack smashing)
– Taint a frame pointer or a return address.

Heap corruption
– Taint the free-chunk doubly-linked list
maintaining the heap structure.

globbing attack
– User input resides in a location that is used
as a pointer by the parent function of glob().
22
Internals of Stack Buffer Overflow Attacks
Vulnerable code:
char buf[100];
strcpy(buf,user_input);
Stack growth
High
Return addr
Frame pointer
buf[99]
…
buf[1]
buf[0]
Frame pointer or
return address
can be tainted.
user_input
buf
Low
23
Internals of Format String Attacks
Vulnerable code:
recv(buf);
printf(buf);
Stack growth
High
Low
\xdd \xcc \xbb \xaa %d %d %d %n
/* should be printf(“%s”,buf) */
…
%n
%d
%d
%d
0xaabbccdd
fmt: format string pointer
ap:
pointer
fmt:argument
format string
pointer
ap: argument pointer
In vfprintf(),
*ap is a
if (fmt points to “%n”)
24
tainted
value.
then **ap = (character count)
Internals of Heap Corruption Attacks
user input
Vulnerable code:
buf = malloc(1000);
recv(sock,buf,1024);
free(buf);
Free chunk A
Allocated buffer buf
Free chunk B
fd=A
bk=C
In free():
B->fd->bk=B->bk;
B->bk->fd=B->fd;
Free chunk C
When B->fd and B->bk are tainted, the effect of free() is to
write a user specified value to a user specified address.
25
Building Defense Techniques
based on Pointer Taintedness

Static code analysis: analyze the
source code to extract the conditions
under which the possibility of pointer
taintedness exists.
– To uncover potential vulnerabilities

Runtime detection: monitor at runtime
whether a tainted value is
dereferenced as a pointer.
– To defeat memory corruption attacks
26
Static Analysis about Pointer Taintedness:
To Extract Security Specifications of Library Functions
IFIP International Information Security Conference 2004
27
Library function specifications are
crucial to secure programming

Library function specifications are specified
empirically
– printf(fmt,…), strcpy(d,s), free(p), glob(p),
strtok(s,del), savestr(p), ….

A unified reason why these specifications are
required
– Required to eliminate pointer taintedness.


Extraction of security specifications of a
function is reduced to a theorem proving task
Formal and complete specifications required
by compiler techniques to check application
source code for security.
28
Semantics of Pointer Taintedness

Formal definition of program semantics is required for
theorem proving.
– Currently defined using an equational logic framework

Taintedness-aware memory model
– The logic framework defines operations to fetch the content
and test the taintedness (true/false) of each memory
location.

Incorporate pointer taintedness into program
semantics
– Define program semantics at the assembly level to reason
about memory layout.
– Load/Store/ALU instructions: propagate taintedness from
source data to destination data.
– Input functions (scanf, recv and recvfrom)

Axiom: The memory locations in the receiving buffer are tainted
immediately after these function calls.
29
Extracting Function Specifications
by Theorem Prover
C source code of
a library function
Automatically translated
to formal semantic
representation
formal semantic
representation
Theorem generation
For each pointer dereference in an
assignment, generate a theorem
stating that the pointer is not tainted
Theorem proving
A set of sufficient conditions that imply the validity of the theorems.
They are the security specifications of the analyzed function.
30
Example:
vfprintf()
int vfprintf (FILE *s, const char *format, va_list ap)
{ char * p, *q; int done,data,n,state;
char buf[10];
p=format; done=0; if (p==NULL) return 0; state=NO_PENDING;
while (*p != 0) {
if (state==NO_PENDING) {
if (*p=='%') state=PENDING;
else outchar(s,*p); }
else {
switch (*p) {
case '%':
outchar(s,'%')
break;
case 'd':
data=va_arg (ap, int);
if (data<0) { outchar(s,'-'); data=-data; }
n=0;
while (data>0 && n<10) {
Theorem1: buf+n should not be a tainted value
case 's':
case 'n':
Theorem2: q should not be a tainted value
buf[n]=data%10+'0';
data/=10;
n++; }
while (n>0) { n--; outchar(s,buf[n]); }
break;
q=va_arg (ap, char *);
if (q==NULL) break;
while (*q!=0) {
outchar(s,*q)
q++; }
break;
q= va_arg(ap,void*) ;
*(int*) q = done;
break;
outchar(s,*p)
default:
}
state=NO_PENDING;
}
}
p++;
} return done;
31
Extracting the Specifications of vfprintf()
iterate


Try to prove the two theorems
The theorem prover cannot complete the proof initially
– only valid under certain preconditions.

Add these preconditions as axioms to the theorem
prover.
Repeat until both theorems are proved.

Four preconditions are added: the specifications of

vfprintf (FILE *s, const char *format, va_list ap)
– ap never points to any location within the current function
frame.
– *ap never points to the location of variable ap, i.e., *ap  &ap
– Suppose the memory segment that ap sweeps over is called
ap_activitiy_range, then *ap never points to any location
within ap_activitiy_range.
– No locations within ap_activitiy_range are tainted before
vfprintf() is called.
Suggest the scenario of format string vulnerability
32
Other Studied Examples

Function strcpy()
– Four security specifications indicating buffer overflow, buffer
overlapping and buffer underflow scenarios causing pointer
taintedness.

Function free() of a heap management system
– Seven security specifications are extracted, including several
specifications indicating heap corruption vulnerabilities.

Socket read functions of Apache HTTP Server and
NULL HTTP Server
– Apache function is proven to be free of pointer taintedness.
– Two (known) vulnerabilities are exposed in the theorem
proving process of NULL HTTP Server function.
33
Runtime Pointer Taintedness Detection:
To Defeat Memory Corruption Attacks
To appear in IEEE Conference on Dependable Systems and Networks, 2005.
34
The Technique

A processor architectural level mechanism
to detect pointer taintedness
– On SimpleScalar simulator
Implemented a taintedness-aware memory
system
 Extened instructions to track taintedness

– To show the validity of pointer taintedness
concept on whole programs of real
applications
Network servers
 SPEC 2000 integer benchmarks

35
Evaluations on Real-World Software

Evaluation
–
–
–
–

Effectiveness of detection
No false alarm in any application evaluated
Transparent to applications
A small number of potential attack scenarios
undetected.
Pointer taintedness detection can be applied
to the whole program of real software
– offers a substantial improvement on security
protection.
36
Conclusions
37
Conclusions

Many real-world software can be compromised by
corrupting non-control data.
– It is insufficient to rely on control flow integrity for
software security.


Pointer taintedness is a unifying perspective to
reason about most memory corruption
vulnerabilities/attacks.
Reasoning about pointer taintedness is a promising
direction to enhance security on real-world systems
– A theorem proving based code analysis approach
– A runtime pointer taintedness detection mechanism
38
Future Directions

Short term goals
– Provide a higher degree of automation for the theorem
proving technique.
– Reduce the intrusiveness of the runtime pointer
taintedness detection technique


Combine with the theorem proving technique. The processor
only checks function preconditions.
Long term goals
– Extract programming styles susceptible to security attacks.
Can compilers detect bad programming styles?
– Identify a broader range of non-traditional security
threats.
– Study historical data about how security vulnerabilities
were discovered, reported and patched.
– Decompose the behaviors of viruses, worms and rootkits
to a number of basic building blocks.
39
Summary of My Research Methodology

Analysis-centric approach
– A significant amount of effort in my dissertation is
on analysis.
– Starting from the reality (usually a mess) to define
problems!

I am a data analysis person
– Excited to analyze real data and incidents
– Tedious? Sometimes, but it is a step toward a lot of
fun.
– Rewarding? Definitely. Especially important for
systems research.
– Goal: strongly motivate research topics that solve
problems in the reality.
40

Enhancing Security of Real-World Systems with a Better Understanding of Threats Shuo Chen Ph.D.

Transcript Enhancing Security of Real-World Systems with a Better Understanding of Threats Shuo Chen Ph.D.

Directory