PPT - TSYS School of Computer Science

Download Report

Transcript PPT - TSYS School of Computer Science

A Fast Approximate Detector
for the Win32.Simile Malware
Edna Milgo and Yasmine Kandissounon
TSYS Department of Computer Science
Columbus State University
Mid-East Chapter of ACM 2008 Fall conference
Nov.20-21 ,2008
Gatlinburg, Tennessee, USA.
Terminology
 Malware
Program designed to disrupt or damage
system’s normal function
 Malware signature
Malware’s unique characteristic or pattern
 Malware detection
Identification of malware, usually by looking
for a signature
 Metamorphic malware
Malware that alters its own code (signature)
to thwart detection
Malware
Malware 1
Malware 2
Malware 3
Malware detector
Signature 1
Signature 2
Signature 3
Signatures are Specific
Many forms of
malware exist
Malware detector extracts
patterns : bytes or behavior
that uniquely identify
a malware instance
W32.Simile’s
engine
W32.Simile
M
Simile Eve
W32.Simile
Signature
Metamorphic Malware /
Metamorphic Engine
Win32.
Simile
M
M
…
Too Many
Generation 1
Generation N Signatures
Generation 2
Challenge the
AV Scanners !
Malware detector
Storage
and
Signature N
Signature 1
Signature 2
Time
Win32.
Simile
…
Metamorphic Engine Transforms the Malware
Targeted Metamorphic
Malware
We Target W32.Simile
 Very Sophisticated
 Various Transformations
 Code Substitution
expansion
compression
 Permutation
 Code encryption
Code Substitution
lea eax,[ecx+3]
mov eax,ecx
add eax,3
Current Detection Method
 Look for
metAPHOR 1b BY tHe MeNTAl drilLER/29A
 Current Engine and DAT files
 But..Time consuming to store one signature per variant
 Also expensive to update signature databases online
Proposed Approach: Faster detection
 No need to store one signature for each variant
 Faster since only disassembly is needed
Instruction Frequency Vector
Definition
Maps opcode mnemonics (instructions) with the frequency
of their occurrence in an assembly language program.
 IFV’s are easy to compute in linear time!
mov
add
sub
push
3
3
2
1
Example
Program P:
mov add mov add sub push add mov sub
IFV(P):
(mov, 3),(add, 3),(sub, 2),(push, 1)
Experimental Set Up
 Implemented a W32.Simile Simulator
 Code expansion, Code compression
 Generated variants
 10 for each of the first 5 generations of W32. Simile

Generated benign
 50 random benign opcode sequences
 Generated the IFVs for each variant / benign
 Modeled IFV evolution as a Markov chain
 Transition matrix of the chain captures the probability
that the IFV of variant A mutates to the IFV of variant B.
(We hence selected 10 IFVs for the matrix)
Classification
 Calculate the Euclidian distances between IFVs.
n
2
d= sqrt(∑i=0 (IFVx-IFVy) )
 Given suspect program, set a threshold ε
 pv = percentage of variants within ε of suspect
 pnv = percentage of non-variants within ε of suspect
 If pv > pnv then suspect is a variant
else suspect is not a variant
Evaluation/Results
- Accuracy: Markov
Chain Theory
- Perfect classifier.
- Smallest/largest
recorded distances
lrd(v,v)= 109.27
srd(v,v)=10.90
lrd(v,b)= 5370.79
srd(v,b) =4343.57
-A good ε is any
threshold s.t
109.27 < ε < 4343.57
Conclusion and Future Work
Impact
 IFV overcomes code permutation
 Faster since it simulates only the opcodes
 No need for variant signature
Limitations / Future Work
 Is just a filter/quick check. More may be
needed. Program analysis to the rescue.
 Detection is now limited to only 10 samples ,
need more.
 Mine Markov chain theory for more.