PPT - TSYS School of Computer Science
Download
Report
Transcript PPT - TSYS School of Computer Science
A Fast Approximate Detector
for the Win32.Simile Malware
Edna Milgo and Yasmine Kandissounon
TSYS Department of Computer Science
Columbus State University
Mid-East Chapter of ACM 2008 Fall conference
Nov.20-21 ,2008
Gatlinburg, Tennessee, USA.
Terminology
Malware
Program designed to disrupt or damage
system’s normal function
Malware signature
Malware’s unique characteristic or pattern
Malware detection
Identification of malware, usually by looking
for a signature
Metamorphic malware
Malware that alters its own code (signature)
to thwart detection
Malware
Malware 1
Malware 2
Malware 3
Malware detector
Signature 1
Signature 2
Signature 3
Signatures are Specific
Many forms of
malware exist
Malware detector extracts
patterns : bytes or behavior
that uniquely identify
a malware instance
W32.Simile’s
engine
W32.Simile
M
Simile Eve
W32.Simile
Signature
Metamorphic Malware /
Metamorphic Engine
Win32.
Simile
M
M
…
Too Many
Generation 1
Generation N Signatures
Generation 2
Challenge the
AV Scanners !
Malware detector
Storage
and
Signature N
Signature 1
Signature 2
Time
Win32.
Simile
…
Metamorphic Engine Transforms the Malware
Targeted Metamorphic
Malware
We Target W32.Simile
Very Sophisticated
Various Transformations
Code Substitution
expansion
compression
Permutation
Code encryption
Code Substitution
lea eax,[ecx+3]
mov eax,ecx
add eax,3
Current Detection Method
Look for
metAPHOR 1b BY tHe MeNTAl drilLER/29A
Current Engine and DAT files
But..Time consuming to store one signature per variant
Also expensive to update signature databases online
Proposed Approach: Faster detection
No need to store one signature for each variant
Faster since only disassembly is needed
Instruction Frequency Vector
Definition
Maps opcode mnemonics (instructions) with the frequency
of their occurrence in an assembly language program.
IFV’s are easy to compute in linear time!
mov
add
sub
push
3
3
2
1
Example
Program P:
mov add mov add sub push add mov sub
IFV(P):
(mov, 3),(add, 3),(sub, 2),(push, 1)
Experimental Set Up
Implemented a W32.Simile Simulator
Code expansion, Code compression
Generated variants
10 for each of the first 5 generations of W32. Simile
Generated benign
50 random benign opcode sequences
Generated the IFVs for each variant / benign
Modeled IFV evolution as a Markov chain
Transition matrix of the chain captures the probability
that the IFV of variant A mutates to the IFV of variant B.
(We hence selected 10 IFVs for the matrix)
Classification
Calculate the Euclidian distances between IFVs.
n
2
d= sqrt(∑i=0 (IFVx-IFVy) )
Given suspect program, set a threshold ε
pv = percentage of variants within ε of suspect
pnv = percentage of non-variants within ε of suspect
If pv > pnv then suspect is a variant
else suspect is not a variant
Evaluation/Results
- Accuracy: Markov
Chain Theory
- Perfect classifier.
- Smallest/largest
recorded distances
lrd(v,v)= 109.27
srd(v,v)=10.90
lrd(v,b)= 5370.79
srd(v,b) =4343.57
-A good ε is any
threshold s.t
109.27 < ε < 4343.57
Conclusion and Future Work
Impact
IFV overcomes code permutation
Faster since it simulates only the opcodes
No need for variant signature
Limitations / Future Work
Is just a filter/quick check. More may be
needed. Program analysis to the rescue.
Detection is now limited to only 10 samples ,
need more.
Mine Markov chain theory for more.