METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong Mark Stamp

Download Report

Transcript METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong Mark Stamp

METAMORPHIC SOFTWARE FOR GOOD AND EVIL

Wing Wong & Mark Stamp November 20, 2006

Outline

I.

II.

III.

IV.

V.

      Metamorphic software What is it?

Good and evil uses Metamorphic virus construction kits How effective are metamorphic engines?

How to compare two pieces of code?

Similarity of viruses/normal code Can we detect metamorphic viruses?

Commercial virus scanners HMMs and similarity index Conclusion

PART I

Metamorphic Software

What is Metamorphic Software?

    Software is

metamorphic

 All copies do the same thing provided  Internal structure differs Today almost all software is

cloned

“Good” metamorphic software…  Mitigate buffer overflow attacks “Bad” metamorphic software…  Avoid virus/worm signature detection

Metamorphic Software for Good?

    Suppose program has a buffer overflow If we clone the program   One attack breaks

every

copy Break once, break everywhere (BOBE) If instead, we have metamorphic copies  Each copy still has a buffer overflow    One attack does not work against every copy BOBE-resistant Analogous to genetic diversity in biology A little metamorphism does a lot of good!

Metamorphic Software for Evil?

   Cloned virus/worm can be detected   Common signature on

every

copy Detect once, detect everywhere (DODE?) If instead virus/worm is metamorphic  Each copy has different signature    Same detection may not work against every copy Provides DODE-resistance?

Analogous to genetic diversity in biology Effective use of metamorphism here is tricky!

Crypto Analogy

   Consider WWII ciphers German Enigma   Broken by Polish and British cryptanalysts Design was (mostly) known to cryptanalysts Japanese Purple  Broken by American cryptanalysts  Design was (mostly) unknown to cryptanalysts

Crypto Analogy

    Cryptanalysis  break a (known) cipher Diagnosis  determine how an unknown cipher works (from ciphertext) Which was the greater achievement, breaking Enigma or Purple?

  Cryptanalysis of Enigma was harder Diagnosis of Purple was harder Can make a reasonable case for either…

Crypto Analogy

    What does this have to do with metamorphic software?

Suppose the good guys generate metamorphic copies of software Bad guys can attack individual copies Can bad guys attack all copies?

 If they can diagnose our metamorphic generator, maybe  But that’s a diagnosis problem…

Crypto Analogy

     What about case where bad guys write metamorphic code?

 Metamorphic viruses, for example Do good guys need to solve diagnosis problem?

 If so, good guys are in trouble Not if good guys “only” need to detect the metamorphic code (not diagnose) Not claiming the good guys job is easy Just claiming that there is hope…

Virus Evolution

   Viruses first appeared in the 1980s  Fred Cohen Viruses must avoid signature detection  Virus can alter its “appearance” Techniques employed  encryption   polymorphic metamorphic

Virus Evolution -

Encryption

   Virus consists of  decrypting module (decryptor)  encrypted virus body Different encryption key  different virus body signature Weakness  decryptor can be detected

Virus Evolution –

Polymorphism

   Try to hide signature of decryptor Can use

code emulator

to decrypt putative virus dynamically Decrypted virus body is constant  Once (partially) decrypted, signature detection is possible

Virus Evolution –

Metamorphism

  Change virus body Mutation techniques:  permutation of subroutines   insertion of garbage/jump instructions substitution of instructions

PART II

Virus Construction Kits

Virus Construction Kits – PS-MPC

 According to Peter Szor: “…

PS-MPC

[Phalcon/Skism Mass- Produced Code generator] uses a generator that effectively works as a

code-morphing engine

…… the viruses that PS-MPC generates are not [only] polymorphic, but their

decryption routines and structures change in variants

…”

Virus Construction Kits – G2

 From the documentation of

G2

(

Second Generation virus generator

): “… different viruses may be generated from identical configuration files…”

Virus Construction Kits – NGVCK

  From the documentation for

NGVCK

(

Next Generation Virus Creation Kit

): “… all created viruses are or more scanstrings.……

completely different in structure and opcode variability of the entire code

” …… impossible to catch all variants with one

nearly 100%

Oh, really?

PART III

How Effective Are Metamorphic Engines?

How We Compare Two Pieces of Code Assembly programs Program X Opcode sequences Graph of matches (matching 3 opcodes) 0 1 2 3 call pop mov sub m-1 n-1 jmp Graph of real matches (lines with length > 5) Score m-1 score = average % match Program Y 0 1 2 3 push mov sub and 0 Program X n-1 0 Program X n-1 m-1 retn

Virus Families – Test Data

  Four generators, 45 viruses  20 viruses by NGVCK    10 viruses by G2 10 viruses by VCL32 5 viruses by MPCGEN 20 normal utility programs from the Cygwin bin directory

Similarity within Virus Families – Results 1.0

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0.0

0 50 100

Comparison number

150 200 NGVCK viruses Normal files

Similarity within Virus Families – Results min max average Minimum, maximum, and average similarity scores NGVCK

0.01493

0.21018

0.10087

G2 0.62845

0.84864

0.74491

VCL32 0.34376

0.92907

0.60631

MPCGEN 0.44964

0.96568

0.62704

Normal 0.13603

0.93395

0.34689

Similarity within Virus Families – Results -0.2

Size of bubble = average similarity

1.2

1 0.8

0.6

0.4

Normal VCL32 MPCGEN 0.2

0 0 NGVCK 0.2

0.4

Minmum similarity score

0.6

G2 0.8

NGVCK G2 VCL32 MPCGEN Normal

Similarity within Virus Families – Results IDA_ NGVCK0- IDA_ NGVCK8 (11.9%) IDA_G4- IDA_G7 (75.2%)

Similarity within Virus Families – Results IDA_VCL 0- IDA_VCL 9 (60.2%) IDA_MPC 1- IDA_MPC 3 (58.0%)

NGVCK Similarity to Virus Families

 NGVCK versus other viruses  0% similar to G2 and MPCGEN viruses   0 – 5.5% similar to VCL32 viruses (43 out of 100 comparisons have score > 0) 0 – 1.2% similar to normal files (only 8 out of 400 comparisons have score > 0)

NGVCK Metamorphism/Similarity

 NGVCK  By far the highest degree of metamorphism of any kit tested   Virtually no similarity to other viruses or normal programs Undetectable???

PART IV

Can Metamorphic Viruses Be Detected?

Commercial Virus Scanners   Tested three virus scanners  eTrust version 7.0.405   avast! antivirus version 4.7

AVG Anti-Virus version 7.1

Each scanned 37 files  10 NGVCK viruses    10 G2 viruses 10 VCL32 viruses 7 MPCGEN viruses

Commercial Virus Scanners 

Results

   eTrust and avast! detected 17 (G2 and MPCGEN) AVG detected 27 viruses (G2, MPCGEN and VCL32) none of NGVCK viruses detected by the scanners tested

Virus Detection with HMMs

 Use

hidden Markov models

(HMMs) to represent

statistical properties

of a set of metamorphic virus variants  Train the model on family of metamorphic viruses  Use trained model to determine whether a given program is

similar

the viruses the HMM represents to

Virus Detection with HMMs – Data

  

Data set

 200 NGVCK viruses (160 for training, 40 for testing)

Comparison set

  40 normal exes from Cygwin 25 other “ non-family ” viruses (G2, MPCGEN and VCL32) 25 HMM models generated and tested

Virus Detection with HMMs – Methodology

Training:

(1)

Training set

(160 files) (2) Training (3) HMM (3) Scoring Scores (LLPO) virus0 -2.0

virus1 -2.3

: :

random0 -11.3

:

other0 -8.9

Data Set

(1)

Test set

(40 files) Normal programs (40 files) Other viruses (25 files)

Comparison Set

(4)

Threshold

Classifying:

(1) Scoring LLPO > Threshold ?

Program A HMM

Virus Detection with HMMs – Results 0 -20 0 -40 -60 -80 -100 -120 -140 -160 10

Test set 0, N = 2

20 30 40 family viruses normal files

File number

Virus Detection with HMMs – Results  Detect some other viruses “for free”

Test set 0, N = 3

0 -20 0 -40 -60 -80 -100 -120 -140 -160 -180 10 20 30 40 family viruses non-family viruses normal files

File number

Virus Detection with HMMs

 Summary of experimental results  All normal programs distinguished    VCL32 viruses had scores close to NGVCK family viruses With proper threshold, 17 HMM models had 100% detection rate and 10 models had 0% false positive rate No significant difference in performance between HMMs with 3 or more hidden states

Virus Detection with HMMs – Trained Models    Converged probabilities in HMM matrices may give insight into the

features

of the represented viruses We observe  opcodes grouped into “hidden” states  most opcodes in one state only What does this mean?

 We are not sure…

Detection via Similarity Index

 Straightforward

similarity index

be used as detector  can To determine whether a program belongs to the NGVCK virus family, compare it to any randomly chosen NGVCK virus   NGVCK similarity to non-NGVCK code is small Can use this fact to detect metamorphic NGVCK variants

Detection via Similarity Index

Threshold determination:

Data set

D

Virus

V

(randomly chosen) Pairwise comparison Subset of

D

(randomly chosen) Virus 0 Virus 1

:

Virus X Scoring Similarity scores Virus 0 0.035

Virus 1 0.041

: :

Virus X 0.189

Classifying:

Program A Virus

V

Scoring

Similarity score > Threshold ?

Yes => family virus No => not family virus

Detection via Similarity Index

 Experiment  compare 105 programs to one selected NGVCK virus  Results   100% detection, 0% false positive Does not depend on specific NGVCK virus selected

PART V

Conclusion

Conclusion

  Metamorphic generators vary a lot  NGVCK has highest metamorphism (10% similarity on average)   Other generators far less effective (60% similarity on average) Normal files 35% similar, on average But, NGVCK viruses can be detected!

 NGVCK viruses

too different

viruses and normal programs from other

Conclusion

   NGVCK viruses not detected by commercial scanners we tested Hidden Markov model (HMM) detects NGVCK (and other) viruses with high accuracy NGVCK viruses also detectable by similarity index

Conclusion

  All metamorphic viruses tested were detectable because  High similarity within family and/or  Too different from normal programs Effective use of metamorphism by virus/worm requires  A high degree of metamorphism

and

similarity to other programs  This is not trivial!

The Bottom Line

  Metamorphism for “good”   Buffer overflow mitigation, BOBE resistance A little metamorphism does a lot of good Metamorphism for “evil”    For example, try to evade virus/worm signature detection Requires high degree of metamorphism and similarity to normal programs Not impossible, but not easy…

The Bottom Bottom Line

  All-too-often in security, the advantage lies with the bad guys For metamorphic software, perhaps the inherent advantage lies with the good guys

     

References

X. Gao, Metamorphic software for buffer overflow mitigation, MS thesis, Dept. of CS, SJSU, 2005 P. Szor, The Art of Computer Virus Research and Defense, Addison-Wesley, 2005 M. Stamp, Information Security: Principles and Practice, Wiley InterScience, 2005 M. Stamp, Applied Cryptanalysis: Breaking Ciphers in the Real World, Wiley, 2007 W. Wong, Analysis and detection of metamorphic computer viruses, MS thesis, Dept. of CS, SJSU, 2006 W. Wong and M. Stamp, Hunting for metamorphic engines, Journal in Computer Virology, Vol. 2, No. 3, 2006, pp. 211-229

Appendix

Bonus Material

Hidden Markov Models (HMMs)

      state machines transitions between states have fixed probabilities each state has a probability distribution for observing a set of observation symbols states = features of the input data transition and the observation probabilities = statistical properties of features can “train” an HMM to represent a set of data (in the form of observation sequences)

HMM Example – the Occasionally Dishonest Casino 0.95

1: 1/6 2: 1/6 3: 1/6 4: 1/6 5: 1/6 6: 1/6 Fair 0.05

0.1

1: 1/10 2: 1/10 3: 1/10 4: 1/10 5: 1/10 6: 1/2 Loaded 0.9

HMM Example – the Occasionally Dishonest Casino      2 states: fair/loaded The switch between dice is a Markov process Outcomes of a roll have different probabilities in each state If we can only see a sequence of rolls, the

state sequence is hidden

want to understand the underlying Markov process from the observations

HMMs – the Three Problems

1.

2.

3.

 Find the likelihood of seeing an observation sequence O given a model i.e. P(O |  ) 

,

Find an optimal state sequence that could have generated a sequence O Find the model parameters given a sequence O There exist

efficient algorithms

solve the three problems to

HMM

HMM Application – Determining the Properties of English Text    Given: a large quantity of written English text Input: a long sequence of observations consisting of 27 symbols (the 26 lower-case letters and the word space) Train a model to find the most probable parameters (i.e., solve Problem 3)

HMM Application – Initial and Final Observation Probability Distributions

HMM Application - Results

    Observation probabilities converged, each letter belongs to one of the two hidden states The two states correspond to consonants and vowels Can use trained model to score any unknown sequence of letters to determine whether it corresponds to English text. (i.e. Problem 1) Note:  

no a priori assumption

was made HMM effectively recovered the

statistically significant feature

inherent in English

HMM Application - Results

  Probabilities can be sensibly interpreted for up to n = 12 hidden states Trained model could be used to detect English text, even if the text is “disguised” by, say, a simple substitution cipher or similar transformation

HMMs – The Trained Models

0.40

0.35

0.30

0.25

0.20

0.15

0.10

0.05

0.00

p o p re tn p u sh jb rcl jn b ja d iv a d c ro r sh r

opcode

ro l a d d sa r b o u n d cmp sb re tf mo v xo r d e c n o t l imu mo vsb st o sd lo d sw lo d sd lo d sb in re p e mo vsd fn st e n v cmc j n s jle cl c rcr fil d o u t state 0 state 1 state 2

HMMs – Run Time of Training Process  5 to 38 minutes, depending on number of states N. 2500 2000 1500 500 iterations 800 iterations 1000 500 0 1 2 3 4 5

Number of states N

6 7

HMMs – Run Time of Classifying Process  0.008 to 0.4 milliseconds, depending on N and number of opcodes T . 0.45

0.4

0.35

0.3

0.25

0.2

0.15

0.1

0.05

0 0 500 1000

Length of observation sequence T

1500 N = 2 N = 3 N = 4 N = 5 N = 6

AVG Anti-Virus Scanning Result