Malware Variant Generation

Download Report

Transcript Malware Variant Generation

‘Supervised Automation’ for Malware
Variant Generation: Theoretical and
Practical Implications
July 7, 2015
Rachit Mathur
Research Scientist
McAfee
18th EICAR Annual Conference
9th – 12th May, 2009
Berlin, Germany
Agenda
Introduction & Malware Growth
Supervised-Automation
Compare With Metamorphism
Real-World Examples
Detection Challenges
Conclusions & Future work
Questions
Malware Growth – All known samples
+180%
Malware Growth – Families vs Variants
Mar-09
Feb-09
Jan-09
Dec-08
Nov-08
Oct-08
Sep-08
Aug-08
Jul-08
Jun-08
May-08
Apr-08
Mar-08
Feb-08
Jan-08
Rogue AV Unique Binaries Discovered
140000
120000
100000
80000
60000
40000
20000
0
Sample Count Explosion
• Lots of variants per family
• New variants released even before a signature for
previous ones gets released
• Money-motivated organized malware gangs
– ‘Professional products’
– Pose serious detection challenges
• Difficult to anticipate changes
• Short-term per family proactive detection is minimum requirement
– Use bleeding-edge technology
• Conficker – crypto algorithms
• MBR rootkit – stealth techniques
• To evade detection is the primary motive
Morphing Malware
• Not the traditional poly or metamorphics
• Do not carry the mutator
• Delivered through the cloud (server-side)
– Drive-by downloads, social engineering, self-updating malware
– Binaries change often
• Now adopted by all
– Backdoor, PWS, AdClicker, Proxy, Worms etc
• Morphing services
– Tibs-Packed: Storm worm, downloader, uploder, spam-bot,
backdoors etc.
– FakeAV looking downloaders, backdoors, worms
• Human supervised automated variant generation system
Supervised-Automation
• Supervised Automation (SA) is semi-automated method of
generation of malware variants with sporadic human
intervention
• Loosely related to the concept of metamorphism
• Not based off of any particular malware family
Supervised Automation
Malicious binary &
info
B
Info
• ADD
• SUB
Loop-back to
re-encrypt
Select and apply
encryption
• XOR
• ROT
• RC4 Human
E(B)
Info
• Dead Code Insertion
Select and apply
morphing
• Junk Code Insertion
• CFG Obfuscation
• Instruction Substitution
M(E(B))
Info
Black-Box
signature
• Decryption
Key Obfuscation
extraction
• Geometric Fuzzyfication
Release-to-world
Supervised-Automation
• Generate any number of new variants at the desired
frequency
• Motive is to evade detection and not ‘blindly’ generate
variants
• Different pattern of operation observed in Tibs-Packed,
FakeAV, GamePWS trojans
SA vs. Metamorphism
• Generally speaking, virus detection is undecidable
• Solutions for specific sub cases have been proposed
• Let us see what existing results from comparable
technology apply to SA
• Purely automatic variant generation i.e. the concept of
metamorphism is studied
SA vs. Metamorphism
• Do not carry the engine
• Transformation logic is not self-contained
ownnot
code constant
• TransformationLocate
rules
Decode
• No feed-back loopAnalyze
• Transformations not limited
Transform
• Anti debugging, anti disassembly, anti
emulation : anti analyses
Metamorphic engine
Normalization based approach
• Transformation rules modelled as Term Rewriting Systems
(TRS) and related to formal grammars
• Proving equivalence between two programs w.r.t. a
rewriting system reduces to the famous word problem
– Undecidable in general
– Unless TRS is confluent and terminating
– Some approximation based approaches
push ecx
mov edi, 0x04
mov ecx, 0x04
mov edi, ecx
pop ecx
unconditional
push eax
push eax
mov eax, 0x04
eax not live
push 0x04
mov eax, 0x04
push eax
eax not live
Normalization based approach
RS1
RS2
RS3
Time
• Multiple TRS bad news for some solutions
•Q: Do multiple TRS really make a difference?
•Same worst case for a ‘well-designed’ system
•But multiple TRS does make things worse
Approaches
• Approaches that are agnostic of rule systems can be
useful against such systems
• Smart byte-based detection schemes
• Normalization based on general optimization techniques
and program semantics based detection methods
• Behaviour based detection may be useful today
• Emulation based techniques have been proposed earlier
to identify detectable behaviours but emulation has a host
of well known problems
Example – Storm worm
start of encrypted code
• Locate the start address of
encrypted data and size/end of the
data
• Calculate key(s): key[i]
• Apply key(s)
• Transfer control to decrypted code
Add ,
rotate
end of encrypted
code
Fake call
returns -1.
Example – Storm worm
start of encrypted code
Add ,
rotate
end of encrypted
code
Fake call
returns -1.
Example – Storm worm
start of encrypted code
Add ,
rotate
end of encrypted
code
Fake call
returns -1.
Example – Storm worm
• High, medium and low frequency changes
…..
Base Variant (BV)
Algorithm A
Algorithm B
EBV1
M1
…
M2
EBV2
Mn
M1
…
M2
M11
K
M21
M11
K
M11
K
M12
…
K
M22
M12
K
M12
…
K
M1n
K
Algorithm N
…..
M1
Mn
K
K
K
M2
…
Mn
M11
K
M11
M11
K
M11
K
…
…
…
M2n
M1n
Day 1
…..
EBV3
K
…
Algorithm C
Day 2 …. Day m Day m+1
K
M1n
Day m+2
K
Day n
K
M1n
Day n+1
K
M1n
Day n+2 Day o
Example – DNSChanger
• Uses obfuscated calls
Possible call targets
Rules can be conditional
Example – PWS dll
• Rules change often
• Constructs strings
HBXYXND-0109-NEW
Example – PWS dll
• Rules change often
• Constructs strings
WM_HOOKEX_RK
Example – PWS dll
• Rules change often
• Constructs strings
Explorer.exe
Example – PWS dll
• Rules change often
• Constructs strings
act=getpos&account=%s
Example – FakeAV
• junk code
• variable renaming
• register liveness
• second one is reversed
Detection Challenges
• Virus authors want to evade detection, and keep
undetected once a machine is compromised
• AV update should detect the ‘current’ vairant – somewhat
‘proactive’
• Able to detect all automatically generated variants up till
the next human based update
• Resistant to non-functional changes
Signatures
• Goal is to find ‘enough’ evidence to detect and classify a
file for practical purposes such that it will not generate
any false positives
– Generic
– Reliable : No falses
– “my virus botnet, attack ms08-067 ping”
Signatures
• Simple byte sequence based not useful
– Hash based
– Detection worthy strings
– Detection worthy code sequence
• Multiple sets of wildcard based byte sequences at various
locations that remain constant
• Emulation
• Decryption or cryptanalysis based
– Presence of a technique can yield itself to detection
• Geometry based
• Combination provides the right balance
Conclusions & Future Work
• Stakes are getting bigger with increasingly critical,
sensitive, high-value information at risk
• Adoption of cutting-edge research concepts and
innovation skills by virus authors
• More automation and more understanding of ‘correct’
transformation techniques is expected
• Interesting to formalize some results in the realm of SA
based malware
• Detections solutions which are agnostic of rewrite systems
need to be investigated.
• It will also be interesting to see how behaviour evolution
materializes in reality and any forward looking research
around that is very relevant
References
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Bruschi, D., Martignoni, L., & Monga, M. (2006). Detecting Self-mutating Malware Using Control-Flow Graph Matching. Lecture Notes in Computer
Science , 4064/2006 (Detection of Intrusions and Malware & Vulnerability Assessment), 129-143.
Bruschi, D., Martignoni, L., & Monga, M. (2006). Using code normalization for fighting self-mutating malware. International Symposium on Secure
Software Engineering. Washington, DC, USA: IEEE.
Chess, D. M., & White, S. R. (2000). An undetectable computer virus. In Proceedings of Virus Bulletin Conference.
Christodorescu, M., & Jha, S. (2003). Static analysis of executables to detect malicious patterns. SSYM'03: Proceedings of the 12th conference on
USENIX Security Symposium (pp. 12 - 30). USENIX Association.
Christodorescu, M., & Jha, S. (2004). Testing malware detectors. ACM SIGSOFT Software Engineering Notes , 29 (4), 34 - 44.
Christodorescu, M., Jha, S., Seshia, S. A., Song, D., & Bryant, R. E. (2005). Semantics-Aware Malware Detection. IEEE Symposium on Security and
Privacy (pp. 32 - 46). ACM Press.
Filiol, E. (2006). Malware Pattern Scanning Schemes Secure Against Black-box Analysis. Journal in Computer Virology , 35-50.
Filiol, E. (2007). Metamorphism, Formal Grammars and Undecidable Code Mutation. International Journal of Computer Science .
Filiol, E., & Josse, S. (2007). A statistical model for undecidable viral detection. Journal in Computer Virology , 3, 65-74.
Filiol, E., Jacob, G., & Liard, M. L. (2006). Evaluation methodology and theoretical model for antiviral behavioural detection strategies. Journal in
Computer Virology , 23-37.
Kapoor, A., & Mathur, R. (2008, June). STRIKE ME DOWN, AND I SHALL BECOME MORE POWERFUL! VIRUS BULLETIN , pp. 8-10.
Lakhotia, A., Kapoor, A., & Kumar, E. U. (2005, January). Are metamorphic viruses really invincible? - part II. Virus Bulletin , pp. 9-12.
Mathur, R. (2006, December). Normalizing Metamorphic Malware using Term-Rewriting. M.S. Thesis . University of Louisiana at Lafayette.
Mathur, R., & Kapoor, A. (2007, December). Exploring The Evolutionary Patterns Of Tibs-Packed Executables. Virus Bulletin , pp. 6-9.
Soeder, D., & Permeh, R. (2005). BootRoot. Retrieved from eEye: http://research.eeye.com/html/tools/RT20060801-7.html
Szor, P., & Ferrie, P. (2001). Hunting for metamorphic. 11th International Virus Bulletin Conference.
Tan, X. (2007). Anti-unpack Tricks in Malicious Code. AVAR. Seoul.
Walenstein, A., Mathur, R., Chouchane, M. R., & Lakhotia, A. (2008). Constructing malware normalizers using term rewriting. Journal in Computer
Virology , 307-322.
Walenstein, A., Mathur, R., Chouchane, M. R., & Lakhotia, A. (2007). The Design Space of Metamorphic Malware. Proceedings of the 2nd
International Conference on Information Warfare. Monterey, CA, U.S.A.
Webster, M., & Malcolm, G. (2008, July). Detection of metamorphic and virtualization-based malware using algebraic specification. Journal in
Computer Virology .
Thank You! (Danke!)
Suggestions & Questions:
Email: [email protected]