DroidScope: Seamlessly Reconstructing the OS and Dalvik

Download Report

Transcript DroidScope: Seamlessly Reconstructing the OS and Dalvik

1 DroidScope: Seamlessly Reconstructing the OS and Dalvik Semantic Views for Dynamic Android Malware Analysis

Lok Kwong Yan, and Heng Yin Syracuse University Air Force Research Laboratory USENIX 2012 Presentation: 2012-09-11 曾毓傑

Outline

• Introduction • Background • Architecture • Interface & Plugins • Evaluation • Discussion & Conclusion

2

INTRODUCTION

3

4

Introduction

• Malicious applications exist in official and unofficial marketplace with a rate of 0.02% and 0.2% respectively • Virtualization-based analysis approach • Analysis runs underneath the entire virtual machine • Difficult for an attack within VM to disrupt the analysis • Loss the semantic contextual information when the analysis component is moved out of the box • We need to intercept certain kernel events and parse kernel data structure to reconstruct the semantic knowledge

5

DroidScope

• Reconstruct two levels of semantic knowledge • OS-level: to understand the activities of the malware process and its native components • Java-level: comprehend the behaviors in the Java components • Built on top of QEMU emulator • Build tools for analysis • Native instruction tracer • Dalvik instruction tracer • API tracer • Taint tracker

BACKGROUND

6

7

Android System Overview

Android System Parent process for all Android processes

libdvm.so

provide Java-level abstraction Kernel data structure

DroidScope Overview

8

ARCHITECTURE

9

10

Architecture

• Integrating the changes into the QEMU emulator • Came from Android SDK • Leave Android system unchanged • For different virtual devices can be loaded • Reconstruct OS-level and Java-level views • • • Monitors how malware’s Java components communicate with Android Java Framework Monitors how malware’s native components interact with the Linux Kernel Monitors how malware’s Java components and native components communicate through the JNI interface

11

Reconstructing OS-level View

• Basic Instrumentation • Insert extra instructions during the code translation phase for system status Target Instructions Add additional code for detection Tiny Code Generator(TCG) Native Instructions

12

Reconstructing OS-level View (Cont.)

• For example, context switch in ARM architecture would change the

c2_base0

and

c2_base1

registers, which stores the page table address • Extract semantic knowledge • System calls • Running processes, threads • Memory maps

13

Reconstructing OS-level View (Cont.)

• System calls • ARM architecture use service zero instruction svc #0 as making system calls, and system call number is in register

R7

• Processes and Threads • • Read

task_struct

structure for process information

pid

,

tgid

,

pgd

,

uid

,

gid

,

euid

,

egid

,

comm

,

cmdline

,

thread_info

sys_fork

,

sys_execve

,

sys_clone

, and

sys_prctl

system calls trigger the information update • Memory maps •

mm_struct

sys_mmap2

triggers the information update

14

Reconstructing Java-level View

• Dalvik Instructions • Knowing which instruction is executing right now • Register

R15

points to the currently executing Dalvik instruction

15

Reconstructing Java-level View (Cont.)

• Just-In-Time Compiler • Some hot, heavily used instructions are compiled into native machine code • Those code execution would skip the

mterp

component Call

dvmGetCodeAddr()

address of compiled code for Flush JIT cache, return

NULL

and reset

counter

to disable JIT function

16

Reconstructing Java-level View (Cont.)

• Dalvik Virtual Machine States • Record Register

R4

to

R8

for storing DVM states

R4

: Program Counter

R5

: Stack Frame Pointer

R6

:

InterpState

Structure

R7

: Instruction Counter

R8

:

mterp

Base Address

17

Reconstructing Java-level View (Cont.)

• Java Objects • Obtaining data inside Java objects such as string data

18

Symbol Information

• Native library symbols • Use

objdump

to retrieve symbol information • Some malwares often stripped of all symbol information • Dalvik or Java symbols • Use

dexdump

to retrieve symbol information • Data structures of DVM also contains some symbol information • •

InterpState

the

Method

Structure (Register

R6

) has a

method

field points to structure for the currently executing method

Method

structure has a

name

field points to method name

INTERFACE & PLUGINS

19

20

Interface & Plugins

• APIs for analysis customization • The instrumentation logic in DroidScope is complex and dynamic • An event based interface to facilitate custom analysis tool developement

21

Sample Plugin

• Setup which program to be analyzed and print all Dalvik opcode information

22

API Implementation

• API tracer • Instrument the invoke* and execute* Dalvik bytecodes to identify and log method invocations • Native instruction tracer • Gather each instruction including the raw instruction, its operands, and their values • Dalvik instruction tracer • Decode instructions into dexdump format, including values and all available symbol information • Taint Tracker • Monitor sensitive information and keep track data propagation

EVALUATION

23

Evaluation

• Benchmark checking efficiency and capability • 7 benchmark apps • AnTuTu Benchmark • AnTuTu CaffeineMark • CaffeineMark • CF-Bench • Mobile Processor Benchmark • Benchmark by Softweg • Linpack

24

Evaluation

• Performance • Capability • Analysis of DroidKongFu • Analysis of DroidDream

25

DISCUSSION & CONCLUSION

26

27

Discussion

• Limited Code Coverage • One drawback of dynamic analysis • By manipulating the return value of function call, we may increase the code coverage • Other Dalvik Analysis Tools • Dalvik/Java Static Analysis: Woodpecker, DroidMoss • Native Static Analysis: IDA, binutils, BAP • Android Dynamic Analysis: TaintDroid, DroidRanger • Linux Kernel Dynamic Analysis: logcat, adb

28

Conclusion

• We presented DroidScope, a fine grained dynamic binary instrumentation tool for Android that rebuilds two levels of semantic information