Virtunoid: Breaking out of KVM

Download Report

Transcript Virtunoid: Breaking out of KVM

Nelson Elhage Black Hat USA 2011

      Introduction Related work Background Knowledge Attack Detailed      CVE 2011-1751 Bug Detailed Exploit Detailed (Take Control of %rip) Inject Shellcode into host Disable non executable page Bypassing ASLR Conclusions Reference

 It was found that the PIIX4 Power Management emulation layer in qemu-kvm did not properly check for hot plug eligibility during device removals. A privileged guest user could use this flaw to crash the guest or, possibly, execute arbitrary code on the host. (CVE-2011-1751)

 a generic and open source machine emulator and virtualizer.

 Three components:    Kvm.ko

Kvm-intel.ko or kvm-amd.ko

Qemu-kvm

 The core KVM kernel module  Provides ioctls for communicating the kernel module  Primarily responsible for emulating the virtual CPU and MMU  Emulates a few devices in-kernel for efficiency  Contains an emulator for a subset of x86 used in handling certain traps

 Provides support for Intel’s VMX and AMD’s SVM virtualization extensions  Relatively small compared to the rest of KVM

 Provides the most direct user interface to KVM  Based on the classic x86 emulator  Implements the bulk of the virtual devices a VM uses  Implements a wide variety of possible devices and buses  An order of magnitude more code than the kernel module

 

Static QEMUTimer *active_timers[QEMU_NUM_CLOCKS] Struct QEMUTimer { QEMUClock *clock; int64_t expire_time; QEMUTimerCB *cb; /* call back function*/ void *opaque; /* parameter */ struct QEMUTimer *next; /* link list */ }

Active_timers QEMUTimer

 Related functions:  Qemu_new_timer: allocate a memory region for the new timer.

 Qemu_mod_timer: modify the current timer add it to link list.  Qemu_run_timers: loop through the link list and execute the timer structure call back function with the opaque as the parameter

 The main_loop_wait function will iterate through the active_timers and call qemu_run_timers()

 A computer clock that keep track of the current time  MC146818 RTC hardware manual can be found  http://wiki.qemu.org/File:MC146818AS.pdf

}  RTCState structure Struct RTCState { …..

QEMUTimer *second_timer; QEMUTimer *second_timer2;

 Related functions:  Rtc_initfn : initialize the RTC  Rtc_update_second : update the expire time of the QEMUTimer and add it to the link list.

 rtc_initfn : RTCState *s = ….

s->second_timer = qemu_new_timer(rtc_clock, rtc_updated_second, s) s->second_timer2 = qemu_new_timer(rtc_clock, rtc_update_second2, s) qemu_mod_timer(s->second_timer2, s->next_second_time)

……… Second_timer Second_timer2 RTCState Cb opaque Next Active_timer Rtc_update_second ……….

………..

……….

QEMUTimer Cb opaque Next QEMUTimer ……….

………..

……….

Rtc_update_second2

 A south bridge chip.

 Default south bridge chip used by qemu-kvm  Include ACPI, PCI-ISA, and an embeded MC146818 RTC.

 Support PCI device hotplug, write values to IO port 0xae08   Qemu use qdev_free to emulate device hotplug.

Certain devices don’t support device hotplug but qemu didn’t check this.

 It should not be possible to unplug the ISA bridge  KVM’s emulated RTC is not designed to be unplugged.

Did not check The device can Be unplug or not

Being dealloc Add the second timer to link list.

   #include  Int main(){  iopl(3); outl(2, 0xae08); return 0;  }

Unplug RTC ……… Second_timer RTCState Active_timer Cb opaque Next QEMUTimer Rtc_update_second Cb opaque Next QEMUTimer ……….

………..

……….

Unplug RTC Second_timer RTCState Active_timer Cb opaque Next QEMUTimer Rtc_update_second Cb opaque Next QEMUTimer ……….

………..

……….

…… …… …… Dummy memory region

Return to main_loop_wait Call qemu_run_timers Second_timer RTCState Active_timer Cb opaque Next QEMUTimer Rtc_update_second Cb opaque Next QEMUTimer ……….

………..

……….

…… …… …… Dummy memory region

QEMUTimer call back Rtc_update_second(opaque) Second_timer RTCState Active_timer Cb opaque Next QEMUTimer Rtc_update_second Cb opaque Next QEMUTimer ……….

………..

……….

…… …… …… Dummy memory region

Next Main_loop_wait Second_timer RTCState Active_timer Cb opaque Next QEMUTimer Rtc_update_second Cb opaque Next QEMUTimer ……….

………..

……….

…… …… …… Dummy memory region

 1. Inject a Controlled QEMUTimer into qemu-kvm  2. Eject ISA bridge  3. Force an allocation into the freed RTCState, with second timer point to our fake QEMUTimer

  The guest RAM is backed by mmap()ed region inside the qemu-kvm process.

Allocate in the guest RAM and calculate the the host address by the following formula:   Hva = physmem_base + gpa gpa = page_traslation(gva) <= linux kernel project 1      Gva = guest virtual address Gpa = guest physical address Hva = host virtual address Physmem_base = mmap start region For now assume we know physmem_base(no aslr)

 Force qemu to call malloc  Utilize the qemu-kvm user-mode networking stack     Qemu-kvm implement DHCP server, DNS server and NAT gateway in user-mode networking stack User-mode stack normally handle packets synchronously To prevent recursion, if a second packet is emitted while handling a first packet, the second packet is queued using malloc.

ICMP ping.

 1. Allocate a Fake QEMUTimer  2. calculate the Fake timer address  3. unplug ISA bridge  4. ping the gateway containing pointers to your fake timer.

Allocate Fake QMEUTimer ……… Second_timer RTCState Active_timer Cb opaque Next QEMUTimer Rtc_update_second ……….

………..

……….

Cb opaque Next QEMUTimer Evil function (Shellcode) Cb opaque Next Fake QEMUTimer ……….

………..

……….

Unplug ISA bridge Ping the gateway Second_timer RTCState Active_timer Cb opaque Next QEMUTimer Rtc_update_second ……….

………..

……….

Cb opaque Next Cb opaque Next QEMUTimer Evil function (Shellcode) Fake QEMUTimer ……….

………..

……….

First Main_loop_wait Second_timer RTCState Active_timer Cb opaque Next QEMUTimer Rtc_update_second ……….

………..

……….

Cb opaque Next Cb opaque Next QEMUTimer Evil function (Shellcode) Fake QEMUTimer ……….

………..

……….

Second Main_loop_wait Second_timer RTCState Active_timer Cb opaque Next QEMUTimer Cb opaque Next Evil function (Shellcode) Fake QEMUTimer ……….

………..

……….

 1. we have %rip control  2. Where is the Evil function   Inject shellcode to host virtual memory Host virtual memory has page protection(NX bit)  3. Solutions:  A. ROP  B. something clever

 1. we can control the QEMUTimer data structure.

 2. create multiple QEMUTimer object and chain them together.

QEMUTimer Cb opaque Next Cb opaque Next Cb opaque Next ……….

………..

……….

F1(X) ……….

………..

……….

F2(Y) ……….

………..

……….

F3(Z)

 We now have multiple on argument function calls.

 We want to do more arguments function calls. For example, mprotect take three arguments.

 Arguments of types Bool, char, short, int, long, long long, and pointers are in the INTEGER class.

 If the class is INTEGER , the next available register of the sequence %rdi, %rsi, %rdx, %rcx, %r8 and %r9 is used  More detailed check out the reference 7

 Suppose we can find a function with the following property.

Set_rsi: movl %rdi, %rsi; return  Let f1(x) be set_rsi  %rsi register will not be modified during qemu_run_timer() in most qemu version.

 Therefore, F2(y) becomes F2(y,x) since we control the %rsi from f1(x)

Void cpu_outl(pio_addr_t addr, uint32_t val) { ioport_write(2, addr, val); }  This function will copy its first parameter to the second parameter of ioport_write  %rdi is the first parameter and %rsi is the second parameter. Therefore we get a function with the previous property. (Movl %rdi, %rsi)

  Mprotect prototype:  Mprotect(addr, lens, prot)  PROT_EXEC = 4 Use the following function   We control the “opaque/ioport” by QEMUTimer and control the “addr” by set_rsi() Seems like we control everything in this function

 Allocate a fake IORangeOps with  fake_ops->read = mprotect  Allocate a page-aligned IORange with  Fake_ioport->ops = fake_ops  Fake_ioport->base = -PAGE_SIZE  Copy shellcode following the IORange  Construct a timer chain that calls  Cpu_outl(0, *)  Ioport_readl_thunk(fake_ioport, 0)  Fake_ioport + 1

QEMUTimer Chain Cb opaque Next Cb opaque Next Cb opaque Next mprotect …….

…….

…….

…….

…….

…….

Cpu_outl …….

…….

…….

Ioport_readl_thunk ops Fill with shellcode IORange (PAGE_ALIGN) Read IORangeOps

 The base address of the qemu-kvm binary, to find code address(such as mprotect ….)  Physmem_base, the address of the physical memory mapping inside kvm  Solutions:  Find an information leak  Assume non-PIE. Every major distribution compile qemu kvm as non position independent executable.  How about physmem_base

 Emulated IO ports 0x510 (address) and 0x511 (data)  Used to communicate various tables to the qemu BIOS (e820 map, ACPI tables, etc)  Also provides support for exporting writable tables to the BIOS  However, fw_cfg_write doesn’t check if the target table is supposed to be writable

 Several fw_cfg areas are backed by statically-allocated buffers.

 Net result: nearly 500 writable bytes inside static variables.

 Mprotect needs a page-aligned address, so these aren’t suitable for our shellcode  We can construct fake timer chains in this space to build a read4() primitive. (Create Information Leak)  Follow pointers from static variables to find physmem_base  Proceed as before

 Sandbox qemu-kvm  Build qemu-kvm as PIE  Lazily mmap/mprotect guest RAM  XOR-encode key function pointers  More auditing and fuzzing of qemu-kvm

 VM breakouts aren’t magic  Hypervisors are just as vulnerable as anything else  Device drivers are the weak spot.

        [1] http://qemu.weilnetz.de/qemu-tech.html

[2] http://qemu.weilnetz.de/doxygen/structRTCState.html

[3] http://www.linuxinsight.com/files/kvm_whitepaper.pdf

[4] https://www.ibm.com/developerworks/cn/linux/l-virtio/ [5] http://smilejay.com/kvm_theory_practice/ [6] http://www.linux-kvm.org/page/Documents [7] http://www.cs.tufts.edu/comp/40/readings/amd64-abi.pdf

[8]http://linuxfromscratch.xtra-net.org/hlfs/view/unstable/glibc 2.4/chapter02/pie.html

 qemu source code