Transcript Slide 1
Geiger: Monitoring the Buffer Cache in a Virtual Machine Environment Stephen T. Jones Andrea C. Arpaci-Dusseau Remzi H. Arpaci-Dusseau Department of Computer Sciences 1 Buffer Cache • In modern OSes, file system buffer and virtual memory system are unified – When first access a file, data is buffered in a memory page – When under memory pressure, a page will be evicted out •If the page is dirty, write to swap space or file system first •Then the page can be reused •Later, if the data is needed, a page fault occurs – Allocate a free page, reload the data from disk to the page 2 Useful Information About Buffer Cache • If VMM knows events of eviction/promotion – Tell if guest OS is thrashing and how much more memory allocation is needed to prevent it – Guide eviction-based cache placement •exclusive cache: when hits, data item is removed •A transparent secondary cache maybe desirable – E.g. a 32-bit OS running on a host with 16 GB mem •Why exclusive cache works? – Normally, when a page is read from disk, the OS will not read it again without evicting it first – Increase cache utilization 3 Services in a VMM • VMM layer is attractive development target – Security (isolation from OS and apps) – Portability (transparent to OS) • Our target services – VMM-driven eviction-based cache placement •Increase hit-ratio for remote storage caches •Transparent to guest OS – Working set size estimation for thrashing VMs •Complement ESX server technique 4 VMM Services Need Information • Information about guest operating systems • For our target services – Information about OS buffer cache • Hidden from the VMM – Layered design approach – Narrow interface (virtual architecture) 5 Geiger Monitors Buffer Cache • Virtual machine monitor extension • Implicitly observes buffer cache events – Uses only information intrinsically available to VMM – Explicit approach possible, but drawbacks • No guest OS modifications required • Applicable to closed and legacy OS • Accurate (usually less than 5% error) • Low cost (usually less than 3% overhead) • Enables service implementation in VMM 6 Outline • Geiger approach • New Geiger techniques • Evaluation • Application 7 Buffer Cache Events • Cache promotion – Disk block inserted into buffer cache • Cache demotion – Disk block removed from cache 8 Detecting Promotion • Block read • Block write • Disk reads and writes visible to the VMM • Associated Disk Location (ADL) A ADL User process B C A Buffer cache Disk 9 Detecting Demotion • Detect when a page is removed from the cache • VMM cannot observe page free directly • Instead, look for page reuse • If cache page data is reused, the page was logically freed in the interim • Reuse inconsistent with ADL -> eviction ADL A B C A B Buffer cache Disk 10 Read / Write Evictions – Read eviction •A non-free page is reused for reading from a different disk location •E.g. read a large file/memory space – Write eviction •A non-free page is reused for writing. When it is written-back, the reuse (eviction) is detected •Lag 11 Existing Techniques • Promotion via reads and writes • Demotion via reads and writes • Chen et al. -- USENIX 2003 – Within OS (pseudo device driver) • Initial basis for Geiger 12 Outline • Geiger approach • New Geiger techniques • Evaluation • Application 13 New Geiger Techniques • Other ways buffer cache pages are evicted • Unified buffer cache/virtual memory system • Non-I/O allocations cause eviction • Two new eviction detection heuristics – Copy-on-write – Anonymous allocation 14 When Eviction Happens? • Explicit Eviction – Read eviction – Write eviction • Implicit Eviction – A non-free page is reused without disk writing or reading •Page allocation or Copy-on-Write – E.g. when a process requests for a new page, a non-dirty page is allocated it 15 Detecting Allocation Eviction A B z A A’ z B User process C Disk • • • • • R Buffer cache Page not-present fault Page allocation (possible reuse) New writable mapping Detect eviction Invalidate ADL 16 Filesystem Issues • Filesystem features cause false positives • Filesystem blocks can be deleted – Leads to dangling ADL and spurious eviction • Journaling causes aliasing – Same cache page written to both the journal and filesystem locations – Interferes with write-eviction heuristic 17 Geiger Is Filesystem Aware • Uses static filesystem info – Journal location and size – Block allocation bitmaps • Ignore writes to the journal • Track allocation bitmap updates and invalidate ADLs when blocks deallocated • Significantly reduces Geiger false positives 18 Block Liveness • Reusing a free page is not an eviction – Geiger infers the liveness of a page from the liveness of block • A block dies – A file is deleted or truncated – A process with virtual memory usage terminates 19 Block Liveness for Files • Observing the writes to superblock + :They are at some special disk location – : OS caches them in memory and sync to disk every 30 secs or more • Pages used to cache them are marked read-only – Write attempts will cause page-faults – Invalidate affected ADLs 20 Block Liveness for Swap Space • No on-disk structure to track block usage – When a disk block is written from a different memory page, the original block is considered to be “dead” – Maintain a reverse mapping from between blocks and ADLs – Invalidate ADLs when blocks are overwritten – If no overwritting, dead blocks can’t be detected •Leads to as much as 37% false positive eviction 21 Outline • Geiger approach • New Geiger techniques • Evaluation • Application 22 Evaluation Goals • Measure Geiger accuracy – Missed evictions (false negatives) – Spurious evictions (false positives) • Measure Geiger timeliness – Lag between actual event and detection 23 Experimental Environment • Xen 2.0.7 VMM [Barham et al., SOSP03] – Extensions to observe page faults, page table updates, and I/O requests/completions • Linux 2.4 and 2.6 guests • Microbenchmarks – Isolate specific eviction types – Read, write, COW, allocation • Application benchmarks – Dbench, Mogrify, TPC-W, SPC disk trace 24 Eviction Detection Accuracy Workload False Neg % False Pos % Read Evict 0.96% 0.58% Write Evict 1.68% 0.03% COW Evict 2.47% 1.45% Alloc Evict 0.17% 0.17% 25 Eviction Detection Lag ~3s 26 Application Accuracy Workload Geiger Opt False Neg% False Pos% Dbench w/o block liveness w/ block liveness w/o block liveness w/ block liveness 1.10% 30.23% 2.30% 5.72% 0.05% 22.99% 0.65% 2.46% TPC-W 0.14% 3.12% SPC Web2 2.24% 0.32% Dbench Mogrify Mogrify 27 Outline • Geiger approach • New Geiger techniques • Evaluation • Application – Eviction-based cache placement 28 Application: Eviction-based Cache Placement • Disk cache utilization is critical to performance • Storage servers have large caches • Demand-based placement => poor utilization • Increase cache utilization via exclusivity • Use client cache eviction as placement hint [Chen et al., USENIX ’03, Wong and Wilkes, USENIX ‘02] • Use VMM-based, implicit eviction information to inform a remote storage cache • No client or OS storage interfaces change 29 Cache Placement Results 13% 51% • Geiger outperforms demand placement • Mogrify: buffer misses too many evictions • Mogrify: false positives are fortuitous • Dbench: Lag causes OS to outperform Geiger 30 Outline • Geiger approach • New Geiger techniques • Evaluation • Application – Eviction-based cache placement – Working set size estimator 31 LRU Miss Ratio Curve m kln m a n klcd m b n cd l m n n g e f h l m kl m l m cb e fd fg e h ig ijh jki klj m lk m n cn kn c lck m n n n m a m cklb n cka n l m ck m ln kln c m jckln ij i g h h fg e f d e cd b a 1 14 0 0 0 0 0 4 0 0 0 0 0 0 0 1 2 3 1 0 0 5 LRU Queue Pages in LRU order Hit Histogram Associated with each LRU position Fault Curve faults faulti hist i 1 1 i 1 4 11 14 pages 32 Application: Working Set Size Estimator • MemRx: • Observe evictions/reloads • Compute miss ratio curve WSS = current memory allocation + LRU estimation Only works when WSS > current memory size 33 Estimation Results: Microbenchmarks Virtual Machine is configured with 128 MB memory Each benchmark accesses 256 MB file/memory FS: file access VM: memory access 34 Estimation Results: Applications 35 Summary • System services in a VMM • Need information about the guest OS • Implicit information about the buffer cache – No guest OS modification – Accurate – Low overhead • Build services and optimizations in a VMM – Eviction-based cache placement – Working set size estimation 36 Computer Sciences Department Advanced Systems Laboratory http://cs.wisc.edu/adsl 37