Transcript Slide 1

Geiger: Monitoring the Buffer Cache in a
Virtual Machine Environment
Stephen T. Jones
Andrea C. Arpaci-Dusseau
Remzi H. Arpaci-Dusseau
Department of Computer Sciences
1
Buffer Cache
• In modern OSes, file system buffer and
virtual memory system are unified
– When first access a file, data is buffered in a
memory page
– When under memory pressure, a page will be
evicted out
•If the page is dirty, write to swap space or file
system first
•Then the page can be reused
•Later, if the data is needed, a page fault occurs
– Allocate a free page, reload the data from disk to the page
2
Useful Information About Buffer
Cache
• If VMM knows events of eviction/promotion
– Tell if guest OS is thrashing and how much
more memory allocation is needed to prevent it
– Guide eviction-based cache placement
•exclusive cache: when hits, data item is removed
•A transparent secondary cache maybe desirable
– E.g. a 32-bit OS running on a host with 16 GB mem
•Why exclusive cache works?
– Normally, when a page is read from disk, the OS will not
read it again without evicting it first
– Increase cache utilization
3
Services in a VMM
• VMM layer is attractive development target
– Security (isolation from OS and apps)
– Portability (transparent to OS)
• Our target services
– VMM-driven eviction-based cache placement
•Increase hit-ratio for remote storage caches
•Transparent to guest OS
– Working set size estimation for thrashing VMs
•Complement ESX server technique
4
VMM Services Need Information
• Information about guest operating systems
• For our target services
– Information about OS buffer cache
• Hidden from the VMM
– Layered design approach
– Narrow interface (virtual architecture)
5
Geiger Monitors Buffer Cache
• Virtual machine monitor extension
• Implicitly observes buffer cache events
– Uses only information intrinsically available to VMM
– Explicit approach possible, but drawbacks
• No guest OS modifications required
• Applicable to closed and legacy OS
• Accurate (usually less than 5% error)
• Low cost (usually less than 3% overhead)
• Enables service implementation in VMM
6
Outline
• Geiger approach
• New Geiger techniques
• Evaluation
• Application
7
Buffer Cache Events
• Cache promotion
– Disk block inserted into buffer cache
• Cache demotion
– Disk block removed from cache
8
Detecting Promotion
• Block read
• Block write
• Disk reads and writes visible to the VMM
• Associated Disk Location (ADL)
A
ADL
User process
B
C
A
Buffer cache
Disk
9
Detecting Demotion
• Detect when a page is removed from the cache
• VMM cannot observe page free directly
• Instead, look for page reuse
• If cache page data is reused, the page was
logically freed in the interim
• Reuse inconsistent with ADL -> eviction
ADL
A
B
C
A
B
Buffer cache
Disk
10
Read / Write Evictions
– Read eviction
•A non-free page is reused for reading from a
different disk location
•E.g. read a large file/memory space
– Write eviction
•A non-free page is reused for writing. When it is
written-back, the reuse (eviction) is detected
•Lag
11
Existing Techniques
• Promotion via reads and writes
• Demotion via reads and writes
• Chen et al. -- USENIX 2003
– Within OS (pseudo device driver)
• Initial basis for Geiger
12
Outline
• Geiger approach
• New Geiger techniques
• Evaluation
• Application
13
New Geiger Techniques
• Other ways buffer cache pages are evicted
• Unified buffer cache/virtual memory system
• Non-I/O allocations cause eviction
• Two new eviction detection heuristics
– Copy-on-write
– Anonymous allocation
14
When Eviction Happens?
• Explicit Eviction
– Read eviction
– Write eviction
• Implicit Eviction
– A non-free page is reused without disk writing
or reading
•Page allocation or Copy-on-Write
– E.g. when a process requests for a new page, a non-dirty
page is allocated it
15
Detecting Allocation Eviction
A
B
z
A
A’
z
B
User process
C
Disk
•
•
•
•
•
R
Buffer cache
Page not-present fault
Page allocation (possible reuse)
New writable mapping
Detect eviction
Invalidate ADL
16
Filesystem Issues
• Filesystem features cause false positives
• Filesystem blocks can be deleted
– Leads to dangling ADL and spurious eviction
• Journaling causes aliasing
– Same cache page written to both the journal
and filesystem locations
– Interferes with write-eviction heuristic
17
Geiger Is Filesystem Aware
• Uses static filesystem info
– Journal location and size
– Block allocation bitmaps
• Ignore writes to the journal
• Track allocation bitmap updates and
invalidate ADLs when blocks deallocated
• Significantly reduces Geiger false positives
18
Block Liveness
• Reusing a free page is not an eviction
– Geiger infers the liveness of a page from the
liveness of block
• A block dies
– A file is deleted or truncated
– A process with virtual memory usage
terminates
19
Block Liveness for Files
• Observing the writes to superblock
+ :They are at some special disk location
– : OS caches them in memory and sync to disk
every 30 secs or more
• Pages used to cache them are marked
read-only
– Write attempts will cause page-faults
– Invalidate affected ADLs
20
Block Liveness for Swap Space
• No on-disk structure to track block usage
– When a disk block is written from a different
memory page, the original block is considered
to be “dead”
– Maintain a reverse mapping from between
blocks and ADLs
– Invalidate ADLs when blocks are overwritten
– If no overwritting, dead blocks can’t be detected
•Leads to as much as 37% false positive eviction
21
Outline
• Geiger approach
• New Geiger techniques
• Evaluation
• Application
22
Evaluation Goals
• Measure Geiger accuracy
– Missed evictions (false negatives)
– Spurious evictions (false positives)
• Measure Geiger timeliness
– Lag between actual event and detection
23
Experimental Environment
• Xen 2.0.7 VMM [Barham et al., SOSP03]
– Extensions to observe page faults, page table
updates, and I/O requests/completions
• Linux 2.4 and 2.6 guests
• Microbenchmarks
– Isolate specific eviction types
– Read, write, COW, allocation
• Application benchmarks
– Dbench, Mogrify, TPC-W, SPC disk trace
24
Eviction Detection Accuracy
Workload
False Neg %
False Pos %
Read Evict
0.96%
0.58%
Write Evict
1.68%
0.03%
COW Evict
2.47%
1.45%
Alloc Evict
0.17%
0.17%
25
Eviction Detection Lag
~3s
26
Application Accuracy
Workload
Geiger Opt
False Neg%
False Pos%
Dbench
w/o block
liveness
w/ block
liveness
w/o block
liveness
w/ block
liveness
1.10%
30.23%
2.30%
5.72%
0.05%
22.99%
0.65%
2.46%
TPC-W
0.14%
3.12%
SPC Web2
2.24%
0.32%
Dbench
Mogrify
Mogrify
27
Outline
• Geiger approach
• New Geiger techniques
• Evaluation
• Application
– Eviction-based cache placement
28
Application:
Eviction-based Cache Placement
• Disk cache utilization is critical to performance
• Storage servers have large caches
• Demand-based placement => poor utilization
• Increase cache utilization via exclusivity
• Use client cache eviction as placement hint
[Chen et al., USENIX ’03, Wong and Wilkes, USENIX ‘02]
• Use VMM-based, implicit eviction information to
inform a remote storage cache
• No client or OS storage interfaces change
29
Cache Placement Results
13%
51%
• Geiger outperforms demand placement
• Mogrify: buffer misses too many evictions
• Mogrify: false positives are fortuitous
• Dbench: Lag causes OS to outperform Geiger
30
Outline
• Geiger approach
• New Geiger techniques
• Evaluation
• Application
– Eviction-based cache placement
– Working set size estimator
31
LRU Miss Ratio Curve
m
kln m
a
n
klcd m
b
n
cd
l m
n
n g
e
f h
l m
kl m
l m
cb
e
fd fg
e
h
ig ijh jki klj m
lk m
n
cn kn
c lck m
n
n n
m
a m
cklb
n
cka
n
l m
ck m
ln
kln
c m
jckln ij
i g
h
h fg e
f d
e cd b a
1
14
0 0 0 0 0
4 0 0 0 0 0 0 0
1
2
3
1 0 0
5
LRU Queue
Pages in LRU order
Hit Histogram
Associated with each
LRU position
Fault Curve
faults

faulti   hist i  1
1
i
1
4
11
14
pages
32
Application:
Working Set Size Estimator
• MemRx:
• Observe evictions/reloads
• Compute miss ratio curve
WSS = current
memory allocation
+ LRU estimation
Only works when
WSS > current
memory size
33
Estimation Results:
Microbenchmarks
Virtual Machine is
configured with
128 MB memory
Each benchmark
accesses 256 MB
file/memory
FS: file access
VM: memory
access
34
Estimation Results:
Applications
35
Summary
• System services in a VMM
• Need information about the guest OS
• Implicit information about the buffer cache
– No guest OS modification
– Accurate
– Low overhead
• Build services and optimizations in a VMM
– Eviction-based cache placement
– Working set size estimation
36
Computer Sciences Department
Advanced Systems Laboratory
http://cs.wisc.edu/adsl
37