X-RAY: A Non-Invasive Exclusive Caching Mechanism for RAIDs Lakshmi N. Bairavasundaram Muthian Sivathanu Andrea C.

Download Report

Transcript X-RAY: A Non-Invasive Exclusive Caching Mechanism for RAIDs Lakshmi N. Bairavasundaram Muthian Sivathanu Andrea C.

X-RAY: A Non-Invasive Exclusive
Caching Mechanism for RAIDs
Lakshmi N. Bairavasundaram
Muthian Sivathanu
Andrea C. Arpaci-Dusseau
Remzi H. Arpaci-Dusseau
ADvanced Systems Laboratory
Computer Sciences Department
University of Wisconsin – Madison
Introduction

Caching in modern systems



Level 1: File system (FS) cache




Multiple levels
Storage: 2-level hierarchy


Application
File system cache
Software-managed
Main memory of host/client
LRU-like cache replacement
Level 2: RAID cache

Host
Firmware-managed
Memory inside RAID system
Usually LRU replacement
RAID
RAID cache
.......
Introduction – contd.

LRU


Read Block no. 10
Replace LRU block
Cache placement on read
LRU
39
23
23
…….. ……..
45
45
10
MRU
Read Block no. 10
Introduction – contd.

LRU



Replace LRU block
Cache placement on read
Read Block no. 10
FS Cache
LRU
…. ……..
10 11
10
12 MRU
2 levels of LRU

Redundant contents
Read Block no. 10
RAID Cache
LRU
LRU
….
10 11
……..
12
10
MRU
MRU
Read Block no. 10
Introduction – contd.

LRU



LRU
….
10 11
….
10
12 MRU
2 levels of LRU


Cache placement on read
Replace LRU block
FS Cache
Redundant contents
Goal:

Exclusive caching
RAID Cache
LRU
11
12
MRU
Improved RAID Caching

Multi-Queue (Zhou et al. 2001)



Add frequency component to cache policy
Not strictly exclusive!
DEMOTE (Wong and Wilkes 2002)




Change interface to disk
File system issues “cache place” command
Has perfect information and hence perfectly exclusive caches
Interface changes – difficult to deploy
Ideal RAID Cache

Exclusive caching



File system and RAID caches should have different contents
Global LRU

Known to work well

RAID cache should be a victim cache
No interface changes
….
RAID Cache
LRU
FS Cache
MRU
Victim Block
……
Block Read
X-RAY

Observes disk traffic



Host
Reads and writes to data and metadata
Builds a model of the FS cache

Uses semantic knowledge

Predicts size and contents of FS cache
File system cache
Identifies set of exclusive blocks


Reads blocks from disk into cache

Result

RAID
Recent victims of the FS cache
A nearly exclusive cache without
interface changes
X-RAY
Model of FS cache
RAID cache
Talk Outline

Introduction

File Systems

Information and Inferences

X-RAY Cache Design

Results

Conclusion
File System Operation

Applications perform file reads and writes

File system (Unix)

Translates file accesses to disk block requests

Metadata

To maintain application data on disk and manage disk blocks

Periodically written to disk

Examples: inodes, bitmap blocks
File System Operation

Inode


Pointers to data blocks
File access information
Latest access time
File
Inode
Pointers to data blocks
Data Blocks
File System Operation


File access

Use inode to obtain pointers to disk data blocks

Read corresponding blocks from disk if they are not in FS cache

Update the access time information in inode
Metadata updates

Periodically check for “dirty” inodes and write to disk
The Problem

To observe disk traffic and infer
the contents of FS cache

Why difficult?

FS cache size changes over time

Shares main memory with virtual
memory system
The Problem

To observe disk traffic and infer
the contents of FS cache

Why difficult?

FS cache size changes over time

Disk cannot observe all FS-level
accesses
12
11
Read block: 10
FS Cache
LRU
12
11
10
MRU
Disk Read
RAID
FS Cache Model
10
11
12
LRU
MRU
The Problem

To observe disk traffic and infer
the contents of FS cache

Why difficult?

FS cache size changes over time

Disk cannot observe all FS-level
accesses
Read block:
10
13
FS Cache
LRU
10
11
12
MRU
Disk Read
RAID
FS Cache Model
10
LRU
11
12
MRU
The Problem

To observe disk traffic and infer
the contents of FS cache

Why difficult?

FS cache size changes over time

Disk cannot observe all FS-level
accesses
Read block:
FS Cache
LRU
12
10
13
MRU
RAID
FS Cache Model
11
LRU
12
13
MRU
The Problem


To observe disk traffic and infer
the contents of FS cache
Why difficult?


FS cache size changes over time
Disk cannot observe all FS-level
accesses
Read block:
FS Cache
LRU
12
10
13
MRU
RAID

Key observation


We need information about
accesses that hit in FS cache
File system maintains access
information in inodes
FS Cache Model
11
LRU
12
13
MRU
Talk Outline

Introduction
File Systems
Information and Inferences

X-RAY Cache Design

Results

Conclusion


Information



Obtain information from observing disk traffic
Knowledge of file system structures and operations
 File system maintains time of last access in inodes
 Periodic inode writes
 Assuming whole file access, all blocks are in FS cache
Assume file system cache policy is LRU
Inferences

Read for data block
 Block will be placed in file system cache (MRU block)

Read for previously read data block
 Block became victim in file system cache
 Blocks with an earlier access time should also be victims

Inode write: new access time , no disk read observed
 All blocks belonging to file are in FS cache
 Other blocks with later access time should also be present
Talk Outline

Introduction
File Systems
Information and Inferences

X-RAY Cache Design

Results

Conclusion


Design



Block number
Recency list (R-list)
 List of data blocks ordered
by access time
LRU A, 1
Cache Begin (CB) pointer
 Divides R-list into inclusive
and exclusive regions
RAID Cache contents
 Subset of blocks in exclusive
region
Access time
B, 1
Exclusive region
Blocks the RAID
should cache
C, 2
D, 3
CB
E, 3
F, 5
Inclusive region
Blocks expected to be
in FS cache
MRU
Disk Read
Read Block ‘D’ ; time = 6
LRU
A,1
B,1
Exclusive region
C,2
CB
D,3
E,3
F,4
Inclusive region
MRU
Disk Read
Read Block ‘D’ ; time = 6
LRU
A,1
B,1
C,2
D,3
E,3
Exclusive region
Inclusive region
CB
F,4
MRU
Disk Read
Read Block ‘D’ ; time = 6
LRU
A,1
B,1
C,2
E,3
Exclusive region
F,4
Inclusive region
CB
D,6
MRU
Inode Write – Access time change
Inode “23” : access time = 6
Semantic knowledge
Inode “23” == blocks D & E
LRU
A,1
B,1
Exclusive region
Blocks D, E : access time = 6
C,2
D,3
E,4
F,5
G,7
Inclusive region
CB
MRU
Inode Write – Access time change
Inode “23” : access time = 6
LRU
A,1
B,1
Blocks D, E : access time = 6
C,2
D,3
Exclusive region
E,4
F,5
Inclusive region
CB
G,7
MRU
Inode Write – Access time change
Inode “23” : access time = 6
Blocks D, E : access time = 6
D,6
LRU
A,1
B,1
C,2
E,6
F,5
Exclusive region
Inclusive region
CB
G,7
MRU
X-RAY Cache
RAID Cache (size = 2 blocks)
LRU
A,1
B,1
C,2
F,5
Exclusive region
D,6
E,6
Inclusive region
CB

Keep track of additions to window in exclusive region
G,7
MRU
X-RAY Cache
RAID Cache (size = 2 blocks)
LRU
A,1
B,1
C,2
F,5
D,6
Exclusive region
E,6
Inclusive region
CB

Read newly-added blocks from disk

Replace blocks no longer in the window

Additional disk bandwidth

Idle time, extra internal bandwidth, freeblock scheduling
G,7
MRU
Talk Outline





Introduction
File Systems
Information and Inferences
X-RAY Cache Design
Results



Tracking FS Cache Contents
RAID Cache Performance
Conclusion
Results – Tracking

Accurate size and content prediction

Highly responsive to FS cache size changes

Tolerates changes in inode write interval

Partial file reads

X-RAY performs well if percentage of partially accessed files is < 40%
(typical traces have less than 30%)
Results – Cache Performance



Performs better than LRU and
Multi-Queue
Close to DEMOTE, in spite of
imperfect information
Hit rate advantage translates to
lower read latency
Additional Results



File system cache policy is not LRU

Clock, 2Q

X-RAY performs nearly as well as before

It performs better than both LRU and Multi-Queue
Idle time requirements

X-RAY reads blocks into cache only during idle time

It performs well if idle time is greater than one-third of actual idle time
observed in the trace
More in the paper …
Conclusion

Easy deployment is an important goal in developing technology


Higher-level systems maintain various pieces of information
about data they manage


Provide low-level systems with basic semantic knowledge
Semantic intelligence for managing RAID caches



Avoid interface changes – use non-invasive mechanisms
Use access information in metadata to track file system cache contents
and cache exclusive blocks
In spite of imperfect information, X-RAY performs nearly as well as
changing the interface
Semantically-smart Disk Systems

Availability, security and performance improvements
Questions ?
ADvanced Systems Laboratory (ADSL)
Computer Sciences, University of Wisconsin-Madison
http://www.cs.wisc.edu/adsl