Transcript ppt
Hash in a Flash:
Hash Tables for Solid State Devices
Tyler Clemons*
Charu Aggarwal†
S M Faisal*
Shirish Tatikonda ‡
Srinivasan Parthasarathy*
*The Ohio State University. Columbus, Ohio
‡IBM Almaden Research Center. San Jose, California
†IBM T.J. Watson Center. Yorktown Heights, New York
Motivation and Introduction
2
Data is growing at a fast pace
Scientific
data, Twitter, Facebook, Wikipedia, WWW
Traditional Data Mining and IR algorithms require
random out-of-core data access
Often
Data is too large to fit in memory thus frequent
random disk access is expected
11/30/2007
Motivation and Introduction (2)
3
Traditional Hard Disk Drives can keep pace with
storage requirements but NOT random access
workloads
Moving
parts are physical limitations
Also contribute to rising energy consumption
Flash Devices have emerged as an alternative
Lack
moving parts
Faster
Random Access
Lower
energy usage
But they have several drawbacks….
11/30/2007
Flash Devices
4
Limited Lifetime
Supports
Also
limited number of rewrites
known as erasures or cleans.
Impacts response time
These
are incurred at the block level.
Blocks consist of pages. Pages (4kb-8kb) are the smallest
I/O unit
Poor
Random Write Performance
Incurs
many erasures and lowers lifetime
Efficient
Lowers
sequential write performance
erasures and increases lifetime
11/30/2007
On Flash Devices, DM, and IR
5
Flash Devices provide fast random read access
Common for many IR and DM algorithms and data structures
Hash Tables are common in both DM and IR
Useful for associating keys and values
Counting Hash Tables associate keys with a frequency
This
is found in many algorithms that track word frequency
We will examine one such algorithm common in both DM and IR
(TF-IDF)
They exhibit random access for writes and reads
Random
Writes are an issue for Flash Devices
11/30/2007
Hash Tables for Flash Devices must:
6
Reduce erasures/cleans and Reduce random writes to SSD
Batch updates
Maintain reasonable query times
Data
Structure must not incur unreasonable disk
overhead
Nor should it require unreasonable memory restraints
11/30/2007
Our approach
7
Our approach makes two key contributions:
Optimize our designs for a counting hash table.
This has not been done by the previous approaches
(A. Anand ’10), (D. Andersen ’09) , (B. Debnath, ’10) , (D.
Zelinalipour-Yatzi ’05)
The Primary Hash Table resides on the Flash Device.
Many designs use the SSD as a cache to the HDD
(D. Andersen ’09) (B. Debnath, ’10)
Anticipate data sets with high random access and throughout
requirements
11/30/2007
Hash Tables for Flash Devices must:
8
Reduce erasures/cleans and Reduce random writes to SSD
Batch updates
Create
In Memory Structure
Target semi-random updates or block level updates
Maintain reasonable query times
Data
Structure must not incur unreasonable disk
overhead
Carefully
Nor
index keys on disk
should it require unreasonable memory restraints
Memory
requirement is at most fixed parameter
Memory Bounded(MB) Buffering
9
(64,2)
(12,7)
Updates are quickly combined in memory
Updates are Hashed into a bucket in the RAM
When full, batch updates to
corresponding Disk Buckets
If Disk Buckets are full, invoke overflow region
Memory Bounded(MB) Buffering
10
Two way Hash
On-Disk Closed Hash
Table
Hash
at page level
Update via block level
Linear Probing for collisions
In memory Open Hash
table
Hash
at block level
Combine updates
Flush with merge()
operation
Overflow segment
Closed
Hash table excess
11/30/2007
Can we improve MB?
11
Reduces number of write operations to flash device
Query times are reasonable
Batch Updates only when memory buffer is full
Updates are semi-random
(Key,Value) changes are maintained in memory
Memory buffer search is fast
Relatively fast SSD random access and linear probing (See Paper)
Prefetch pages
MB has disadvantages
Sequential Page Level operations are preferred
Fewer block updates
Limited by the amount of available memory
Think large disk datasets.
Updates may be numerous
11/30/2007
Introduce an On Disk Buffer
12
Batch updates from memory to disk are page level
Reduce expensive block level writes (time and cleans)
Increase Sequential writes
Increase buffering capability
Reduce expensive non semi-random Block Updates
May decrease cleans
Search space increases during queries
Incurred only if inserting and reading concurrently
However, less erasure time will decrease latency
11/30/2007
On Disk Buffering
13
Change Segment (CS)
stage() operation
Flushes memory to CS
Fast Page Level Operations
merge() operation
Sequential Log Structure
sequential writes
Invoked when CS is full
Combines CS with Data Segment
Less frequent than stage()
What is the structure of the CS?
11/30/2007
Change Segment Structure v1
14
Buckets are assigned specific Change Segment Buckets.
Change Segment Buckets are shared by multiple RAM buffer buckets.
Memory Disk Bounded Buffer (MDB)
15
Associate a CS block to k
data blocks
Semi random writes
Only merge() full CS
blocks
Frequently updated blocks
may incur numerous (k-1)
merge() operations
Query times incur an
additional block read
Packed with unwanted data
11/30/2007
Change Segment Structure v2
16
As buckets are flushed, they are written sequentially to the change segment
one page at a time
MDB-L
17
No Partitions in CS
Allows frequently updated
blocks to have maximum
space
merge() all blocks when CS is
full
Potentially expensive
Very infrequent
Queries are supported by
pointers
As blocks are staged onto the
CS, their pages are recorded
for later retrieval
Prefetch
11/30/2007
Expectations
18
MB will incur more cleans than MDB or MDBL
Frequent
MDB and MDBL will incur slightly higher query times
Addition
merge() operation will incur block erasure
of CS
MDB and MDBL will have superior I/O performance
Most
operations are page level
Less erasures lower latency
11/30/2007
Experimental Setup (Application)
19
TF-IDF
Term
Frequency-Inverse Document Frequency
Word importance is highest for infrequent words
Requires a counting hash table
Useful in many data mining and IR applications
(document classification and search)
11/30/2007
Experimental Setup (DataSets)
20
100,000 Random Wikipedia articles
136M
keywords
9.7M entries
MemeTracker (Aug 2009 dump)
402M
total entries
17M unique
11/30/2007
Experimental Setup (Method)
21
1M random queries were issued during insertion
phase
10
random workloads, queries need not be in the table
Measure Query Performance, I/O time, and Cleans
Used three SSD configurations
One
Single Level Cell (SLC) vs two Multi Level Cell
(MLC) configurations
MLC
is more popular. Cheaper per GB but less lifetime
SLC have lower internal error rate, and faster response
rates (See Paper for specific configurations)
DiskSim and Microsoft SSD Plugin
Used
for benchmarking and fine-tuning our SSD
Results (AVERAGE Query Time)
22
By varying the on memory buffer, as a percentage of the data segment,
the average query time only reduces by fractions of a second. This
suggest the majority of the query time is incurred by the disk.
11/30/2007
Results (AVERAGE Query Time)
23
By varying the on disk buffer, as a percentage of the data segment,
the average query time decreases substantiall for MDBL
This reduction is seen in both datasets.
11/30/2007
MDB requires block reads in the CS.
Results (AVERAGE Query Time)
24
Using the Wiki dataset, we compared SLC with MLC
We experience consistent performance
11/30/2007
Results(AVERAGE I/O)
25
In this experiment, we set the in memory buffer to 5% and the CS to 12.5%
of the primary hash table size
Simulation time is highest for MB because of the block erasures (next slide).
MDBL is faster than MDB because of the increased page level operations
Results(Cleans/Erasures)
26
Cleans are extremely low for both MDB and MDBL relative to MB
This is caused by the page level sequential operations
Queries are effected by cleans because the SSD must allocate
resources to cleaning moving
11/30/2007
Discussion and Conclusion
27
Flash Devices are gaining popularity
Low Latency, High Random Read Performance, Low Energy
Limited lifetime, poor random write performance
Hash tables are useful data structures in many data
mining and IR algorithms
They exhibit random write patterns
Challenging
for Flash Devices
We have demonstrated that a proper Hash table for
Flash Devices will have
In-memory buffer for batch memorydisk updates
On disk data buffer with page level operations
11/30/2007
Future work
28
Our current designs rely on hash functions that use
the mod operator
Extendible
Hashing
Checkpoint methods for crash recovery
Examine on Real SSD
Disksim
is great for finetuning and examining statistics
11/30/2007
Questions?