Cache-Oblivious Priority Queue and Graph Algorithm

Download Report

Transcript Cache-Oblivious Priority Queue and Graph Algorithm

I/O-Algorithms
Thomas Mølhave
Spring 2012
February 9, 2012
Heavily based on slides by Lars Arge
Massive Data
•
•
•
Pervasive use of computers and sensors
Increased ability to acquire/store/process data
→ Massive data collected everywhere
Society increasingly “data driven”
→ Access/process data anywhere any time
Obviously not only
in sciences:
Nature/Science
special
issues
•• Economist
2/06,9/08,02/10:
2/11
• From 150
Billion
Gigabytesexponentially,
five years ago
• Scientific
data
size growing
to 1200
Billion
while
quality
andtoday
availability improving
• Managing data deluge difficult; doing so
• Paradigm shift: Science will be about mining data
will transform business/public life
Heavily based on slides by Lars Arge
I/O-Algorithms
Example: LIDAR Terrain Data
• Massive (irregular) point sets (~1m resolution)
– Becoming relatively cheap and easy to collect
•
Sub-meter resolution using mobile mapping
Heavily based on slides by Lars Arge
4
I/O-Algorithms
Example: LIDAR Terrain Data
Heavily based on slides by Lars Arge
5
I/O-Algorithms
Example: LIDAR Terrain Data
•
•
~2 million points at 30 meter (<1GB)
~18 billion points at 1 meter (>1TB)
Heavily based on slides by Lars Arge
6
Example: Detailed Data Essential
• Mandø with 2 meter sea-level raise
80 meter terrain model
Heavily based on slides by Lars Arge
2 meter terrain model
I/O-Algorithms
Random Access Machine Model
R
A
M
• Standard theoretical model of computation:
– Infinite memory
– Uniform access cost
• Simple model crucial for success of computer industry
Heavily based on slides by Lars Arge
9
I/O-Algorithms
Hierarchical Memory
L
1
L
2
R
A
M
• Modern machines have complicated memory hierarchy
– Levels get larger and slower further away from CPU
– Data moved between levels using large blocks
Heavily based on slides by Lars Arge
10
I/O-Algorithms
Slow I/O
• Disk access is 106 times slower than main memory access
track
read/write head
read/write
arm
“The difference
in speed
between modern CPU and
disk technologies is
analogous to the difference
in speed in sharpening a
pencil using a sharpener on
magnetic surface
one’s desk or by taking an
airplane to the other side of
– Disk systems try to amortize large access time
largea
the transferring
world and using
contiguous blocks of data (8-16Kbytes)
sharpener on someone else’s
• Important to store/access data to take advantage ofdesk.”
blocks(D.
(locality)
Comer)
Heavily based on slides by Lars Arge
11
I/O-Algorithms
Scalability Problems
• Most programs developed in RAM-model
– Run on large datasets because
OS moves blocks as needed

Scalability problems!
running time
• Moderns OS utilizes sophisticated paging and prefetching strategies
– But if program makes scattered accesses even good OS cannot
take advantage of block access
data size
Heavily based on slides by Lars Arge
12
I/O-Algorithms
External Memory Model
D
Block I/O
N = # of items in the problem instance
B = # of items per disk block
M = # of items that fit in main memory
T = # of items in output
M
I/O: Move block between memory and disk
P
Heavily based on slides by Lars Arge
13
I/O-Algorithms
Scalability Problems: Block Access Matters
• Example: Traversing linked list (List ranking)
– Array size N = 10 elements
– Disk block size B = 2 elements
– Main memory size M = 4 elements (2 blocks)
1 5 2 6 3 8 9 4 7 10
Algorithm 1: N=10 I/Os
1 2 10 9 5 6 3 4 8 7
Algorithm 2: N/B=5 I/Os
• Large difference between N and N/B large since block size is large
– Example: N = 256 x 106, B = 8000 , 1ms disk access time
 N I/Os take 256 x 103 sec = 4266 min = 71 hr
 N/B I/Os take 256/8 sec = 32 sec
Heavily based on slides by Lars Arge
15
I/O-algorithms
Fundamental Bounds
• Scanning:
• Sorting:
• Searching:
Internal
N
N log N
log
2
External
N
B
N
B
N
log
log
M
B
B
N
B
N
• Note:
– Linear I/O: O(N/B)
– B factor VERY important:
– Cannot sort optimally with search tree
N
B
Heavily based on slides by Lars Arge

N
B
log
M
B
N
B
 N
20
I/O-algorithms
External Search Trees
• Binary search tree:
– Standard method for search among N elements
 (log
2
N)
– Search traces at least one root-leaf path
– If nodes stored arbitrarily on disk
 Search in O (log 2 N ) I/Os
Heavily based on slides by Lars Arge
21
I/O-algorithms
External Search Trees
 (log
2
B)
(B)
• BFS blocking:
– Block height O (log 2 N ) / O (log 2 B )  O (log B N )
– Output elements blocked

Rangesearch in  (log B N ) I/Os
• Optimal: O(N/B) space and  (log B N ) query
Heavily based on slides by Lars Arge
22
I/O-algorithms
External Search Trees
• Maintaining BFS blocking during updates?
– Balance normally maintained in search trees using rotations
x
y
y
x
• Seems very difficult to maintain BFS blocking during rotation
Heavily based on slides by Lars Arge
23
I/O-algorithms
B-trees
• BFS-blocking naturally corresponds to tree with fan-out  ( B )
• B-trees balanced by allowing node degree to vary
– Rebalancing performed by splitting and merging nodes
Heavily based on slides by Lars Arge
24
I/O-algorithms
(a,b)-tree
• T is an (a,b)-tree (a≥2 and b≥2a-1)
– All leaves on the same level and
contain between a and b elements
– Except for the root, all nodes have
degree between a and b
– Root has degree between 2 and b
(2,4)-tree
• (a,b)-tree uses linear space and has height O (log a N )

Choosing a,b =  ( B ) each node/leaf stored in one disk block

(N/B) space and  (log B N  T B ) query
Heavily based on slides by Lars Arge
25
I/O-algorithms
(a,b)-Tree Insert
• Insert:
Search and insert element in leaf v
DO v has b+1 elements/children
Split v:
make nodes v’ and v’’ with
 b 21   b and b 21   a elements
insert element (ref) in parent(v)
(make new root if necessary)
v=parent(v)
v
b 1
v’
v’’
 b 21   b 21 
• Insert touch  (log
a
N ) nodes
Heavily based on slides by Lars Arge
26
I/O-algorithms
(2,4)-Tree Insert
Heavily based on slides by Lars Arge
27
I/O-algorithms
(a,b)-Tree Delete
• Delete:
Search and delete element from leaf v
DO v has a-1 elements/children
Fuse v with sibling v’:
move children of v’ to v
delete element (ref) from parent(v)
(delete root if necessary)
If v has >b (and ≤ a+b-1<2b) children split v
v=parent(v)
• Delete touch O (log
a
Heavily based on slides by Lars Arge
v
a -1
v
 2a - 1
N ) nodes
28
I/O-algorithms
(2,4)-Tree Delete
Heavily based on slides by Lars Arge
29
I/O-algorithms
Summary/Conclusion: B-tree
• B-trees: (a,b)-trees with a,b =  ( B )
– O(N/B) space
– O(logB N) query
– O(logB N) update
• B-trees with elements in the leaves sometimes called B+-tree
• Construction in O ( NB log M B NB ) I/Os
– Sort elements and construct leaves
– Build tree level-by-level bottom-up
Heavily based on slides by Lars Arge
31
I/O-algorithms
Merge Sort
• Merge sort:
– Create N/M memory sized sorted runs
– Merge runs together M/B at a time
 O (log
M
B
N
M
) phases using O ( N B ) I/Os each
• Distribution sort similar (but harder – partition elements)
Heavily based on slides by Lars Arge
32