Persistent Data Structures

Download Report

Transcript Persistent Data Structures

Persistent Data Structures
Apr 17, 2013
Definitions
 An immutable data structure is one that, once created, cannot be
modified
 Immutable data structures can (usually) be copied, with modifications, to
create a new version
 The modified version takes up as much memory as the original version
 A persistent data structure is one that, when modified, retains
both the old and the new values
 Persistent data structures are effectively immutable, in that prior references
to it do not see any change
 Modifying a persistent data structure may copy part of the original, but the
new version shares memory with the original
 This definition is unrelated to persistent storage, which means
keeping a copy of data on disk between program executions
Why persistent data structures?
 Functional programming is based on the idea of immutable
data—or persistent data, which is effectively immutable
 The use of immutable data structures greatly simplifies
concurrent programming
 Synchronization is expensive, and immutable data structures
don’t need to be synchronized
 Copying large data structures is expensive and wastes space, but
persistent data structures can use sophisticated structure sharing
to reduce the cost on disk between program executions
Lists
 Lists are the original persistent data structures, and are
very heavily used in functional programming
insert w
original
delete x
w
x
y
z
As you can see, persistence is automatic with a
list, and requires no additional effort
Trees and binary trees

Trees and binary trees can also be implemented in a
persistent fashion, though it takes a bit more work
A
A’
B
D
H
E
I
C’
C
J
F
K
L
G
M
G’
N
5
Arrays and vectors




It’s more difficult to implement a persistent array
The programming language Clojure implements
persistent vectors, which are like arrays but can be
expanded
Any location in a vector can be accessed in (almost)
O(1) time
Vectors are represented as “fat trees,” or more precisely,
as 32-tries
6
Tries



A trie is like a binary search
tree, only each node may
have many children
Tries are most often used
with strings (and have up to
26 children per node)
Each node of a 32-trie may
have 32 children
7
Vector implementation I

A persistent vector in Clojure is implemented as an N-level trie (N <= 7),
where the root and internal nodes are arrays of 32 references, and the
leaves are arrays of 32 values


For example, consider accessing location 5000 in a vector


The depth of the trie (1 to 7) is also kept as an instance value
5000 decimal is 1001110001000 binary
To acess element 5000 in a trie of depth 4:




The binary number in group 4 (green) says to take the 0th reference
The binary number in group 3 (orange) says to take the 5th reference
The binary number in group 2 (green) says to take the 28th reference
The binary number in group 1 (blue) says to take the 8th value
8
Vector implementation II

The trie can be treated as a “fat tree,” with the structure
sharing discussed earlier




Because the trie is fat (many children per node), there is a
high proportion of actual data to structure
Access time is “almost” O(1), but as the size increases, the
constant factor grows from 1 to 7 (depth of trie)
This design is especially good for appending vectors
For adding single elements to the end of the vector,
there are additional special-case optimizations
9
Persistent Hash Map

Since (in Java and Clojure) a hash code is a 32-bit integer, a hash map could
be implemented just like a vector


For a vector, the additional space required for the trie structure is a reasonable
proportion of the total space
For a hash map, the additional space required is not reasonable



The hard part is to use only as much space as needed
Basic approach:


Use arrays size N <= 32, where N is the number of non-null children
Use a 32-bit word to indicate which children are actually present


For example: 00010000000100010000000000101000 indicates 5 children
Find a fast function to map numbers in the range [0, 31] into the range [0, N)


There will be a large number of 32-element arrays which contain mostly nulls
Many processors have an instruction to count the number of 1 bits in a word
This would make a good assignment for the next time I teach this
course 
10
The End
Now this is not the end. It is not even the beginning of
the end. But it is, perhaps, the end of the beginning.
--Sir Winston Churchill, Speech in November 1942
11