Persistent Data Structures
Download
Report
Transcript Persistent Data Structures
Persistent Data Structures
Apr 17, 2013
Definitions
An immutable data structure is one that, once created, cannot be
modified
Immutable data structures can (usually) be copied, with modifications, to
create a new version
The modified version takes up as much memory as the original version
A persistent data structure is one that, when modified, retains
both the old and the new values
Persistent data structures are effectively immutable, in that prior references
to it do not see any change
Modifying a persistent data structure may copy part of the original, but the
new version shares memory with the original
This definition is unrelated to persistent storage, which means
keeping a copy of data on disk between program executions
Why persistent data structures?
Functional programming is based on the idea of immutable
data—or persistent data, which is effectively immutable
The use of immutable data structures greatly simplifies
concurrent programming
Synchronization is expensive, and immutable data structures
don’t need to be synchronized
Copying large data structures is expensive and wastes space, but
persistent data structures can use sophisticated structure sharing
to reduce the cost on disk between program executions
Lists
Lists are the original persistent data structures, and are
very heavily used in functional programming
insert w
original
delete x
w
x
y
z
As you can see, persistence is automatic with a
list, and requires no additional effort
Trees and binary trees
Trees and binary trees can also be implemented in a
persistent fashion, though it takes a bit more work
A
A’
B
D
H
E
I
C’
C
J
F
K
L
G
M
G’
N
5
Arrays and vectors
It’s more difficult to implement a persistent array
The programming language Clojure implements
persistent vectors, which are like arrays but can be
expanded
Any location in a vector can be accessed in (almost)
O(1) time
Vectors are represented as “fat trees,” or more precisely,
as 32-tries
6
Tries
A trie is like a binary search
tree, only each node may
have many children
Tries are most often used
with strings (and have up to
26 children per node)
Each node of a 32-trie may
have 32 children
7
Vector implementation I
A persistent vector in Clojure is implemented as an N-level trie (N <= 7),
where the root and internal nodes are arrays of 32 references, and the
leaves are arrays of 32 values
For example, consider accessing location 5000 in a vector
The depth of the trie (1 to 7) is also kept as an instance value
5000 decimal is 1001110001000 binary
To acess element 5000 in a trie of depth 4:
The binary number in group 4 (green) says to take the 0th reference
The binary number in group 3 (orange) says to take the 5th reference
The binary number in group 2 (green) says to take the 28th reference
The binary number in group 1 (blue) says to take the 8th value
8
Vector implementation II
The trie can be treated as a “fat tree,” with the structure
sharing discussed earlier
Because the trie is fat (many children per node), there is a
high proportion of actual data to structure
Access time is “almost” O(1), but as the size increases, the
constant factor grows from 1 to 7 (depth of trie)
This design is especially good for appending vectors
For adding single elements to the end of the vector,
there are additional special-case optimizations
9
Persistent Hash Map
Since (in Java and Clojure) a hash code is a 32-bit integer, a hash map could
be implemented just like a vector
For a vector, the additional space required for the trie structure is a reasonable
proportion of the total space
For a hash map, the additional space required is not reasonable
The hard part is to use only as much space as needed
Basic approach:
Use arrays size N <= 32, where N is the number of non-null children
Use a 32-bit word to indicate which children are actually present
For example: 00010000000100010000000000101000 indicates 5 children
Find a fast function to map numbers in the range [0, 31] into the range [0, N)
There will be a large number of 32-element arrays which contain mostly nulls
Many processors have an instruction to count the number of 1 bits in a word
This would make a good assignment for the next time I teach this
course
10
The End
Now this is not the end. It is not even the beginning of
the end. But it is, perhaps, the end of the beginning.
--Sir Winston Churchill, Speech in November 1942
11