Stupid Columnsort Tricks Geeta Chaudhry Tom Cormen Dartmouth College

Download Report

Transcript Stupid Columnsort Tricks Geeta Chaudhry Tom Cormen Dartmouth College

Stupid Columnsort Tricks
Geeta Chaudhry
Tom Cormen
Dartmouth College
Department of Computer Science
What Do We Know About Columnsort?
• Sorts N values on an r  s mesh
• Uses 8 steps
– Each step either sorts each column or performs a
fixed permutation
• Divisibility restriction: s divides r
• Height restriction: r ≥ 2s2 4s3/2
–
–
–
–
Exponent of s goes from 2 to 3/2
Mesh need not be quite so tall and skinny
Cost: 2 additional steps
Can simultaneously remove the divisibility
restriction and relax the height restriction to
r ≥ 6s3/2
Why Relax the Conditions?
• Columnsort applies in more circumstances
• Our motivation: out-of-core sorting
• Column height r is limited by amount of
memory
–
–
–
–
Either per processor or in entire system
N = rs, r ≥ 2s2 ==> N ≤ r3/2/21/2
N = rs, r ≥ 4s3/2 ==> N ≤ r5/3/42/3
Reducing the exponent of s in the bound for r
allows us to sort more values with a given
amount of memory
• A similar technique works for applying
columnsort to in-core sorting
This Talk
• Slabpose columnsort
– r ≥ 4s3/2
– Requires divisibility restriction
• Also in the paper
– Subblock columnsort
• r ≥ 4s3/2 with divisibility restriction
• r ≥ 6s3/2 without divisibility restriction
– Proof that the divisibility restriction is
unnecessary in the basic columnsort algorithm
Columnsort Steps
1.
2.
3.
4.
5.
6.
7.
8.
Sort each column
Transpose entire mesh
Sort each column
Untranspose entire mesh
Sort each column
Shift down by half a column
Sort each column
Shift up by half a column
Slabpose Columnsort Steps
1. Sort each column
2. Slabpose: transpose within vertical slabs
3. Sort each column
4. Shuffle columns
5. Slabpose
Oblivious!
6. Sort each column
7. Untranspose entire mesh
8. Sort each column
9. Shift down by half a column
10.Sort each column
11.Shift up by half a column
Slabpose Columnsort Steps
1.
2.
3.
4.
Sort each column
Slabpose: transpose within vertical slabs
Sort each column
Shuffle columns + slabpose
Oblivious!
5. Sort each column
6. Untranspose entire mesh
7. Sort each column
8. Shift down by half a column
9. Sort each column
10.Shift up by half a column
Why Work With Vertical Slabs?
• In regular columnsort, the matrix needs to
be tall and skinny
• Working with vertical slabs allows us to
change the aspect ratio to use tall and
skinny slabs
• We’ll use slabs that are s columns wide
• The mesh will have s slabs
0-1 Principle
• If an oblivious algorithm sorts all input sets
consisting solely of 0s and 1s, then it sorts
all input sets with arbitrary values
• Use the 0-1 Principle by looking at portions
of the r  s mesh
• Clean: all 0s or all 1s
• Dirty: may be mixed 0s and 1s
Step 1: Sort Each Column
0
dirty
r
1
s
Step 2: Slabpose
s-slab
column
s
s slabs
≤ s dirty rows
Step 3: Sort Each Column
≤ s
rows
Step 4: Shuffle
s-slab
s-slab
≤ s
rows
s slabs
s slabs
Step 5: Slabpose
s-slab
s-slab
r/ s rows
≤ 2 rows
s slabs
s slabs
s sets of dirty rows
Step 6: Sort Each Column
≤ 2 s rows
≤ 2s3/2 elements
Step 7: Untranspose Entire Mesh
≤ 2s3/2 elements
3/2 the
Once thersize
≥ 4sof
==>dirty
2s3/2 area
≤ r/2is at most half a column,
the last
==>four
dirtysteps
areawill
≤ half
finish
a column
up
Step 8: Sort Each Column
dirty area resides in one column ==> done
Step 8: Sort Each Column
dirty area resides in two columns ==> no change
Step 9: Shift Down by Half a Column
dirty area resides
in one column
Step 10: Sort Each Column
dirty area resides
in one column
Step 11: Shift Up by Half a Column
sorted
Subblock Columnsort
• Adds two steps to columnsort
– Sort each column
– A fixed permutation
• The permutation is any one that distributes
all elements of each s  s subblock to all
s columns
• Like slabpose columnsort, the size of the
dirty area is ≤ 2s3/2 entering the last four
steps
• As long as 2s3/2 ≤ r/2 (half a column), the
last four steps complete the sorting
Removing the Divisibility Restriction
from Columnsort
• With the divisibility restriction, the dirty
rows after the transpose step have only 0->1
transitions
• Without the divisibility restriction, there
may also be 1->0 transitions
• The proof shows that even with the 1->0
transitions, the size of the dirty area
entering the last four steps does not increase
• Thus r ≥ 2s2 suffices, even without the
divisibility restriction
Conclusion
• We can get around the restrictions of
columnsort
• Reduce the exponent in the height
restriction from 2 to 3/2
– The mesh need not be quite so tall and skinny
– Cost: Two extra steps
– In out-of-core implementation, slabpose
columnsort requires no additional I/O
• The divisibility restriction is unnecessary
• Open question: Can we reduce the exponent
further within the columnsort framework?