Diapositiva 1

Download Report

Transcript Diapositiva 1

Complex
Sorting
The Perl Sorting Paradigm
•
•
•
•
1. Preprocess the input to extract the sortkeys.
2. Sort the data by comparing the sortkeys.
3. Postprocess the output to retrieve the data.
@out = # These may be separate steps.
map POSTPROCESS($_) =>
sort sortsub
map PREPROCESS($_) =>
@in;
• @out = sort @in; # The default sort.
2
Perl Sorting Techniques
• Naive (no pre- or postprocessing)
– Sortkeys recomputed on every comparison.
• Cached sortkeys; the Orcish Maneuver
– Sortkeys cached in hashes.
• The Schwartzian Transform
– Sortkeys cached in anonymous arrays.
• The Packed-Default Sort
– Sortkeys and operands packed in strings.
3
Schwartzian Transformation (ST)
Sort a list of strings according to a dotted-quad IP address.
@out =
map $_->[0] =>
sort { $a->[1] <=> $b->[1] ||
$a->[2] <=> $b->[2] ||
$a->[3] <=> $b->[3] ||
$a->[4] <=> $b->[4] }
map [ $_, /(\d+)\.(\d+)\.(\d+)\.(\d+)/ ]
=> @in;
4
ST with Packed Sortkeys
Concatenate the subkeys into a sortable string.
@out =
map $_->[0] =>
sort { $a->[1] cmp $b->[1] }
map [ $_, pack('C4' =>
/(\d+)\.(\d+)\.(\d+)\.(\d+)/)
] => @in;
A Fresh Look at Efficient Perl
Sorting
5
The Packed-Default Sort
Append the operands to the packed sortkeys.
@out =
map substr($_, 4) =>
sort
map pack('C4' =>
/(\d+)\.(\d+)\.(\d+)\.(\d+)/)
. $_ => @in;
6
Selected Benchmarks
CPU time (microseconds per line)
250
200
150
Laddered ST
Packed ST
Packed-Default
100
50
0
100
1000
10000
100000
O(N*logN) comparisons dominate the ST.
O(N) preprocessing dominates the P-D.
7
Packing the Sortkeys
• Strings – fixed or varying lengths; ascending or
descending; can be case-insensitive
• Integers – chars, shorts, or longs; signed or
unsigned; ascending or descending
• Floating-point numbers – floats or doubles;
ascending or descending
• Indexes of strings (to achieve stable sorting) or
indexes of arrays or hashes (for retrieval)
8
The Sort::Records Module
• Combines the packed-default sort technique
with automatic subkey extraction using a simple
attribute/value syntax.
• Sort /etc/passwd by user name.
$sort1 = Sort::Records->
new([width => 10, split => [':', 0]]);
@pw = $sort1->sort(‘cat /etc/passwd‘);
• Sort /etc/passwd by user ID.
$sort2 = Sort::Records->
new([type => 'int', split => [':', 2]]);
@pw = $sort2->sort(‘cat /etc/passwd‘);
9
Conclusions
• Packing subkeys into sortable strings speeds
up large sorts, using any sorting method.
• Appending the operands to the sortkeys
makes it possible to use the fast default
lexicographic sort comparison.
• The module Sort::Records encapsulates the
code conveniently.
• <URL:http://www.hpl.hp.com/personal/Larry_Rosler/sort/>
<URL:http://www.sysarch.com/perl/sort/>
10