Sorting Large Files
Download
Report
Transcript Sorting Large Files
Sorting Large Files
Part One:
Why even bother?
And a simple solution.
Starter Questions
Why sort a large data file?
speed of searching
Why not sort a large data file?
difficult to add and delete data
Searching Unsorted Files
Algorithm - Sequential Search
start
at top of the file and inspect each record
until found
Efficiency
best case:
worst case:
average case:
1
N
N/2
average search for 1,000,000 records is 500,000 compares
Big O
N
Searching Sorted Files
Example 1: Sequential Search
Example 2: Binary Search
Basic
Algorithm
look at middle record
if (target < current record)
look at front half
else
look at end half
Big
O = log2(N)
average search for 1,000,000 records is 20 compares
Editing Unsorted Files
How do you add data?
append new data to end of file
How do you delete data?
mark over records with Xs and 0s
periodically clean the file
Editing Sorted Files
To Delete Records, we cannot put Xs over
the key field of records
Maintain 3 sorted Files
working
data
data to delete
data to add
To Update --> Merge the three all at once
Example Update of Sorted File
Working Data:
aardvark
bat
cat
dog
giraffe
hippopotamus
Data to Delete:
cat
Data to Add:
elephant
ferret
New Working Data:
aardvark
bat
dog
elephant
ferret
giraffe
hippopotamus
Question
Why we would ever need to sort a file?
Wouldn't we build it sorted to begin with
and just keep it sorted?
sort a big block of new data
e.g., list of transactions from today
sort a huge file by a different key
File Sorting Algorithms
Internal Sorts
when
the whole file will fit in main memory
algorithm:
1. read the unsorted file into memory
2. sort all at once
3. write to new file
File Sorting Algorithms
External Sorts
when
the file is too big to fit in memory
over simplified algorithm:
while not eof
read a big block of the data into memory
sort that portion
write into a temp file
merge all those temp files
2-Way Merge Sort
Create 2 sorted files
Read 1st half of file W
sort it, then write to
Read 2nd half of W into
sort it, then write to
into memory
file X
memory
file Y
Merge the 2 files
Read record x from X
Read record y from Y
While both X and Y contain records
if x < y
write x to Z
read x from X
else
write y to Z
read y from Y
If X is empty
write remainder of Y to Z
else
write remainder of X to Z
Next Time
Good internal sorts
Merging a small amount of unsorted new
data into a Big Sorted File
N-Way Merge Sort