Transcript Slide 1

Searching
1
The Problem
Searching is an every day occurrence.
2
Searching an Unsorted Array
• A method that uses a loop to search an
array.
public boolean contains(Object anEntry)
{ boolean found = false;
for (int index = 0; !found && (index < length); index++)
{ if (anEntry.equals(entry[index]))
found = true;
} // end for
return found;
} // end contains
3
Searching an Unsorted Array
An iterative sequential search of an array that (a)
finds its target; (b) does not find its target
4
Searching an Unsorted Array
• Pseudocode for a recursive algorithm to
search an array.
Algorithm to search a[first] through a[last] for desiredItem
if (there are no elements to search)
return false
else if (desiredItem equals a[first])
return true
else
return the result of searching a[first+1] through a[last]
5
Searching an Unsorted Array
A recursive sequential search of
an array that (a) finds its target;
(b) does not find its target.
6
Efficiency of a Sequential Search
• Best case
O(1)
– Locate desired item first
• Worst case
O(n)
– Must look at all the items
• Average case
O(n)
1  2  3  ...  n n(n  1) / 2 n  1


n
n
2
7
Searching a Sorted Array
• A search can be more efficient if the data
is sorted
Coins sorted by their mint dates.
8
Binary Search of Sorted Array
Ignoring one-half of the data when
the data is sorted.
9
Binary Search of Sorted Array
• Algorithm for a binary search
Algorithm binarySearch(a, first, last, desiredItem)
mid = (first + last)/2 // approximate midpoint
if (first > last)
return false
else if (desiredItem equals a[mid])
return true
else if (desiredItem < a[mid])
return binarySearch(a, first, mid-1, desiredItem)
else // desiredItem > a[mid]
return binarySearch(a, mid+1, last, desiredItem)
10
Binary Search of Sorted Array
A recursive binary search of a sorted
array that (a) finds its target;
11
Binary Search of Sorted Array
A recursive binary search of a sorted array
that (b) does not find its target.
12
Java Class Library: The Method
binarySearch
• The class Arrays in java.util defines
versions of a static method with following
specification:
/** Task: Searches an entire array for a given item.
* @param array the array to be searched
* @param desiredItem the item to be found in the array
* @return index of the array element that equals desiredItem;
* otherwise returns -belongsAt-1, where belongsAt is
* the index of the array element that should contain
* desiredItem */
public static int binarySearch(type[] array, type desiredItem);
13
Efficiency of a Binary Search
• Best case
O(1)
– Locate desired item first
• Worst case
O(log n)
– Must look at all the items
• Average case
O(log n)
14
Hash Tables
15
Motivations
• We want a data structure in which finds/searches are
very fast
– As close to O(1) as possible
• Insert and Deletes should be fast too
• Objects in Hash tables have unique keys
– A key may be a single property/attribute value
– Or may be created from multiple properties/values
16
Hashing
• Say we have a class with 100 students.
– Each student is assigned a 5 digit student
number.
– Range of student numbers is [0,105-1]
– Efficiency is key issue here
• We use N to denote the size of the range
and n to denote the size of the data set
– For above scenario:
• N=99999
• n=100
17
Suboptimal Hashing
1. Array of N buckets indexed by key
null
00000
–
–
null
null
00001 00002
Mary Susan John
…
11254 11255 11256
null
Joe
…
27798
…
99999
Search Time: O(1)
Storage Requirements: O(N)
•
Huge amounts of wasted space
2. Linked List of n elements
Head
–
Mary
11254
Susan
11255
John
11256
Joe
27798
null
Search and Storage are O(n)
3. Balanced Binary Tree with n nodes.
–
–
Search: O(logn)
Storage: O(n)
18
A Better Solution
• Hash Tables:
– O(1) search time
– O(M) storage space
• where M is the table size
– Like array implementation but we use a
function to map large range into a smaller
more manageable one.
– Example function: f(x) = x mod 100
• Maps keys into a relatively small range. Mapped
values used as indices, not original keys.
19
Simple Hashing Example
• Suppose we use a hashed array with 5 buckets
null
0
null
1
null
null
null
2
3
4
1. Insert (Steve, 99654)
• Calculate hash value = 99654 mod 5 = 4
• Therefore (Steve,99654) is stored in slot indexed by 4
null
0
null
1
null
null
Steve
99654
2
3
4
Tanya
35562
null
Steve
99654
2
3
4
2. Insert (Tanya,35562)
• 35562 mod 5 = 2
null
0
null
1
20
Simple Hashing Example
• What happens if we get overlap?
• We call this a ‘collision’
3. Insert (John, 01197)
• 01197 mod 5 = 2
null
0
null
1
Tanya
35562
null
Steve
99654
2
3
4
21
Hash Functions and Hash Tables
•
Let universe of keys U and an array of size m. A hash function h is a function from
U to 0…m, that is:
h:U
U
k1
k2
k3 k 4
k6
(universe of keys)
0…m
0
1
2
3
4
5
6
7
h (k2)=2
h (k1)=h (k3)=3
h (k6)=5
h (k4)=7
22
Hash Functions and Hash Tables
• A hash function h maps keys of a given type to integers
in a fixed interval [0, m - 1]
• Example:
h(x)  x mod m
is a hash function for integer keys
• The integer h(x) is called the hash value of key x
• A hash table for a given key type consists of
– Hash function h
– Array (called table) of size m
It is important that the key remain constant
for the lifetime of the object
23
Hash Functions Design
• A hash function is usually specified as the composition of two
functions:
Hash code:
h1: keys  integers
Compression function:
h2: integers  [0, m - 1]
• The hash code is applied first, and the compression function is
applied next on the result, i.e.,
h(x) = h2(h1(x))
• A good hash function has the following features:
o Fast
o uniform distribution (minimizes collisions)
24
Hash Codes
• Memory address:
– We translate the memory address of the key object as
an integer (default hash code of all Java objects)
– Good in general, except for numeric and string keys
25
Java’s hashCode() method
public int hashCode()
• Returns a hash code value for the object. This method is supported for the benefit
of hashtables such as those provided by java.util.Hashtable.
The general contract of hashCode is:
Whenever it is invoked on the same object more than once during an execution of a
Java application, the hashCode method must consistently return the same integer,
provided no information used in equals comparisons on the object is modified. This
integer need not remain consistent from one execution of an application to another
execution of the same application.
• If two objects are equal according to the equals(Object) method, then
calling the hashCode method on each of the two objects must produce the
same integer result.
• It is not required that if two objects are unequal according to the
equals(java.lang.Object) method, then calling the hashCode method on
each of the two objects must produce distinct integer results. However, the
programmer should be aware that producing distinct integer results for
unequal objects may improve the performance of hashtables.
• As much as is reasonably practical, the hashCode method defined by class
Object does return distinct integers for distinct objects. (This is typically
implemented by converting the internal address of the object into an
integer, but this implementation technique is not required by the JavaTM
programming language.)
• Returns:
26
– a hash code value for this object.
Hash Codes
• Integer cast:
– We translate the bits of the key as an integer
– Suitable for keys of length less than or equal to the
number of bits of the integer type (e.g., byte, short, int
and float in Java)
27
Hash Codes
• Component sum:
– We partition the bits of the key into components of
fixed length (e.g., 16 or 32 bits) and we sum the
components (ignoring overflows)
– Suitable for numeric keys of fixed length greater than
or equal to the number of bits of the integer type (e.g.,
long and double in Java)
28
Hash Codes
• Polynomial accumulation:
– We partition the bits of the key into a sequence of
components of fixed length (e.g., 8, 16 or 32 bits)
a0 a1 … an-1
– We evaluate the polynomial
p(z) = a0 + a1 z + a2 z2 + …+ an-1zn-1
at a fixed value z, ignoring overflows
– Especially suitable for strings
29
Compression Functions
• Division:
– h2 (y)  y mod m
– The size m of the hash table is usually chosen
to be a prime
– The reason has to do with number theory and
is beyond the scope of this course
• Multiply, Add and Divide (MAD):
– h2 (y)  (ay  b) mod m
– a and b are nonnegative integers such that
a mod m  0
– Otherwise, every integer would map to the same value b
30
Hash function for a hash table of size n
map
key -> {0,n-1}
typical function:
key -> integer % n
eg. // student number key
int hash(String stuNo, int n)
{
return Integer.parseInt(stuNo.substring(1))%n;
}
31