Optimal XOR Hashing for a Linearly Distributed Address

Transcript Optimal XOR Hashing for a Linearly Distributed Address

Optimal XOR Hashing for a
Linearly Distributed Address
Lookup in Computer Networks
Christopher Martinez, Wei-Ming Lin, Parimal Patel
The University of Texas at San Antonio
October 28, 2005
Outline






Motivation
Hashing Background
Linear Distribution
Optimal Hashing
Simulation
Conclusion
Motivation

All network applications require some
searching



Switches, routers and intrusion detection systems
require the searching of IP address or subnet IDs
Searching should be based on distribution of
the records in the database
For computer networks, searching needs to
be real-time
Motivation (cont.)


A capture of network
traffic shows the nonuniform distribution of
IP type C addresses
Since IP address
entering the network
are non-uniform then
searching should take
this into account
Hashing Background




Straightforward sequential
searching impractical for
large databases
Hashing reduces the
database into small
subsets
Searching subsets
reduces search time
Predictable time needed
for real-time applications
Hashing Background



Hashing algorithms are well research, we
look to provide new insight base on the
probability distribution
This work is not concern about collision, each
hashing key will have the same number of
collision in a link list
Hashing using probability background should
limit the average number of searches in the
link list
Hashing: Non-uniform Distribution
Linear Distribution

From our capture network traffic we can
approximate the non-uniform distribution by a
linear probability distribution function
XOR Hashing For Linear Distribution




We wanted a
straightforward hashing
scheme that can be used
for any size database and
hashing space
Define the hashing function
as P=(gm-1,gm-2,…,g0)
Measure hashing functions
against each other by the
value δ
δ measure how close to
uniform the hashing creates
XOR Hashing for Linear Distribution
4-bit to 2-bit Example
P=(2,2)
XOR Hashing for Linear Distribution
4-bit to 2-bit Example
P=(3,1)
XOR Hashing for Linear Distribution
4-bit to 2-bit Example
P=(1,3)
XOR Hashing Observation

Observations:




gi > 1: leads to equal partitioning
gi = 1: leads to unequal partitioning
δ: difference between highest hash
distribution density and mean
To find δ: we need to determine highest
final hash distribution density
Optimal XOR Hashing for Linear
Distribution





Hashing consists of m steps (from step m-1
to step 0)
pi : highest density value after step i
Derive pi from pi+1 at step i
pm = A = 1/2n (original mean before hashing)
δ = p0 – 1/2m
Optimal XOR Hashing for Linear
Distribution
δ vs. P for Linear Distribution

Optimal solution comes from all groups
XORing more than 1 bit
Simulation



Goal: Demonstrate that lower δ leads to
better search performance
Hashing: map from 2n to 2m
Each simulation performs 2m hash lookups
Simulation

Three performance measurements
Number of Empty Bins (NEB)
 Average maximum Search Length (ASL)
 Maximum Search Length (MSL)

Simulation

Improvement from best
δ over worst δ



NEB: 18%
ASL: 12%
MSL: 17%
Simulation
Future Work



Find optimal XOR hashing for exponential
distribution and partial linear distribution
Look more in depth to see if what
applications exhibit linear distribution
Find performance gain of using this hashing
scheme in an intrusion detection system
Conclusion




Network applications demonstrate nonuniform distribution making known search
techniques less than optimal
Linear distribution can benefit from the XOR
folding property
Optimal XOR grouping can be easily
identified to minimize error in hashing
distribution
Theory in linear case can be applied to other
non-uniform distributions

Optimal XOR Hashing for a Linearly Distributed Address

Transcript Optimal XOR Hashing for a Linearly Distributed Address

Directory