Optimal XOR Hashing for a Linearly Distributed Address
Download
Report
Transcript Optimal XOR Hashing for a Linearly Distributed Address
Optimal XOR Hashing for a
Linearly Distributed Address
Lookup in Computer Networks
Christopher Martinez, Wei-Ming Lin, Parimal Patel
The University of Texas at San Antonio
October 28, 2005
Outline
Motivation
Hashing Background
Linear Distribution
Optimal Hashing
Simulation
Conclusion
Motivation
All network applications require some
searching
Switches, routers and intrusion detection systems
require the searching of IP address or subnet IDs
Searching should be based on distribution of
the records in the database
For computer networks, searching needs to
be real-time
Motivation (cont.)
A capture of network
traffic shows the nonuniform distribution of
IP type C addresses
Since IP address
entering the network
are non-uniform then
searching should take
this into account
Hashing Background
Straightforward sequential
searching impractical for
large databases
Hashing reduces the
database into small
subsets
Searching subsets
reduces search time
Predictable time needed
for real-time applications
Hashing Background
Hashing algorithms are well research, we
look to provide new insight base on the
probability distribution
This work is not concern about collision, each
hashing key will have the same number of
collision in a link list
Hashing using probability background should
limit the average number of searches in the
link list
Hashing: Non-uniform Distribution
Linear Distribution
From our capture network traffic we can
approximate the non-uniform distribution by a
linear probability distribution function
XOR Hashing For Linear Distribution
We wanted a
straightforward hashing
scheme that can be used
for any size database and
hashing space
Define the hashing function
as P=(gm-1,gm-2,…,g0)
Measure hashing functions
against each other by the
value δ
δ measure how close to
uniform the hashing creates
XOR Hashing for Linear Distribution
4-bit to 2-bit Example
P=(2,2)
XOR Hashing for Linear Distribution
4-bit to 2-bit Example
P=(3,1)
XOR Hashing for Linear Distribution
4-bit to 2-bit Example
P=(1,3)
XOR Hashing Observation
Observations:
gi > 1: leads to equal partitioning
gi = 1: leads to unequal partitioning
δ: difference between highest hash
distribution density and mean
To find δ: we need to determine highest
final hash distribution density
Optimal XOR Hashing for Linear
Distribution
Hashing consists of m steps (from step m-1
to step 0)
pi : highest density value after step i
Derive pi from pi+1 at step i
pm = A = 1/2n (original mean before hashing)
δ = p0 – 1/2m
Optimal XOR Hashing for Linear
Distribution
δ vs. P for Linear Distribution
Optimal solution comes from all groups
XORing more than 1 bit
Simulation
Goal: Demonstrate that lower δ leads to
better search performance
Hashing: map from 2n to 2m
Each simulation performs 2m hash lookups
Simulation
Three performance measurements
Number of Empty Bins (NEB)
Average maximum Search Length (ASL)
Maximum Search Length (MSL)
Simulation
Improvement from best
δ over worst δ
NEB: 18%
ASL: 12%
MSL: 17%
Simulation
Future Work
Find optimal XOR hashing for exponential
distribution and partial linear distribution
Look more in depth to see if what
applications exhibit linear distribution
Find performance gain of using this hashing
scheme in an intrusion detection system
Conclusion
Network applications demonstrate nonuniform distribution making known search
techniques less than optimal
Linear distribution can benefit from the XOR
folding property
Optimal XOR grouping can be easily
identified to minimize error in hashing
distribution
Theory in linear case can be applied to other
non-uniform distributions