new-allan-load-spreading-0111-v2.ppt

Transcript new-allan-load-spreading-0111-v2.ppt

LOAD SPREADING
APPROACHES
David Allan, Scott Mansfield, Eric Gray, Janos Farkas
Ericsson
DISCLAIMER
› I’m not talking about TTL, nor do I have a proposal
for TTL as to how to encode it/process
it/placement on a baggy pants diagram etc.
› If we believe TTL is necessary to market Ethernet
networking in the future it should not be confined
to ECMP/load spreading applications
–Hence bundling TTL and ECMP together in a single
header is potentially a mistake
› What I do want to discuss is ECMP
Ericsson | 2010-09-10 | Page 2
INTRODUCTION - ASSUMPTIONS
› We want load spreading to operate within the Ethernet
layer
– We do not have per hop “layer violations” in frame processing
– OAM can “fate share” with ECMP flows without having to
impersonate payload
›
We can offer a minimum of a decent ping/traceroute paradigm
› This requires a frame to carry entropy information such
that an equal cost multi-path decision can be made at
each hop
– Entropy information is populated by a packet inspection function at
the network ingresssome digest of packet information
– The function that generates entropy information is designed to
produce a common value for packets in a specific flow
Ericsson | 2010-09-10 | Page 3
FIRST, THE OBVIOUS; IF THE SPREADING FuNCTION IS THE SAME AT
EVERY HOP:
Common hash and common next hop count at each stage is pathological.
This is an extreme example of how correlation of entropy values and correlation of
randomization functions produces an undesirable result
Ericsson | 2010-09-10 | Page 4
The size of the entropy field will not change this
RESULT WITH A UNIQUE RANDOMIZATION FUNCTION AT EACH
HOP/Stage
Need to re-randomize each subset of traffic at each
stage in the hierarchy to use all links
Ericsson | 2010-09-10 | Page 5
HoW Entropy INFORMATION IN THE FRAME is expected to work
› Each node determines the next hop as a function of both
the entropy information in the frame, and locally generated
random information
– In order to try to perfectly re-randomize any arbitrary subset of flow
labels received
› It is desirable to minimize the amount of processing when
making such a decision while maintaining the quality of
randomness
› First blush is BIG is better for quantity of entropy carried in
the packet, but is this really necessary?
Ericsson | 2010-09-10 | Page 6
Thought Experiment
› Question: How much entropy information do we need?
› Experiment: For a given size of entropy information field, lets run it
through a number of stages and see if and observe how it degenerates
when compared to perfection
– For 3 interfaces at each hop, perfection is 1/3, 1/9, 1/27 etc. of original
number of flows assuming we perfectly randomize at every hop
› Practical reality is we will not perfectly randomize progressive subsets of the
original complete set of flows
› The example shows the average of the CV from “perfection” for 10 tries
of 5 stages of randomization with the fan out being 3 interfaces per
stage
Ericsson | 2010-09-10 | Page 7
Sidebar – quality of randomization
› Using a function of a random number and packet entropy information for
randomization at each hop will tend to produce subsets correlated in some
fashion
› This leads to poorer randomization at each hop
– Interface = f(packet entropy, random value) % (# of interfaces)
› What we want is a non-lossy and non-correlated transform in order to
maximize randomization of any arbitrary subset of flow IDs
› Using a table lookup with the table populated with random values allows
us to produce non-lossy uncorrelated subsets with much of the work done
up front
– Interface = table [packet entropy] % (# of interfaces)
› However, the larger the packet entropy field is, the less practical a table
look up is to implement
–
hence we illustrate results for both in the large entropy field case
Ericsson | 2010-09-10 | Page 8
HOW MUCH ENTROPY INFORMATION IS NEEDED?
CV of load Spreading vs. size of Entropy label
10
1
1
2
3
4
5
0.1
Table 6 bit
Table 12 bit
Function 20 bit
Table 20 bit
CV
0.01
0.001
0.0001
0.00001
0.000001
Table methods very sensitive to
quality of random number generation
This simulation wa generated using
the “C” rand() function. This likely can
be improved upon
Number of stages of randomization
Ericsson | 2010-09-10 | Page 9
ADVANTAGES OF USING THE B-VID as a FLOW LABEL
› The CV is still < 20% of load after 5 stages of switching for 12 bits of
entropy
› Use of VID offers backwards compatibility with older implementations
and offering some of the benefits of ECMP load spreading while using
those implementations
– They cannot do the fine grained processing of the B-VID such that each VID
value gets unique treatment as it would blow out the FDB
› But each SVL VID range normally associated with an MSTI could have a
randomly selected outgoing interface as part of FDB design
- Hence the 6 bits of entropy example on the curves
› This still requires SPB “intelligence” at nodes utilizing existing
implementations
– A completely “brain dead” nodes that simply behaves as a shared segment
will cause packet duplication when flooding… as all SPB/ECMP peers are
receiving promiscuously
– This is true no matter what flow label approach we adopt
Ericsson | 2010-09-10 | Page 10
ADVANTAGES OF USING THE B-VID as a FLOW LABEL/2
› Much smaller entropy label allows randomization
transforms that produce an uncorrelated set
– Indirection table allows for superior randomization
› Much smaller entropy label simplifies troubleshooting
– A full “blind” test needs to exercise only 4094 path permutations
› 802.1ag CFM will require minimal or no changes to work
Ericsson | 2010-09-10 | Page 11
CONCLUSIONS
› We do not need a large entropy token to get acceptable
results
› There are advantages to using the B-VID as such a token
– We can offer some ECMP benefits with primarily control plane
changes now
– We can incrementally modify the technology base over time to
improve the quality of load spreading
› O(MSTI) evolving to O(B-VID) for the amount of per-hop entropy
available
– The edge function is common
› B-VID = f(entropy source)
– OAM procedures are simplified and can work with the existing
toolset
Ericsson | 2010-09-10 | Page 12
BACKUP
› Function method
– At node initialization:
node_random_value = rand() & 0x0fff;
– When packet processing
interface = (B-VID ^ node_random_value) % next_hop_interface_count;
› Table method
– At node initialization
for (i=0;i<4096;i++) {
/* populate the table */
table [i] = i;
}
for (i=0;i<4096;i++) {
swap_index = rand() & 4096;
swap_value = table [i];
table [i] = table[swap_index];
table[swap_index] = swap_value;
}
/* scramble the table */
– When packet processing
interface = table[B-VID] % next_hop_interface_count
Ericsson | 2010-09-10 | Page 14

new-allan-load-spreading-0111-v2.ppt

Transcript new-allan-load-spreading-0111-v2.ppt

Directory