presentation(PPT)

Download Report

Transcript presentation(PPT)

Strategies for Implementing
Dynamic Load Sharing
Problems with Static Load
Sharing
• Algorithms must be distributed at compile
time.
• For many algorithms, the processing time
for the program is unknown.
• Load on each processor is unknown and can
change at any time.
• If a processor finished, it will remain idle.
Advantages to Dynamic Load
Sharing
• Partitions of work are modified at run time.
• Hope is for load to shift during run time so
that no processor is idle until all processors
run out of work.
• Sender and Receiver Based
– depending on the algorithm, either the overloaded or under-loaded processor can initiate
the transfer of work.
Hybrid Load Sharing
• Load is initially distributed statically.
• During run time, the work distribution is
modified dynamically.
• Saves the complexity of having to break up
the workload at the start.
Conditions for Dynamic Load
Sharing to be Worthwhile
• Work at each processor must be able to be
partitioned into independent pieces as long
it has more than a “minimal” amount.
• Cost of splitting work and sending to
another processor must be less than if it
does not.
• Method of splitting data must exist.
Receiver Initiated Dynamic Load
Sharing Algorithms
Asynchronous Round Robin
• Each processor keeps a “target” variable.
• When a processor runs out of work, it sends
a request to the processor in “target.
• Not desirable because multiple processors
could send work to the same processor near
the same time. Depending on network
topology, high overhead is possible to reach
all processors from one node.
Nearest Neighbor
• An idle processor will send requests to its
nearest neighbors in a round robin scheme.
• If a network has an identical distance
between processors, same as Asynchronous
Round Robin method.
• The major problem with this method is that
if there is a localized concentration of data
it will take a long time to be distributed
Global Round Robin
• Solves the problems of distributing work
evenly and of a processor receiving work
from multiple other processors.
• Global “target” variable so that workload
gets distributed relatively evenly.
• Problem is that processors will be in
contention for the target variable.
Global Round Robin with
Message Combining
• Avoids contention
problem when
accessing the target
variable
• All requests to read
the target value are
combined at
intermediate nodes.
• Has only been used in
research.
000
2
4
x
x+2
000
100
2
2
010
000
000
001
010
011
100
100
101
010
010
010
Random Polling
• Simplest method of load balancing
• processor requests a processor to send work
to at random
• equal probability of sending work to each
processor, so, distribution of work is about
equal
Scheduler Based
• One processor designated as the scheduler.
• sends work from FIFO queue of processor
that can donate work to nodes that request
work.
• Work request never sent to a node that does
not have work to send.
• Disadvantage is the routing time because
every node must go through the scheduler
node.
Gradient Model
• Based on 2 threshold parameters,
High-Water-Mark (HWM) and
Low-Water-Mark (LWM)
• Determine if system load is light, moderate
or heavy.
• Proximity defined as shortest distance to a
lightly loaded node
Gradient Map (cont…)
• Tasks are rounted through the
system in toward the nearest
underloaded processor.
• Gradient map may change
while work passes through
network
• Propagation to update
gradient map can vary greatly.
Receiver Initiated Diffusion
• Each processor only looks at the load
information in its own domain.
• An implementation of Near Neighbor,
however, there is a threshold value where a
processor will request work, so a processor
will never become idle.
• Eventually will cause each processor to
have an even amount of the work
Sender Initiated Dynamic Load
Balancing Algorithms
Single Level Load Balancing
• Breaks down tasks into smaller subtasks.
• Each processor responsible for more than
one subtask.
• This assures that each processor does
roughly the same amount of work.
• Manager processor controls creation and
distribution of subtasks.
• Not scalable because manager must
distribute all work
Multilevel Load Balancing
• Processors arranged in trees.
• Root processor of each tree is responsible
for distributing super-subtasks to its
successor processors.
• If only one tree, will act the same as single
level load balancing.
Sender Initiated Diffusion
• Each processor sends out a load update to
its neighbors. If a load update shows that a
processor does not have a lot of work, then
work is sent by one of its neighbors.
• Similar to Receiver Initiated Diffusion
• If one area of the network has a lot of work,
it will take a long time to become
distributed.
Hierarchical Balancing Method
• Organizes the processors into
balancing domains.
• Specific processors given
responsibility of balancing
different levels of hierarchy.
• Tree structure works well.
• If a branch overloaded, it will
send work to another branch of
the tree, each node has a
corresponding node in the
opposite tree.
Dimensional Exchange Method
• Balances domains first, then larger domains.
• Domain defined as a dimension of a hypercube.
• Can be expanded to mesh by folding the mesh into
sections.