Election Algorithms

Download Report

Transcript Election Algorithms

Election Algorithms
Topics
r Issues
r Detecting Failures
r Bully algorithm
r Ring algorithm
Readings
r Van Steen and Tanenbaum: 5.4
r Coulouris: 11.3
Election Algorithms
r Remember using Lamport clocks for total order
r Can you think of another way to do this?
r It turns out that you can use a sequencer.
m All operations go to a sequencer
m The sequencer assigns numbers to each message before
the message goes to each replica
m What if the sequencer goes down?
Election Algorithms
r Many distributed algorithms require a
process to act as a coordinator.
r The coordinator can be any process that
organizes actions of other processes.
r A coordinator may fail
r How is a new coordinator chosen or
elected?
Election Algorithms
Assumptions
Each process has a unique number to distinguish
them.
One process per machine (which suggests that
an IP address can be the unique identifier)
Processes know each other’s process number
Processes do not know which ones are currently
up and which ones are down.
General Approach
Locate the process with the process with the
highest process number and designate it as the
coordinator.
Election algorithms differ in how they do this.
Issues in Dealing with
Coordinator Failure
Detecting Failure
• Any node might detect failure first
• Multiple processes might detect failure at once.
Election
• Must run without coordination
• Must deal with arbitrary process failures
• All nodes must agree on when election is over and who
the new coordinator is.
Detecting Failures
Timeouts are used to detect failures
T = 2Ttrans + Tprocess
• Where Ttran is maximum transmission delay and Tprocess
represents the maximum delay for processing a
message.
If a process fails to respond to a message
request within T seconds then an election
is initiated.
Bully Algorithm
When a process, P, notices that the
coordinator is no longer responding to
requests, it initiates an election.
P sends an ELECTION message to all processes
with higher numbers.
If no one responds, P wins the election and
becomes a coordinator.
If one of the higher-ups answers, it takes over.
P’s job is done.
Bully Algorithm
When a process gets an ELECTION
message from one of its lower-numbered
colleagues:
Receiver sends an OK message back to the
sender to indicate that he is alive and will take
over.
Receiver holds an election, unless it is already
holding one.
Eventually, all processes give up but one, and
that one is the new coordinator.
The new coordinator announces its victory by
sending all processes a message telling them
that starting immediately it is the new
coordinator.
Bully Algorithm
If a process that was previously down
comes back:
It holds an election.
If it happens to be the highest process
currently running, it will win the election and
take over the coordinator’s job.
“Biggest guy” always wins and hence the
name “bully” algorithm.
The Bully Algorithm (Example)
The bully election algorithm
Process 4 holds an election
Process 5 and 6 respond, telling 4 to stop
Now 5 and 6 each hold an election
The Bully Algorithm (Example)
d)
e)
Process 6 tells 5 to stop
Process 6 wins and tells everyone
Bully Algorithm
Analysis
Best case
The node with second highest identifier
detects failure
Total messages = N-2
• One message for each of the other processes
indicating the process with the second highest
identifier is the new coordinator.
Worst case
The node with lowest identifier detects failure.
This causes N-1 processes to initiate the
election algorithm each sending messages to
processes with higher identifiers.
Total messages = O(N2)
Bully Algorithm Discussion
How many processes are used to detect a
coordinator failure?
As many as you want. You could have all other
processes check out the coordinator.
It is impossible for two processes to be
elected at the same time.
Ring Algorithm
Use a ring (processes are physically or logically
ordered, so that each process knows who its
successor is).
Algorithm
When a process notices that coordinator is not functioning:
• Builds an ELECTION message (containing its own process
number)
• Sends the message to its successor (if successor is down,
sender skips over it and goes to the next member along the ring,
or the one after that, until a running process is located).
• At each step, sender adds its own process number to the list in
the message.
Ring Algorithm
Algorithm (continued)
When the message gets back to the process that
started it all:
•
• Process recognizes the message that contains its own
process number
• Changes message type to COORDINATOR
• Circulates message once again to inform everyone else:
Who the new coordinator is (list member with highest
number); Who the members of the new ring are.
• When message has circulated once, it is removed.
Even if two ELECTIONS started at once, everyone
will pick same leader since node with highest
identifier is picked.
Ring Algorithm
Initiation:
1. Process 4 sends an
ELECTION message to
its successor (or next
alive process) with its ID
Ring Algorithm
Initiation:
2. Each process adds its own
ID and forwards the
ELECTION message
Ring Algorithm contd…
Leader Election:
3. Message comes back to initiator, here
the initiator is 4.
4. Initiator announces the winner by
sending another message around the ring
Ring Algorithm Analysis
• At best 2(N-1 ) messages are passed
• One round for the ELECTION message
• One round for the COORDINATOR
• Assumes that only a single process
•
starts an election.
Multiple elections cause an increase in
messages but no real harm done.
Summary
Synchronization between processes often
requires that one process acts as a
coordinator.
The coordinator is not fixed.
Election algorithms determine the
coordinator.