






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Material Type: Notes; Class: Distributed Software Develop; Subject: Computer Science; University: University of San Francisco (CA); Term: Unknown 1989;
Typology: Study notes
1 / 10
This page cannot be seen from the preview
Don't miss anything!
Chris Brooks
Department of Computer Science
University of San Francisco
Department of Computer Science — University of San Francisco – p. 1/
??
Time is a big challenge in systems that don’t share aclock.
Insight: we often don’t need to know the exact timethat events occur.
Instead, we need to know the order in which theyhappened.
Department of Computer Science — University of San Francisco – p. 2/
??
Cause and effect can be used to produce a partialordering.
Local events are ordered by identifier.
Send and receive events are ordered.
If
p
1
sends a message
m
1
to
p
2
send
( m
1 )^
must
occur before
receive
( m
1 )
Assume that messages are uniquely identified.
If two events do not influence each other, evenindirectly, we won’t worry about their order.
Department of Computer Science — University of San Fra
The
happens before
relation is denoted
→
Happens before is defined:
If
e
ki^ , e
li^
and
k < l
, then
e ki^
→
e
li
(sequentially ordered events in the same process)
If
e
i^
=
send
(m
)^
and
e
j^
=
receive
(m
) , then
e
i^
→
e
j
(send must come before receive)
If
e
→
e
′^
and
e
′^
→
e
′′
, then
e
→
e
′′
(transitivity)
If
e
6 →
e
′^
and
e
′^ 6 →
e
, then we say that
e and
e
′^
are
concurrent. (
e ||
e
These events are unrelated, and could occur in eitherorder.
Department of Computer Science — University of San Francisco – p. 4/
??
Happens before provides a partial ordering over theglobal history.
(H,
→
)
We call this a distributed computation.
A distributed computation can be represented with aspace-time diagram.
Department of Computer Science — University of San Francisco – p. 5/
??
p^1 p^2 p^3
e 11
(^2) e 1
e 31
e 41
e 12
(^2) e 2
e 32
e 42
e 13
e 23
e 33
(^4) e 3
Department of Computer Science — University of San Fra
Arrows indicate messages sent between processes.
Causal relation between events is easy to detect
Is there a directed path between events?
e 11
→
e
43
e 21 ||
e 13
Department of Computer Science — University of San Francisco – p. 7/
??
Recall that we want to know what the global state ofthe system is at some point in time.
Active monitoring won’t work
-^
Updates from different processes may arrive out oforder.
We need to restrict our monitor to looking at consistent cuts
A cut is consistent if, for all events
e
and
e
′
-^
e ∈
C
and
e
′^
→
e
⇒
e
′^
∈
C
In other words, we retain causal ordering andpreserve the ’happens before’ relation.
Department of Computer Science — University of San Francisco – p. 8/
??
How could we solve this problem with synchronouscommunication and a global clock?
Assume FIFO delivery, delays are bounded by
δ
send
( i)
→
send
( j ) ⇒
deliver
( i)
→
deliver
(j
)
Receiver must buffer out-of-order messages.
Each event
e
is stamped with the global clock:
RC
( e
When a process notifies
p
0
of event
e , it includes
RC
(e
)^
as a timestamp.
At time
t
p
0
can process all messages with
timestamps up to
t^
−
δ
in increasing order.
No earlier message can arrive after this point.
Department of Computer Science — University of San Fra
If we assume a delay of
δ
, at time
t
, all messages
sent before
t
−
δ
have arrived.
By processing them in increasing order, causality ispreserved.
e →
e
′^
⇒
RC
(e
)^
< RC
( e ′)
But we don’t
have
a global clock!!
Department of Computer Science — University of San Francisco – p. 10/
??
Each process maintains a logical clock. (
LC
Maps events to natural numbers. (0,1,2,3,...).
In the initial state, all LCs are 0.
Each message
m
contains a timestamp indicating the
logical clock of the sending process.
After each event, the logical clock for a process isupdated as follows:
-^
LC
( e ) =
LC
if
e
is a local or send event.
-^
LC
( e ) =
max
(LC, T S
( m
)) + 1
if
e
=
receive
( m
The LC is updated to be greater than both theprevious clock and the timestamp.
Department of Computer Science — University of San Francisco – p. 11/
??
p^1 p^2 p^3
1 1 1
2
2
3
4 4
5
5
6 5
7
6
7 Department of Computer Science — University of San Fran
Causal delivery gives us almost all of the functionalitythat we need from a global clock.
We can build on top of this to solve more complexcoordination problems.
Coordination often requires not only that allprocesses agree on state, but that all processes canensure that every other process sees the same state.
Department of Computer Science — University of San Francisco – p. 19/
??
A fundamental problem in distributed systems isgetting a set of processes or nodes to agree on oneor more values.
-^
Is a procedure continuing or aborted?
-^
What value is stored in a distributed database?
-^
Which process is serving as coordinator?
-^
Has a node failed?
There are a set of related problems that require a setof processes to coordinate their states or actions.
Department of Computer Science — University of San Francisco – p. 20/
??
An example:
Two people (A and B) want to meet at dusktomorrow evening at a local hangout.
Each wants to show up only if the other one will bethere.
They can send email to each other, but email maynot arrive.
Can either one guarantee that the other will bethere?
Department of Computer Science — University of San Fran
We’ll want to distinguish what sorts of failures thesealgorithms can tolerate.
No failure
Some of the algorithms we’ll see can’t tolerate afailure.
Crash failure
This means that a node stops working and fails torespond to all messages.
Byzantine failure
A node can exhibit arbitrary behavior.
This makes things pretty hard for us ...
Department of Computer Science — University of San Francisco – p. 22/
??
How can we detect whether a failure has happened?
A simple method:
-^
Every
t
seconds, each process sends an “I am
alive” message to all other processes.
-^
Process
p
knows that process
q
is either
unsuspected
suspected
, or
failed
If
p sees
q ’s message, it knows
q
is alive, and sets its
status to unsuspected.
What if it doesn’t receive a message?
Department of Computer Science — University of San Francisco – p. 23/
??
Depends on our communication model.
Synchronous communication: if after
d seconds
(where
d is the maximum delay in message delivery)
we haven’t received a message from
p
p
has failed.
Ansychronous or unreilable communication: if themessage is not received, we can say that
p
is
suspected of failure.
Department of Computer Science — University of San Fran
Other problems:
What if
d
is fairly large?
We can think processes are still running that havein fact crashed.
This is what’s called an
unreliable
failure detector.
It will make mistakes, but, given enough information,it may still be of use.
Can provide hints and partial information.
As we look at different algorithms, we’ll need to thinkabout whether we can detect that a process hasfailed.
Department of Computer Science — University of San Francisco – p. 25/
??
The Couloris chapter talks quite a bit about how toachive different properties with multicastcommunication.
-^
Reliable multicast
-^
Ordered multicast^ •
FIFO ordering
-^
Total ordering
-^
Causal ordering
The punchline: Totally ordered multicast is equivalentto the consensus problem.
Department of Computer Science — University of San Francisco – p. 26/
??
Consider that a process needs to send a message toa
group
of other processes.
It could:
Send a point-to-point message to every otherprocess. •
Inefficient, plus need to know all other processesin group.
Broadcast to all processes in subnet. •
Wasteful, won’t work in wide-area network.
Multicast allows the process to do a single send.Packet is delivered to all members of the group.
Department of Computer Science — University of San Fran
Notice that multicast is a packet-orientedcommunication.
Same send/receive semantics as UDP
A process joins a multicast group (designated by anIP address)
It then receives all messages sent to that IP address.
Groups can be closed or open.
Multicast can be effectively used to do sharedwhiteboards, video or audio conferencing, or tobroadcast speeches or presentations.
Middleware needed to provide ordering.
Department of Computer Science — University of San Francisco – p. 28/
??
Mutual exclusion is a familiar problem from operatingsystems.
-^
There is some resource that is shared by severalprocesses.
-^
Only one process can use the resource at a time.
-^
Shared file, database, communications medium
Processes request to enter their
critical section
, then
enter, then exit.
In a centralized system, this can be negotiated withshared objects. (locks or mutexes).
Distributed systems rely only on message passing!
Department of Computer Science — University of San Francisco – p. 29/
??
Our goals for mutual exclusion:
safety: Only one process uses the resource at atime.
liveness: everyone eventually gets a turn. •
This implies no deadlock or starvation.
ordering: if process
i
’s request to enter its CS
happens-before (in the causal sense) process
j
’s,
then process
i
should enter first.
Department of Computer Science — University of San Fran
Example: consider
p
1
p
2
p 3
p 3
doesn’t need CS.
T
(
p 1 )
T
( p
2 )
p 1
and
p
2
request CS.
p 3
replies immediately to both.
When
p
2
gets
p
1 ’s request, it queues it.
p 1
replies to
p 2
immediately.
Once
p
2
exits, it replies to
p
1
Department of Computer Science — University of San Francisco – p. 37/
??
Provides liveness, safety
Also provides ordering
-^
That’s the reason for logical clocks.
Still can’t deal with failure.
Also scaling problems.
Optimization: can enter the CS when a majority ofreplies are received.
Department of Computer Science — University of San Francisco – p. 38/
??
If a failure occurs, it must first be detected.
As we’ve seen, this can be difficult.
Once failure is detected, a new group can be formedand the protocol restarted.
Group formation involes a two-phase protocol.
Coordinator broadcasts group change to allmembers.
Once all reply, a commit is broadcast to allmembers.
Once all members reply to the commit, a newgroup is formed.
Department of Computer Science — University of San Fran
How can we decide which process should play therole of server or coordinator?
We need for all processes to agree.
We can do this by means of an election.
Any process can start an election
for example, if it notices that the coordinator fails.
We would still like safety (only one process is chosen)and liveness (the election process is guaranteed tofind a winner).
Even when more than one election is startedsimultaneously.
Department of Computer Science — University of San Francisco – p. 40/
??
Assume each process has an identifying value.
Largest value will be the new leader.
-^
We could use load, or uptime, or a randomnumber.
Department of Computer Science — University of San Francisco – p. 41/
??
Assume processes are arranged in a logical ring.
A process starts an election by placing its identifierand value in a message and sending it to its neighbor.
Department of Computer Science — University of San Fran
When a message is received:
If the value is greater than its own, it saves theidentifier and forwards the value to its neighbor.
Else if the receiver’s value is greater and thereceiver has not participated in an election already,it replaces the identifer and value with its own andforwards the message.
Else if the receiver has already participated in anelection, it discards the message.
If a process receives its own identifer and value itknows it is elected. It then sends an electedmessage to its neighbor.
When an elected message is received, it isforwarded to the next neighbor.
Department of Computer Science — University of San Francisco – p. 43/
??
Safety is guaranteed - only one value can be largestand make it all the way through the ring.
Liveness is guaranteed if there are no failures.
Inability to handle failure once again ...
Department of Computer Science — University of San Francisco – p. 44/
??
The
bully
algorithm can deal with crash failures.
Assumption: synchronous, reliable communication
When a process notices that the coordinator hasfailed, it sends an election message to allhigher-numbered processes.
If no one replies, it declares itself coordinator andsends a new-coordinator message to all processes.
If someone replies, its job is done.
When process
q
receives an election message from a
lower-numbered process:
Return a reply.
Start an election.
Department of Computer Science — University of San Fran
Guarantees safety and liveness.
Can deal with crash failures
Assumes that there is bounded message delay
Otherwise, how can we distinguish between acrash and a long delay?
Department of Computer Science — University of San Francisco – p. 46/
??
All of these algorithms are examples of theconsensus problem.
-^
All processes must agree on a state
Let’s take a step back and think about when theconsensus problem can be solved.
Department of Computer Science — University of San Francisco – p. 47/
??
We’ll start with a set of processes
p
1
p 2
p
n
All processes can propose a value, and everyonemust agree at the end.
We’ll assume that communication is reliable.
Processes can fail.
Both Byzantine and crash failures.
We’ll also specify whether processes can digitallysign messages.
This limits the damage Byzantine failures can do.
We’ll specify whether communication is synchronousor asynchronous.
Department of Computer Science — University of San Fran
We can survive 1/3 Byzantine failures in asynchronous system with reliable delivery.
In an asynchronous system, we can’t guaranteeconsensus after a single crash failure.
Without reliable communication, consensus isimpossible to guarantee.
In general, we can trade off process reliability fornetwork reliability.
Department of Computer Science — University of San Francisco – p. 55/
??
Consensus can take a nuber of forms:
-^
Mutual exclusion
-^
Leader election
-^
Consensus
Many special-purpose algorithms exist.
General results about what is possible can help indesigning a system or deciding how (or whether) totackle a problem.
Department of Computer Science — University of San Francisco – p. 56/
??