Formal design of a distributed mutual exclusion protocol

We consider the problem of mutual exclusion in distributed systems which cannot provide an atomic test-and-set instruction. We use model-checking to design an algorithm that solves this problem. We start from a design that we prove incorrect. Then, we use the counter-example provided by the TLC model-checker to improve the design, until we reach an algorithm that matches the specification.

A first proposal

The solution below is a first attempt to design a distributed mutual exclusion algorithm. To make the problem simpler, we consider only 2 processes. We assume that the processes can share memory as this can be implemented in a distributed system.

EXTENDS Naturals, TLC

CONSTANTS N (* Number of processes MUST BE = 2 *)

(* PlusCal options (-sf) *)

(* --algorithm Mutex1 {
  variables flags = [ i \in 1..N |-> FALSE ];  (* flags[i]=TRUE when process i wants to enter CS *)

  process (P \in 1..N) {
  loop:    while (TRUE) {
  test:      await( \A q \in 1..N: flags[q]=FALSE );
  set:       flags[self] := TRUE;
  cs:        skip;              (* In CS *)
  release:   flags[self] := FALSE;
           }
  }

} *)

\* BEGIN TRANSLATION
\* END TRANSLATION

(* Atomic propositions *)
CS(p) == (pc[p] = "cs")
REQ(p) == (pc[p] = "test")

(* In every state, there is at most one process in the critical section *)
Mutex == [](\A p \in 1..N: \A q \in (1..N \{p}): ~(CS(p) /\ CS(q)))

(* Every process that tries to enter the critical section is eventually granted
the access to the critical section *) 
NoStarvation == \A p \in 1..N: [](REQ(p) => <>CS(p))
  1. Explain why checking properties Mutex and NoStarvation is not sufficient to guarantee correctness of our algorithm, and checking absence of deadlock if required as well?
  2. Model-check the algorithm w.r.t. to the properties Mutex, NoStarvation above, as well as absence of deadlock. Which property is violated?
  3. We propose to fix the algorithm by swapping the statements that correspond to labels test and set. Justify this proposition with respect to the counter-example provided by the model-checker.

A second proposal

The second proposal implements the modification suggested above. In order to enter the critical section, a process first advertises that it requires access to the critical section, then it checks that no other process wants to access the critical section.

EXTENDS Naturals, TLC

CONSTANTS N (* Number of processes MUST BE = 2 *)

(* PlusCal options (-sf) *)

(* --algorithm Mutex2 {
  variables flags = [ i \in 1..N |-> FALSE ];  (* flags[i]=TRUE when process i wants to enter CS *)

  process (P \in 1..N) {
  loop:    while (TRUE) {
  set:       flags[self] := TRUE;
  test:      await( \A q \in (1..N \ {self}): flags[q]=FALSE );
  cs:        skip;              (* In CS *)
  release:   flags[self] := FALSE;
           }
  }

} *)

\* BEGIN TRANSLATION
\* END TRANSLATION

(* Atomic propositions *)
CS(p) == (pc[p] = "cs")
REQ(p) == (pc[p] = "test")

(* In every state, there is at most one process in the critical section *)
Mutex == [](\A p \in 1..N: \A q \in (1..N \{p}): ~(CS(p) /\ CS(q)))

(* Every process that tries to enter the critical section is eventually granted
the access to the critical section *) 
NoStarvation == \A p \in 1..N: [](REQ(p) => <>CS(p))
  1. Model-check the algorithm w.r.t. properties Mutex and NoStarvation, as well as for absence of deadlock. Which property is violated?

Peterson’s algorithm

We consider Peterson’s mutual exclusion algorithm below (see http://en.wikipedia.org/wiki/Peterson%27s_algorithm for extra information).

Shared variables:
  - for each process: a boolean variable, flag, with the same meaning as for the
  second proposal above
  - a shared integer variable, turn, that ranges over the set of process
  identifiers (1..N)

Initial values:
  - all flags have value FALSE
  - turn is set to any value in its domain

Algorithm executed by process P in order to access the critical section:
  - set flag[P] to TRUE
  - set turn to some other process
  - wait until all flags are FALSE (except P's own flag) or turn is P

  ... critical section...

Algorithm executed by P when exiting the critical section:
  - set flag[P] to FALSE
  1. Justify the solution proposed by Peterson with respect to the counter-example provided by TLC for the second proposal above.
  2. Write a PlusCal model of Peterson’s algorithm including the requirements Mutex and NoStarvation
  3. Check that the algorithm is correct for N=2 processes.
  4. The algorithm is incorrect for 3 processes. Use the model-checker to get a counter-example. From the counter-example, tell which invariant of the algorithm is violated for more than 2 processes.

Lamport’s bakery algorithm

Lamport’s bakery algorithm (see [http://research.microsoft.com/en-us/um/people/lamport/pubs/bakery.pdf] (original publication) and http://en.wikipedia.org/wiki/Lamport%27s_bakery_algorithm for extra information) is a solution to the mutual exclusion problem for any number of processes. The algorithm can be described as follows:

Shared variables:
  - for each process, a boolean variable, choosing, that tells if a process is
   currently choosing a number
  - for each process, an integer variable, number, that gives an access number

Initial values:
  - all choosing flags are set to FALSE   (i.e. initially not choosing a number)
  - all numbers have value 0              (i.e. initially no access number)

Algorithm executed by process P in order to access the critical section:
  - set choosing[P] to TRUE
  - set number[P] to 1 + the maximum number among all other processes
  - set choosing[P] to FALSE
  - for every other process Q:
    . wait until Q has obtained a number (i.e. choosing[Q] is FALSE)
    . then if Q has priority over P, then P waits until Q exits the CS. By
    priority we mean: P waits until (number[Q]=0) or
    (number[P],P) < (number[Q],Q)

  ... critical section ...

Algorithm executed by process P when exiting the critical section:
  - set number[P] back to 0

In the algorithm above, (number[P],P) < (number[Q],Q) stands for: (number[P] < number[Q]) or (number[P]=number[Q] and P < Q).

  1. Justify the solution of Lamport with respect to the counter-example to the correctness of Peterson’s algorithm with 3 processes.
  2. Explain why this algorithm cannot be model-checked with TLC, nor implemented, as is.