Formal design of a distributed mutual exclusion algorithm

We consider the problem of mutual exclusion in distributed systems which cannot provide an atomic test-and-set instruction. We use model-checking to design an algorithm that solves this problem. We start from a design that we prove incorrect. Then, we use the counter-example provided by the TLC model-checker to improve the design, until we reach an algorithm that matches the specification.

A first proposal

The solution below is a first attempt to design a distributed mutual exclusion algorithm. To make the problem simpler, we consider only 2 processes. We assume that the processes can share memory as this can be implemented in a distributed system.

EXTENDS Naturals, TLC

CONSTANTS N (* Number of processes MUST BE = 2 *)

(* PlusCal options (-sf) *)

(* --algorithm Mutex1 {
  variables flags = [ i \in 1..N |-> FALSE ];  (* flags[i]=TRUE when process i wants to enter CS *)

  process (P \in 1..N) {
  loop:    while (TRUE) {
  test:      await( \A q \in 1..N: flags[q]=FALSE );
  set:       flags[self] := TRUE;
  cs:        skip;              (* In CS *)
  release:   flags[self] := FALSE;
           }
  }

} *)

\* BEGIN TRANSLATION
\* END TRANSLATION

(* Add the requirements as TLA+ specifications *)
  1. Add the requirements (see previous lecture) and model-check the algorithm. Which property is violated?
  2. We propose to fix the algorithm by swapping the statements that correspond to labels test and set. Justify this proposition with respect to the counter-example provided by the model-checker.

A second proposal

The second proposal implements the modification suggested above. In order to enter the critical section, a process first requires the access to the critical section, then it checks that no other process wants to access the critical section.

EXTENDS Naturals, TLC

CONSTANTS N (* Number of processes MUST BE = 2 *)

(* PlusCal options (-sf) *)

(* --algorithm Mutex2 {
  variables flags = [ i \in 1..N |-> FALSE ];  (* flags[i]=TRUE when process i wants to enter CS *)

  process (P \in 1..N) {
  loop:    while (TRUE) {
  set:       flags[self] := TRUE;
  test:      await( \A q \in (1..N \ {self}): flags[q]=FALSE );
  cs:        skip;              (* In CS *)
  release:   flags[self] := FALSE;
           }
  }

} *)

\* BEGIN TRANSLATION
\* END TRANSLATION

(* Add the requirements as TLA+ specifications *)
  1. Add the requirements and model-check the algorithm. Which property is violated?

Peterson's algorithm

We consider Peterson's mutual exclusion algorithm below (see Wikipedia page).

  Shared variables:
  - for each process: a boolean variable, flag, with the same meaning as for the
  second proposal above
  - a shared integer variable, turn, that ranges over the set of process
  identifiers (1..N)

  Initial values:
  - all flags have value FALSE
  - turn is set to any value in its domain

  Algorithm executed by process P in order to access the critical section:
  - set flag[P] to TRUE
  - set turn to some other process
  - wait until all flags are FALSE (except P's own flag) or turn is P

  ... critical section...

  Algorithm executed by P when exiting the critical section:
  - set flag[P] to FALSE
  1. Justify the solution proposed by Peterson with respect to the counter-example provided by TLC for the second proposal above.
  2. Write a PlusCal model of Peterson's algorithm including the requirements.
  3. Check that the algorithm is correct for 2 processes.
  4. The algorithm is incorrect for 3 processes. Use TLC to get a counter-example. From the counter-example, tell which invariant of the algorithm is violated for more than 2 processes.

Lamport's bakery algorithm

Lamport's bakery algorithm (see original publication, see Wikipedia page) is a solution to the mutual exclusion problem for any number of processes. The algorithm can be described as follows:

  Shared variables:
  - for each process, a boolean variable, choosing, that tells if a process is
   currently choosing a number
  - for each process, an integer variable, number, that gives an access number

  Initial values:
  - all choosing flags are set to FALSE   (i.e. initially not choosing a number)
  - all numbers have value 0              (i.e. initially no access number)

  Algorithm executed by process P in order to access the critical section:
  - set choosing[P] to TRUE
  - set number[P] to 1 + the maximum number among all other processes
  - set choosing[P] to FALSE
  - for every other process Q:
    . wait until Q has obtained a number (i.e. choosing[Q] is FALSE)
    . then if Q has priority over P, then P waits until Q exits the CS. By
    priority we mean: P waits until (number[Q]=0) or
    (number[P],P) < (number[Q],Q)

  ... critical section ...

  Algorithm executed by process P when exiting the critical section:
  - set number[P] back to 0

In the algorithm above, (number[P],P) < (number[Q],Q) stands for: (number[P]<number[Q]) or (number[P]=number[Q] and P<Q).

  1. Justify the solution of Lamport with respect to the counter-example to the correctness of Peterson's algorithm with 3 processes.
  2. Explain why this algorithm cannot be model-checked with TLC as is.

Taubenfeld's black-white bakery algorithm (optionnel)

Several solutions have been proposed to this problem, in particular a solution by Szymanski in 1988 (see Wikipedia page) and more recently, an other one by Taubenfeld in 2004 (see original publication). We study the solution by Taubenfeld that is closer to Lamport's algorithm. The idea is to use two sets of access numbers: one "black" set and one "white" set. Initially, a color is chosen, say black. All the processes that want to access the critical section obtain a black number by using the same mechanism as in Lamport's algorithm. The first process that enters the critical section switches the color to white. This has two effects. The first effect is that from now on, the new numbers obtained by processes willing to access the critical section are white. The second effect is that all the processes with a black number will enter and exit the critical section before any process with a white number enters the critical section. Then, the first process with a white number that enters the critical section switches the color to black, and the same process goes on.

The main properties of the algorithm is that if the current color is c, then all the processes that get a new number will have color c, and all the processes which have the other color will access the critical section before any process with color c. Observe that when the first process with a given color has accessed the critical section, then the new numbers start back to 1 as the color changes. Taubenfeld has proven that the numbers are bounded by the number of processes N. Here follows a more precise description of Taubenfeld's algorithm:

  Shared variables:
  - a global variable, color, of type {black, white} that gives the current
  color
  - for each process: a variable, choosing, that tells if a process is choosing
  a number
  - for each process: an integer variable, number, that contains the
  access number
  - for each process: a variable, mycolor, of type {black, white} storing the
  color of the process

  Initial values:
  - all choosing flags are set to FALSE   (i.e. initially not choosing a number)
  - all numbers have value 0              (i.e. initially no access number)
  - the initial values of the other variables are immaterial

  Algorithm executed by process P in order to access the critical section:
  - set choosing[P] to TRUE
  - set mycolor[P] to the global color
  - set number[P] to 1 + the maximum number among all other processes with the
  same color
  - set choosing[P] to FALSE
  - for every other process Q:
    . wait until Q has obtained a number (i.e. choosing[Q] is FALSE)
    . then,
      * if P and Q have the same color, the access condition is similar to
      Lamport's algorithm. More precisely, P waits until: number[Q]=0, or
      (number[P],P) < (number[Q],Q), or mycolor[P] != mycolor[Q]
      * if P and Q have different colors, then P waits until: number[Q]=0, or
      mycolor[P] != color, or mycolor[P] = mycolor[Q]

  ... critical section ...

  Algorithm executed by process P when exiting the critical section:
  - set the global color to the opposite of mycolor[P]
  - set number[P] back to 0

In the algorithm above, testing if P and Q have the same color (or not) and waiting for the corresponding access condition are two distinct atomic steps. This is why, waiting is stopped when the color of Q changes. The test (number[P],P) < (number[Q],Q) is defined as for Lamport's bakery algorithm.

  1. Write a PlusCal model of Taubenfeld's black-white bakery algorithm including the requirements. Use the fact that all numbers are bounded by N.
  2. Check that the algorithm is correct for 2 processes, and then for 3 processes.
  3. Add an extra requirement to check that all the numbers are bounded by N. Use the TLC model-checker to check that the algorithm satisfies this requirement.