Introduction to formal design of software

The goal of this class is to study how model-checking can be used to design software. In the first part, we introduce a formalism (PlusCal) and a logic (Linear Temporal Logic, LTL) for model-checking. In the second part, we study how model-checking can used to design software.

Configuration of the TLA toolbox

A local installation of the TLA toolbox is available under ~herbrete/public/TLA/toolbox (add the path to your environment variable PATH). The TLA toolbox is launched by entering the command: toolbox. The PlusCal user's manual is available at ~herbrete/public/TLA/c-manual.pdf.

A first example

We model in PlusCal a program that consists of a shared variable x, initialized to 0, and three processes that runs the following non atomic instructions:

processus Inc:    while true do   if (x < 200) then x:=x+1 end    end
processus Dec:    while true do   if (x > 0)   then x:=x-1 end    end
processus Reset:  while true do   if (x = 200) then x:=0 end      end
  
  1. Define the transition system that gives the semantics of this program.
  2. Formalize in LTL the requirement "the value of x is always between 0 and 200".
  3. Model-check the formula on the PlusCal model. From the counter-example, explain why the program does not satisfy the requirement.

Formal design of a distributed mutual exclusion protocol

We consider the problem of mutual exclusion in distributed systems which cannot provide an atomic test-and-set instruction. We use model-checking to design an algorithm that solves this problem. We start from a design that we prove incorrect. Then, we use the counter-example provided by the TLC model-checker to improve the design, until we reach an algorithm that matches the specification.

A first proposal

The solution below is a first attempt to design a distributed mutual exclusion algorithm. To make the problem simpler, we consider only 2 processes. We assume that the processes can share memory as this can be implemented in a distributed system.

EXTENDS Naturals, TLC

CONSTANTS N (* Number of processes MUST BE = 2 *)

(* PlusCal options (-sf) *)

(* --algorithm Mutex1 {
  variables flags = [ i \in 1..N |-> FALSE ];  (* flags[i]=TRUE when process i wants to enter CS *)

  process (P \in 1..N) {
  loop:    while (TRUE) {
  test:      await( \A q \in 1..N: flags[q]=FALSE );
  set:       flags[self] := TRUE;
  cs:        skip;              (* In CS *)
  release:   flags[self] := FALSE;
           }
  }

} *)

\* BEGIN TRANSLATION
\* END TRANSLATION

(* Add the requirements as TLA+ specifications *)
  1. What is the specification of a mutual exclusion algorithm?
  2. Verify the model using TLC. Which property is violated?
  3. We propose to fix the algorithm by swapping the statements that correspond to labels testcs and getcs. Justify this proposition with respect to the counter-example provided by the model-checker.

A second proposal

The second proposal implements the modification suggested above. In order to enter the critical section, a process first requires the access to the critical section, then it checks that no other process wants to access the critical section.

EXTENDS Naturals, TLC

CONSTANTS N (* Number of processes MUST BE = 2 *)

(* PlusCal options (-sf) *)

(* --algorithm Mutex2 {
  variables flags = [ i \in 1..N |-> FALSE ];  (* flags[i]=TRUE when process i wants to enter CS *)

  process (P \in 1..N) {
  loop:    while (TRUE) {
  set:       flags[self] := TRUE;
  test:      await( \A q \in (1..N \ {self}): flags[q]=FALSE );
  cs:        skip;              (* In CS *)
  release:   flags[self] := FALSE;
           }
  }

} *)

\* BEGIN TRANSLATION
\* END TRANSLATION

(* Add the requirements as TLA+ specifications *)
  1. Model-check the algorithm. Which property is violated?

Peterson's algorithm

We consider Peterson's mutual exclusion algorithm below (see Wikipedia page).

  Shared variables:
  - for each process: a boolean variable, flag, with the same meaning as for the
  second proposal above
  - a shared integer variable, turn, that ranges over the set of process
  identifiers (1..N)

  Initial values:
  - all flags have value FALSE
  - turn is set to any value in its domain

  Algorithm executed by process P in order to access the critical section:
  - set flag[P] to TRUE
  - set turn to some other process
  - wait until all flags are FALSE (except P's own flag) or turn is P

  ... critical section...

  Algorithm executed by P when exiting the critical section:
  - set flag[P] to FALSE
  1. Justify the solution proposed by Peterson with respect to the counter-example provided by TLC for the second proposal above.
  2. Check that the PlusCal model of Peterson's algorithm is correct for 2 processes using TLC.
  3. The algorithm is incorrect for 3 processes. Use TLC to get a counter-example. From the counter-example, tell which invariant of the algorithm is violated for more than 2 processes.

Lamport's bakery algorithm

Lamport's bakery algorithm (see original publication, see Wikipedia page) is a solution to the mutual exclusion problem for any number of processes. The algorithm can be described as follows:

  Shared variables:
  - for each process, a boolean variable, choosing, that tells if a process is
   currently choosing a number
  - for each process, an integer variable, number, that gives an access number

  Initial values:
  - all choosing flags are set to FALSE   (i.e. initially not choosing a number)
  - all numbers have value 0              (i.e. initially no access number)

  Algorithm executed by process P in order to access the critical section:
  - set choosing[P] to TRUE
  - set number[P] to 1 + the maximum number among all other processes
  - set choosing[P] to FALSE
  - for every other process Q:
    . wait until Q has obtained a number (i.e. choosing[Q] is FALSE)
    . then if Q has priority over P, then P waits until Q exits the CS. By
    priority we mean: P waits until (number[Q]=0) or
    (number[P],P) < (number[Q],Q)

  ... critical section ...

  Algorithm executed by process P when exiting the critical section:
  - set number[P] back to 0

In the algorithm above, (number[P],P) < (number[Q],Q) stands for: (number[P]<number[Q]) or (number[P]=number[Q] and P<Q).

  1. Justify the solution of Lamport with respect to the counter-example to Peterson's algorithm with 3 processes.
  2. Explain why this algorithm cannot be model-checked with TLC or implemented as is.
  3. Show that we can still build a PlusCal model that is equivalent to Lamport's bakery algorithm. Model-check the requirements on this PlusCal model.