Verification of PlusCal algorithms

Formal semantics of sequential algorithms

During the run of an algorithm, the control flows between the statements of the algorithm. The control points in PlusCal algorithms are described by labeling the statements. The control flow is represented by a variable pc (program counter) that "points" to the current label, and that is switched to the next label when a statement is executed. Euclid's algorithm may for instance be labeled as:

--algorithm EuclidAlg {
  variables u \in 1..MAXINT;  (* 1st integer *)
            v \in 1..MAXINT;  (* 2nd integer *)
 
{
l0:    print <<u, v>>;
    
l1:    while (u /= 0) {
l2:      if (u < v) {
           u := v || v := u
         };
l3:      u := u - v
       }

(* Implicit label "Done:" here *)
} }

At the beginning of the run, pc is equal to l0. Then, by executing the print statement, pc changes to l1. If the test of the while statement is true, pc moves to l2, otherwise it is set to the special label Done that is implicitly labels the terminal control point in the algorithm. From label l2, the parallel assignment is performed when the test u<v is true. In any case, the program counter moves to l3. Finally, from l3, the program counter moves back to l1 after executing the assignment.

In this framework, an algorithm is thus equipped with a set of variables including the global/local variables (u and v in the example above) and the program counter pc. The semantics of the algorithm describes how these variables evolve for each statement of the algorithm. Since the statements may be non-deterministic, the semantics is described as a relation between the values of the variables before executing the statement, and after executing the statement. This relation can be symbolically described as a formula. The goal of the translation achieved by the TLA toolbox is to compute this formula for the given PlusCal algorithm. The translation process is described in section 3.8 of the PlusCal user manual (available at ~herbrete/public/TLA/c-manual.pdf).

The translation of Euclid's algorithm above is shown below:

\* BEGIN TRANSLATION
VARIABLES u, v, pc

vars == << u, v, pc >>

Init == (* Global variables *)
        /\ u \in 1..MAXINT
        /\ v \in 1..MAXINT
        /\ pc = "l0"

l0 == /\ pc = "l0"
      /\ PrintT(<<u, v>>)
      /\ pc' = "l1"
      /\ UNCHANGED << u, v >>

l1 == /\ pc = "l1"
      /\ IF u /= 0
            THEN /\ pc' = "l2"
            ELSE /\ pc' = "Done"
      /\ UNCHANGED << u, v >>

l2 == /\ pc = "l2"
      /\ IF u < v
            THEN /\ /\ u' = v
                    /\ v' = u
            ELSE /\ TRUE
                 /\ UNCHANGED << u, v >>
      /\ pc' = "l3"

l3 == /\ pc = "l3"
      /\ u' = u - v
      /\ pc' = "l1"
      /\ v' = v

Next == l0 \/ l1 \/ l2 \/ l3
           \/ (* Disjunct to prevent deadlock on termination *)
              (pc = "Done" /\ UNCHANGED vars)

Spec == /\ Init /\ [][Next]_vars
        /\ WF_vars(Next)

Termination == <>(pc = "Done")

\* END TRANSLATION
  1. What is defined by the tuple vars?
  2. What is defined by the formula Init?
  3. In the formulas l0, l1, l2 and l3, some variables are primed, some are not. What is described by these 2 sets of variables?
  4. Check that the formulas l0, l1, l2 and l3 define the semantics of the corresponding statements in the algorithm. What does UNCHANGED <<u,v>> stand for? Why is it needed?
  5. What is the purpose of formulas Next and Spec?

An overview of program verification

Correctness and termination

Consider the PlusCal model of Euclid's algorithm above. The semantics of the algorithm has been described in terms of states and transitions between states. Hence, the set of the executions of a PlusCal algorithm can be seen as a graph G where vertices are states of the algorithms, and edges are transitions from states to states.

  1. The correctness of Euclid's algorithm is stated as an assertion. How can it be restated as a graph problem on G?
  2. How can the termination problem be restated as a problem on the graph G?
  3. Are these two problems decidable?

Verification algorithms

An invariant is a set of states that is stable for the transition relation. The empty set is a trivial invariant. One way to prove the correctness of an algorithm is to compute an invariant that satisfies the specification (assertions, etc), if any. For instance, an invariant for Euclid's algorithm is the set of states that guarantee that v=gcd(u_init,v_init) at the end of the algorithm. One can prove that it contains all the reachable states and it obviously satisfies the specification.

  1. Which well-known graph algorithms can be used to determine if an algorithm is correct, or, in other words, to compute an invariant that contains all the reachable states of the algorithm?
  2. Which well-known graph algorithms can be used to solve the termination problem?