Debugging a Java multi-threaded program

We consider the Java program described in This Java program consists in several threads that communicate using a single buffer. Some of the threads are producers that write to the buffer. The other threads are consumers that read data from the buffer. The buffer is implemented by synchronized methods to ensure mutual exclusion.

As explained in the document above, this program has a bug: the system may deadlock. This was detected on a production version of the application. The author has been able to reproduce the bug through testing. However, it required to let the system run for a very long time before the bug occured. This is unsatisfactory as this mostly relies on luck. Moreover, the resulting trace is so long that debugging from the trace is almost unachievable.

A first approach

The author then models the system in TLA+ and he uses the TLC model-checker to detect the bug and get a short trace leading to a deadlock. Our goal is to reproduce his methodology in a slightly different settings:

  1. Read the document at the link above. You need to fully understand the Java code, the issue that is identified by the author, and his solution

Improving the model

Our goal is to provide a PlusCal model that improves the TLA+ model of the author. Your contribution will consists in three files:

Notice that the your justifications are as valuable as your model: a model without convincing justifications will not be considered. The file will be written in the markdown file format.

  1. Which data structures need to be modeled? How do you propose to model them?
  2. Implement your data structures using PlusCal macros and definitions
  3. How do you suggest to model the behavior of the Java synchronized keyword?
  4. Model the producer and the consumer processes
  5. Specify the correct runs of the algorithm as an LTL property
  6. Check that your model violate the property in the settings given in the paper (3 producers, 2 consumers and a buffer of capacity 2)
  7. Explain why replacing the notify() calls by notifyAll() calls in the Java code fixes the bug. Why is this only a short term fix and not an actual solution?
  8. Propose a true solution, and model it in the file java_prodcons_solved.tla. Explain how your solution can be implemented in Java (we don’t ask for the Java code, but only for the principle of an implementation).
  9. Check that your solution is correct w.r.t. the specification in the previous settings. Also, check that it holds for 12 producers, 10 consumers and a buffer of capacity 10