Introduction to Pragmatic Formal Modeling

Introduction
The modeling and model testing language
I want to run the examples without looking at the learning material!
What does modeling get us? (A Simple Example)

Introduction

A formal model is a mathematical description of a system, generally to be implemented in hardware or software. They are useful for two reasons. Firstly, mathematics, unlike English, is precise and unambiguous. Even if this is where you stop, it forces you to understand the system you are describing. Secondly, mathematically based models can be checked. You can describe success criteria, and if they are violated you can see the exact series of steps that led to that. This is particularly useful for distributed systems, from multiple threads on a computer to thousands of computers in a cloud service.

When you think about formal modeling, it’s easy (and intimidating) to jump straight to the most complex use cases. Sure, those algorithm geniuses who design the fundamental algorithms of distributed systems might need it, but what about me? I care about the quality of my work. But I work in industry. Maybe I work in the cloud, coordinating microservices. Maybe I’m a game developer writing the next multiplayer networking library. Maybe I’m building a peer to peer file storage solution. Regardless, the question remains:

How can I model the system I’m building in a way that’s useful, practical, and has good ROI for myself and my company?

For every example on this site, we’re going to try to take a pragmatic approach that mirrors the engineering design lifecycle. We’ll start with UML diagrams and relatively precise descriptions, and then convert them into a formal specification language. Then we’ll see how we can check for design errors, and get concrete examples as to how they occur. Finally we will show how to use the detailed model errors to progressively refine your designs.

The modeling and model testing language

TLA+ is a specification language developed by Leslie Lamport, one of the computer science greats. It has a model checker: TLC, which works in a brute force manner to check every possible state of your modelled system.

TLA+ is used in industry. AWS is one of its most enthusiastic users, along with Microsoft and Intel.

While it looks very math-y, it is surprisingly accessible and practical. TLA+ was designed as a tool for engineers, not just algorithmists. But current examples tend to fall into one of two categories: toy problems, or complex algorithms. One of my goals with this site is to present examples that show how it can be used in an engineering process for “normal” engineers.

My advice is to skim the examples and see if they resonate. Could this help you in your day-to-day? Does it pique your interest? If so, go through the learning material and revisit this post.

However if you are mathematically inclined and obsessed with complex distributed system algorithms already, I’d advise you just jump straight to the TLA+ Video Series: Leslie Lamport will be able to sell you better than I can.

I want to run the examples without looking at the learning material!

Ok, here’s how: First download VSCode TLA+

For each example you want to run:

Click Download Code. The file will look like “modelfoo.tla”
Scroll down (or hit Next Section), where you will see a Download Configuration link. Click it. The file will look like “modelfoo.cfg”
Place both files in the same folder. Make sure they have the same name (other than the extension). If necessary rename “modelfoo_small.cfg” to “modelfoo.cfg”
Open the folder in VSCode and open the TLA file
Right click inside the editor and click Check Model with TLC
An output window will open and you will get a result. Or you will get an error, probably because you didn’t rename the .cfg file

For any example that relies on additional model files, simply download them and place them in the same folder before clicking Check Model with TLC.

What does modeling get us? (A Simple Example)

We will start with a toy problem, Tic-Tac-Toe, just to demonstrate some of the core benefits of modeling and model checking. This is the only toy problem, so feel free to skip ahead to the practical part. If you want to get a bit more of an intuitive understanding, keep reading.

The Model

I advise that you look at the code both in LaTex and code form. The LaTex is generated directly from the code and can let you appreciate the math a bit better.

---------------------------- MODULE tictactoe ----------------------------

EXTENDS Naturals

VARIABLES 
    board, \* board[1..3][1..3] A 3x3 tic-tac-toe board
    nextTurn \* who goes next

Pieces == {"X", "O", "_"} \* "_" represents a blank square

Init ==
        /\ nextTurn = "X" \* X always goes first
        \* Every space in the board states blank
        /\ board = [i \in 1..3 |-> [j \in 1..3 |-> "_"]]

Move(player) ==
    \E i \in 1..3: \E j \in 1..3: \* There exists a position on the board
        /\ board[i][j] = "_" \* Where the board is currently empty
        (********************************************************************)
        (* The future state of board is the same, except a piece is in that *)
        (* spot                                                             *)
        (********************************************************************)
        /\ board' = [board EXCEPT
                        ![i][j] = player] 

MoveX == 
    /\ nextTurn = "X" \* Only enabled on X's turn
    /\ Move("X")
    /\ nextTurn' = "O" \* The future state of next turn is O

MoveO ==
    /\ nextTurn = "O" \* Only enabled on O's turn 
    /\ Move("O")
    /\ nextTurn' = "X" \* The future state of next turn is X

\* Every state, X will move if X's turn, O will move on O's turn
Next == MoveX \/ MoveO

\* A description of every possible game of tic-tac-toe
\* will play until the board fills up, even if someone won
Spec == Init /\ [][Next]_<<board,nextTurn>>

(***************************************************************************)
(* Invariants: The things we are checking for.                             *)
(***************************************************************************)

WinningPositions == {
    \* Horizonal wins
    {<<1,1>>, <<1,2>>, <<1,3>>},
    {<<2,1>>, <<2,2>>, <<2,3>>},
    {<<3,1>>, <<3,2>>, <<3,3>>},
    \* Vertical wins
    {<<1,1>>, <<2,1>>, <<3,1>>},
    {<<1,2>>, <<2,2>>, <<3,2>>},
    {<<1,3>>, <<2,3>>, <<3,3>>},
    \* Diagonal wins
    {<<1,1>>, <<2,2>>, <<3,3>>},
    {<<3,1>>, <<2,2>>, <<1,3>>}
}

Won(player) == 
    \* A player has won if there exists a winning position
    \E winningPosition \in WinningPositions:
        \* Where all the needed spaces
        \A neededSpace \in winningPosition:
            \* are occupied by one player
            board[neededSpace[1]][neededSpace[2]] = player

XHasNotWon == ~Won("X")
OHasNotWon == ~Won("O")

BoardFilled ==
    \* There does not exist
    ~\E i \in 1..3, j \in 1..3:
        \* an empty space
        LET space == board[i][j] IN
        space = "_"

\* It's not a stalemate if one player has won or the board is not filled
NotStalemate ==
    \/ Won("X")
    \/ Won("O")
    \/ ~BoardFilled

=============================================================================

Essentially, this model describes every game of Tic-Tac-Toe that could be played, including games where someone has already won, but the board isn’t full. The model checker actually shows us how many possible board/move combinations exist:

Next Section

Download Configuration

State Name	Total States	Distinct States
Init	1	1
MoveX	9963	2907
MoveO	9144	3138

It ends when there are no more spaces left to fill. None of the Next actions are activated, so the system “deadlocks.” Sometimes this behavior is an error, but for us it’s a feature. You can configure how to treat it.

Checking the Model

All right, that’s kind of cool, but so what? Say that it would be really bad if O were to win. We can check a model against the OHasNotWon invariant to make sure it’s impossible for O to win. Let’s run that now.

Next Section

Download Configuration

Oh no, it is possible for O to win. Invariant OHasNotWon is violated.

1. Initial predicate
- board
  1. : ___
  2. : ___
  3. : ___
- nextTurn
  1. : X
2. MoveX
- board
  1. : X__
  2. : ___
  3. : ___
- nextTurn
  1. : O
3. MoveO
- board
  1. : XO_
  2. : ___
  3. : ___
- nextTurn
  1. : X
4. MoveX
- board
  1. : XOX
  2. : ___
  3. : ___
- nextTurn
  1. : O
5. MoveO
- board
  1. : XOX
  2. : _O_
  3. : ___
- nextTurn
  1. : X
6. MoveX
- board
  1. : XOX
  2. : XO_
  3. : ___
- nextTurn
  1. : O
7. MoveO
- board
  1. : XOX
  2. : XO_
  3. : _O_
- nextTurn
  1. : X

Now the interesting thing is not that O can win Tic-Tac-Toe. We probably knew that. But the model checker uses Breadth-First-Search, so not only is this a possible O victory, there are no faster victories. What’s also interesting is how clearly it is presented. The data above is EXACTLY what comes out of the model checker, prettied up with generic css and a bit of annotation. Look how much clearer it is than a standard code debugger. Because we’re not debugging code, we’re debugging logic.

Let’s see what happens when X wins.

Next Section

Download Configuration

X can also win. Invariant XHasNotWon is violated.

1. Initial predicate
- board
  1. : ___
  2. : ___
  3. : ___
- nextTurn
  1. : X
2. MoveX
- board
  1. : X__
  2. : ___
  3. : ___
- nextTurn
  1. : O
3. MoveO
- board
  1. : XO_
  2. : ___
  3. : ___
- nextTurn
  1. : X
4. MoveX
- board
  1. : XO_
  2. : X__
  3. : ___
- nextTurn
  1. : O
5. MoveO
- board
  1. : XOO
  2. : X__
  3. : ___
- nextTurn
  1. : X
6. MoveX X can win in 5 moves, because X goes first
- board
  1. : XOO
  2. : X__
  3. : X__
- nextTurn
  1. : O

Finally, let’s look at a stalemate. In those past wins, it looked like the other player wasn’t trying very hard. That’s because we saw the fastest possible wins. Let’s see a potential stalemate:

Next Section

Download Configuration

Stalemate is also a thing that can happen in Tic-Tac-Toe. Invariant NotStalemate is violated.

1. Initial predicate
- board
  1. : ___
  2. : ___
  3. : ___
- nextTurn
  1. : X
2. MoveX
- board
  1. : X__
  2. : ___
  3. : ___
- nextTurn
  1. : O
3. MoveO
- board
  1. : XO_
  2. : ___
  3. : ___
- nextTurn
  1. : X
4. MoveX
- board
  1. : XOX
  2. : ___
  3. : ___
- nextTurn
  1. : O
5. MoveO
- board
  1. : XOX
  2. : O__
  3. : ___
- nextTurn
  1. : X
6. MoveX
- board
  1. : XOX
  2. : OX_
  3. : ___
- nextTurn
  1. : O
7. MoveO
- board
  1. : XOX
  2. : OX_
  3. : O__
- nextTurn
  1. : X
8. MoveX
- board
  1. : XOX
  2. : OXX
  3. X is about to win : O__
- nextTurn
  1. : O
9. MoveO
- board
  1. : XOX
  2. : OXX
  3. O must block : O_O
- nextTurn
  1. : X
10. MoveX So that a stalemate occurs
- board
  1. : XOX
  2. : OXX
  3. : OXO
- nextTurn
  1. : O

So O blocked an X win here, but there is no intelligence yet. There is another world where O didn’t block X, and it was an X win. This is just one of the possible stalemates. So in our current system it will be possible for O to win. What can we do to fix that?

Playing not to lose

Let’s imagine it is really important that O never win. If they do, the casino you work for loses millions of dollars (why they introduced high-stakes Tic-Tac-Toe is above our pay grade).

O will still play every possible game available to it. But the casino is X, meaning we can change its strategy. How do we do that? We put stricter limits on what is considered an allowable move for X:

The previous version of MoveX let X be put into any unfilled space.
In this updated version, a programmer read the WikiHow on “How to Play Tic-Tac-Toe” and made a best attempt at a strategy.

We need to prove the O will not win if this strategy is used. We do this by encoding it to TLA+ and running the same check (to see the full implementation, click Download Code or Download PDF):

---------------------------- MODULE tictactoexstrat ----------------------------

MoveX ==
    /\ nextTurn = "X" \* Only enabled on X's turn
    /\ ~Won("O") \* And X has not won
    \* This specifies the spots X will move on X's turn
    /\ \/ /\ BoardEmpty
          /\ StartInCorner
       \/ /\ ~BoardEmpty \* If it's not the start
          /\ \/ /\ CanWin
                /\ Win
             \/ /\ ~CanWin
                /\  \/ /\ CanBlockWin
                       /\ BlockWin
                    \/ /\ ~CanBlockWin
                       /\ \/ /\ CanTakeCenter
                             /\ TakeCenter
                          \/ /\ ~CanTakeCenter
                             /\ \/ /\ CanSetupWin
                                   /\ SetupWin
                                \/ /\ ~CanSetupWin
                                   /\ MoveToEmpty("X") \* No more strategies. Pick spot
    /\ nextTurn' = "O" \* The future state of next turn is O

=============================================================================

Let’s see what happens when we set it up with invariant OHasNotWon:

Next Section

Download Configuration

State Name	Total States	Distinct States
Init	1	1
MoveX	800	382
MoveO	648	488

We didn’t get an error trace! Normally that’s a cause for suspicion; did we mess up the test? But by building incrementally we can have more confidence: we’ve seen this test fail. So we did it! O wil never win. So what exactly does that mean?

The strategy isn’t optimal (i.e. wins in the smallest number of moves) or deterministic. For any given turn it may allow multiple moves. The TLC model checker tested all of those moves and ensured that O would never win, no matter which of those moves we pick. This is true no matter what O does. We will call all moves allowed by the strategy Strategy Moves.

If we wanted to maximize winning, we could run a machine learning algorithm that tries to predict which move would lead to victory. It doesn’t even need to be completely logical, it could try to psych out its opponent, or realize that people wearing red shirts are more likely to move left. As long as it only picks moves that are Strategy Moves we will never lose.

This is a silly example. But combining machine learning with logical safeguards designed and tested with TLA+ has a lot of potential. TLA+ is logical modeling; it can’t give you statistical optimization or tell you how many 9s of reliability you will get. But it can let you update algorithms confidently, knowing that critical parameters will be met.

Playing to win

“So you stopped us from losing,” says your boss, “but that was yesterday, and anyway, we’re a company of winners.” “Right on it boss,” you respond, because you know what to do next. To prove that our system is going to win, we have to first describe winning. Let’s start with what we already know.

Won("X") is true when X has Won
[]Won("X") means always Won("X").

But at the start of the game you haven’t won. This would fail. We need a new operator.

<>Won("X") means Won("X") must EVENTUALLY be true

This is what’s called a Temporal Property, a property that is measured across steps, and we can test for it. X eventually winning sounds exactly like what we want.

If a little thing called “Eventual Consistency” is important to you, don’t worry, you’ll see this again!

So let’s test our code.

A series of steps. The last one is <<Stuttering>>.

We get an error, but it’s unlike any we’ve seen before. What is a Stuttering Step? Well, so far we’ve been thinking about this one state after another. But another valid thing that can happen is nothing. And nothing can happen forever. There’s a world where one of the participants just walks away from the game board, which means X will never win. How cool is it that our modeling tool can point that out to us?

But don’t worry, our casino won’t allow that to happen. If a player can take a move, they will eventually take a move. We say that formally with the concept of Fairness. Weak Fairness (represented as WF) roughly means that if a move can be made, and that fact doesn’t change, the move will eventually be made. We define it below.

---------------------------- MODULE tictactoexwin ----------------------------


XMustEventuallyWin == <>Won("X")

Spec == Init /\ [][Next]_vars /\ WF_vars(Next)


=============================================================================

Note: Fairness isn’t something we are checking for, it’s a property of how the system works.

Great, let’s run it again.

Next Section

Download Configuration

We run into a stalemate. Which means that X won’t always eventually win.

1. Initial predicate
- board
  1. : ___
  2. : ___
  3. : ___
- nextTurn
  1. : X
2. MoveX
- board
  1. : X__
  2. : ___
  3. : ___
- nextTurn
  1. : O
3. MoveO
- board
  1. : XO_
  2. : ___
  3. : ___
- nextTurn
  1. : X
4. MoveX
- board
  1. : XO_
  2. : _X_
  3. : ___
- nextTurn
  1. : O
5. MoveO Click here to see X in a interesting position
- board
  1. : XO_
  2. : _X_
  3. : __O
- nextTurn
  1. : X
6. MoveX then click here to see X set up the win
- board
  1. : XO_
  2. : _XX
  3. : __O
- nextTurn
  1. : O
7. MoveO which is sadly foiled
- board
  1. : XO_
  2. : OXX
  3. : __O
- nextTurn
  1. : X
8. MoveX
- board
  1. : XOX
  2. : OXX
  3. : __O
- nextTurn
  1. : O
9. MoveO
- board
  1. : XOX
  2. : OXX
  3. : O_O
- nextTurn
  1. : X
10. MoveX While a stalemate still occurs, look how much harder X fought
- board
  1. : XOX
  2. : OXX
  3. : OXO
- nextTurn
  1. : O

So we didn’t succeed. X does not always eventually win. The house may not always win, but it never loses. We can live with that. But if you figured out an algorithm to make X always win, this is a way to prove it.

Tic-Tac-Toe is a solved problem, and there is no such algorithm, but don’t let that limit you.

Next: Coordinating a Database and Blob Store