Link Search Menu Expand Document

(Implementing New Requirements) A naive update

  1. Updating the design
    1. Storage Cleaner Run
  2. Modeling the design
  3. Verifying the design
    1. Summary

Updating the design

The Storage Cleaner is going to to query the blob store and get a batch of keys. It will then query the database in one query and find all the images that are missing a database entry. Then it will delete those unused keys using a batch API call.

Storage Cleaner Run

Storage CleanerStorage CleanerDatabaseDatabaseBlob StoreBlob StoreGets batch of keysReturns batch of keysQueries for unused keysReturns unused keysBatch deletes unused keysReturns statusalt[any failure]Repeats from beginning

Modeling the design

This is the first example in our modeling tasks in which the model will not match the solution 1-1.

  • Relaxing a constraint: While the design calls for batches, for simplicity’s sake we will model it as if the entire blob store keyset can fit into one batch. This hides the complexity of figuring out which items have already been checked and how large of a batch size to use; we can either handle these considerations in implementation or model them separately. In this model, we will need to handle new keys being added after we query for keys, as well as the deletion process failing before all key are deleted. This should also alert us to problems that may be introduced by batching. Note: This design decision is a judgment call that may or may not be correct, but it holds for the current examples.
  • Enhancing a constraint: Deleting from the blob store will be modeled as a one by one operation, even though it is submitted in one API call. This is because blob stores don’t provide transactions. A batch delete may happen over the course of time.

The Storage Cleaner state diagram looks like this:

WaitingCleanerStartGetBlobKeysCleanerGetUnusedKeysCleanerFailCleanerDeletingKeysCleanerFinished

Only the core additions to the spec are shown here. Click Download Code or Download PDF to see the whole thing.

Show Code Show LaTex
---------------------------- MODULE storagecleanernaive ----------------------------

(***************************************************************************)
(* Cleaner States                                                          *)
(***************************************************************************)

CleanerStartGetBlobKeys(c) ==
    LET current == cleanerStates[c] IN
    \* Starts only from waiting
    /\ current.state = "waiting"
    /\ cleanerStates' = [
        cleanerStates EXCEPT
            ![c].state = "got_blob_keys",
            \* All keys that are set in blockstore
            ![c].blobKeys = {k \in UUIDS: blobStoreState[k] # "UNSET"}
        ]
    /\ UNCHANGED <<serverStates, databaseState, blobStoreState, operations>>

CleanerGetUnusedKeys(c) ==
    LET current == cleanerStates[c] IN
    \* From blob keys, get unused keys from database
    /\ current.state = "got_blob_keys"
    /\ cleanerStates' = [
        cleanerStates EXCEPT
            ![c].state = "got_unused_keys",
            ![c].unusedBlobKeys = 
                {k \in current.blobKeys: \* Keys in blob keys
                    \A u \in USERIDS: \* That are not in the database
                        databaseState[u].imageId # k}
        ]
    /\ UNCHANGED <<serverStates, databaseState, blobStoreState, operations>>

CleanerDeletingKeys(c) ==
    LET current == cleanerStates[c] IN
    \* When we have unused keys, keep deleting
    /\ current.state \in {"got_unused_keys", "deleting_keys"}
    /\ Cardinality(current.unusedBlobKeys) # 0
    /\ \E k \in current.unusedBlobKeys: \* Pick a key to delete
        /\ blobStoreState' = [blobStoreState EXCEPT ![k] = "UNSET"]
        /\ cleanerStates' = [
            cleanerStates EXCEPT
                \* Remove the key from set
                ![c].unusedBlobKeys = current.unusedBlobKeys \ {k}
            ]
    /\ UNCHANGED <<serverStates, databaseState, operations>>

CleanerFinished(c) ==
    LET current == cleanerStates[c] IN
    /\ current.state = "deleting_keys"
    \* When we have no more unused keys to delete, finish
    /\ Cardinality(current.unusedBlobKeys) = 0
    /\ cleanerStates' = [
        cleanerStates EXCEPT
            ![c].state = "waiting",
            ![c].blobKeys = {},
            ![c].unusedBlobKeys = {}
        ]
    /\ UNCHANGED <<serverStates, databaseState, blobStoreState, operations>>

CleanerFail(c) ==
    LET current == cleanerStates[c] IN
    \* Cleaner can fail from any active state
    /\ current.state \in {"got_blob_keys", "got_unused_keys", "deleting_keys"}
    (************************************************************************)
    (* Failure represented by cleaner losing state. Any partial operations  *)
    (* stay partially finished.                                             *)
    (************************************************************************)
    /\ cleanerStates' = [
        cleanerStates EXCEPT
            ![c].state = "waiting",
            ![c].blobKeys = {},
            ![c].unusedBlobKeys = {}
        ]
    /\ UNCHANGED <<serverStates, databaseState, blobStoreState, operations>>

(***************************************************************************)
(* Specification / Next                                                    *)
(***************************************************************************)
Next ==
    \* For every step, we either trigger a server or cleaner to take a step
    \/  \E s \in SERVERS: 
            \/ ServerStartWrite(s)
            \/ ServerWriteBlob(s)
            \/ ServerWriteMetadataAndReturn(s)
            \/ ServerFailWrite(s)
            \/ ServerStartRead(s)
            \/ ServerReadMetadata(s)
            \/ ServerReadMetadataAndReturnEmpty(s)
            \/ ServerReadBlobAndReturn(s)
    \/ \E c \in CLEANERS: \* All the steps a cleaner can take
            \/ CleanerStartGetBlobKeys(c)
            \/ CleanerGetUnusedKeys(c)
            \/ CleanerDeletingKeys(c)
            \/ CleanerFinished(c)
            \/ CleanerFail(c)



Spec == Init /\ [][Next]_vars


(***************************************************************************)
(* Invariants                                                              *)
(***************************************************************************)

NoOrphanFiles ==
    \* There does not exist a key
    ~\E k \in UUIDS:
        \* That is in the block store
        /\ blobStoreState[k] # "UNSET"
        \* And not in the database
        /\ \A u \in USERIDS:
            databaseState[u].imageId # k

=============================================================================

Verifying the design

Let’s start small and see what happens:

CONSTANTS
    SERVERS = {s1}
    CLEANERS = {c1}

Invariant ConsistentReads is violated.

Let’s try it again with two servers and two cleaners to see if we get different behavior.

CONSTANTS
    SERVERS = {s1, s2}
    CLEANERS = {c1, c2}

Same behavior. Invariant ConsistentReads is violated.

Adding more servers and cleaners didn’t change the failure mode. We’ve likely hit upon the essential failure of this design.

Summary

Clearly this solution isn’t going to work as is. It can delete images that were part of records being created at that moment. Normal cleanup systems don’t do that; normally they wait a little while…