▲ Overview

Layer 2 LL-free (Gen 3)

Making the STM bundle/unbundle protocol livelock-free — with a formal proof

KAME's STM commits optimistically while bundling / unbundling subtrees of the node tree. This deck explains its livelock-free (LL-free) variant (the third-generation spec BundleUnbundle_2level_LLfree.tla). The core idea is "older transactions win" priority arbitration (privileged-TID negotiate): on CAS contention the older transaction is given priority, so retry counts are structurally bounded and TLC can exhaust the entire state space without any CONSTRAINT — which is itself the formal proof of livelock-freedom.

Where this deck fits

Roadmap: notation & new elements → three generations of verification → priorityTagCanProceed → tag-update rules → PreemptTag/ClearMyTagsEventuallyAllDone (liveness proof) → older generations & the wait-free distinction.

Notation & new elements — delta from the base 2-level spec

BundleUnbundle_2level_LLfree.tla is a variant of the base 2-level spec that adds a privileged-TID negotiate mechanism, making livelock-freedom formally checkable by TLC. This deck only describes the delta — see slides_layer2_en.html for the shared base.

Where this variant sits: KAME's TLA+ specs split into three generations by how they bound finiteness. LLfree is the latest, Gen 3: it does not rely on CONSTRAINT and instead bounds the state space structurally via the priority mechanism. That means TLC's full exhaustion itself is the formal proof of livelock-freedom. The spec directly mirrors m_link->negotiate() / release_privileged_tidstamp() from the C++.

Setting Privilege = FALSE disables the priority machinery, the state space diverges, and TLC fails to terminate — the most direct counter-example showing the priority mechanism is required (detailed in the final slide).

New — variables, operators, actions, properties

SymbolMeaningDefinition (summary)
priorityTag[n] Privileged-TID tag for node n Null | <<iter, tid>> per node
iter(t) Transaction age for thread t (derived quantity) MaxCommits - iterBudget[t]
MyTag(t) t's current tag value <<iter(t), t>>
TagOlder(a, b) a is older than b (= takes priority) Lexicographic on (a[1], a[2]), a smaller
CanProceed(t, n) Can t execute a CAS at node n? Tag is Null or matches mine (same 2 conditions as C++ tag_as_contender())
TagAfterFail(t, n) priorityTag[n] after CAS fail 3-stage ladder (Null→mine, mine→refresh, older-wins overwrite)
TagAfterSuccess(t, n) priorityTag[n] after CAS success priorityTag[n] ← unchanged (Transaction-scope persistence)
ClearMyTags(t) Release tags on commit success only (held across retries) All my tags → Null (C++ finalizeCommitment → drop_tags)
PreemptTag(t, n) New action: an older t steals a younger tag Independent action (parallel to the CanProceed path)
PROPERTY EventuallyAllDone Liveness check (new) <>AllDone — every thread reaches termination

Removed / relaxed

ItemBase 2-levelLLfree
CONSTRAINT SerialBound Required (Nat serial grows unboundedly otherwise) Not used (priorityTag mechanism guarantees finiteness structurally)
SYMMETRY ThreadSymmetry Applied Removed (TLC warns on liveness + symmetry; Threads must be ordered Nat anyway)
Threads type model values {t1, t2} {1, 2} (Nat — type-compatible with TagOlder's a[2] < b[2])
MaxSerial / MaxPayload Set in cfg Removed — serial is Lamport (Nat; never referenced inside ops); payload is a monotone Nat counter
DebugSerialBound Undefined == TRUE (always passes; Lamport serial is unbounded so the numeric check is removed)

Unchanged from the base spec: tree structure (Parent + Child1 + Child2), 4-phase bundle protocol, commit pipeline, unbundle, coarse / fine / superfine triples, safety invariants (SnapshotConsistency, NoPriorityLoss, BundleRefConsistency, MissingPropagation, TerminalPayloadCheck) — identical to the base. LLfree adds only a gating layer that guarantees livelock-freedom.

Overview: three generations of verification

KAME's bundle/unbundle TLA+ specs split into three generations based on how they bound finiteness. LLfree is Gen 3; that this variant exhausts without CONSTRAINT constitutes a structural proof of livelock-freedom.

Three generations compared

GenerationSerialCONSTRAINTPriority mechanismTLC behaviorWhat it proves
Gen 1 (modular) % MaxSerial wrap None None Fails on SerialWrapAround violation The need for LL-free machinery, as a counter-example
Gen 2 (Nat+CONSTRAINT) Nat monotone Cut off at SerialBound None Terminates but relies on bound Safety within a bounded search
Gen 3 (LLfree) Lamport (unbounded Nat) None Always on (priorityTag) Finiteness holds structurally; TLC exhausts; liveness PASS Safety + livelock-freedom proven structurally

Tree structure (same as base)

    Parent
      ├── Child1
      └── Child2

2-level tree. The bundle/unbundle protocol itself, the 4 phases, the commit pipeline, and the coarse/fine/superfine triple are identical to the base spec. LLfree only adds the gating layer that ensures livelock-freedom.

Module header (excerpt)

\* Variant of BundleUnbundle_2level that mirrors KAME's livelock-free
\* negotiate mechanism explicitly so TLC's exhaustive search reaches a
\* finite state space without modular-serial wraparound.
\*
\* Key differences from the base 2level spec:
\*   - Lamport serial (TID encoded in lower digit, counter in upper).
\*     No globalSerial — C++ has none. No MaxSerial constant.
\*   - New variable priorityTag[n]: per-node (Null | <<iter, tid>>) tag.
\*   - Set by a thread when its CAS fails at a "negotiate point";
\*     others see the tag and only proceed if it's Null or matches mine.
\*   - Older transactions (smaller iter, then smaller tid) win.
\*   - Tag cleared at Transaction boundaries via ClearMyTags —
\*     mirroring C++ release_privileged_tidstamp().

TLC verification results

665,218 distinct states / depth 89 / 28 s / queue 0 (exhaustion)

Superfine also terminates

BundleUnbundle_2level_LLfree_superfine_mc.cfg (Phase 1 prestamp CAS + Phase 3 DISTURBED detection) likewise exhausts. The most C++-faithful mirror works.

priorityTag variable + helper operators

LLfree's single new variable priorityTag[n] and the helpers that manipulate it. They mirror C++'s m_priority_tidstamp / negotiate() directly.

Variable declaration and Init

VARIABLES
    serial, linkage, pc, op, target, local,
    iterBudget, childQueue,
    priorityTag   \* [Nodes -> Null | <>]

\* globalSerial is gone (no C++ counterpart). serial uses Lamport encoding.
vars == <<serial, linkage, pc, op, target, local,
         iterBudget, childQueue, priorityTag>>

\* Serial encoding: EncodeSerial(cnt, tid) = cnt * SerialBase + tid
\*   SerialBase = 1 + |Threads|  (> max TID)
Init ==
    /\ ...  \* same init as base 2-level
    /\ serial = [t \in Threads |-> EncodeSerial(0, t)]  \* TID embedded
    /\ priorityTag = [n \in Nodes |-> Null]

Derived quantities and helpers

\* iter(t): number of completed iterations; combined with t to form
\* a total order on <> (smaller = older = wins).
iter(t) == MaxCommits - iterBudget[t]

MyTag(t) == <<iter(t), t>>

\* a is older than b. lex order on (iter, tid) with a smaller.
TagOlder(a, b) ==
    \/ a[1] < b[1]
    \/ (a[1] = b[1] /\ a[2] < b[2])

\* Lamport serial helpers (C++ transaction.h:547-576)
SerialBase == 1 + Cardinality(Threads)
SerialCounter(s) == s \div SerialBase
SerialTID(s)     == s % SerialBase
EncodeSerial(cnt, tid) == cnt * SerialBase + tid

Each operator's meaning and C++ counterpart

OperatorMeaningC++ counterpart
priorityTag[n] Registered tag for node n (Null | <<iter, tid>>) Linkage::m_priority_tidstamp
iter(t) Derived: number of completed iterations (no extra variable) Coarse approximation of the Lamport stamp from now_us_tagged()
MyTag(t) The tag t should register right now Transaction::priority_tidstamp()
TagOlder(a, b) a is older (= wins contention) C++ lex compare (iter, tid)

Why iter(t) is a derived quantity

It is computed from iterBudget[t] without adding a new variable, so the state space is unchanged. TLC simply evaluates iter(t) as MaxCommits - iterBudget[t]; no new state axis is introduced.

Why Threads must be Nat

cfg requirement: Threads must be defined as Nat (e.g. {1, 2}).

CanProceed(t, n) — the CAS precondition

The core gate. Every CAS attempt in the base spec is preceded by CanProceed(t, n) /\ ...; if the registered tag forbids it, that thread cannot proceed. This is the heart of the priority mechanism.

Definition

\* CanProceed: gate for any CAS attempt at node n by thread t.
\* Matches C++ tag_as_contender(): a thread proceeds when:
\*   (a) no privileged tidstamp registered, OR
\*   (b) the registered tidstamp's tid is mine.
CanProceed(t, n) ==
    LET tag == priorityTag[n] IN
    \/ tag = Null
    \/ tag /= Null /\ tag[2] = t

Two disjuncts

ConditionMeaningC++ counterpart
tag = Null No tag registered — free entry m_priority_tidstamp.load() == 0
tag[2] = t My own tag (any iter) — same Tx retry or a leftover from earlier == priority_tidstamp()

Where it applies (priority-gated CAS sites)

proof_semantics.md §8: verified 1:1 against C++ at all 8 priority-gated CAS sites. The 10 spec call sites:

ActionNodeMeaning
BundlePhase1 (superfine prestamp)ParentSerial bump at bundle start
BundlePhase2ParentCAS parent to missing=TRUE
BundlePhase3 coarseeach c \in ChildrenConvert all children to BundledRef in one step
BundlePhase3 fine successeach cOne child per step → BundledRef
BundlePhase3 fine faileach cRollback + restart on failure
BundlePhase4ParentFinal CAS to missing=FALSE
CommitParentParentParent payload commit
CommitTryCAStarget[t]Direct child commit
UnbundleCASAncestorsParentRestore parent to missing=TRUE
UnbundleCASChildtarget[t]Restore child to priority

Typical CAS pattern

\* CommitParent as example
CommitParent(t) ==
    /\ pc[t] = "commit_parent"
    /\ ...
    /\ CanProceed(t, Parent)                  \* ★ gate: blocked unless allowed
    /\ LET ser == GenSerial(t, pw.serial)    \* Lamport serial
       IN
       \/ \* CAS success
          /\ ...
          /\ priorityTag' = [priorityTag EXCEPT ![Parent] = TagAfterSuccess(t, Parent)]
          /\ ...
       \/ \* CAS fail
          /\ ...
          /\ priorityTag' = [priorityTag EXCEPT ![Parent] = TagAfterFail(t, Parent)]

Directly from proof_semantics.md §4: "Other threads wait when the tag is neither Null nor mine (CanProceed negation)"; "there is always exactly one 'preferred' thread at every moment"; "that thread's retry count is bounded by the number of other threads"; "the cumulative retry across all threads is bounded too"; "→ serial growth is bounded → state space finite → TLC terminates." This chain is the structural proof of livelock-freedom (§2).

TagAfterFail / TagAfterSuccess — tag-update rules

Two functions that decide how priorityTag[n] updates after CAS success or failure. Fail = 4-step ladder; success = unchanged (Transaction-scope persistence).

TagAfterFail(t, n) — CAS fail update

TagAfterFail(t, n) ==
    IF   priorityTag[n] = Null
    THEN MyTag(t)
    ELSE IF priorityTag[n][2] = t
         THEN MyTag(t)
         ELSE IF TagOlder(MyTag(t), priorityTag[n])
              THEN MyTag(t)
              ELSE priorityTag[n]

3-step ladder

ConditionActionIntent
tag = Null Register MyTag(t) New registration (signal of contention)
tag is mine (any iter) Refresh to MyTag(t) If iter advanced, refresh to the latest
holder is younger Overwrite with MyTag(t) (older-wins) Mirrors C++ tag_as_contender() — older Tx steals younger tags
holder is older Leave it alone Don't steal older threads' registrations

TagAfterSuccess(t, n) — CAS success update

TagAfterSuccess(t, n) == priorityTag[n]   \* ← unchanged (KEEP)

Why per-CAS clearing would cause livelock: "Clear my tag back to Null on every CAS success" lets another Transaction squeeze in between phase CASes of the holding Transaction. That peer's bundle/unbundle reset the first Transaction back to Phase 1, and the ping-pong repeats indefinitely.

The right design: tags are not cleared per-CAS, but only on commit success (CommitParent success / CommitDone success). On failure/retry, tags persist into the next snapshot (matching C++ operator++ not calling drop_tags_n_privilege()).

Correspondence with C++ ScopedNegotiateLinkage

C++TLA+ LLfree
ScopedNegotiateLinkage tag(*link) at commit_xxx() outer scope CanProceed check at the head of SnapCheck / CommitParent / CommitTryCAS
m_priority_tidstamp.store(my_stamp) on CAS fail priorityTag' = [priorityTag EXCEPT ![n] = TagAfterFail(t, n)]
No store on CAS success TagAfterSuccess(t, n) == priorityTag[n] (no-op)
Destructor: release_privileged_tidstamp() ClearMyTags(t) at Transaction-end (see the PreemptTag / ClearMyTags slide)

PreemptTag + ClearMyTags

LLfree's independent action (parallel to CAS) and the bulk tag release at Transaction-end. Both are required for the "older threads are never permanently locked out" property.

PreemptTag(t, n) — independent action

PreemptTag(t, n) ==
    /\ priorityTag[n] /= Null
    /\ priorityTag[n][2] /= t              \* not already mine
    /\ TagOlder(MyTag(t), priorityTag[n])
    /\ priorityTag' = [priorityTag EXCEPT ![n] = MyTag(t)]
    /\ UNCHANGED <<serial, linkage, pc, op, target,
                 local, iterBudget, childQueue>>

Why an independent action is needed

Scenario: A younger thread tyoung grabs the tag on a CAS fail. Then an older thread told appears; its CAS is gated out by CanProceed. But TagAfterFail updates the tag only on actually attempted CAS — and if the CAS isn't enabled, no Fail fires either. So an independent action PreemptTag is needed to let told steal the tag without trying CAS.

Without it: as proof_semantics.md §4 explains, a younger thread holding the tag could push an older thread's retry count to infinity. PreemptTag ensures TLC's state graph always contains "older steals tag," and under WF_vars(NextStep) progress is guaranteed.

ClearMyTags(t) — bulk release at Transaction-end

ClearMyTags(t) ==
    [n \in Nodes |->
        IF priorityTag[n] = Null
        THEN priorityTag[n]
        ELSE IF priorityTag[n][2] = t
             THEN Null
             ELSE priorityTag[n]]

Where it fires (only on commit success)

ActionpriorityTag updateMeaning
CommitParent success ClearMyTags(t) Commit done → release all tags (C++ finalizeCommitment)
CommitParent fail [priorityTag EXCEPT ![Parent] = TagAfterFail(t, Parent)] Tags persist into retry; only Parent's tag updated
CommitDone success ClearMyTags(t) Per-child commit done → release all tags
CommitDone fail UNCHANGED priorityTag Tags persist into retry

EventuallyAllDone PROPERTY + structural LL-freedom

What does LLfree formally prove? And why is the chain of argument in proof_semantics.md §2 the proof of livelock-freedom?

Spec structure

\* AllDone: terminal condition — every thread consumed its full budget and returned to idle.
AllDone == \A t \in Threads : pc[t] = "idle" /\ iterBudget[t] = 0

NextStep ==
    \E t \in Threads :
        \/ SnapRead(t) \/ SnapCheck(t)
        \/ BundlePhase1(t) \/ ... \/ BundlePhase4(t)
        \/ CommitParent(t) \/ ... \/ CommitDone(t)
        \/ UnbundleWalk(t) \/ ... \/ UnbundleCASChild(t)
        \/ \E n \in Nodes : PreemptTag(t, n)   \* ★ also a progress step

Terminating ==
    /\ AllDone
    /\ UNCHANGED vars

Next == NextStep \/ Terminating

Spec == Init /\ [][Next]_vars /\ WF_vars(NextStep)
                              \* ↑ Terminating is outside WF: otherwise
                              \* "stutter forever" trivially satisfies it
                              \* and liveness becomes meaningless

EventuallyAllDone PROPERTY

EventuallyAllDone == <>AllDone

Argument chain (proof_semantics.md §2)

"Termination + no CONSTRAINT + Nat serial monotone → livelock-free automatically"

(1) TLC exhausts (queue 0) ⇒ the state space is genuinely finite (not artificially cut off — every reachable state was enumerated)
(2) Serial is Nat, monotone (no modular wrap)
(3) Every livelock-candidate action (including retries) must bump the serial
(4) A cycle in the state graph ⇒ returns to the same state ⇒ same serial ⇒ contradicts monotonicity ⇒ no cycle exists
(5) No cycle ⇒ no infinite behavior ⇒ livelock-free

Verification results (current, 2-level LLfree)

cfgThreadsStatesDepthTimeResult
coarse micro (MaxCommits=1)2665,2188928sPass + liveness (laptop)
superfine22,676,1961293:12Pass + liveness
superfine confC (all-root)3137,333,348966:35Pass (ohtaka)
MaxCommits=2 superfine2127,586,5993114:40Pass (ohtaka)
dynamic release superfine live2413,884,5163207:13Pass + liveness (ohtaka)

Conclusion: across all configs, safety + livelock-freedom are proven structurally. Details: kamestm/tests/VERIFICATION.md §3, doc/verification_log.md.

Related variants

SpecDifferenceDeck
BundleUnbundle_3level_LLfree.tla3-level tree (Grand → Parent → Child); recursive inner bundle; multi-level unbundle walkslides_layer2_LLfree_3level_en.html
BundleUnbundle_2level_LLfree_dynamic.tla
BundleUnbundle_3level_LLfree_dynamic.tla
Online insert(online=true) / release; InsertThreads / ReleaseThreads role configslides_layer2_LLfree_dynamic_en.html
BundleUnbundle_hardlink_*.tla (5 specs)2-parent / 1-child hard-link topologies; Phase 4 reachability gate; bug repro + fix; per-action fairnessslides_hardlink_en.html

Why preserve older generations + the wait-free distinction

Why the old-generation specs should not be deleted (proof_semantics §5), and how to distinguish "livelock-free" from "wait-free" (proof_semantics §7).

Why keep older generations (proof_semantics §5)

Gen 1 (oldest, modular serial) — fails on SerialWrapAround

Why it matters: An empirical violation proves the LL-free mechanism is necessary. The strongest argument for Gen 3 (proof_semantics §5).

Gen 2 (Nat + CONSTRAINT) — state-space divergence

Why it matters: Provides Gen 3's motivation. CONSTRAINT termination only guarantees safety inside the cut-off, not liveness (proof_semantics §3).

Privilege=FALSE — disabling the priority mechanism

The 3-level spec (BundleUnbundle_3level_LLfree.tla) has an explicit CONSTANT Privilege. Setting Privilege=FALSE disables the priority mechanism entirely, equivalent to the original non-LL-free behavior.

Privilege=TRUE (LLfree)Privilege=FALSE (non-LL-free)
CanProceed(t, n) Tag-based gate — blocks younger threads Always TRUE — no gate
TagAfterFail(t, n) 4-step ladder, registers/refreshes tag no-op — tag unchanged
State space Finite → TLC exhausts Diverges — retries unbounded
EventuallyAllDone PASS Violation (livelock present)

Running TLC with Privilege=FALSE directly confirms that switching to Nat serial alone is not LL-free — the priority mechanism is essential.

Wait-free is a separate question

Lock-free / LL-free (KAME)Wait-free
Definition Under fairness, every thread eventually progresses Bounded steps to completion, independent of other threads
Fairness dependence Liveness requires WF Independent of fairness
CAS-retry-based OK (preferred thread progresses, others retry bounded times) Not possible by design
How to prove in TLA+ <>AllDone + WF suffices Per-thread bounded-step invariant needed

KAME proves lock-free + livelock-free. Wait-free does not follow from a CAS-retry design (by intent).

What formal verification can and cannot show

PropertyFormal verificationC++ implementation
Safety Proven by CONSTRAINT-free exhaustion No violation observed under large stress tests
LL-free EventuallyAllDone PASS (+ Privilege=FALSE divergence as counter-example) No livelock observed
Wait-free Not designed to prove Not achievable via CAS-retry
References: spec source → BundleUnbundle_2level_LLfree.tla / BundleUnbundle_3level_LLfree.tla | base spec → slides_layer2_en.html | 3-level / dynamic → slides_layer2_LLfree_3level_en.html / slides_layer2_LLfree_dynamic_en.html | hardlink → slides_hardlink_en.html