Layer 2: LL-free specification

Making the STM bundle/unbundle protocol livelock-free — with a formal proof

KAME's STM commits optimistically while bundling / unbundling subtrees of the node tree. This deck explains its Layer 2 spec BundleUnbundle_2level_LLfree.tla. The core idea is "older transactions win" priority arbitration (privileged-TID negotiate): on CAS contention the older transaction is given priority, so retry counts are structurally bounded and TLC can exhaust the entire state space without any CONSTRAINT — which is itself the formal proof of livelock-freedom.

Where this deck fits

What this deck covers: the priority-arbitration gating layer that guarantees LL-freedom (priorityTag / CanProceed / PreemptTag / ClearMyTags).
Related decks: for the bundle/unbundle tree structure and protocol detail, see also the Layer 2 protocol deck. The 3-level, dynamic, and hard-link specs are reachable from the coverage overview.

Roadmap: notation & core elements → spec & verification results → priorityTag → CanProceed → tag-update rules → PreemptTag/ClearMyTags → EventuallyAllDone (liveness proof) → the wait-free distinction.

Notation & core elements

BundleUnbundle_2level_LLfree.tla is the 2-level spec equipped with a privileged-TID negotiate mechanism, making livelock-freedom formally checkable by TLC. For the bundle/unbundle tree structure and protocol detail, see also slides_layer2_en.html.

Key property: LLfree does not rely on CONSTRAINT and instead bounds the state space structurally via the priority mechanism. That means TLC's full exhaustion itself is the formal proof of livelock-freedom. The spec mirrors tag_as_contender() / drop_tags_n_privilege() over the per-linkage Linkage::m_transaction_started_time slot from the C++ — at the coarse per-iteration granularity (iter is a work-done counter, not the C++ µs stamp, so this is a sound coarsening, not an order-isomorphism).

Setting Privilege = FALSE disables the priority machinery, the state space diverges, and TLC fails to terminate — the most direct validation that the priority mechanism is required (detailed in the final slide).

New — variables, operators, actions, properties

Symbol	Meaning	Definition (summary)
`priorityTag[n]`	Privileged-TID tag for node n	`Null \| <<iter, tid>>` per node
`iter(t)`	Transaction age for thread t (derived quantity)	`MaxCommits - iterBudget[t]`
`MyTag(t)`	t's current tag value	`<<iter(t), t>>`
`TagOlder(a, b)`	a is older than b (= takes priority)	Lexicographic on `(a[1], a[2])`, a smaller
`CanProceed(t, n)`	Can t execute a CAS at node n?	Tag is Null or matches mine. (Spec hard-gates on any foreign tag; the C++ counterpart is the advisory `fair_mode_blocks_me` gate, hard only for Reserved stamps — the spec is deliberately stricter.)
`TagAfterFail(t, n)`	`priorityTag[n]` after CAS fail	4-branch ladder (Null→mine, mine→refresh, older-wins overwrite, younger→keep)
`TagAfterSuccess(t, n)`	`priorityTag[n]` after CAS success	`priorityTag[n]` ← unchanged (Transaction-scope persistence)
`ClearMyTags(t)`	Release tags on commit success only (held across retries)	All my tags → Null (C++ `finalizeCommitment → drop_tags`)
`PreemptTag(t, n)`	New action: an older t steals a younger tag	Independent action (parallel to the CanProceed path)
`PROPERTY EventuallyAllDone`	Liveness check (new)	`<>AllDone` — every thread reaches termination

Finiteness-related setup

Item	How LLfree handles it
CONSTRAINT SerialBound	Not used (priorityTag mechanism guarantees finiteness structurally)
SYMMETRY ThreadSymmetry	Not used (TLC warns on liveness + symmetry; Threads must be ordered Nat anyway)
Threads type	`{1, 2}` (Nat — type-compatible with `TagOlder`'s `a[2] < b[2]`)
MaxSerial / MaxPayload	Unused — serial is Lamport (Nat; never referenced inside ops); payload is a monotone Nat counter
DebugSerialBound	`== TRUE` (always passes; Lamport serial is unbounded so there is no numeric check)

Safety-related setup: tree structure (Parent + Child1 + Child2), 4-phase bundle protocol, commit pipeline, unbundle, coarse / fine / superfine triples, safety invariants (SnapshotConsistency, NoPriorityLoss, BundleRefConsistency, MissingPropagation, TerminalPayloadCheck). On top of these, LLfree carries a gating layer that guarantees livelock-freedom.

Overview: spec & verification results

That LLfree exhausts without CONSTRAINT constitutes a structural proof of livelock-freedom. Below: the tree structure and protocol body carrying the priority mechanism (priorityTag).

Tree structure

    Parent
      ├── Child1
      └── Child2

2-level tree. On top of the bundle/unbundle protocol body, the 4 phases, the commit pipeline, and the coarse/fine/superfine triple, LLfree carries the gating layer that ensures livelock-freedom.

Module header (excerpt)

\* Variant of BundleUnbundle_2level that mirrors KAME's livelock-free
\* negotiate mechanism explicitly so TLC's exhaustive search reaches a
\* finite state space without modular-serial wraparound.
\*
\* Key differences from the base 2level spec:
\*   - Lamport serial (TID encoded in lower digit, counter in upper).
\*     No globalSerial — C++ has none. No MaxSerial constant.
\*   - New variable priorityTag[n]: per-node (Null | <<iter, tid>>) tag.
\*   - Set by a thread when its CAS fails at a "negotiate point";
\*     others see the tag and only proceed if it's Null or matches mine.
\*   - Older transactions (smaller iter, then smaller tid) win.
\*   - Tag cleared at Transaction boundaries via ClearMyTags —
\*     mirroring C++ drop_tags_n_privilege() (per-linkage default).

TLC verification results

867,696 distinct states / depth 89 / ~35 s / queue 0 (exhaustion)

Config: BundleUnbundle_2level_LLfree_micro_mc.cfg
MaxCommits = 1, |Threads| = 2, fine atomicity
No CONSTRAINT — finiteness held by priorityTag structurally
No SYMMETRY — Threads are ordered Nat; SYMMETRY + liveness don't compose
All safety invariants PASS + EventuallyAllDone == <>AllDone liveness PASS

Superfine also terminates

BundleUnbundle_2level_LLfree_superfine_mc.cfg (Phase 1 prestamp CAS + Phase 3 DISTURBED detection) likewise exhausts. The most C++-faithful mirror works.

`priorityTag` variable + helper operators

LLfree's single new variable priorityTag[n] and the helpers that manipulate it. They mirror C++'s per-linkage Linkage::m_transaction_started_time / tag_as_contender() directly.

Variable declaration and Init

VARIABLES
    serial, linkage, pc, op, target, local,
    iterBudget, childQueue,
    priorityTag   \* [Nodes -> Null | <>]

\* globalSerial is gone (no C++ counterpart). serial uses Lamport encoding.
vars == <<serial, linkage, pc, op, target, local,
         iterBudget, childQueue, priorityTag>>

\* Serial encoding: EncodeSerial(cnt, tid) = cnt * SerialBase + tid
\*   SerialBase = 1 + |Threads|  (> max TID)
Init ==
    /\ ...  \* tree / linkage / pc initialization
    /\ serial = [t \in Threads |-> EncodeSerial(0, t)]  \* TID embedded
    /\ priorityTag = [n \in Nodes |-> Null]

Derived quantities and helpers

\* iter(t): number of completed iterations; combined with t to form
\* a total order on <> (smaller = older = wins).
iter(t) == MaxCommits - iterBudget[t]

MyTag(t) == <<iter(t), t>>

\* a is older than b. lex order on (iter, tid) with a smaller.
TagOlder(a, b) ==
    \/ a[1] < b[1]
    \/ (a[1] = b[1] /\ a[2] < b[2])

\* Lamport serial helpers (C++ SerialGenerator, transaction.h:816-844)
SerialBase == 1 + Cardinality(Threads)
SerialCounter(s) == s \div SerialBase
SerialTID(s)     == s % SerialBase
EncodeSerial(cnt, tid) == cnt * SerialBase + tid

Each operator's meaning and C++ counterpart

Operator	Meaning	C++ counterpart
`priorityTag[n]`	Registered tag for node n (`Null \| <<iter, tid>>`)	`Linkage::m_transaction_started_time` (atomic per-linkage slot)
`iter(t)`	Derived: number of completed iterations (no extra variable)	Age component of `Snapshot::m_started_time`, compared by `signed_diff_us_packed`
`MyTag(t)`	The tag t should register right now	`Snapshot::m_started_time` (tid-packed µs stamp from `now_us_tagged()`)
`TagOlder(a, b)`	a is older (= wins contention)	C++ compares started-time µs only (tid excluded); the spec adds a tid tie-break to totalize the order

Why `iter(t)` is a derived quantity

It is computed from iterBudget[t] without adding a new variable, so the state space is unchanged. TLC simply evaluates iter(t) as MaxCommits - iterBudget[t]; no new state axis is introduced.

Why Threads must be Nat

cfg requirement: Threads must be defined as Nat (e.g. {1, 2}).

TagOlder(a, b) compares a[2] < b[2] over Threads, which requires the elements to be ordered numbers. Model-values like {t1, t2} don't compose.
SYMMETRY ThreadSymmetry is also removed: with ordered Nat, Permutations(Threads) isn't meaningful, and TLC warns that liveness + SYMMETRY risks missed violations. LLfree's exhaustion is fast enough without SYMMETRY anyway.

`CanProceed(t, n)` — the CAS precondition

The core gate. Every CAS attempt is preceded by CanProceed(t, n) /\ ...; if the registered tag forbids it, that thread cannot proceed. This is the heart of the priority mechanism.

Definition

\* CanProceed: gate for any CAS attempt at node n by thread t.
\* Matches C++ tag_as_contender(): a thread proceeds when:
\*   (a) no privileged tidstamp registered, OR
\*   (b) the registered tidstamp's tid is mine.
CanProceed(t, n) ==
    LET tag == priorityTag[n] IN
    \/ tag = Null
    \/ tag /= Null /\ tag[2] = t

Two disjuncts

Condition	Meaning	C++ counterpart
`tag = Null`	No tag registered — free entry	`Linkage::m_transaction_started_time.load() == 0`
`tag[2] = t`	My own tag (any iter) — same Tx retry or a leftover from earlier	slot's tid == mine (`i_am_privileged_now` / `fair_mode_blocks_me` gate over `Snapshot::m_started_time`)

Where it applies (priority-gated CAS sites)

proof_semantics.md §8: verified 1:1 against C++ at all 8 priority-gated CAS sites. The 10 spec call sites:

Action	Node	Meaning
`BundlePhase1` (superfine prestamp)	`Parent`	Serial bump at bundle start
`BundlePhase2`	`Parent`	CAS parent to missing=TRUE
`BundlePhase3` coarse	each `c \in Children`	Convert all children to BundledRef in one step
`BundlePhase3` fine success	each `c`	One child per step → BundledRef
`BundlePhase3` fine fail	each `c`	Rollback + restart on failure
`BundlePhase4`	`Parent`	Final CAS to missing=FALSE
`CommitParent`	`Parent`	Parent payload commit
`CommitTryCAS`	`target[t]`	Direct child commit
`UnbundleCASAncestors`	`Parent`	Restore parent to missing=TRUE
`UnbundleCASChild`	`target[t]`	Restore child to priority

Typical CAS pattern

\* CommitTryCAS (a mid-transaction CAS) as example — keeps its tag on success.
\* (CommitParent/CommitDone SUCCESS instead run ClearMyTags: Transaction end.)
CommitTryCAS(t) ==
    /\ pc[t] = "commit_try_cas"
    /\ ...
    /\ CanProceed(t, Parent)                  \* ★ gate: blocked unless allowed
    /\ LET ser == GenSerial(t, pw.serial)    \* Lamport serial
       IN
       \/ \* CAS success
          /\ ...
          /\ priorityTag' = [priorityTag EXCEPT ![Parent] = TagAfterSuccess(t, Parent)]
          /\ ...
       \/ \* CAS fail
          /\ ...
          /\ priorityTag' = [priorityTag EXCEPT ![Parent] = TagAfterFail(t, Parent)]

Directly from proof_semantics.md §4: "Other threads wait when the tag is neither Null nor mine (CanProceed negation)"; "there is always exactly one 'preferred' thread at every moment"; "that thread's retry count is bounded by the number of other threads"; "the cumulative retry across all threads is bounded too"; "→ serial growth is bounded → state space finite → TLC terminates." This chain is the structural proof of livelock-freedom (§2).

`TagAfterFail` / `TagAfterSuccess` — tag-update rules

Two functions that decide how priorityTag[n] updates after CAS success or failure. Fail = 4-step ladder; success = unchanged (Transaction-scope persistence).

`TagAfterFail(t, n)` — CAS fail update

TagAfterFail(t, n) ==
    IF   priorityTag[n] = Null
    THEN MyTag(t)
    ELSE IF priorityTag[n][2] = t
         THEN MyTag(t)
         ELSE IF TagOlder(MyTag(t), priorityTag[n])
              THEN MyTag(t)
              ELSE priorityTag[n]

4-branch ladder

Condition	Action	Intent
tag = `Null`	Register `MyTag(t)`	New registration (signal of contention)
tag is mine (any iter)	Refresh to `MyTag(t)`	If iter advanced, refresh to the latest
holder is younger	Overwrite with `MyTag(t)` (older-wins)	Mirrors C++ `tag_as_contender()` — older Tx steals younger tags
holder is older	Leave it alone	Don't steal older threads' registrations

`TagAfterSuccess(t, n)` — CAS success update

TagAfterSuccess(t, n) == priorityTag[n]   \* ← unchanged (KEEP)

Why per-CAS clearing would cause livelock: "Clear my tag back to Null on every CAS success" lets another Transaction squeeze in between phase CASes of the holding Transaction. That peer's bundle/unbundle reset the first Transaction back to Phase 1, and the ping-pong repeats indefinitely.

The right design: tags are not cleared per-CAS, but only on commit success (CommitParent success / CommitDone success). On failure/retry, tags persist into the next snapshot (matching C++ operator++ not calling drop_tags_n_privilege()).

Correspondence with C++ per-linkage priority slot

C++	TLA+ LLfree
`i_am_privileged_now` / `fair_mode_blocks_me` gate over `Linkage::m_transaction_started_time`	`CanProceed` check at the head of SnapCheck / CommitParent / CommitTryCAS
`tag_as_contender(link)` on CAS fail (store-and-verify write, oldest-wins; pushes onto `m_tagged_linkages`; the spec/C11 model the CAS semantics)	`priorityTag' = [priorityTag EXCEPT ![n] = TagAfterFail(t, n)]`
No store on CAS success	`TagAfterSuccess(t, n) == priorityTag[n]` (no-op)
On commit success: `drop_tags_n_privilege()` walks `m_tagged_linkages`, zeroing matching slots	`ClearMyTags(t)` at Transaction-end (see the `PreemptTag` / `ClearMyTags` slide)

`PreemptTag` + `ClearMyTags`

LLfree's independent action (parallel to CAS) and the bulk tag release at Transaction-end. Both are required for the "older threads are never permanently locked out" property.

`PreemptTag(t, n)` — independent action

PreemptTag(t, n) ==
    /\ priorityTag[n] /= Null
    /\ priorityTag[n][2] /= t              \* not already mine
    /\ TagOlder(MyTag(t), priorityTag[n])
    /\ priorityTag' = [priorityTag EXCEPT ![n] = MyTag(t)]
    /\ UNCHANGED <<serial, linkage, pc, op, target,
                 local, iterBudget, childQueue>>

Why an independent action is needed

Scenario: A younger thread t_young grabs the tag on a CAS fail. Then an older thread t_old appears; its CAS is gated out by CanProceed. But TagAfterFail updates the tag only on actually attempted CAS — and if the CAS isn't enabled, no Fail fires either. So an independent action PreemptTag is needed to let t_old steal the tag without trying CAS.

Without it: as proof_semantics.md §4 explains, a younger thread holding the tag could push an older thread's retry count to infinity. PreemptTag ensures TLC's state graph always contains "older steals tag," and under WF_vars(NextStep) progress is guaranteed.

`ClearMyTags(t)` — bulk release at Transaction-end

ClearMyTags(t) ==
    [n \in Nodes |->
        IF priorityTag[n] = Null
        THEN priorityTag[n]
        ELSE IF priorityTag[n][2] = t
             THEN Null
             ELSE priorityTag[n]]

Where it fires (only on commit success)

Action	priorityTag update	Meaning
`CommitParent` success	`ClearMyTags(t)`	Commit done → release all tags (C++ `finalizeCommitment`)
`CommitParent` fail	`[priorityTag EXCEPT ![Parent] = TagAfterFail(t, Parent)]`	Tags persist into retry; only Parent's tag updated
`CommitDone` success	`ClearMyTags(t)`	Per-child commit done → release all tags
`CommitDone` fail	`UNCHANGED priorityTag`	Tags persist into retry
`~Transaction()` (abort / RAII)	— (no spec action)	C++ also drops tags on abort / RAII and at standalone-`Snapshot` exit; the spec models only the commit-success release, so "only on commit success" is a spec property, not a C++ one

`EventuallyAllDone` PROPERTY + structural LL-freedom

What does LLfree formally prove? And why is the chain of argument in proof_semantics.md §2 the proof of livelock-freedom?

Spec structure

\* AllDone: terminal condition — every thread consumed its full budget and returned to idle.
AllDone == \A t \in Threads : pc[t] = "idle" /\ iterBudget[t] = 0

NextStep ==
    \E t \in Threads :
        \/ SnapRead(t) \/ SnapCheck(t)
        \/ BundlePhase1(t) \/ ... \/ BundlePhase4(t)
        \/ CommitParent(t) \/ ... \/ CommitDone(t)
        \/ UnbundleWalk(t) \/ ... \/ UnbundleCASChild(t)
        \/ \E n \in Nodes : PreemptTag(t, n)   \* ★ also a progress step

Terminating ==
    /\ AllDone
    /\ UNCHANGED vars

Next == NextStep \/ Terminating

Spec == Init /\ [][Next]_vars /\ WF_vars(NextStep)
                              \* ↑ Terminating is outside WF: otherwise
                              \* "stutter forever" trivially satisfies it
                              \* and liveness becomes meaningless

`EventuallyAllDone` PROPERTY

EventuallyAllDone == <>AllDone

Argument chain (proof_semantics.md §2)

"Termination + no CONSTRAINT + Nat serial monotone → livelock-free automatically"

(1)	TLC exhausts (queue 0) ⇒ the state space is genuinely finite (not artificially cut off — every reachable state was enumerated)
(2)	Serial is `Nat`, monotone (no modular wrap)
(3)	Every livelock-candidate action (including retries) must bump the serial
(4)	A cycle in the state graph ⇒ returns to the same state ⇒ same serial ⇒ contradicts monotonicity ⇒ no cycle exists
(5)	No cycle ⇒ no infinite behavior ⇒ livelock-free

Verification results (current, 2-level LLfree)

cfg	Threads	States	Depth	Time	Result
micro / fine (MaxCommits=1)	2	867,696	89	~35s	Pass + liveness (laptop)
superfine	2	2,676,196	129	3:12	Pass + liveness
superfine confC (all-root)	3	137,333,348	96	2h 57min	Pass + liveness (ohtaka)
MaxCommits=2 superfine	2	127,586,599	311	4:40	Pass (ohtaka)
dynamic release superfine live	2	413,884,516	320	7:13	Pass + liveness (ohtaka)

Conclusion: across all configs, safety + livelock-freedom are proven structurally. Details: kamestm/tests/VERIFICATION.md §3, doc/verification_log.md.

Related specs

Spec	Difference	Deck
`BundleUnbundle_3level_LLfree.tla`	3-level tree (Grand → Parent → Child); recursive inner bundle; multi-level unbundle walk	`slides_layer2_LLfree_3level_en.html`
`BundleUnbundle_2level_LLfree_dynamic.tla` `BundleUnbundle_3level_LLfree_dynamic.tla`	Online `insert(online=true)` / release; `InsertThreads` / `ReleaseThreads` role config	`slides_layer2_LLfree_dynamic_en.html`
`BundleUnbundle_hardlink_*.tla` (7 specs)	2-parent / 1-child hard-link topologies; Phase 4 reachability gate; bug repro + fix; per-action fairness (incl. the conditional `nested_external` gate-scope model)	`slides_hardlink_en.html`

Privilege=FALSE validation + the wait-free distinction

The Privilege=FALSE validation showing the priority mechanism is essential (proof_semantics §5), and how to distinguish "livelock-free" from "wait-free" (proof_semantics §7).

Privilege=FALSE — disabling the priority mechanism

The 3-level spec (BundleUnbundle_3level_LLfree.tla) has an explicit CONSTANT Privilege. Setting Privilege=FALSE disables the priority mechanism entirely. It is the switch that validates the priority mechanism is essential: with it disabled, the state space diverges.

	`Privilege=TRUE` (LLfree)	`Privilege=FALSE` (non-LL-free)
`CanProceed(t, n)`	Tag-based gate — blocks younger threads	Always `TRUE` — no gate
`TagAfterFail(t, n)`	4-step ladder, registers/refreshes tag	no-op — tag unchanged
State space	Finite → TLC exhausts	Diverges — retries unbounded
`EventuallyAllDone`	PASS	not checkable — state space diverges (no liveness run recorded)

Running TLC with Privilege=FALSE directly confirms that switching to Nat serial alone is not LL-free — the priority mechanism is essential.

Wait-free is a separate question

	Lock-free / LL-free (KAME)	Wait-free
Definition	Under fairness, every thread eventually progresses	Bounded steps to completion, independent of other threads
Fairness dependence	Liveness requires WF	Independent of fairness
CAS-retry-based	OK (preferred thread progresses, others retry bounded times)	Not possible by design
How to prove in TLA+	`<>AllDone` + WF suffices	Per-thread bounded-step invariant needed

KAME proves lock-free + livelock-free. Wait-free does not follow from a CAS-retry design (by intent).

What formal verification can and cannot show

Property	Formal verification	C++ implementation
Safety	Proven by CONSTRAINT-free exhaustion	No violation observed under large stress tests
LL-free	`EventuallyAllDone` PASS (+ Privilege=FALSE divergence as counter-example)	No livelock observed
Wait-free	Not designed to prove	Not achievable via CAS-retry

References: spec source → BundleUnbundle_2level_LLfree.tla / BundleUnbundle_3level_LLfree.tla | protocol detail → slides_layer2_en.html | 3-level / dynamic → slides_layer2_LLfree_3level_en.html / slides_layer2_LLfree_dynamic_en.html | hardlink → slides_hardlink_en.html