# Formal Verification of KAME's Lock-Free STM Primitives

## Overview

Three complementary verification approaches covering the full stack.

- **Layer 0 (informal)**: C11 RC11 memory model — verified by GenMC.
- **Layer 1 (TLA+)**: `atomic_shared_ptr` tagged-pointer primitive **and**
  commit-style primitives (`compareAndSet_impl_`, `scoped_atomic_view`,
  drain `release_tag_ref_`) — merged: the modern API exposes commit-level
  operations directly, subsuming the formerly separate "stm_commit" layer.
- **Layer 2 (TLA+)**: 2/3-level bundle/unbundle with LL-free negotiate,
  plus hard-link topologies (sibling parents and root-with-intermediate
  self-collision).

| Layer | Tool | Target | What it verifies |
|---|---|---|---|
| 0 (informal) | GenMC v0.17 (RC11) | `atomic_shared_ptr` | `memory_order_relaxed` / `acq_rel` safety under weak memory |
| 1 (TLA+) | TLC | `atomic_shared_ptr` | Tagged-pointer protocol + drain release + scoped_atomic_view RAII (TagHeld + CAS set + dtor). Safety only — no liveness (livelock-free is a Layer 2 property) |
| 2 (TLA+) | TLC | 2/3-level bundle/unbundle | Multi-phase CAS + LL-free priority for Parent→Children / Grand→Parent→Children; recursive inner bundle; static + dynamic; **safety + liveness** |
---

## 1. GenMC: `atomic_shared_ptr` Memory Model Verification

**Directory:** `tests/cds_atomic_shared_ptr/`

### What it tests

The core lock-free reference counting protocol from `kame/atomic_smart_ptr.h`, extracted into
standalone C11 programs with all `memory_order` annotations exactly matching the original:

| Operation | Ordering in original |
|---|---|
| `compare_set_weak` success | `memory_order_acq_rel` |
| `compare_set_weak` failure | `memory_order_relaxed` |
| `load_shared_` global refcnt `fetch_add` | `memory_order_relaxed` |
| `compareAndSwap_` transfer `fetch_add` | `memory_order_relaxed` |
| `compareAndSwap_` rollback `fetch_add(negative)` | `memory_order_relaxed` |
| `compareAndSwap_` final `fetch_sub` (step 6) | `memory_order_acq_rel` |
| `release_tag_ref_` fallback `decAndTest` | `memory_order_acq_rel` |
| `local_shared_ptr::reset` `decAndTest` | `memory_order_acq_rel` |

### Test files

| File | Scenario | Threads |
|---|---|---|
| `cds_test_load.c` | 3 threads concurrently `load_shared_()` + `release_tag_ref_()` | 3 |
| `cds_test_cas.c` | Reader (`load_shared_`) vs writer (`compareAndSwap_<false>`) | 2 |
| `cds_test_multi_cas.c` | 2 threads race `compareAndSet` on same target | 2 |
| `cds_test_swap.c` | `local_shared_ptr::swap(atomic_shared_ptr&)` under contention | 2 |
| `cds_test_multi_cas_excess.c` | Drain `release_tag_ref_(pref, T)` with `cas_rcnt` parameter | 3 |
| `cds_test_swap_excess.c` | Swap with drain release | 2 |
| `cds_test_cas_excess.c` | CAS with drain release excess undo | 2 |
| `cds_test_cas_noacquire.c` | `compareAndSet_impl_<NOSWAP=true>` no-acquire optimization | 2 |
| `cds_test_scoped_weak.c` | `scoped_atomic_view` + `compareAndSetWeak` race (Layer 1 SCOPED path) | 2 |

### Key modeled details

- **Tagged pointer scheme**: `CAPACITY=8`, pointer in upper bits, local refcount in lower 3 bits
- **Intrusive reference counting**: `Obj.refcnt` as `_Atomic(uintptr_t)`
- **`acquire_tag_ref`**: CAS loop incrementing local refcount in tagged pointer
- **`release_tag_ref`**: CAS to decrement local refcount; falls back to global `decAndTest(acq_rel)` if pointer was swapped
- **`load_shared_`**: acquire_tag_ref → `fetch_add(1, relaxed)` on global refcnt → release_tag_ref
- **`compareAndSwap_`**: pre-increment newr → acquire_tag_ref → transfer local→global → CAS → release old
- **`compare_and_set` (NOSWAP=true)**: callers pre-acquire global refs before spawning threads (matching real `const local_shared_ptr<T>&` calling convention)

### Assertions

- No use-after-free (`assert(p->destroyed == 0)` while holding reference)
- No double-free (refcnt reaches 0 exactly once)
- No memory leaks (all objects destroyed, all refcounts reach 0)
- GenMC additionally checks: no data races on non-atomic accesses

### Results

All 9 tests pass under RC11 (`make run` in `tests/cds_atomic_shared_ptr/`):

| Test | Complete executions | Blocked executions | Wall time | Result |
|---|---|---|---|---|
| 1: load_shared / release_tag_ref safety | 5,757 | 6,428 | 1s | **Pass** |
| 2: load + compareAndSwap_ race | 240 | 7 | 0.06s | **Pass** |
| 3: multiple compareAndSwap_ race | 464,164 | 705,296 | ~270s | **Pass** |
| 4: swap + local_reset safety | 4 | 0 | 0.1s | **Pass** |
| 5: multi CAS with release_tag_ref drain | (large) | (large) | (varies) | **Pass** |
| 6: CAS+load with release_tag_ref drain | (large) | (large) | (varies) | **Pass** |
| 7: swap with release_tag_ref cas_rcnt | 120,118 | 401,606 | 117s | **Pass** |
| 8: compareAndSet_ (NOSWAP=true) no-acquire | 74 | 3 | 0.13s | **Pass** |
| 9: scoped_atomic_view + compareAndSetWeak race | 85 | 12 | 0.04s | **Pass** |

"Blocked executions" are partial executions pruned by GenMC's DPOR algorithm as redundant.

### TLA+-derived GenMC tests (`tests/tlaplus/test_*.c`)

C11 programs **mechanically translated** from the TLA+ specifications, with each
TLA+ atomic step corresponding 1:1 to a C atomic operation:

| File | Source spec | Status |
|---|---|---|
| `test_atomic_shared_ptr.c` | `atomic_shared_ptr.tla` (Layer 1, core) | Pass |
| `test_scoped_atomic_view.c` | `atomic_shared_ptr.tla` (Layer 1, scope) | Pass — 96 executions, 0.28s |
| `test_stm_commit.c` | (legacy stm_commit layer) | Pass |
| `test_bundle_2level.c`, `test_bundle_2level_LLfree.c` | `BundleUnbundle_2level*.tla` | Pass |
| `test_bundle_3level.c`, `test_bundle_3level_LLfree.c` | `BundleUnbundle_3level*.tla` | Pass |

### Key findings

1. **`fetch_add(1, relaxed)` in `load_shared_` is safe** — the preceding `acq_rel` CAS in `acquire_tag_ref_` pins the pointer before the global refcount increment.
2. **`relaxed` transfer/rollback in `compareAndSwap_` is safe** — the final `acq_rel` CAS that installs the new pointer synchronizes-with subsequent observers.
3. **The tagged-pointer local refcount scheme prevents premature deallocation** — no execution was found where a thread observes a freed object.

### Build & run

```bash
cd tests/cds_atomic_shared_ptr

# One-time: build GenMC (requires LLVM 15-20, NOT 22+)
brew install llvm@20 hwloc
git clone https://github.com/MPI-SWS/genmc.git
cd genmc && mkdir build && cd build
cmake .. \
  -DCMAKE_PREFIX_PATH=/opt/homebrew/opt/llvm@20 \
  -DCMAKE_C_COMPILER=/opt/homebrew/opt/llvm@20/bin/clang \
  -DCMAKE_CXX_COMPILER=/opt/homebrew/opt/llvm@20/bin/clang++ \
  -DCMAKE_EXE_LINKER_FLAGS="-L/opt/homebrew/lib" \
  -DCMAKE_BUILD_TYPE=Release
make -j$(sysctl -n hw.ncpu)
cd ../..

# Run all tests
make run

# Or individually
genmc/build/genmc --rc11 --unroll=5 /full/path/to/cds_test_load.c
genmc/build/genmc --rc11 --unroll=5 /full/path/to/cds_test_cas.c
genmc/build/genmc --rc11 --unroll=5 /full/path/to/cds_test_multi_cas.c
```

**Note:** GenMC compiles source files directly via its LLVM frontend. Pass absolute paths.
The `--unroll=5` flag bounds CAS retry loops. Increase if GenMC warns about insufficient unrolling.

### Reproduction audit (2026-06-09, GenMC v0.17.0 / LLVM 20.1.8, Apple M3)

The full suite was re-run on a freshly built GenMC v0.17.0 to confirm the
numbers above are reproducible on current toolchains. **Every documented
complete-execution count was reproduced to the digit:**

| Test | Re-run complete | Doc | Wall | |
|---|---|---|---|---|
| 1: `cds_test_load` | 5,757 | 5,757 | 1s | ✅ |
| 2: `cds_test_cas` | 240 | 240 | <1s | ✅ |
| 3: `cds_test_multi_cas` | 464,164 | 464,164 | 275s | ✅ |
| 4: `cds_test_swap` | 4 | 4 | <1s | ✅ |
| 5: `cds_test_multi_cas_excess` | 410,364 | (large) | 271s | ✅ pass |
| 6: `cds_test_cas_excess` | 240 | (large) | 1s | ✅ pass |
| 7: `cds_test_swap_excess` | 120,118 | 120,118 | 117s | ✅ |
| 8: `cds_test_cas_noacquire` | 74 | 74 | <1s | ✅ |
| 9: `cds_test_scoped_weak` | 85 | 85 | <1s | ✅ |
| TLA+ `test_scoped_atomic_view.c` | 96 | 96 | <1s | ✅ |

Because the **complete**-execution counts match the recorded values exactly,
the `--unroll=5` bound is *complete* for these tests — every CAS-retry loop
terminates within the bound, so no execution is truncated and the
verification is not vacuous. (Contrast: a test whose loops exceed the unroll
reports `complete executions: 0` with all paths *blocked* — a vacuous pass.
Always confirm `complete executions > 0`.)

`test_atomic_shared_ptr.c` (the monolithic 3-thread load+CAS+swap union of
tests 1/2/4) is verification-heavy — it did **not** finish a local 25-minute
run (`>1551s`, no result), consistent with the 66.5M-state scale of the
analogous TLC `atomic_shared_ptr_all_mc.cfg`. The focused decomposition in
the table above plus that TLA+/TLC config (§2) are the tractable,
authoritative coverage of the full protocol.

**macOS build note (GenMC v0.17.0):** on macOS 26 SDK + Homebrew `llvm@20`,
building GenMC *itself* with the keg's `clang++`/`libc++` (as the recipe
above implies) fails to link with an unresolved weak symbol
`operator new[](size_t, std::__type_descriptor_t)` (a type-aware-allocation
ABI symbol the prebuilt keg `libc++.dylib` does not export). Building GenMC
with **Apple `clang++` (system libc++)** while still pointing
`-DLLVM_DIR=$(brew --prefix llvm@20)/lib/cmake/llvm` at the keg (for
`libLLVM.dylib` + the frontend `clang`) resolves it — GenMC's own objects no
longer emit that symbol, and the LLVM C-API boundary is unaffected. Also
add `hwloc` to the link path (`-L$(brew --prefix hwloc)/lib`), and note the
v0.17.0 binary installs at `build/bin/genmc/genmc` (nested).

---

## 2. TLA+ Layer 1: `atomic_shared_ptr` + Commit Primitives Verification

**Directory:** `tests/tlaplus/`
**Spec:** `atomic_shared_ptr.tla`

### What it tests

The full Layer 1 vocabulary of `kame/atomic_smart_ptr.h` under sequential
consistency.  Complements GenMC (which checks memory ordering) by exhaustively
exploring all thread interleavings of:

- the tagged-pointer reference-counting protocol
  (`acquire_tag_ref_`, `load_shared_`, `release_tag_ref_`),
- the **modern drain release pattern**
  (`release_tag_ref_(pref, added_global_rcnt)` with `cas_rcnt` parameter),
- the **modern bulk-transfer `load_shared_`**
  (`pref->refcnt.fetch_add(rcnt)` + drain), which steals concurrent threads'
  tag IOUs and shifts contention from `m_ref` to `refcnt`,
- the unified **`compareAndSet_impl_<OldrT, NewrT, false (no WEAK), RETAIN_NEWR>`**
  template (covering Set / Swap variants),
- **`scoped_atomic_view<T>` RAII lifecycle** (TagHeld state):
  acquire → CAS set (SCOPED variant with step4 = +T, fetch_sub(2) on success) → dtor.

### Modeled operations

| Operation | C++ source | Key detail |
|---|---|---|
| `acquire_tag_ref_()` | `atomic_smart_ptr.h:1058-1108` | Single atomic load of `m_ref` + CAS to +1 local tag |
| `load_shared_()` (bulk) | `atomic_smart_ptr.h:1116-1128` | `fetch_add(rcnt)` to global + drain `release_tag_ref_(pref, rcnt)` |
| `release_tag_ref_(pref, T)` | `atomic_smart_ptr.h:1158-1206` | Drain `min(local_rc, T)` tags in one CAS + fetch_sub the excess |
| `compareAndSwap_()` (legacy) | `atomic_smart_ptr.h:550-650` | 6-phase: pre-inc, acquire, check, transfer, CAS, cleanup/undo |
| `local_shared_ptr::swap(asp&)` | `atomic_smart_ptr.h:628-649` | Like CAS but unconditional and hold-transfer |
| `compareAndSet_impl_<SCOPED>` | `atomic_smart_ptr.h:1240-1450` | No acquire (scope holds +1); step4 = +T (full); fetch_sub(2) on success |
| `scoped_atomic_view` ctor | `atomic_smart_ptr.h:598-700` | Acquire → TagHeld |
| `scoped_atomic_view` dtor | `atomic_smart_ptr.h:730-845` | `release_tag_ref_(pref, 1)` if TagHeld |
| `local_shared_ptr::reset()` | `atomic_smart_ptr.h:433-444` | `fetch_sub(1, acq_rel)` + delete check |

### Key modeling decisions

1. **WEAK CAS spurious failure NOT modelled** — `compare_exchange_strong` semantics
   used throughout.  Sound for safety: weak-CAS behaviors are a superset of
   strong-CAS behaviors, and spurious failures exercise the same undo paths
   covered by genuine failures.  The weak case is empirically validated by
   GenMC RC11 (`cds_test_scoped_weak.c`).
2. **ABA-by-recycling NOT modelled** — prevented by the shared-pointer
   ownership contract (`oldr->refcnt >= 1` keeps oldr alive); not a
   structural property of the tagged-pointer protocol.  See spec header.
3. **`scoped_atomic_view` simplified to 2 states** — `"none"` and `"tagheld"`.
   The `"empty"` and `"owned"` C++ states collapse to `"none"` in this model
   (no observable behavior depends on the distinction at Layer 1).
4. **Drain modelled by extending the existing release CAS** — `cas_rcnt`,
   `added_global_rcnt`, and `drained` parameters generalise the 1-tag
   release to arbitrary K-tag release.

### Verified invariants (10)

| Invariant | Description |
|---|---|
| `TypeOK` | Type invariant |
| `MemorySafety` | While a thread holds a tag/ref on pref, pref is not freed |
| `NoUseAfterFree` | Objects held by `local_shared_ptr`s are not freed |
| `GlobalRCNonNeg` | Global refcount >= 0 for live objects |
| `FreedImpliesZeroRC` | Freed objects have `global_rc = 0` |
| `InstalledNotFreed` | Object currently in `atomic_shared_ptr.m_ref` is not freed |
| `LocalRCBounded` | `local_rc <= 2 * |Threads|` (each thread: 1 scope tag + 1 in-flight acquire) |
| `QuiescentCheck` | All-threads-idle implies `freed[o] = (global_rc[o] = 0)` |
| `TerminalCheck` | All-budgets-exhausted + no held refs + scope="none" → ptr has rc=1, others freed |
| `ScopeConsistent` | `scope_state` and `scope_pref` agree; tagheld implies pref not freed |

### Liveness

**Liveness `<>AllDone` is INTENTIONALLY VIOLATED** at Layer 1.  TLC finds a
lasso where two threads enter mutual CAS retry (`cas_retry` ctx → `cas_acquire` →
`atr_cas` → ...) — internal CAS retries do not consume `iterBudget`, so an
adversarial scheduler can sustain the loop indefinitely.  This is the expected
behavior: livelock-freedom is supplied by Layer 2's priority older-wins
(LL-free negotiate) mechanism, not by Layer 1.

### Results

All configurations enable `SYMMETRY AllSymmetry` (Threads ∪ Objects permutable)
except 1-thread and liveness configs.

| Config | Threads | MaxOps | Ops | Distinct states | Depth | Time | Result |
|---|---|---|---|---|---|---|---|
| `atomic_shared_ptr_1thr_mc.cfg` | 1 | 3 | load+CAS+swap+scope | 4,014 | 33 | <1s | **Pass** (complete) |
| `atomic_shared_ptr_small_mc.cfg` | 2 | 2 | load+CAS+scope | 613,990 | 69 | 12s | **Pass** (complete) |
| `atomic_shared_ptr_mc.cfg` | 2 | 3 | load+CAS+scope | (subset of all_mc.cfg) | — | — | superseded |
| `atomic_shared_ptr_all_mc.cfg` | 2 | 3 | load+CAS+swap+scope (full) | **66,537,058** | 84 | 1h 30min | **Pass** (complete) |
| `atomic_shared_ptr_3thr_cas_mc.cfg` | 3 | 2 | load+CAS+scope | — | — | — | (running on local) |
| `atomic_shared_ptr_live_mc.cfg` | 2 | 2 | `<>AllDone` liveness | (lasso) | — | 40s | **Fail (expected)** — confirms Layer 1 has no LL-freedom |

### Key findings

- **All modelled safety invariants hold** under exhaustive 1-thread (4k
  states) and 2-thread MaxOps=2 (614k states) exploration — no memory safety
  violations, no refcount leaks, no use-after-free.
- **Drain release preserves refcount balance** under arbitrary `cas_rcnt`
  values: excess undo (`fetch_sub(added - drained)`) keeps `global_rc`
  consistent whether the drain succeeds or falls through.
- **Scope tag CAS-consume invariant holds**: SCOPED CAS's `fetch_sub(2)` on
  success uniformly absorbs both `m_ref`'s release and scope's tag-share,
  regardless of whether the scope was in the ABSORBED or DRAINED logical
  state at CAS time.
- **Layer 1 has no liveness guarantee** — formally confirmed via the
  `EventuallyAllDone` counter-example, motivating the need for Layer 2's
  priority older-wins mechanism.

---

## 3. TLA+ Layer 2, 2-level bundle/unbundle Verification:

**Directory:** `tests/tlaplus/`  
**Specs:** `BundleUnbundle_2level_LLfree.tla`, `BundleUnbundle_2level_LLfree_dynamic.tla`

### What it tests

The multi-phase CAS bundle/unbundle protocol for a 2-level tree (Parent → {Child1, Child2}),
plus the livelock-free (LL-free) priority mechanism. Models `bundle()`, `unbundle()`,
`commit()`, and `snapshot()` from `kame/transaction_impl.h` for the 2-level case.

### Priority (LL-free) mechanism

`priorityTag[n] ∈ {Null} ∪ ({0..MaxIter} × Threads)` per node. Older transaction (smaller
iter, then smaller tid) wins. CAS failure → set own tag if older (`TagAfterFail`). Older tag
blocks younger threads (`CanProceed`). `PreemptTag` lets an active older thread snatch a tag.
Tags cleared only on commit success (`ClearMyTags`). Mirrors `m_priority_tidstamp` /
`ScopedNegotiateLinkage` in `transaction_impl.h`.

### Specification generations

| Generation | Serial | Finiteness | Liveness | Status |
|---|---|---|---|---|
| Gen 1 (modular) | `% MaxSerial` wrap | Structural | Fails (SerialWrapAround violated) | Counter-example: shows LL-free necessary |
| Gen 2 (Nat+CONSTRAINT) | Nat monotone | `CONSTRAINT SerialBound` cutoff | Not proven | Shows Nat alone insufficient |
| **Gen 3 (LL-free)** | Nat monotone | **Structural** (priority bounds retries) | **Proven** | **Current reference** |

### Static spec: `BundleUnbundle_2level_LLfree.tla`

- 4-phase bundle: collect sub-packets → CAS parent (missing=TRUE) → CAS each child to
  BundledRef → CAS parent (missing=FALSE, finalize)
- Unbundle for commit: mark child slot Null (missing=TRUE) → restore child priority
- 3 atomicity granularities: `coarse` / `fine` / `superfine` (most C++-faithful)

### Dynamic spec: `BundleUnbundle_2level_LLfree_dynamic.tla`

Extends the static spec with online child insertion (`insert(online=true)`) and release.
Thread roles configurable via `InsertThreads`, `RootThreads`, `LeafThreads`, `ReleaseThreads`.

### Verified invariants

| Invariant | Description |
|---|---|
| `SnapshotConsistency` | If node has `missing=FALSE`, all sub-packets exist |
| `NoPriorityLoss` | Non-root nodes always have `hasPriority=TRUE` or `bundledBy≠Null` |
| `BundleRefConsistency` | If child is bundled to parent, parent has priority |
| `MissingPropagation` | `missing=TRUE` propagates to all ancestors |
| `TerminalPayloadCheck` | At termination each child received exactly the expected increments |
| `EventuallyAllDone` (PROPERTY) | All threads eventually complete — formal liveness proof |

### Selected results (Gen 3, CONSTRAINT-free exhaustion)

| Config | Threads | Distinct states | Depth | Time | Result |
|---|---|---|---|---|---|
| 2-thread coarse | 2 | 665,218 | 89 | 28 s | **Pass + liveness** |
| 2-thread superfine | 2 | 2,676,196 | 129 | 3:12 | **Pass + liveness** |
| 3-thread superfine confC (all-root) | 3 | 137,333,348 | 96 | 6:35 | **Pass** (ohtaka) |
| MaxCommits=2 superfine | 2 | 127,586,599 | 311 | 4:40 | **Pass** (ohtaka) |
| dynamic release superfine live | 2 | 413,884,516 | 320 | 7:13 | **Pass + liveness** (ohtaka) |

Full results: `tests/tlaplus/doc/verification_log.md`

### Equivalence of the existing models with the new C++ Phase 4 reachability gate

The hard-link work (§5) introduced a `Packet::allSubReachable` gate
in the C++ bundle Phase 4 (`transaction_impl.h`, commits `87892b35`,
`92b15f62`, `404fa137`, `b12e1895`): the `is_bundle_root=true`
override now only clears `m_missing` when every Null sub-slot has a
`reverseLookup`-able subnode, otherwise the bundle returns
`BundledStatus::DISTURBED` so the outer `snapshot()` retry loop
re-attempts.  The gate's order matters: `newpacket->m_missing` is
cleared *before* `allSubReachable` / `checkConsistensy` are called,
because both functions short-circuit their Null-slot reverseLookup
when the root they observe is `missing` (mid-bundle semantics).  On
reachability failure the flag is restored to `true` before returning
DISTURBED (see `b12e1895` for the order-of-operations subtlety
introduced by the global-root parameter).

`checkConsistensy` and `allSubReachable` were also given an optional
`globalroot` parameter (commit `1ffd8dce`) so callers with a non-root
`pkt` can supply the true global tree root for the Null-slot
`reverseLookup` — necessary for hard-link Case B (where the
hard-linked child's packet lives in a sibling sub-tree).  At the
Phase 4 call site, `newpacket` aliases the global root for
`is_bundle_root=true` (see `reverseLookup` line 1440 — when
`&superpacket->node() == this` the function returns `superpacket`
itself), so the default `globalroot = {}` is correct without
explicit threading.

**The 2level / 3level LL-free (static + dynamic) TLA+ models do not
need to be updated** for these C++ changes.  In all non-hard-link
topologies the gate is provably a no-op:

1. Each child node has exactly one parent.  When `bundle(P)` reaches
   Phase 4 with a Null sub-slot for an active child `c`, the only way
   `reverseLookup` could fail is if `c`'s packet lives *outside*
   `P`'s subtree — but with a single parent that is impossible by
   construction.
2. The existing `SnapshotConsistency` invariant in these specs
   (`2level_LLfree_dynamic.tla:1093-1096`,
   `3level_LLfree_dynamic.tla:1831-1835`) states exactly the
   property the gate enforces:

   ```tla
   (pw.hasPriority /\ ~pw.packet.missing) =>
       (\A c \in ActiveChildren : pw.packet.sub[c] /= Null)
   ```

   In words: if a node is published with `missing=FALSE`, no active
   child has a Null sub-slot.  This is precisely the precondition
   `allSubReachable` checks before letting the publish proceed.
3. The 413 M-state `dynamic release superfine live` run and all
   smaller dynamic / static configurations have verified this
   invariant without violation.  Therefore the C++ `else { return
   DISTURBED }` branch is provably unreachable in non-hard-link
   topology, and the models accurately represent the implementation
   without modification.

The gate is only effectful in the hard-link topology, where multiple
parents share a child and the multi-step release can leave the child
transiently Null in two parents' `sub[]`.  That case is modelled in
§5.

### Build & run

```bash
cd tests/tlaplus
# tla2tools.jar included; requires OpenJDK 21+

# 1-thread sanity (< 1 s)
java -XX:+UseParallelGC -Xmx8g -cp tla2tools.jar tlc2.TLC \
  -workers auto -config BundleUnbundle_2level_LLfree_coarse_mc.cfg \
  BundleUnbundle_2level_LLfree.tla

# 2-thread coarse with liveness (~30 s)
java -XX:+UseParallelGC -Xmx14g -cp tla2tools.jar tlc2.TLC \
  -workers auto -config BundleUnbundle_2level_LLfree_coarse_mc.cfg \
  BundleUnbundle_2level_LLfree.tla
```

---

## 4. TLA+ Layer 2, 3-level bundle/unbundle Verification:

**Directory:** `tests/tlaplus/`  
**Spec:** `BundleUnbundle_3level_LLfree.tla`

### What it tests

Same protocol as the 2-level Layer 2 (§3) extended to a 3-level tree
(Grand → Parent → {Child1, Child2}).  Additionally verifies recursive
bundling (snapshot of Grand triggers inner bundle of Parent) and
multi-level unbundle walk.

### Additional coverage vs the 2-level spec

- **Recursive inner bundle**: `InnerPhase2/3/4` model the inner `bundle()` call when
  `snapshot(Grand)` encounters Parent with `missing=TRUE`
- **Multi-level unbundle walk**: `commit(Child)` when Child is bundled 2 levels deep
  (Child → Parent → Grand); walk traverses `bundledBy` chain up to Grand
- **`BundleChainValid`** / **`BundledByCorrect`** invariants: structural correctness of
  the 3-level `bundledBy` chain

### Verified invariants (adds to the 2-level set)

| Invariant | Description |
|---|---|
| `BundleChainValid` | Bundled node's `bundledBy` target is priority or itself bundled |
| `BundledByCorrect` | `bundledBy` always points to the structural parent |
| `GrandAlwaysPriority` | Root (Grand) node always has priority |

### Selected results (Gen 3, CONSTRAINT-free exhaustion)

| Config | Threads | Distinct states | Depth | Time | Result |
|---|---|---|---|---|---|
| 2-thread coarse | 2 | 1,497,098 | 98 | 1:35 | **Pass + liveness** |
| 2-thread superfine | 2 | 14,109,731 | 148 | 19:13 | **Pass + liveness** |
| 3-thread superfine confC (all-root) | 3 | 640,894,951 | 88 | 15:25 | **Pass + liveness** (ohtaka) |

Full results: `tests/tlaplus/doc/verification_log.md`

### Build & run

```bash
cd tests/tlaplus

# 1-thread sanity (< 1 s)
java -XX:+UseParallelGC -Xmx8g -cp tla2tools.jar tlc2.TLC \
  -workers auto -config BundleUnbundle_3level_LLfree_1thr_superfine_mc.cfg \
  BundleUnbundle_3level_LLfree.tla

# 2-thread coarse with liveness (~2 min)
java -XX:+UseParallelGC -Xmx14g -cp tla2tools.jar tlc2.TLC \
  -workers auto -config BundleUnbundle_3level_LLfree_coarse_mc.cfg \
  BundleUnbundle_3level_LLfree.tla
```

Large configs (3-thread superfine) require a supercomputer; see
`tests/tlaplus/doc/ohtaka_handoff.md`.

### Liveness (Layer 2 only)

**Layer 2 is lock-free + livelock-free**, formally proven via the
`EventuallyAllDone` PROPERTY.  Priority gating (TagOlder older-wins
arbitration) makes unbounded retries structurally impossible: serials
increment monotonically, the state graph is acyclic, and TLC terminates
without `CONSTRAINT`.  See `tests/tlaplus/doc/proof_semantics.md` §2–§4
for the full argument.

**Layer 1 is NOT livelock-free** — see §2 above for the counter-example.
Liveness is supplied by the Layer 2 priority mechanism, not by the bare
tagged-pointer protocol.

Wait-free is not claimed at either layer — CAS-retry-based with fairness
(`WF_vars`) required for per-thread liveness.  See `proof_semantics.md` §7.

---

## 5. TLA+ Layer 2, Hard-link Verification (2-parent + 1-child topologies)

**Directory:** `tests/tlaplus/`
**Specs:**
- `BundleUnbundle_hardlink_dynamic.tla` — sibling-parents variant (Parent1 ‖ Parent2 share DynChild1)
- `BundleUnbundle_hardlink_self_collision.tla` — self-collision variant (R-root + intermediate parent A, both directly parent of C)
- `BundleUnbundle_hardlink_4node.tla` — Root-A-B-C topology with C hard-linked under both A and B (production-race repro, mirrors the C++ Phase 4 reachability-gating fix)
- `BundleUnbundle_hardlink_external.tla` — minimal 4-node external-parent repro (`P2` hard-linked external to `P1`, `bundle(GN1)` triggers `SnapshotConsistency` violation without the fix)
- `BundleUnbundle_hardlink_external_migration.tla` — cross-tree migration (bundle on `GN1` reaches into `P1`'s tree to pull `P2` into `GN2.sub[P2]`)

### Background

The "hard-link" case — one child node referenced from two distinct parents
— had been called out informally in `CLAUDE.md` / `README.md` as a
"transactions may always fail and retry" liveness caveat.  Production
reports of low-frequency dyn_node aborts ("30/30 abort on another env")
prompted formal modelling of the protocol under hard-link topologies.

Five complementary topologies are modelled:

| Spec | Topology | Bug surface |
|---|---|---|
| `_hardlink_dynamic` | Parent1 ‖ Parent2 → DynChild1 (sibling parents) | migration race when one parent's bundle pulls the child from the other parent's `sub[]` |
| `_hardlink_self_collision` | R → A → C and R → C (root is also direct parent) | bundle(R) walks both R→C and R→A→C; the `is_bundle_root` Phase 4 `m_missing=false` override interacts with the doubled path |
| `_hardlink_4node` | Root → A, Root → B, A → C, B → C (C shared between A and B) | bundle(Root) Phase 4 reachability gating + outer `DISTURBED` retry; production-race repro |
| `_hardlink_external` | GN1 → P2 (hardlink), P1 → P2 (external owner of P2's packet) | bundle(GN1) finalises `GN1 ~missing` while `GN2.sub[P2]=Null` is unreachable from GN1 → SnapshotConsistency violation without Phase 4 gating |
| `_hardlink_external_migration` | as above, but the bundle is allowed to migrate P2 from P1's tree | bundle(GN1) must atomically pull P2 out of P1's `sub[]` and into GN2's `sub[]` while the peer races on P1 |

### Self-collision: bug repro and fix simulation

The self-collision spec (`_hardlink_self_collision`) reproduces a real
race captured in the production trace and confirms a proposed fix.

**Bug surface:** during `bundle(R)` Phase 2 (R missing=TRUE, child sub-packets
collected), a peer thread's `insert(R, C)` (hard-link registration) CAS
of R may execute, overwriting R's wrapper with `missing=FALSE`.  The
bundling thread's Phase 4 CAS then fails; on retry, Phase 1 collects C
via a now-stale `bundledBy=A` branch and writes `R.sub[A].sub[C] = Null`,
losing the packet.  `SnapshotConsistency` (mirroring C++
`Packet::checkConsistensy` at `transaction_impl.h:870-871`) is violated.

**Proposed fix:** `BundlePhase3` should only CAS-tag child wrappers
whose packets actually *move* into the parent's `sub[]`.  Hard-link
references — where `parent.sub[c] = Null` indicates the packet lives
elsewhere — must be skipped:

```cpp
// pseudo-diff in bundle() Phase 3
for (child in subnodes) {
    if (local.subpackets[child] == nullptr) continue;  // ← NEW
    // existing CAS to BundledRef(this)
}
```

A naive alternative — gating `insert(R, C)` on `~R.missing` (waiting for
in-flight bundle to clear) — was rejected: it breaks lock-freedom and
risks deadlock at 3+ threads.  The Phase 3 skip lives entirely inside
the bundle protocol and needs no external coordination.

### Impact on non-hard-link models / tests

The fix only triggers when `local.subpackets[child] == nullptr`.  In
the non-hard-link 2level/3level dynamic models and the existing C++
tests (`dyn_node_test`, `3level_mixed_test`, `payload_integrity_*`),
every inserted child's packet lives in exactly one parent's `sub[]` —
no `Null` sub-slot occurs in steady state — so the fix is a no-op.

### Verified invariants

| Invariant / Property | Description |
|---|---|
| `SnapshotConsistency` | mirrors `Packet::checkConsistensy` — a `Null` sub-slot is consistent only if reachable via `reverseLookup` (hard-link path) |
| `HardlinkExclusive` | at most one parent's `sub[]` holds the child's packet (no duplicate homing) |
| `BundleRefConsistency` | `child.bundledBy` parent has priority and either holds the packet or the child wrapper carries an `InsertedRef` |
| `NoPriorityLoss`, `NoMissingHole`, `MissingPropagation` | structural sanity |
| `EventuallyAllDone` (PROPERTY) | all threads eventually complete — checked on every hard-link cfg (uniform safety+liveness coverage as of 2026-05-20) |

### Results

`_hardlink_dynamic` (sibling parents, superfine):

| Config | Threads | Distinct states | Result |
|---|---|---|---|
| 1-thread, MaxCommits=1 | 1 | 7 | **Pass** + liveness |
| 2-thread, MaxCommits=1 | 2 | 62 | **Pass** + liveness |
| 2-thread, MaxCommits=2 | 2 | 703 | **Pass** |

`_hardlink_self_collision` (R-A-C, before fix):

| Config | Result |
|---|---|
| 2-thread | **FAIL** — `SnapshotConsistency` violated at 376 states / 222 distinct |

`_hardlink_self_collision` (after Phase 3 skip-Null fix):

| Config | Threads | Distinct states | Result |
|---|---|---|---|
| 2-thread | 2 | 114 | **Pass** + liveness |
| 3-thread | 3 | 760 | **Pass** + liveness |

`_hardlink_4node` (R + A + B + shared C, production-race repro
with Phase 4 reachability gating + outer-retry semantics):

| Config | Threads | Distinct states | Result |
|---|---|---|---|
| 2-thread (production-race, no fix) | 2 | 97 | **FAIL** — SnapshotConsistency violated |
| 2-thread (Phase 4 reachability gating + DISTURBED retry) | 2 | 531 | **Pass** + liveness under per-action `WF(MigrateCToA)` |

`_hardlink_external` (P2 hard-linked under both GN1 and P1, no migration —
direct repro of `SnapshotConsistency` violation when Phase 4 finalises
GN1 ~missing while GN2.sub[P2] is unreachable):

| Config | Threads | Distinct states | Result |
|---|---|---|---|
| 1-thread | 1 | 15 | **Pass** + liveness (with Phase 4 reachability gate) |
| 2-thread | 2 | 136 | **Pass** + liveness |

`_hardlink_external_migration` (bundle on GN1 atomically migrates P2 from
P1's tree into GN2.sub[P2]; per-action WF on every progress step):

| Config | Threads | Distinct states | Result |
|---|---|---|---|
| 1-thread | 1 | 8 | **Pass** + liveness |
| 2-thread | 2 | (~hundreds) | **Pass** + liveness |
| 3-thread | 3 | 1,202 | **Pass** + liveness under per-action WF on every BundlePhase* / BundlePullP1 / BundleCASP2 / BundleUpdateGN1 step |

### First per-action fairness in the KAME TLA+ corpus

The `_hardlink_4node` (with the C++ fix mirrored) is the first KAME
TLA+ spec to use **per-action fairness** rather than the blanket
`WF_vars(NextStep)`:

```tla
Spec == Init /\ [][Next]_vars
        /\ WF_vars(NextStep)
        /\ \A t \in Threads : WF_vars(MigrateCToA(t))
```

`_hardlink_external_migration` extends the same idea further — every
progress action of every thread gets its own per-action WF, since
each step of the bundle pipeline must be guaranteed to fire against
arbitrary peer retries:

```tla
Spec == Init /\ [][Next]_vars
        /\ WF_vars(NextStep)
        /\ \A t \in Threads :
            /\ WF_vars(BundlePhase1(t))
            /\ WF_vars(BundlePullP1(t))
            /\ WF_vars(BundleCASP2(t))
            /\ WF_vars(BundleUpdateGN1(t))
            /\ WF_vars(BundlePhase4(t))
            /\ WF_vars(BundlePhase5(t))
```

Rationale.  With the production-faithful retry semantics (Phase 4 on
reachability failure returns to `bundle_phase1` — matching the C++
`BundledStatus::DISTURBED` return out to `snapshot()`'s outer retry
loop), the model admits executions where the bundling thread retries
arbitrarily many times.  The retry only terminates after the peer's
release-step-2 (`MigrateCToA` in the model) fires, restoring
`A.sub[C]` to a reachable state.

`WF_vars(NextStep)` (blanket) is too weak here.  Weak fairness on a
disjunction guarantees "some enabled action fires" — an infinite
sequence firing only `BundlePhase1A/B/2/4` (= bundle retries)
satisfies it, even though `MigrateCToA` is continuously enabled in
parallel.  Such a trace violates the production fairness assumption,
but TLC's liveness checker accepts it.

`WF_vars(MigrateCToA(t))` — per-action weak fairness — excludes
those traces: if `MigrateCToA(t)` is continuously enabled (and it
is, in this model — no peer action re-disables it once enabled), it
must eventually fire.  After it fires, the next bundle attempt sees
the reachable state and finalizes cleanly.

(SF would also work but is overkill here — SF only adds value when
the action can oscillate between enabled and disabled.  WF and SF
produce the same 13,056-distinct-state verification with
`EventuallyAllDone` PASS.)

### Doesn't "older-wins" subsume this fairness?

The implementation's LL-free `negotiate` provides `older-wins`
arbitration on CAS contention.  Doesn't it already cover progress?

No, because `older-wins` arbitrates **on the same linkage**.  The
race here spans **three different linkages**:

* T1's bundle CAS — operates on `Root`'s `m_link`
* T2's release step 1 (`ReleaseBCNoMigrate`) — operates on **B**'s `m_link`
* T2's release step 2 (`MigrateCToA`) — operates on **A**'s `m_link`

T1 and T2 never CAS the same linkage in this scenario, so
`older-wins` has nothing to arbitrate between them.  T1 can retry
its Root-CAS arbitrarily many times without ever blocking T2 from
running its B-CAS or A-CAS.

Production progress for T2 therefore depends on **OS-level thread
scheduling** (each thread gets CPU time), not on negotiate.  The
TLA+ model abstracts the OS-scheduling guarantee as
`WF_vars(MigrateCToA)`.

All earlier KAME TLA+ specs (`BundleUnbundle_2level_LLfree`,
`_3level_LLfree`, both `_dynamic` variants, the other hardlink
specs, and `atomic_shared_ptr.tla`) terminate without needing
per-action fairness because their race surfaces were on a single
linkage chain, where LL-free priority gating bounds retries
structurally and `WF_vars(NextStep)` suffices.

### State-space bounding policy

User feedback 2026-05-20 led to a uniform policy across all hardlink
models — **no artificial retry bound, no CONSTRAINT, no MOD wrap-around**:

| Model | retryCount / DebugRetryBound | Bounding mechanism |
|---|---|---|
| `_hardlink_4node` | **removed** (gold standard) | `"in_release"` pc state binds the 2-step release into one atomic-from-caller API call (mirrors C++ `release(B, C)`); finite because every other variable has a finite domain |
| `_hardlink_self_collision` | **removed** | finite by bounded domains |
| `_hardlink_external` | **removed** | finite by bounded domains |
| `_hardlink_external_migration` | **removed** | per-action WF on every progress step replaces the retry-counter role; finite by bounded domains |
| `_hardlink_dynamic` | never had one | finite by bounded domains + `MaxCommits=1` |

Rationale.  CONSTRAINT silently prunes long-retry paths and risks
false-negative liveness verdicts; INVARIANT bounds on a retry counter
false-positive on legitimate retries that recover.  Removing both
gives an honest verification: TLC terminates naturally on the bounded
state space, and `EventuallyAllDone` is checkable end-to-end.

The `_hardlink_4node` and `_hardlink_external_migration` models in
particular demonstrate two complementary recipes:

* **4-node**: a 2-step release becomes a single "logical action" via
  a binding pc state — mirrors how the C++ `release(B, C)` API is a
  single non-preemptible call from the caller's standpoint.  No
  per-action fairness on the bundling thread is needed because the
  retry/release pairing is structural.
* **external_migration**: cross-tree races where three different
  linkages are involved use **per-action WF** on every progress step
  for every thread.  This models OS-level thread scheduling
  fairness, since in-linkage older-wins negotiate doesn't arbitrate
  between threads that never CAS the same linkage.

### Non-atomic test pattern (`BundleUnbundle_hardlink_nonatomic.tla`)

A separate spec verifies whether the test pattern restored by remote
branch claude/refactor-negotiate-scoped-f7de2 commit b23fa954 — non-tx
`p1->insert(p2)` / `p1->release(p2)` interleaved with transactional
`gn1`/`gn2` operations — exposes a real liveness gap in master, or
merely a scheduler artifact.

The spec models:
* Per-thread sequence ❶ NonTxInsertAC → ❷ TxInsertHardlink →
  ❸ NonTxReleaseAC (leaves C in **limbo**: `bundledBy = A` while A no
  longer references C) → ❹ TxReleaseHardlink → destructor
  finalises A and C in limbo.
* Two finalize variants (CONSTANT `UseFixVariant`):
  - `FALSE` (master): bundle-fall-through, walks bundledBy chain →
    requires scope on Root (heavily contended by the peer's TX).
  - `TRUE` (b23fa954): direct self-promote CAS on C's linkage only
    — no chain walk, no Root contention.
* Two fairness levels: `Spec` (per-action WF on every progress
  step) and `WeakSpec` (only blanket `WF_vars(NextStep)`).

Result (all four configurations):

| Spec / Variant | Distinct states | Result |
|---|---|---|
| `Spec` + master | 308 | **Pass + liveness** |
| `Spec` + fix | 308 | **Pass + liveness** |
| `WeakSpec` + master | 308 | **Pass + liveness** |
| `WeakSpec` + fix | 308 | **Pass + liveness** |

**Conclusion:** at this modelling abstraction, both paths are
theoretically live.  The b23fa954 self-promote is a **CAS-count
optimization** — it reduces the chain walk's `O(chain × retry)`
CASes to `O(1)`, but the master fall-through is not livelocked in
principle.  The ~10% residual hang reported on the remote branch is
therefore consistent with real-OS scheduling artifacts (real
schedulers do not strictly meet `WF` in finite time), not a logic
gap.

The optimization is gated by `KAME_STM_OPTIONAL_OPTIMIZATION`
(defined to `1` by default in `kame/transaction_definitions.h`,
commits `ead762be`, `b7a4d882`).  Compile with
`-DKAME_STM_OPTIONAL_OPTIMIZATION=0` to disable the self-promote
shortcut and the related `bundle()` "peer-completed early return";
the resulting build matches the strictly TLA+-modelled master path
with no opt-in shortcuts.

**Note**: these shortcuts are NOT covered by the TLA+ models in
`tests/tlaplus/` — the unconditional master path is verified, the
shortcuts are CAS-count optimizations that bypass the modelled
control flow.

### Other STM correctness / perf knobs touched in this work

* **`KAME_ENABLE_SPIN_BAND_GATE` default flipped to `0`** on Apple
  Silicon (commit `472d193d`).  Re-benchmarked on M3 (4-thread):
  3level `payload_integrity` ~5% faster (mean, lower variance) with
  the gate OFF, others within noise.  Older sweeps had the gate ON
  by default for iMac Pro / Linux x86.  Override with
  `-DKAME_ENABLE_SPIN_BAND_GATE=1` if a target shows positive A-B.
* **`fair_mode_blocks_me` / `i_am_privileged_now` TID compare**
  (commit `9a0f9848`).  Fixed nested-Tx self-deadlock in the
  per-linkage privilege path: the per-linkage branches used
  `strip_kind` (which preserves the US timestamp field) for
  "is this slot mine?" identity, mis-identifying a nested inner Tx
  on the same thread as a peer.  Now use `stamp_tid` (TID only),
  matching the well-documented global-mode logic.  Measurable side
  effect: `payload_integrity` runs faster than before due to
  removed spurious CV-sleeps.

### Build & run

```bash
cd tests/tlaplus

# Self-collision (after fix) 2-thread (<1 s)
java -XX:+UseParallelGC -Xmx4g -cp tla2tools.jar tlc2.TLC \
  -workers auto -config BundleUnbundle_hardlink_self_collision_2thr_mc.cfg \
  BundleUnbundle_hardlink_self_collision.tla

# 4-node production-race + fix 2-thread (~1 s)
java -XX:+UseParallelGC -Xmx6g -cp tla2tools.jar tlc2.TLC \
  -workers auto -config BundleUnbundle_hardlink_4node_2thr_mc.cfg \
  BundleUnbundle_hardlink_4node.tla

# External migration 3-thread (<1 s)
java -XX:+UseParallelGC -Xmx4g -cp tla2tools.jar tlc2.TLC \
  -workers auto -config BundleUnbundle_hardlink_external_migration_3thr_mc.cfg \
  BundleUnbundle_hardlink_external_migration.tla

# Dynamic 2-thread (sibling parents, with liveness)
java -XX:+UseParallelGC -Xmx4g -cp tla2tools.jar tlc2.TLC \
  -workers auto -config BundleUnbundle_hardlink_dynamic_2thr_mc.cfg \
  BundleUnbundle_hardlink_dynamic.tla

# Non-atomic test pattern — master vs b23fa954 fix comparison
java -XX:+UseParallelGC -Xmx4g -cp tla2tools.jar tlc2.TLC \
  -workers auto -config BundleUnbundle_hardlink_nonatomic_master_mc.cfg \
  BundleUnbundle_hardlink_nonatomic.tla
java -XX:+UseParallelGC -Xmx4g -cp tla2tools.jar tlc2.TLC \
  -workers auto -config BundleUnbundle_hardlink_nonatomic_fix_mc.cfg \
  BundleUnbundle_hardlink_nonatomic.tla
```

To reproduce the original bug, revert the Phase 3 `subpackets[c] == Null`
skip in `BundlePhase3` and `BundlePhase3Fail` (re-add the `CAS C` branch
and the `local[t].cWrapper /= Null` disjunct), or in `_hardlink_4node`
remove the `canFinalize` gating in `BundlePhase4`.

---

## 6. Runtime Stress Tests (C++)

**Directory:** `tests/`

Complementing the exhaustive model checking above, the C++ stress tests exercise the real
implementation under high contention with many iterations, catching bugs that depend on
timing, memory layout, or compiler-specific behavior.

### Test suite

| Test | Threads | Iterations | What it tests |
|---|---|---|---|
| `atomic_shared_ptr_test` | 4 | 400K/thread | `atomic_shared_ptr` scan/CAS/swap/reset under concurrent access; verifies object lifecycle balance (construction = destruction), use_count correctness |
| `atomic_queue_test` | 4 | 100K pushes/thread | Lock-free queues (`atomic_queue`, `atomic_pointer_queue`, `atomic_queue_reserved`); verifies no lost/phantom elements, totals match |
| `atomic_scoped_ptr_test` | 4 | 1M/thread | Atomic unique-pointer swaps and resets; verifies no leaks (destructor/constructor count balance) |
| `mutex_test` | 8 | 100K lock/unlock/thread | Mutual exclusion correctness; verifies critical section invariant (`g_cnt1` stays balanced) |
| `transaction_test` | 4 | 2,500/thread | STM with tree node operations; verifies **snapshot consistency** (`gn2 <= gn3` invariant across concurrent commits), insert/release under transactions, final sums |
| `transaction_negotiation_test` | 6 (2 slow + 4 fast) | 10 slow, 5M fast | STM fairness under asymmetric contention; verifies slow threads complete despite fast threads (no starvation) |
| `transaction_dynamic_node_test` | 4 | 2,500/thread | STM with dynamic node creation/deletion (ComplexNode subtrees); verifies tree consistency during concurrent restructuring |

### Build & run

```bash
cd tests
# Via qmake (Qt Creator)
qmake tests.pro && make

# Via GNU Make
make -f Makefile.tests check   # builds and runs all tests
```

### Relationship to formal verification

| Property | Stress tests | GenMC | TLA+ |
|---|---|---|---|
| Memory ordering correctness | Probabilistic (timing-dependent) | **Exhaustive** (RC11 model) | N/A (seq. consistency) |
| Protocol logic (refcount + drain + scope) | Probabilistic | Assertion-based | **Exhaustive** (Layer 1, this work) |
| STM commit correctness | `gn2 <= gn3` invariant | N/A | **Exhaustive** (Layer 1 + Layer 2) |
| Bundle/unbundle protocol | Implicit (via STM tests) | N/A | **Exhaustive** (Layer 2, 622M states) |
| Livelock-freedom (LL-free) | Asymmetric stress | N/A | **Exhaustive** (Layer 2 only; Layer 1 has counter-example) |
| Fairness / starvation | `transaction_negotiation_test` | N/A | Not modeled |
| Real compiler/hardware bugs | **Yes** (actual binary) | No (abstract model) | No (abstract model) |

The stress tests and formal methods are complementary: stress tests catch implementation-level
bugs (compiler optimizations, platform-specific alignment, actual memory ordering on hardware)
while model checking guarantees protocol correctness across **all** possible interleavings
within the modeled abstraction.

---

## Paper-ready citations

- **GenMC:** Kokologiannakis, M., Raad, A., & Vafeiadis, V. "Model Checking for Weakly Consistent Libraries." PLDI 2019.
- **RC11:** Lahav, O., Vafeiadis, V., Kang, J., Hur, C.-K., & Dreyer, D. "Repairing Sequential Consistency in C/C++11." PLDI 2017.
- **TLA+:** Lamport, L. "Specifying Systems: The TLA+ Language and Tools for Hardware and Software Engineers." Addison-Wesley, 2002.
- **TLC:** Yu, Y., Manolios, P., & Lamport, L. "Model Checking TLA+ Specifications." CHARME 1999.
