| Notation | Meaning | Example |
|---|---|---|
== | Definition ("means") | Init == ... means "Init is defined as..." |
/\ | AND (logical conjunction) | A /\ B means "A and B" |
\/ | OR (logical disjunction) | A \/ B means "A or B" |
=> | Implication ("if...then") | A => B means "if A then B" |
~ | NOT (negation) | ~A means "not A" |
\in | Element of set | x \in S means "x is in S" |
\notin | Not element of set | x \notin S |
\cup | Union | S \cup T |
\subseteq | Subset | S \subseteq T |
\A | Universal quantifier (∀, for all) | \A t \in Threads : P(t) |
\E | Existential quantifier (∃, exists) | \E o \in Objects : ... |
| Notation | Meaning | Example |
|---|---|---|
VARIABLES x, y | State variable declaration | Constitute the model state |
x' | Value of x in the next state | x' = x + 1 means "increment x by 1" |
UNCHANGED <<x, y>> | x, y remain unchanged | No change in this action |
Init | Initial state definition | Specifies initial values for all variables |
Next | Next-state relation (transition) | Disjunction of possible actions |
[][Next]_vars | Always Next or stuttering | Temporal formula of the specification |
WF_vars(Next) | Weak fairness (enabled actions eventually execute) | Required for liveness |
| Notation | Meaning | Example |
|---|---|---|
[x \in S |-> e] | Function definition (returns e for each x in S) | [t \in Threads |-> 0] initializes all threads to 0 |
f[x] | Value of f at x | global_rc[o] is the RC of object o |
[f EXCEPT ![x] = e] | New function with only f[x] changed to e | [pc EXCEPT ![t] = "done"] |
@ | Current value inside EXCEPT | [f EXCEPT ![x] = @ + 1] means +1 |
[a |-> 1, b |-> 2] | Record (struct) | r.a for field access |
| Notation | Meaning | Example |
|---|---|---|
IF P THEN A ELSE B | Conditional expression | Standard if-then-else |
CASE p1 -> e1 [] p2 -> e2 | Pattern match | p1: if p1 is true then e1, if p2 then e2 |
LET x == e IN ... | Local definition | auto x = e; in C++equivalent |
CHOOSE x \in S : P(x) | Pick one element of S satisfying P | Nondeterministic choice |
| Notation | Meaning | Example |
|---|---|---|
CONSTANTS | Model parameters | Threads, Objects, MaxLocalRC |
0..N | 0to N integer set | 0..MaxVal |
Nat | Natural numbers | For type checking |
BOOLEAN | {TRUE, FALSE} | For type checking |
The CONSTANTS in a TLA+ spec define model parameters. Concrete values are specified in
.cfg files (TLC configuration files).
| Spec (.tla) | Config (.cfg) | Meaning |
|---|---|---|
CONSTANTS Threads, Objects | Threads = {t1, t2} | Set concrete values for model checking |
CONSTANTS EnableCAS, EnableRecycle | EnableCAS = TRUE | Feature enable/disable flags |
Increasing thread/object count causes exponential state space growth. Typically verified with 2-3 threads and 2 objects, with symmetry reduction.
| Notation | Location | Meaning |
|---|---|---|
CONSTRAINT | .cfg file | When a state violating this condition is reached, exploration beyond it is pruned. Used to make the state space finite. Not a safety check. |
INVARIANT | .cfg file | Checks that this condition holds in every reachable state. Violations are reported as bugs. |
Important distinction:
CONSTRAINT means "only explore within this range" (search restriction).
INVARIANT means "does this always hold?" (safety verification).
e.g., global_rc[o] <= 5 in CONSTRAINT: states where global_rc exceeds 5 are not explored (potential misses).
Same expression in INVARIANT: reported as an error if global_rc exceeds 5.
This model does not use StateConstraint.
State space is made structurally finite via C++ scope destruction and modular arithmetic,
exploring all reachable states without artificial pruning.
{a, b, c}{"scan", "cas", "idle"}Reading tips: A TLA+ action (e.g., ReserveScanCAS(t)):
is a declaration: "when these conditions hold, change variables as follows."
/\: all conjuncts must be true for the action to be enabled.
x'is the next state; UNCHANGED means no change.
2-layer architecture: KAME's TLA+ verification is two layers.
Layer 1 = atomic_shared_ptr (tagged-pointer protocol) plus commit-style primitives
(compareAndSet_impl_, scoped_atomic_view, drain release_tag_ref_).
The former "stm_commit" layer has been merged into the modern API (commit fd5b2522).
Layer 2 = bundle/unbundle + LL-free negotiate. Layer 1 is safety only;
liveness (livelock-freedom) is supplied by Layer 2's older-wins mechanism — see end of deck.
The full vocabulary of kamepoolalloc/atomic_smart_ptr.h under sequential consistency.
Complements GenMC (memory-ordering checks) by exhausting all thread interleavings:
acquire_tag_ref_ / load_shared_ / release_tag_ref_)release_tag_ref_(pref, added_global_rcnt) with cas_rcnt — frees K tags in a single CASload_shared_: refcnt.fetch_add(rcnt) + drain steals concurrent threads' tag IOUscompareAndSet_impl_<OldrT, NewrT, false, RETAIN_NEWR> template (Set / Swap)scoped_atomic_view<T> RAII: acquire → TagHeld → CAS set (SCOPED, fetch_sub(2)) / dtor| Element | TLA+ Representation | C++ Counterpart |
|---|---|---|
| Shared pointer | ptr + local_rc | m_ref (uintptr_t; upper=ptr, lower=tag) |
| Objects | Objects (Finite set) | Ref struct |
| Reference count | global_rc[o] | Intrusive o->refcnt (atomic) |
| Freed state | freed[o] | delete call |
| Thread state | pc[t], thr_*, scope_*[t] | Local variables + scope object |
VARIABLES
ptr, \* Shared pointer (Object or NULL)
local_rc, \* Local reference counter (tag)
global_rc, \* global_rc[o]: Global RC per object
freed, \* freed[o]: Freed flag
pc, \* Per-thread program counter
thr_op, \* "load", "cas", "idle"
thr_pref, \* Pointer read by acquire_tag_ref_
thr_rcnt, \* Local RC read by acquire_tag_ref_
thr_old, \* CAS: expected old object
thr_new, \* CAS: new object
thr_holds, \* Objects held via local_shared_ptr
thr_rtr_ctx, \* release_tag_ref_ return target (load_done / cas_retry / scope_release / ...)
iterBudget, \* Per-thread remaining-operations count
(* drain parameters for release_tag_ref_(pref, added_global_rcnt) *)
thr_cas_rcnt, \* Tag count to drain in one CAS
thr_added_grc, \* Global rcnt the caller pre-added
thr_drained, \* Output: how many tags were actually drained
(* scoped_atomic_view RAII *)
scope_state, \* scope_state[t] ∈ {"none", "tagheld"}
scope_pref \* The scope's m_pref
Modeling decisions:
(1) WEAK CAS spurious failure not modeled — strong-CAS covers a superset; weak is checked separately by GenMC RC11.
(2) ABA-by-recycling not modeled — the oldr->refcnt ≥ 1 shared-pointer ownership contract makes it structurally impossible.
(3) Scope "empty" / "owned" sub-states collapse to "none" — observationally equivalent.
m_ref.load() in C++is a single atomic operation; pointer and local RC are
always derived from the same loaded value. TLA+ models this as a single step.
(* Atomic read: read ptr and local_rc simultaneously *)
ReserveScanRead(t) ==
/\ pc[t] = "rs_read"
/\ thr_pref' = [thr_pref EXCEPT ![t] = ptr] \* ← read ptr
/\ thr_rcnt' = [thr_rcnt EXCEPT ![t] = local_rc] \* ← read local_rc (simultaneously)
/\ pc' = [pc EXCEPT ![t] = "rs_cas"]
(* CAS — succeeds only if both ptr and RC match *)
ReserveScanCAS(t) ==
/\ pc[t] = "rs_cas"
/\ thr_pref[t] /= NULL
/\ \/ (* CAS succeeds: both ptr AND local_rc match *)
/\ ptr = thr_pref[t]
/\ local_rc = thr_rcnt[t]
/\ local_rc' = thr_rcnt[t] + 1 \* Pinned!
\/ (* CAS fails: retry from read *)
/\ ~(ptr = thr_pref[t] /\ local_rc = thr_rcnt[t])
/\ pc' = [pc EXCEPT ![t] = "rs_read"]
Key point: Even if another thread changes ptr/local_rc between read and CAS, the CAS performs full-word comparison (both ptr + local_rc), so inconsistencies are always caught.
(* Step 1: acquire_tag_ref_ → Atomic Read + CAS to increment local RC [see previous slide] *)
(* Step 2: Increment global RC — safe with relaxed *)
ScanIncGlobal(t) ==
/\ pc[t] = "scan_inc"
/\ global_rc' = [global_rc EXCEPT ![thr_pref[t]] = @ + 1]
/\ pc' = [pc EXCEPT ![t] = "scan_leave"]
(* Step 3: release_tag_ref_ → decrement local RC *)
LeaveScanCAS(t) ==
\/ (* CAS succeeds: ptr unchanged → decrement local RC *)
/\ ptr = thr_pref[t] /\ local_rc = thr_rcnt[t]
/\ local_rc' = thr_rcnt[t] - 1
\/ (* CAS fails, ptr same → retry *)
/\ ptr = thr_pref[t]
/\ pc' = [pc EXCEPT ![t] = "ls_read"]
\/ (* CAS fails, ptr changed → fallback: global_rc-- *)
/\ ptr /= thr_pref[t]
/\ pc' = [pc EXCEPT ![t] = "ls_global"]
(* Fallback: the swapping thread already transferred the local RC *)
LeaveScanGlobal(t) ==
/\ global_rc' = [global_rc EXCEPT ![o] = @ - 1]
/\ IF global_rc[o] = 1
THEN freed' = [freed EXCEPT ![o] = TRUE] \* → freed!
This fallback is essential: without it, refcount leak or use-after-free. Confirmed by TLA+ verification.
| Phase | TLA+ Action | Operation |
|---|---|---|
| 1 | CASPreInc | Increment new object's global_rc++ |
| 2 | CASReserve→rs_* | Pin via acquire_tag_ref_ |
| 3 | CASCheck | pref != oldrthen return failure |
| 4 | CASTransfer | Transfer local RC to global_rc |
| 5 | CASSwap | CAS (acq_rel): Swap pointer |
| 6 | CASCleanup | Decrement old object's global_rc |
(* Phase 4: Transfer local RC to global *)
CASTransfer(t) ==
/\ IF thr_pref[t] /= NULL /\ thr_rcnt[t] /= 1
THEN global_rc' = [global_rc EXCEPT
![thr_pref[t]] = @ + thr_rcnt[t] - 1]
(* Phase 5: CAS — Full-word comparison *)
CASSwap(t) ==
\/ /\ ptr = thr_pref[t] /\ local_rc = thr_rcnt[t] \* match
/\ ptr' = thr_new[t]
/\ local_rc' = 0
\/ /\ ~(...) \* CAS fails → Undo → release_tag_ref_ → retry
Relation to ABA: Caller holds oldr via local_shared_ptr
→ global_rc > 0 → never freed → ABA is structurally impossible.
However, in load_shared_() the thread holds no reference, so TLA+ models ABA recycling as
a sound over-approximation.
Key difference from CAS: swap does not check the old pointer — it unconditionally
replaces it. No PreInc phase needed (ownership is exchanged, not added).
StartSwap(t) ==
/\ EnableSwap
/\ pc[t] = "idle"
/\ \E newObj \in Objects \cup {NULL} :
/\ (newObj /= NULL => thr_holds[t][newObj] > 0)
/\ thr_new' = [thr_new EXCEPT ![t] = newObj]
/\ pc' = [pc EXCEPT ![t] = "cas_acquire"] \* skip PreInc
/\ thr_op' = [thr_op EXCEPT ![t] = "swap"]
\* After acquire_tag_ref_ succeeds:
\* CASTransfer: global_rc[pref] += local_rc (tag→global transfer)
\* CASSwap: CAS ptr from pref to newObj, local_rc → 0
\* No CASCleanup needed (no PreInc to undo)
Reset(t) ==
/\ pc[t] \in {"idle", "done"}
/\ \E o \in Objects :
/\ thr_holds[t][o] > 0
/\ thr_holds' = [thr_holds EXCEPT ![t][o] = @ - 1]
/\ global_rc' = [global_rc EXCEPT ![o] = @ - 1]
\* decAndTest: if global_rc reaches 0, mark as freed
/\ freed' = [freed EXCEPT ![o] =
IF global_rc[o] = 1 THEN TRUE ELSE @]
When a swapper replaces the pointer while a loader has an in-flight
acquire_tag_ref_, the loader's release_tag_ref_ detects the pointer change
and falls back to global_rc decrement. The swapper's CASTransfer compensates
by folding the outstanding local_rc into global_rc. This tag-to-global
transfer path is the core of atomic_shared_ptr's correctness and is only
exercised when swap and load run concurrently.
fd5b2522)RAII type that holds an external scope's +1 tag ref on oldr for the lifetime of the scope.
Inside the scope, compareAndSet_impl_<scoped, ...> skips acquire (the scope already guarantees +1),
and on CAS success fetch_sub(2) absorbs both the m_ref release and the scope's tag-share.
ScopeAcquire(t) == \* scoped_atomic_view ctor
/\ scope_state[t] = "none"
/\ \* run acquire_tag_ref_ to grab a single tag
/\ scope_state' = [scope_state EXCEPT ![t] = "tagheld"]
/\ scope_pref' = [scope_pref EXCEPT ![t] = thr_pref[t]]
ScopeCASConsume(t) == \* compareAndSet_impl_<scoped> success
/\ scope_state[t] = "tagheld"
/\ \* step4 = +T (full add); on CAS success fetch_sub(2) consumes tag+ref simultaneously
/\ local_rc' = local_rc - 2
/\ scope_state' = [scope_state EXCEPT ![t] = "none"]
ScopeDestruct(t) == \* scoped_atomic_view dtor
/\ scope_state[t] = "tagheld"
/\ \* invoke release_tag_ref_(scope_pref, 1)
/\ thr_cas_rcnt' = 1
/\ pc' = [pc EXCEPT ![t] = "rtr_read"]
\* release_tag_ref_(pref, T) — drain min(local_rc, T) tags in one CAS
ReleaseTagRefCAS(t) ==
/\ pc[t] = "rtr_cas"
/\ \/ \* CAS success: ptr/local_rc match the read values
/\ ptr = thr_pref[t] /\ local_rc = thr_rcnt[t]
/\ LET drained == Min(thr_rcnt[t], thr_cas_rcnt[t]) IN
/\ local_rc' = thr_rcnt[t] - drained
/\ thr_drained' = drained
\/ \* CAS failure → retry or fallback to global path
/\ ~(ptr = thr_pref[t] /\ local_rc = thr_rcnt[t])
\* After drain: cancel excess global rcnt with fetch_sub
ReleaseTagRefExcess(t) ==
/\ LET excess == thr_added_grc[t] - thr_drained[t] IN
global_rc' = [global_rc EXCEPT ![pref] = @ - excess]
Why this matters: Drain release enables the bulk-transfer path of load_shared_.
Instead of releasing one tag at a time, the caller does refcnt.fetch_add(rcnt) to pre-secure global RCs,
then release_tag_ref_(pref, rcnt) drains multiple tags in one CAS — stealing concurrent threads' tag IOUs.
This shifts contention from m_ref to refcnt and improves throughput substantially (commit d93fc7de, +23%).
ABA is out of scope: The old Recycle action and EnableRecycle flag have been removed
(686460d5). The shared-pointer ownership contract (oldr->refcnt ≥ 1) keeps oldr alive,
making ABA-by-recycling structurally impossible for the tagged-pointer protocol. We declare it out of scope explicitly.
| # | Invariant | Meaning |
|---|---|---|
| 1 | TypeOK | Type invariant |
| 2 | MemorySafety | Object held with tag/ref is not freed |
| 3 | NoUseAfterFree | Objects in local_shared_ptr hands are not freed |
| 4 | GlobalRCNonNeg | For live objects: global_rc ≥ 0 |
| 5 | FreedImpliesZeroRC | If freed then global_rc = 0 |
| 6 | InstalledNotFreed | Object referenced by ptr is not freed |
| 7 | LocalRCBounded | local_rc ≤ 2 × |Threads| (each thread: 1 scope tag + 1 in-flight acquire) |
| 8 | QuiescentCheck | All-threads-idle implies freed[o] = (global_rc[o] = 0) |
| 9 | TerminalCheck | iterBudget=0 + no held refs + scope none → ptr has rc=1, others freed |
| 10 | ScopeConsistent | scope_state and scope_pref agree; tagheld → scope_pref not freed |
ScopeConsistent ==
\A t \in Threads :
\/ scope_state[t] = "none" /\ scope_pref[t] = NULL
\/ scope_state[t] = "tagheld"
/\ scope_pref[t] /= NULL
/\ freed[scope_pref[t]] = FALSE
EventuallyAllDone == <>AllDone fails at Layer 1.
TLC finds a lasso where two threads enter mutual CAS retry
(cas_retry → cas_acquire → atr_cas → ...).
Internal CAS retries do not consume iterBudget, so an adversarial scheduler can
sustain the loop indefinitely. This is the intended outcome:
livelock-freedom is supplied by Layer 2's priority older-wins (LL-free negotiate) mechanism,
not by the bare tagged-pointer protocol.
| cfg | Threads | MaxOps | Ops | States | Depth | Time | Result |
|---|---|---|---|---|---|---|---|
1thr_mc.cfg | 1 | 3 | load+CAS+swap+scope | 4,014 | 33 | <1s | ✅ Pass (complete) |
small_mc.cfg | 2 | 2 | load+CAS+scope | 613,990 | 69 | 12s | ✅ Pass (complete) |
all_mc.cfg | 2 | 3 | load+CAS+swap+scope (full) | 66,537,058 | 84 | 1h 30min | ✅ Pass (complete) |
3thr_cas_mc.cfg | 3 | 2 | load+CAS+scope | — | — | — | local run in progress |
live_mc.cfg | 2 | 2 | <>AllDone liveness | (lasso) | — | 40s | ❌ Fail (expected) |
Details: kamestm/tests/VERIFICATION.md §2
slides_layer2_LLfree.html)| TLA+ Variable | C++ (atomic_smart_ptr.h) | Generated C11 / GenMC test |
|---|---|---|
ptr + local_rc | m_ref (uintptr_t; upper=ptr, lower=tag) | _Atomic(uintptr_t) g_ref |
global_rc[o] | o->refcnt (intrusive atomic) | _Atomic(uintptr_t) obj_refcnt[] |
freed[o] | delete call | int obj_freed[] |
thr_pref[t] / thr_rcnt[t] | Ref* pref / Refcnt rcnt (local) | locals |
thr_holds[t][o] | local_shared_ptr reference count | thread-local hold index |
thr_cas_rcnt / thr_added_grc / thr_drained | release_tag_ref_(pref, added_global_rcnt) cas_rcnt arg | cds_test_multi_cas_excess.c |
scope_state[t] / scope_pref[t] | scoped_atomic_view<T> m_pref + state | cds_test_scoped_weak.c |
| TLA+ Action | C++ Function (atomic_smart_ptr.h) | memory_order |
|---|---|---|
AcquireTagRefRead + AcquireTagRefCAS | acquire_tag_ref_() (lines 1058-1108) | relaxed → acq_rel |
LoadSharedBulkAdd (new) | load_shared_() bulk transferrefcnt.fetch_add(rcnt) + drain (lines 1116-1128) | relaxed |
ReleaseTagRefCAS + ReleaseTagRefExcess | release_tag_ref_(pref, T) drain (lines 1158-1206)min(local_rc, T) tags in one CAS + excess fetch_sub | acq_rel |
CASPreInc → CASSwap → CASCleanup | compareAndSwap_() 6 phases (legacy, lines 550-650) | relaxed / acq_rel |
compareAndSet_impl_ path | Unified template compareAndSet_impl_<OldrT, NewrT, false, RETAIN_NEWR> (lines 1240-1450)One implementation for both Set and Swap | relaxed / acq_rel |
compareAndSet_impl_<SCOPED> path | Scope hold avoids acquire; step4 = +T (full add); on success fetch_sub(2) | acq_rel |
ScopeAcquire → ScopeCASConsume / ScopeDestruct | scoped_atomic_view ctor / CAS / dtor (lines 598-845) | acq_rel |
Reset | local_shared_ptr::reset() fetch_sub(1) + delete check (lines 433-444) | acq_rel |
Verification triangle:
C++ implementation (written by author) ← author cross-references independently → TLA+ spec (sequential consistency)
TLA+ → generated C11 (tests/tlaplus/test_*.c) → GenMC (RC11 memory order)
Plus author-written C++-derived tests (tests/cds_atomic_shared_ptr/cds_test_*.c) verified independently by GenMC.
The triangle plus the independent tests together confirm TLA+ model fidelity.