01

Reversibility, formally specified.

A reference architecture for systems that can undo — rollback databases, event-sourced ledgers, version control engines, and fault-tolerant pipelines.

DEF PRIM-001

Forward operation

A deterministic state transition f(s, e) → s' that advances the system from state s to s' by applying event e.

committed
DEF PRIM-002

Inverse operation

A paired transition f-1(s', e) → s that recovers the prior state. Existence of f-1 defines reversibility.

pending
DEF PRIM-003

Causal lineage

Directed acyclic ordering ei ≺ ej identifying the dependency closure that must be undone alongside any reverted event.

committed
0canonical primitives
0tracked operations
0checkpoint slots
0isolation levels
02

Primitives

The four atomic constructs from which every reversible system is composed. A complete implementation must specify each.

Checkpoint

Periodic snapshot of the full state vector. Recovery distance bounded by checkpoint cadence Δtc.

checkpoint(s) → σ

Journal

Append-only log of intent records. Replayable in either direction; the foundation of write-ahead and event-sourced systems.

append(j, e) → j'

Lineage

Causal dependency graph connecting events. A revert must include the transitive closure of dependents to remain consistent.

deps(e) → G′

Compensation

Forward-only inverse for non-reversible side effects (network calls, payments). Records a counteracting transaction rather than rewinding.

comp(e) → e′
03

System architecture

An isometric view of the canonical write-and-replay pipeline. Boxes are subsystems; dashed lines are pending or asynchronous channels.

CLIENT writer tx.begin()
ROUTER coordinator 2pc
JOURNAL wal.log append-only
STORE state.db page-versioned
SNAPSHOT checkpoint.σ Δt = 60s
REPLAY undo.engine backward
RESTORE recover.run forward
OBSERVER audit.tail subscribe
FIG. 03-A — canonical write & replay pipeline. Solid lines: synchronous; dashed: asynchronous.
04

Comparative approaches

A specification table of the four dominant rollback strategies. Cells are indicative; constants depend on workload.

Approach Granularity Recovery cost Storage Reversible State
checkpoint & restart full state O(σ) O(n · |s|) partial committed
WAL replay (ARIES) page O(j) O(j) full committed
event sourcing event O(j) + projection O(j) full pending
compensation (sagas) transaction O(k) O(k) logical pending
copy-on-write page/object O(1) ptr swap O(Δ) full committed
shadow paging page tree O(log n) O(Δ) full deprecated
05

Operation timeline

A trace through ten ordered events. Click an event to highlight its transitive lineage.

    SELECTED · eₐ₀₀₁ | committed | deps: ∅
    06

    Failure model

    Reversibility is only meaningful under a defined failure envelope. The matrix below enumerates supported and unsupported faults.

    F-101

    Process crash

    Recoverable via journal replay from the last checkpoint σ. Bounded recovery time O(j).

    F-102

    Power loss (clean)

    Durable WAL guarantees no committed transition is lost. Re-attach and redo.

    F-201

    Network partition

    Quorum-based coordinator stalls writes; reads degrade gracefully under chosen consistency level.

    F-202

    Side-effect leak

    Compensating transactions emitted; cannot fully rewind external state, but ledger reaches consistency.

    F-301

    Byzantine corruption

    Out of scope. Requires cryptographic lineage and external attestation.

    F-302

    Storage device loss

    Unrecoverable without off-host replica. Mitigation: synchronous replication to remote journal.

    07

    References

    Selected literature establishing the formal basis of reversibility in distributed systems.

    • [1] Mohan, C. et al. ARIES: A Transaction Recovery Method Supporting Fine-Granularity Locking and Partial Rollbacks Using Write-Ahead Logging. ACM TODS, 1992.
    • [2] Gray, J., Reuter, A. Transaction Processing: Concepts and Techniques. Morgan Kaufmann, 1993.
    • [3] Garcia-Molina, H., Salem, K. Sagas. SIGMOD, 1987.
    • [4] Kleppmann, M. Designing Data-Intensive Applications. O'Reilly, 2017.
    • [5] Lamport, L. Time, Clocks, and the Ordering of Events in a Distributed System. CACM, 1978.