CRDTs and Collaborative Editing

TL;DR

Collaborative editing is concurrent writing with no time to coordinate: every keystroke applies locally and instantly (zero-latency optimism), then replicas must converge to the same result despite edits arriving in different orders. Two families solve it. Operational Transformation (OT) rewrites operations against the concurrent ones a central server has already accepted — proven (Google Docs), but correctness lives in fiendish transform functions and a required server ordering. CRDTs (Conflict-free Replicated Data Types) design the data itself so that merges are commutative, associative, and idempotent — replicas converge mathematically, with no central transformer, which is what makes offline-first and peer-to-peer sync possible (Yjs, Automerge). Production systems are pragmatic hybrids: Figma uses server-arbitrated last-writer-wins per property; text editors embed sequence CRDTs; and everything still runs through a server for persistence, presence, and permissions. Use CRDTs for data shaped like sets, counters, registers, and text; keep global invariants (inventory, balances) on a single writer — convergence is not correctness.

The Problem: Concurrency Without Coordination

Two users, same document, same moment:

Start:  "color"
Alice (offline, position 5): insert "s"      → intends "colors"
Bob   (position 0):          insert "the "   → intends "the color"

Apply both naively by position and one replica gets "the colosr" — Alice's index 5 meant her document, not Bob's. Locking is off the table (every keystroke would pay a round trip; offline users could never edit), so the system must accept concurrent operations and guarantee strong eventual consistency: any two replicas that have seen the same set of operations are in the same state, regardless of arrival order (Consistency Models). That guarantee has exactly two industrial solutions.

Path 1: Operational Transformation

Keep operations index-based, but transform each incoming operation against the concurrent ones already applied:

The transform function T(op1, op2) answers: how does op1 change if op2 already happened? For plain text insert/delete this is tractable; add rich-text attributes, tables, and nested structures and the case explosion is notorious — several published OT algorithms were later proven wrong, and the practical fix (used by Google Docs, derived from the Jupiter system) is a central server that serializes operations so each client only ever transforms against one ordering. That works extremely well online, at the cost of: no server, no merge — long-offline edits and P2P are awkward, and the server is a correctness component, not just a relay.

Path 2: CRDTs

Flip the burden from operations to data structure design: make the merge function itself order-insensitive. A state-based CRDT needs merge(a, b) to be commutative, associative, and idempotent — then replicas can gossip states in any order, with duplicates, and still converge.

python

class GCounter:
    """Grow-only counter: one slot per replica; merge = element-wise max."""
    def __init__(self):
        self.slots: dict[str, int] = {}

    def increment(self, replica: str):
        self.slots[replica] = self.slots.get(replica, 0) + 1

    def value(self) -> int:
        return sum(self.slots.values())

    def merge(self, other: "GCounter"):
        for r, n in other.slots.items():
            self.slots[r] = max(self.slots.get(r, 0), n)   # idempotent, commutative


class ORSet:
    """Observed-remove set: add wins over concurrent remove.
    Each add gets a unique tag; remove deletes only the tags it has SEEN."""
    def __init__(self):
        self.adds: dict[str, set[str]] = {}      # element -> live tags

    def add(self, elem: str):
        self.adds.setdefault(elem, set()).add(uuid4().hex)

    def remove(self, elem: str):
        self.adds.get(elem, set()).clear()        # clears observed tags only

    def contains(self, elem: str) -> bool:
        return bool(self.adds.get(elem))

    def merge(self, other: "ORSet"):
        for elem, tags in other.adds.items():
            self.adds.setdefault(elem, set()).update(tags)

The standard toolbox: PN-Counter (increment/decrement as two G-Counters), LWW-Register (last-writer-wins by timestamp — convergent, but concurrent writes lose data silently; see Conflict Resolution), OR-Set (add-wins semantics chosen deliberately), and sequence CRDTs for text. CRDTs don't eliminate conflict semantics — they force you to choose them up front (add-wins or remove-wins? LWW or multi-value?) and then guarantee everyone computes the same answer.

Text: sequence CRDTs

The text trick is replacing fragile integer indexes with stable unique identifiers per character. Each character gets an ID (replica, counter) and a reference to its left neighbor at insertion time; deletes leave tombstones rather than shifting positions. Alice's "insert s after the character with ID (a,5)" means the same thing on every replica forever — no transformation needed. The engineering history since the early designs (Logoot, RGA, and successors like YATA/Yjs and Automerge's columnar encoding) is mostly about taming the costs: metadata per character (solved with run-length blocks of consecutive runs), tombstone growth (periodic compaction once all replicas have seen the delete), and interleaving anomalies — two users typing concurrently at the same spot getting their words shuffled character-by-character — which naive position schemes exhibit and modern algorithms specifically order around. The practical takeaway: never hand-roll a sequence CRDT; use Yjs or Automerge, whose data structures make documents with millions of edits load in milliseconds.

OT vs CRDT

	OT	CRDT
Correctness lives in	Transform functions (hard, historically buggy)	Data-type merge laws (provable, local)
Server	Required, in the correctness path	Optional relay/persistence
Offline / P2P	Awkward	Native — merge is the whole model
Metadata overhead	Low (plain ops)	Per-element IDs + tombstones (engineered down)
Rich text / trees	Mature in products	Mature libraries (Yjs types, Automerge); JSON CRDTs
Semantic intent	Transform can encode richer intent	Merge is structural; intent must fit the type's law
Used by	Google Docs, classic Office co-editing	Figma-adjacent tools, Linear-style sync engines, Apple Notes, multiplayer libraries

The Sync Engine: What Production Actually Ships

A merge algorithm is maybe 20% of a collaborative product. The architecture around it:

Local-first write path: apply to the local store instantly, queue the op, sync when connected (WebSockets for the live channel; the queue survives offline periods). The UI never waits on the network — that's the entire feel of "multiplayer."
Persistence = snapshot + compacted log: replay-from-genesis doesn't scale; periodically materialize a snapshot and garbage-collect tombstones/ops that every known replica has acknowledged — which requires tracking replica liveness, the quietly hard part of long-lived documents.
Presence is a separate, ephemeral channel. Cursors, selections, and "who's here" change at 10× document rate and must not pollute document history (Presence).
The server still rules. Even with CRDTs, production servers enforce auth and per-document permissions (Authorization), validate schema, rate-limit, and arbitrate the things math can't: Figma's multiplayer deliberately uses server-side last-writer-wins per object property instead of full CRDTs — concurrent edits to the same property are rare in design tools, the server provides a total order anyway, and the simplification buys them throughput and debuggability. Linear-style sync engines similarly center on a server-ordered op log with client-side optimistic application. The lesson: pick the weakest convergence machinery your conflict patterns actually require.
Undo must be local-intent undo — undoing my last edit, not the global last edit — which both OT and CRDT libraries support but applications must wire deliberately.

Where CRDTs are the wrong tool

Convergence ≠ invariants. A CRDT happily converges both replicas to a state where the last concert seat sold twice — agreeing on the overbooking. Anything with global constraints (inventory, balances, uniqueness) needs a serialization point: single-writer per key, consensus, or transactions. CRDTs shine where the merge semantics are the business semantics: documents, whiteboards, sets/flags/counters at the edge, shopping carts (the original Dynamo case), and offline-tolerant mobile state.

References

A comprehensive study of Convergent and Commutative Replicated Data Types — Shapiro et al., 2011; the founding taxonomy
Interleaving anomalies in collaborative text editors — Kleppmann et al.; why naive sequence CRDTs shuffle your words
Yjs and Automerge — production CRDT libraries; their docs are excellent systems reading
How Figma's multiplayer technology works — the deliberate not-quite-CRDT design
High-latency, low-bandwidth windowing in the Jupiter collaboration system — the OT architecture behind Docs-style editors
Local-first software — Ink & Switch; the architectural philosophy CRDTs enable

CRDTs and Collaborative Editing ​

TL;DR ​

The Problem: Concurrency Without Coordination ​

Path 1: Operational Transformation ​

Path 2: CRDTs ​

Text: sequence CRDTs ​

OT vs CRDT ​

The Sync Engine: What Production Actually Ships ​

Where CRDTs are the wrong tool ​

References ​