Figma System Design
TL;DR
Figma made a browser tab feel like a native design tool shared by a whole team. Three load-bearing decisions: a C++ engine compiled to WebAssembly rendering to WebGL (the browser is the distribution, not the architecture); multiplayer sync that is deliberately not a full CRDT — every open file is owned by exactly one stateful multiplayer server process that applies per-property last-writer-wins with server-arbitrated ordering, because design files tolerate property-level LWW but not text-style merge anomalies; and a single Postgres-centered metadata plane that survived hypergrowth through a now-canonical sequence — vertical splitting, then application-coordinated horizontal sharding behind a query proxy (colos, DBProxy) without changing the application's mental model. The overall lesson: pick the weakest consistency machinery your domain truly needs, and put a single writer wherever you can get away with it.
Core Requirements
Functional
- Real-time co-editing — many cursors in one file, every change visible to everyone in ~100ms
- Design-tool fidelity — vector rendering, huge documents, 60fps pan/zoom in a browser
- Offline tolerance — keep editing through disconnects; reconcile on reconnect
- File organization — teams, projects, permissions, comments, version history
- Plugins & embeds — third-party code against live documents
Non-Functional
- Latency — local edits apply instantly (optimistic); remote propagation ~100ms
- Convergence — all participants reach the same document state, always
- Durability — no lost work; recoverable history (checkpoints)
- Scale — millions of files; each file's session fits one server, but there are very many files
Multiplayer: One File, One Server, Property-Level LWW
The defining choice. A Figma document is a tree of objects (frames, shapes, text nodes), each object a bag of properties. Concurrent edits are resolved like this:
- Granularity is the design. Conflicts resolve per (object, property) — two designers editing the same rectangle's
fillrace (last write wins, server order), but one editingfillwhile another editswidthmerge perfectly. In a design tool, simultaneous same-property edits are rare and visually self-correcting, so LWW's "lost update" is a non-event — the Conflict Resolution trade made consciously. - Why not a full CRDT? Figma's team studied them and kept only the parts they needed: CRDTs buy convergence without a central authority, but Figma always has one (the file's server), so it can take the simpler model, a total order, and far less metadata. Two genuinely CRDT-ish techniques survive where LWW fails: object identity (IDs, not array indexes, so concurrent inserts don't collide) and fractional indexing for child order — position is a real number between neighbors; concurrent reorders converge without renumbering (CRDTs and Collaborative Editing covers the spectrum this point sits on).
- Tree-structure edge cases get server arbitration: reparenting an object whose parent was concurrently deleted, cycle prevention — cases where property-wise merging is semantically wrong and the single writer simply decides.
- Offline = client keeps its op log, replays against the server's current state on reconnect; the server's order is truth, the client reconciles (the sync-engine shape). Undo is local-intent undo, computed against your own ops.
- The file server is a single-writer partition: routing pins file → process (cell-router thinking at per-file granularity); crash recovery = reload last checkpoint + replay the tail (WAL logic at the application layer).
Rendering: The Browser Is a Deployment Target
The editor is a C++ scene-graph-and-renderer compiled to WebAssembly, drawing via WebGL/WebGPU — not DOM, not SVG. Documents are loaded into a compact binary format; rendering is a tile-based pipeline with culling, much closer to a game engine than a web app. Systems consequences: deterministic performance independent of DOM diffing; one engine shared across browser/desktop (Electron)/mobile; and the multiplayer protocol speaks the engine's compact object model rather than JSON trees — bandwidth and GC pressure stay bounded even on 100MB documents.
The Metadata Plane: Postgres, Stretched Then Sharded
Files are blobs + op logs; everything else — users, teams, file metadata, permissions, comments — lived in one Postgres instance far longer than folklore says is possible. The scaling sequence (told across Figma's 2023–24 engineering posts) is a reusable playbook:
- Buy headroom first: bigger boxes, read replicas, PgBouncer pooling-style connection discipline, query tuning — boring moves that deferred architecture for years.
- Vertical partitioning: peel high-traffic table groups into their own Postgres instances ("colos"), chosen so cross-group joins/transactions are rare — a domain decomposition, not a data one.
- Horizontal sharding, application-transparent: for the tables that still outgrew a box, shard by a small set of keys, fronted by DBProxy — a query-engine-aware proxy that parses SQL, routes by shard key, scatter-gathers the few cross-shard queries, and rejects the patterns sharding can't honor. Critically: logical sharding before physical (views simulating shard boundaries on one box to validate the application), then dual-write/verify cutover per table group.
- The principles they stated outright: shard as little as possible, keep the relational model and transactional islands per shard (Database Sharding), and make the proxy — not 500 call sites — own routing.
Lessons
- Choose consistency machinery by domain, not fashion. Property-LWW + object identity + fractional indexing covers a design tool; full text-CRDT machinery would add metadata and anomaly classes for no product benefit. The weakest sufficient model wins.
- A single writer per natural unit (the file) deletes whole problem classes — conflict resolution, distributed locking, fan-in ordering — at the price of a routing layer and per-unit capacity ceilings you must monitor.
- WASM changed the boundary of "web app": shipping the same native engine everywhere collapsed platform divergence — an architecture decision disguised as a performance one.
- The Postgres saga is the modern default path: vertical split → logical shard rehearsal → proxy-mediated horizontal shard, each step reversible. Distributed-database rewrites are the move of last resort.
- Multiplayer is a product feature with an architecture bill: presence, cursors, and instant feedback (Presence, WebSockets) ride the same session server — co-locating them with document authority is what makes them cheap.
References
- How Figma's multiplayer technology works — the LWW/CRDT reasoning, firsthand
- Realtime editing of ordered sequences (fractional indexing)
- Building a professional design tool on the web (WebAssembly engine)
- The growing pains of database architecture (vertical partitioning) and How Figma's databases team lived to tell the scale (DBProxy sharding)
- CRDTs and Collaborative Editing — the pattern article this case study grounds