Skip to content

FinOps and Cost Engineering

TL;DR

Cloud moved infrastructure spend from a procurement decision to an engineering decision made implicitly, thousands of times a day, by whoever writes the code — FinOps is the discipline of making those decisions visible and deliberate. The practice: measure unit economics (cost per request, per tenant, per feature) rather than staring at the total bill; allocate spend via enforced tagging so every dollar has an owner; attack the levers in efficiency order — turn it off, right-size, tier the storage, mind the egress, then rate-optimize with commitments and spot; and wire cost into the engineering loop the way you wired latency: cost-diff estimates on infrastructure PRs, anomaly alerts within hours not at month-end, and per-tenant attribution feeding pricing decisions. Cost is just another operational signal — treat it with the same machinery as SLOs, and remember the goal is not minimal spend but maximal margin: unspent efficiency, like unspent error budget, is wasted velocity.


Unit Economics Beat Bill Watching

A monthly bill of $480K is uninterpretable — terrifying when revenue is flat, excellent when usage tripled. The signal lives in ratios:

unit cost   = spend attributable to a workload / units of value it produced
            = $ / 1K requests · $ / active tenant · $ / GB ingested ·
              $ / model training run · $ / 1M tokens served

Unit costs separate growth (bill up, unit cost flat — fine, that's success) from regression (unit cost up — something got less efficient) and make engineering trade-offs commensurable: "this cache tier costs $8K/month and cuts cost-per-request 22% while halving p99" is a sentence both finance and engineering can evaluate (Caching decisions are FinOps decisions). Pick 3–5 unit metrics that mirror your journey-level SLOs, trend them per service, and alert on their derivative.

Allocation: every dollar needs an owner

Unit economics require knowing which spend belongs to which workload — the unglamorous foundation:

  • Tag at creation, enforce in CI: team, service, environment, tenant-tier on every resource, validated by IaC policy (untagged = blocked at plan time, not lamented at month-end). GitOps makes this enforceable because all resources flow through reviewed code.
  • Shared platforms need metering: Kubernetes clusters, data platforms, and internal ML serving are one line on the bill but many consumers — allocate by requested resources (requests reserve capacity whether used or not; OpenCost-style allocation), and meter multi-tenant services per tenant (the same tenant-tagged metrics you built for noisy-neighbor analysis double as the cost meter; whale-tenant gross margin is a number your pricing team needs).
  • Accept imperfection structurally: shared costs (NAT, observability, support plans) get a published split rule (proportional to direct spend is fine). An 85%-allocated bill with clear ownership beats a 100%-allocated one nobody trusts.
  • Mind the feedback latency: billing data lags hours-to-a-day; anomaly detection on provider cost APIs + your own usage metrics catches the runaway training job today instead of on the invoice (Alerting: cost anomalies page the owning team, scaled by burn rate — a 10×-normal hourly burn is an incident).

The Levers, In Order

Efficiency before rates: optimizing the price of waste is still waste.

#LeverMechanismTypical impact
1Turn it offIdle dev/staging nights+weekends, zombie resources (unattached volumes, idle LBs, forgotten snapshots), scale-to-zero for spiky internal tools10–30% of many bills is nothing
2Right-sizeFit instances/requests to observed p95 usage, not founding-era guesses; one size class down ≈ −30–50% on that fleetContinuous, automatable
3Storage lifecycleHot → infrequent → archive policies; snapshot/log retention limits; compress + columnar (Parquet)Storage grows monotonically unless told otherwise
4Egress & topologyCross-AZ and cross-region traffic, NAT processing, internet egress — the silent line items. Co-locate chatty services; cache at the edge (CDN); move compute to data, not data to computeOften the most shocking audit finding
5CommitmentsReserved/savings plans for the measured baseline (~60–80% coverage; review quarterly)−30–60% on committed compute, zero code changes
6Spot/preemptibleInterruption-tolerant work: batch, CI, stateless fleets with headroom, training with checkpoints — i.e., workloads you already made idempotent and resumable−60–90% on eligible compute
7ArchitectureTiered tenancy (pool the long tail), batch over per-event processing where latency allows, async over sync chains, ARM/efficiency siliconThe compounding, slow lever

Two notes on the table. Commitments are a forecasting bet — commit to the floor you're sure of, cover spikes on-demand/spot; over-commitment converts the discount into lock-in. Spot is an architecture test: if interruption with a 2-minute warning breaks the workload, that fragility was already a reliability bug (Retries, checkpointing) that spot merely prices.

The LLM-era addendum

Token spend is the fastest-growing line on many 2026 bills, and it behaves like a utility: cost per solved task is the unit metric (LLM Evaluation), and the levers have their own ranking — prompt-cache hit rate first, model tiering second, batch-tier routing for async work, output-length discipline, then provider negotiation (LLM Infrastructure and Harness Engineering cover the mechanics). The FinOps Foundation's scope extension to SaaS/AI spend reflects the same shift: the bill you can engineer is no longer only the IaaS bill.


Wiring Cost Into the Engineering Loop

The cultural failure mode is cost-as-quarterly-cleanup: a heroic audit, 25% savings, regrowth within two quarters. The fix is the same as for quality and reliability — move the signal to where decisions happen:

  • Cost-diff on PRs: infrastructure changes show their monthly delta in review, exactly like a bundle-size or coverage check. A reviewer who sees "+$4,200/mo" asks questions a month-end report never provokes.
  • Budgets as SLOs: each service gets a unit-cost target and an absolute guardrail; breaches open tickets through the normal incident/error-budget machinery, not a finance email thread. (And symmetrically — chronically under target with slipping latency SLOs means you over-optimized; spend it.)
  • Showback before chargeback: publish per-team dashboards first (visibility changes behavior on its own); move to internal billing only where incentives genuinely need teeth — chargeback wars over allocation rules can cost more attention than they save money.
  • Forecast architecture, not just trends: the big cost events are step functions — a new feature, a tenant 10× the median, a region addition (multi-region roughly doubles infrastructure as a planned line item, not a surprise).
  • Make the efficient path the default path: golden IaC modules with lifecycle policies, autoscaling, and right-sized defaults baked in beat any amount of after-the-fact policing — platform engineering is where FinOps compounds.

Checklist

  • [ ] 3–5 unit-cost metrics defined, trended per service, alerted on derivative
  • [ ] Tagging enforced at IaC plan time; shared platforms metered (K8s by requests, multi-tenant services by tenant)
  • [ ] Cost anomaly detection on hours-latency data, paging the owning team
  • [ ] Idle/zombie sweep automated; right-sizing recommendations applied on a cadence
  • [ ] Storage lifecycle + retention policies on every bucket, log group, and snapshot chain
  • [ ] Egress/cross-AZ topology reviewed; chatty services co-located
  • [ ] Commitments cover the measured baseline only; coverage reviewed quarterly
  • [ ] Spot adopted for interruption-tolerant tiers (and the interruption-tolerance actually tested)
  • [ ] Cost-diff visible on infra PRs; unit-cost budgets wired to the incident process
  • [ ] Per-tenant cost attribution feeding pricing/margin decisions; LLM cost-per-solved-task tracked

References

A practical reference for distributed system design. Released under the MIT License.