Skip to main content
mfen.de

Jan 28, 2026 · infra · Evergreen · 15 min read

Shipping platform changes safely with preview environments

Preview environments can be great. They can also be wasteful and expensive, and they can trick you into thinking you’re “safer” while you’re mostly just running another pipeline.

I’m not a believer in “spin up an entire Azure subscription / AWS account / GCP project for every PR” as the default. You can do that, and sometimes you probably should, but as a baseline it’s an incredible amount of operational and financial weight to attach to every diff.

The hard part isn’t “can we provision an environment?” The hard part is: what do you put in it, what does it prove, and what compromises are you quietly making?

My default is: be honest about what’s changing, and pick the cheapest validation that actually increases confidence.

I’m talking about both platform repo diffs (Terraform, clusters, control plane changes) and service delivery pipelines. The decision logic is similar, even if the tooling differs.

Two quick definitions, because people tend to mix these up:

  • Service previews: deploy the app (and enough dependencies) to answer “does this behave end-to-end?”
  • Platform previews: validate infra/control plane changes with plan artifacts and sandbox applies to answer “will this lock us out or change blast radius?”

TL;DR

Preview environments are a tool, not a default. Start with the diff and pick the cheapest validation that actually buys confidence. Most PRs don’t need a preview; they need better plans, better tests, and more deterministic pipelines.

Use previews when the change is inherently runtime-y (migrations, edge auth defaults, controllers/CRDs, dependency-heavy behavior), and if you build them, treat them like a product (TTL, auth at the edge, per-preview secrets, quotas, and teardown that revokes credentials). Previews reduce pre-merge risk; progressive delivery limits post-merge blast radius.

Anti-goals (when previews are mostly placebo)

If your preview environment doesn’t make a decision easier or safer, it’s probably just an expensive ritual.

Common “placebo preview” shapes:

  • It’s “green” but it doesn’t include the dependencies that actually break you (identity, data, networking).
  • It rebuilds the world for changes that are deterministic in CI (docs, formatting, small refactors).
  • It runs against shared staging services in a way that can’t be isolated (every PR becomes an incident generator).
  • It’s “green” because it quietly talks to prod (shared queues, prod APIs, prod identity).
  • It’s so slow people don’t use it, so it exists mainly to look “mature”.

A five-line decision tree

  1. Docs/formatting/static config only? → no preview.
  2. IaC/policy change? → plan + policy checks; preview only if lockout risk or irreversible/high-blast changes.
  3. Service/runtime behavior change? → unit/integration tests first; preview when real dependencies or traffic paths are the unknown.
  4. DB migrations/stateful changes? → CI with ephemeral DB; preview when sequencing and runtime coupling are the unknown.
  5. Controllers/CRDs/operators/control plane? → test cluster + preview (skew/upgrade paths bite).

Decision tree is a first pass; the matrix below is for nuance.

Start with “what changed?”

If your pipeline treats every PR like a full rebuild of the world, you’re leaving a lot of signal on the floor.

I try to answer these questions up front:

  1. Is this a pure Terraform change? (modules, variables, policies, IAM, networking)
  2. Is this a Kubernetes manifest / Helm change? (deployments, services, RBAC, policies)
  3. Does this touch control plane things? (CRDs, admission, controllers, CNI policies)
  4. Does it touch the stuff that only fails at runtime? (migrations, stateful rollouts, weird dependencies)
  5. Does it change blast radius? (authn/authz defaults, network boundaries, “who can reach what”)

Most of the time, you can get very far without provisioning anything new.

A quick decision matrix

This is intentionally reductive, but it helps avoid “preview everything” by default:

Change surfaceCheapest confidencePreview worth it?Preview meansNotes
Pure Terraform (IAM/policy/network)plan + policy checks + drift awarenesssometimesapply in sandbox + promotea long-lived sandbox account is often better than per-PR previews
K8s/Helm app changestemplate + schema validate + unit/integration testssometimesdeploy app + deps (ns)previews pay off when runtime deps matter
Edge authn/authz defaultspolicy tests + canary planoftenshadow eval + canary routelockout risk is high and hard to simulate
DB migrations/stateful changesmigration tests + ephemeral DB in CIoftenrun migration + smoke testspreviews help catch runtime coupling and sequencing
Controllers/CRDs/operatorskind/k3d test cluster + skew testingoftenupgrade rehearsal + skewCRD lifecycle and skew are where “green CI” lies

What “preview” means for platform/IaC changes

For platform changes, a “preview” is rarely “spin up a whole new account per PR”. The pattern that tends to stay boring:

  • Prefer a long-lived sandbox account/project/cluster with prod-like guardrails and monitoring (including drift awareness).
  • Produce speculative plan artifacts (terraform plan, cluster upgrade plans, policy diffs) that humans can review.
  • Apply in sandbox, run the right smoke checks, then promote using the same artifacts/pipeline shape you’ll use for prod.

Reach for per-PR platform previews only when a sandbox isn’t enough to buy confidence — for example, upgrades with lots of moving parts (new cluster version + addons + controllers) or changes with unusual blast radius.

The checks that carry their weight (pre-commit, CI, and “diff confidence”)

Before you build a preview platform, squeeze the boring validation as hard as you can:

  • terraform fmt / validate and a speculative terraform plan against real remote state.
  • Policy checks where you have them (OPA/Sentinel/whatever your org runs).
  • helm template / helm lint + schema validation (kubeconform, etc.).
  • Server-side dry runs / diffs where possible (kubectl diff, SSA dry-run).
  • Unit and integration tests that actually exercise the change surface.

Don’t rerun the plan you just reviewed

One small thing that saves a lot of confusion: if you apply Terraform as part of a preview flow, don’t silently generate a brand new plan at apply time.

Review a plan artifact, then apply that. If the world changed underneath you and the plan can’t be applied, that’s a real signal.

This assumes the apply job uses the same Terraform and provider versions (and compatible backend context). Treat plan files as artifacts tied to that execution environment.

The point isn’t dogma. It’s that a lot of “preview environments” are compensating for missing CI signal. If a local pre-commit hook plus CI can prove the change is safe, don’t pay the preview tax.

No preview, still safe (what replaces it)

If you don’t build previews by default, you need something else to carry risk. The “no preview” bundle I trust looks like this:

  • Deterministic CI: pinned tool versions, reproducible builds, no “works on my machine” drift.
  • Contract tests: verify the interface to shared dependencies (APIs, queues, identity) without needing a full environment (for example: schema tests for an API, “expected OIDC claims” tests, message format checks for queues).
  • Real integration tests where it matters: run a DB in CI for migration tests, run the controller in a test cluster, etc. (for example: ephemeral Postgres in CI for migrations, kind/k3d for controller tests with golden manifests).
  • Progressive delivery: canary, staged rollouts, and automatic rollback based on real signals.
  • Feature flags and dark launches: separate deploy from release; measure before you expose.

Previews reduce pre-merge risk. Progressive delivery reduces post-merge risk. If you only do previews, you’re still gambling after merge.

The dependency problem (and why previews get weird)

The moment you try to make a preview environment meaningful, you run into the part people skip in the pitch deck:

Your service is not your service. It’s your dependencies.

Databases, caches, queues, object storage, IAM, third-party APIs, identity providers, and “the one internal service nobody owns but everything calls”.

You can always provision “an environment”. The question is which of these things are real, which are shared, and which are faked.

There’s no free lunch, but the trade-offs are predictable:

  1. Full isolation: separate account/project/subscription per preview.
  2. Namespace isolation: same cluster/account, separate namespace per preview.
  3. Shared staging: one long-lived staging env, feature flags and careful change control.
  4. No preview: local + CI, maybe a small sandbox used by humans when needed.

Full isolation is the cleanest story for security boundaries and “this might nuke everything” changes. It’s also the most expensive and operationally heavy. Namespace isolation is the common compromise. Shared staging is cheap but politically hard. “No preview” is underrated.

The rest of this post is basically about making those compromises explicit.

Databases: choose your pain

If your preview environments touch a database, you’re immediately in “what does this prove?” territory.

A preview that runs against a totally fake database proves your code compiles and your YAML is valid. It does not prove runtime behavior. A preview that runs against a shared staging database proves something, but it can also turn every PR into a migration incident.

The options I see most often:

Option A: Managed DB, per-preview database or schema (common compromise)

You keep the managed database (RDS/Cloud SQL/Azure DB) and carve isolated slices per preview:

  1. Create a per-preview database (or schema) and a per-preview role.
  2. Inject credentials as part of preview provisioning.
  3. Run migrations for the preview slice.
  4. Tear it down on TTL expiry: drop DB/schema, revoke role, delete secrets.

This is the “good enough” answer if your production DB is managed and you want preview behavior to be at least adjacent.

The migration caveat is the real work: if your migration strategy is “apply breaking changes whenever”, previews will amplify that pain. If your strategy is backwards-compatible migrations (expand/contract, dual-write where needed, no destructive steps baked into deploy), previews become much less dramatic.

One more subtle failure mode: schema drift across previews. Multiple PRs can introduce migrations that each work fine in isolation, then conflict at merge time because ordering/IDs diverged from the shared baseline. If you do previews, enforce a migration discipline that survives parallel work (linearized migrations, timestamp/sequence conventions, and conflict resolution at merge).

Option B: In-cluster Postgres operator per preview (fast, but different)

Spinning up a Postgres operator instance in the namespace feels clean: every preview gets its own DB, credentials are local, teardown is easy.

The problem is parity. If production is “managed DB with backups, HA, weird settings, extensions, and performance knobs”, an in-cluster Postgres is not the same thing. You might still choose it if the preview is meant to validate:

  • application wiring
  • migrations mechanically apply
  • basic queries and correctness

Just be honest that it won’t validate managed-service behavior (latency, failover, parameter groups, backup/restore, etc.).

Option C: One shared DB for all previews (cheap, sharp edges)

This can work, but you need strong boundaries:

  1. Never share a schema between previews.
  2. Avoid cross-preview migrations (migrate per schema, not globally).
  3. Build cleanup as a first-class concern (garbage collect old schemas).

If your migration tool assumes a single global schema, this option becomes a trap fast.

“But what about data?”

Empty databases are safe and boring. They’re also a great way to miss the thing that breaks only with real-ish data.

The compromise I like is: small, synthetic seed data that proves basic flows without becoming a compliance nightmare. If you need realistic behavior, build a separate sandbox environment with curated test data and treat it like its own system, not “a preview”.

Identity, secrets, and access (the part that bites you later)

If preview URLs are accessible to humans, you just created a new access surface.

Things I try to make boring:

  • Authn at the edge: require login for preview access (even internally).
  • Redirects/callbacks: avoid wildcard OIDC redirect URIs; manage preview hostnames via a controlled pattern + allowlist automation.
  • Per-preview credentials: never reuse prod secrets; prefer short-lived tokens/roles.
  • Scoped egress: previews should not be able to call prod dependencies by accident.
  • Clear visibility: a link, a TTL, and a way to see “what commit is this running?”
  • Secrets must die with the preview: teardown should revoke credentials and delete secrets, not just delete pods.

If you don’t do this, previews turn into semi-public shadow environments with “temporary” secrets that live forever.

Reality check: some orgs can’t fully isolate previews from prod-like shared services. If you’re in that world, make it explicit:

  • treat shared services as governed dependencies (tight IAM, scoped credentials, strong auditing)
  • use an explicit allowlist (not “everything can reach everything”)
  • make exceptions visible and temporary (reviewed and monitored)

Lifetime, destroy, and the operational tax

Preview environments only stay cheap if they die reliably.

The boring checklist:

  1. TTL by default (hours or a couple of days, not weeks).
  2. Garbage collection that runs even when CI is broken.
  3. Deletion is a feature: teardown should be as tested as provisioning.
  4. Escape hatches: allow extending TTL for a specific preview when it’s actively used.

If you can’t destroy reliably, you don’t have a preview system. You have a slow-moving environment sprawl generator.

Cost control needs one more layer: enforce limits, don’t just ask nicely.

Previews should have hard quotas by default, and exceeding them should fail fast.

  • Kubernetes: ResourceQuota, LimitRange, and per-namespace budgets (CPU/memory/storage caps).
  • Cloud: labels/tags on everything + automated sweeper + budget alerts.
  • Process: cap concurrent previews per repo/team so one noisy service can’t explode spend.

A “golden” preview spec

If you want previews to be boring to operate, I like a minimal spec:

  • TTL by default, with explicit extension
  • auth at the edge (no anonymous preview URLs)
  • quotas/limits so one preview can’t melt the cluster/account
  • secrets scoped per preview and revoked on teardown
  • teardown tested (not “best effort”) and backed by a sweeper
  • tagging/labels on every resource so cost attribution is real

About Helm CRDs (still… not great)

If your preview story depends on Helm handling CRDs cleanly, you’re going to have a bad time.

In practice, CRDs are their own thing. A pattern that’s boring enough to survive:

  1. Treat CRD changes like a migration with a version-skew plan.
  2. Roll CRDs out explicitly (often outside the main Helm release).
  3. Wait for CRDs to be established before you upgrade controllers.
  4. Keep controllers/operators compatible across versions during the transition.
  5. Assume Helm won’t manage the lifecycle the way you want it to (especially for upgrades).

CRD upgrades are frequently non-atomic; “establish before controller” is one of the easiest ways to avoid a bad day.

Previews don’t fix the CRD upgrade story; they just let you discover the pain earlier.

When a preview environment is actually worth it

I still reach for previews when the change is inherently runtime-y and the cheap checks don’t buy enough confidence:

  • controllers/operators
  • networking policy changes with real traffic paths
  • authn/authz at the edge (where defaults can brick you)
  • dependency-heavy services where “works in CI” doesn’t mean “works in the real network”

If you’re only going to invest in three preview types, I’d pick: edge auth changes, controllers/operators, and migration-heavy stateful changes.

And even then, I try to keep it minimal:

  1. A namespace or small cluster with only the service under test
  2. A slim ingress path with real authn
  3. Realistic dependencies where they matter, stubs where they don’t

Two decision examples

  • Must preview: an edge auth change where a default-deny policy can lock everyone out, a controller/operator rollout with CRD version-skew, or a migration that depends on real runtime behavior.
  • Probably don’t: documentation-only diffs, formatting/refactors, pure unit-test changes, or “static” config edits where a plan/diff and validation already give you the same confidence.

Previews don’t replace safe rollout

Even with a perfect preview story, you still need a production deployment strategy that assumes things will go wrong.

The stuff that actually saves you after merge:

  • canary + gradual traffic shifting
  • error-budget / SLO-based gating (don’t “ship on green” if users are burning)
  • automated rollback criteria (latency/error spikes, bad health signals)
  • release controls (feature flags, dark launches, targeted enablement)

Previews are about catching integration failures early. Progressive delivery is about limiting blast radius when reality disagrees.

Preview environments aren’t a religion. They’re a tool. They only pay off when they close a confidence gap you can’t close with a diff, a test suite, and a deterministic pipeline.

When a preview run fails because of an unreachable endpoint, that’s a win. It means you caught a breaking change before it reached a human reviewer.

How you know it’s working

If preview environments are worth the operational tax, you should be able to point to numbers:

  • How often previews are actually used (and by whom).
  • What classes of failures they catch pre-merge (migrations, IAM, networking, dependency contracts).
  • Average TTL and cleanup reliability.
  • Monthly cost, and which repos/services are responsible.

If you can’t answer those, you may have built a very elaborate placebo.