Shadow Mode: A Production Dry-Run for Rules

$ cat --info "shadow-mode-production-dry-run.mdx"

> published: 2026-05-20 | read_time: 4 min read | category: engineering

A rule you've never run is a guess

When you write a guardrail rule in TinyOps — "comment on PRs with no tests," "block deploys during an incident" — you're making a bet. You think the condition is right. You think it'll match the cases you care about and skip the ones you don't. But you don't actually know until the rule has seen real traffic.

And a guardrail rule that's wrong is worse than no rule at all. It comments on PRs that are fine. It blocks deploys it shouldn't. It pings the whole team for nothing. The first time a rule is wrong, people stop trusting all of them.

So there's a chicken-and-egg problem: you can't trust a rule until it has run against production, but running it against production is the exact risk you're trying to avoid.

Shadow mode: run everything, skip the action

Every rule in TinyOps has a mode: disabled, shadow, or live. Shadow mode is the answer to that problem.

A shadow rule does everything a live rule does — it fires on the same triggers, fetches the same real data from GitHub or Vercel, evaluates its condition exactly the same way — with one difference: it never performs its action. No comment gets posted. No deploy gets blocked. Instead, TinyOps records what the rule would have done.

Here's the entire mechanism, from the worker:

if (rule.mode === RULE_MODES.SHADOW) {
  await db.update(executions).set({
    status: EXECUTION_STATUS.SUCCESS,
    conditionResult: conditionPassed,
    actionResult: {
      shadow: true,
      wouldExecute: { provider: actionCfg.provider, method: actionCfg.method },
    },
    completedAt: new Date(),
  }).where(/* this execution */);
 
  await db.update(rules).set({
    shadowSuccessCount: sql`shadow_success_count + 1`,
    shadowEvaluationCount: sql`shadow_evaluation_count + 1`,
  }).where(eq(rules.id, rule.id));
 
  return; // the action dispatch below never runs
}

Why it was almost free to build

That's the whole feature — one if block and an early return. It's cheap for a structural reason.

A TinyOps rule is three parts: a trigger, a condition, and an action. The trigger and the condition are pure observation — they read state, they don't change anything. The action is the only part with side effects. So a dry run doesn't need a special evaluation path or a mock environment. It just needs to stop one step early.

The architecture made the feature trivial. That's the payoff of keeping side effects in one place.

Turning runs into confidence

A dry run is only useful if you can see the results. Shadow executions accumulate into two counters:

shadowEvaluationCount — every time the rule triggered and was evaluated
shadowSuccessCount — every time the condition passed, i.e. the action would have fired

The dashboard turns those into a confidence score against a threshold:

const requiredRuns =
  plan === 'team' || plan === 'business' ? 3 : plan === 'pro' ? 5 : 10;
// readyToGoLive once shadowSuccessCount >= requiredRuns

Two counters instead of one, deliberately. With both, the UI can tell three different stories apart: a rule that never triggered (evaluationCount === 0), a rule that triggers but whose condition never matches (evaluationCount > 0, successCount === 0), and a rule that's matching and ready.

The stalled-rule check

That middle state — triggering but never matching — is the most common way a shadow rule quietly fails. The rule looks busy. The counter just never moves. Left alone, it would sit in shadow forever.

So there's a small heuristic for it:

const isStalled = evaluationCount > requiredRuns * 3 && successCount < requiredRuns;

If a rule has been evaluated more than three times its target but still hasn't cleared the threshold, the UI stops showing a progress bar and says: your condition may be too restrictive. It uses the gap between the two counters to diagnose a rule that will never graduate on its own. One line, and it catches the failure mode that would otherwise waste the most time.

What I'd do differently

One honest note: readyToGoLive is advisory. The system shows you when a rule has cleared its threshold, but nothing stops an admin from promoting a rule with zero shadow runs. The real gate is permissions — only admins can move a rule to live.

That was deliberate. I didn't want shadow mode to become a box-checking ritual between you and your own rule. But the honest tradeoff is that the safety is a recommendation, not an enforcement. If I revisited it, I'd at least block promotion while a rule's shadow runs are actively failing — the data to do that is already collected; nothing acts on it yet.

Shadow mode isn't a clever algorithm. It's a small idea the architecture happened to make easy: run the whole rule, skip the one part that bites, and count what happens.

← Previous

I Shipped a Rule Engine. Then I Found Its Identical Twin.

4 of 7

Building a Monorepo with Turborepo and pnpm