Tutorial

Stopping a $5,000 Surprise Bill From An AI Agent

Akshay Sarode Jul 14, 2025

Direct answer

Set a monthly USD budget per agent. Track spend per action. Alert at 80%. Refuse at 100%. Allow a one-time +50% override that's audit-logged. The agent stops at the cap; you don't wake up to a four-figure bill.

The week of October 2024 when a viral tweet showed someone burning $5,000 in 8 hours on Claude tokens via a loop bug — that was the week "agent budget" went from theoretical to required.

The shape of the problem

Frontier models cost $0.005–$0.075 per 1k input tokens
An agent in a tight loop can do 100+ requests/minute
Each request can be 5k–50k tokens
Math: a runaway agent costs $30–$1,000 per hour

You can't watch every agent every minute. You need infrastructure that watches.

What "Governor" does

The Ujex Governor is a callable middleware. Every billable action checks against a budget before executing. If the budget is exceeded, the action returns 429.

// Middleware (pseudo)
async function requireQuota(agentId, costUsd) {
  const budget = await db.budgets.get(agentId);
  if (budget.usedUsd + costUsd > budget.monthlyUsd) {
    if (!budget.overrideActive) {
      throw new Error("over budget");
    }
  }
  await db.budgets.increment(agentId, costUsd);
}

Three thresholds

Threshold	Behavior
0–80%	Spend tracked, no notification
80%	FCM push to owner — "agent X at 80%"
100%	All callables return 429; no spend
Override (+50%, one-time)	User explicitly extends; audit-logged

What gets metered

Every callable that has cost. tools.invoke, postbox.send, memory.embed, recall.searchEpisodes — each declares a per-call USD figure. The Governor middleware debits before allowing the action.

Important caveat: model-token billing happens at the LLM provider, not at Ujex. The Governor counts actions, not actual GCP / OpenAI / Anthropic spend. We're working on real USD via Billing API integration; today the count is approximate but stops the runaway loop case (which is what matters most).

Override flow

An agent hits 100%. The owner gets the 80% notification + a "your agent is now refusing actions" notification. They can:

Let it stay refused — agent stops, owner reviews on Monday
Bump the budget — increase monthlyUsd
One-time override — Governor allows +50% above the cap, single use, audit-logged

Code

from ujex_governor import Governor

gov = Governor(api_key=os.environ['UJEX_API_KEY'])

# Set the budget once
gov.set_budget(agent_id='abc', monthly_usd=50.0)

# Read current
budget = gov.get_budget(agent_id='abc')
# {monthlyUsd: 50.0, usedUsd: 38.42, percent: 76.84, alerted80: True}

# Override (rare)
gov.override(agent_id='abc', extra_usd=25.0, reason='shipping deadline')

Pair with audit

Every quota event lands in the audit log: budget set, threshold crossed, action refused, override granted. When you're investigating "what happened on Friday" the budget timeline + the audit log together tell the whole story.

FAQ

How is this different from OpenAI's spend cap?

OpenAI's cap is account-wide. Governor is per-agent — each agent has its own budget within your account.

Does this stop the runaway loop in real time?

Within one action. Each callable check is <10ms; the agent's next action gets refused if the previous one tipped it over the cap.

Can I set caps per day instead of per month?

Today: monthly only. Daily/weekly is on the roadmap.