Stopping a $5,000 Surprise Bill From An AI Agent
Set a monthly USD budget per agent. Track spend per action. Alert at 80%. Refuse at 100%. Allow a one-time +50% override that's audit-logged. The agent stops at the cap; you don't wake up to a four-figure bill.
The week of October 2024 when a viral tweet showed someone burning $5,000 in 8 hours on Claude tokens via a loop bug — that was the week "agent budget" went from theoretical to required.
The shape of the problem
- Frontier models cost $0.005–$0.075 per 1k input tokens
- An agent in a tight loop can do 100+ requests/minute
- Each request can be 5k–50k tokens
- Math: a runaway agent costs $30–$1,000 per hour
You can't watch every agent every minute. You need infrastructure that watches.
What "Governor" does
The Ujex Governor is a callable middleware. Every billable action checks against a budget before executing. If the budget is exceeded, the action returns 429.
// Middleware (pseudo)
async function requireQuota(agentId, costUsd) {
const budget = await db.budgets.get(agentId);
if (budget.usedUsd + costUsd > budget.monthlyUsd) {
if (!budget.overrideActive) {
throw new Error("over budget");
}
}
await db.budgets.increment(agentId, costUsd);
}
Three thresholds
| Threshold | Behavior |
|---|---|
| 0–80% | Spend tracked, no notification |
| 80% | FCM push to owner — "agent X at 80%" |
| 100% | All callables return 429; no spend |
| Override (+50%, one-time) | User explicitly extends; audit-logged |
What gets metered
Every callable that has cost. tools.invoke, postbox.send, memory.embed, recall.searchEpisodes — each declares a per-call USD figure. The Governor middleware debits before allowing the action.
Important caveat: model-token billing happens at the LLM provider, not at Ujex. The Governor counts actions, not actual GCP / OpenAI / Anthropic spend. We're working on real USD via Billing API integration; today the count is approximate but stops the runaway loop case (which is what matters most).
Override flow
An agent hits 100%. The owner gets the 80% notification + a "your agent is now refusing actions" notification. They can:
- Let it stay refused — agent stops, owner reviews on Monday
- Bump the budget — increase
monthlyUsd - One-time override — Governor allows +50% above the cap, single use, audit-logged
Code
from ujex_governor import Governor
gov = Governor(api_key=os.environ['UJEX_API_KEY'])
# Set the budget once
gov.set_budget(agent_id='abc', monthly_usd=50.0)
# Read current
budget = gov.get_budget(agent_id='abc')
# {monthlyUsd: 50.0, usedUsd: 38.42, percent: 76.84, alerted80: True}
# Override (rare)
gov.override(agent_id='abc', extra_usd=25.0, reason='shipping deadline')
Pair with audit
Every quota event lands in the audit log: budget set, threshold crossed, action refused, override granted. When you're investigating "what happened on Friday" the budget timeline + the audit log together tell the whole story.
FAQ
How is this different from OpenAI's spend cap?
OpenAI's cap is account-wide. Governor is per-agent — each agent has its own budget within your account.
Does this stop the runaway loop in real time?
Within one action. Each callable check is <10ms; the agent's next action gets refused if the previous one tipped it over the cap.
Can I set caps per day instead of per month?
Today: monthly only. Daily/weekly is on the roadmap.