Human-in-the-Loop AI Agents: A Production Setup Guide
Wrap every irreversible action in an approval primitive. Use FCM push to a phone (sub-second). Long-press notification → biometric → approve. TTL'd; first responder wins; multi-channel fanout. Audit-logged. The whole loop is <3 seconds when the human is around; defaults safely if they're not.
The autonomous agent that does everything without checking with you sounds great until it pushes to main, sends an email to the wrong person, or runs up a $4,000 API bill. The fix is human-in-the-loop on the things that matter — and only those things.
This is the production setup. Not "a tutorial about a feature" — the design choices that hold at scale.
What needs approval (and what doesn't)
Approval gates add friction. Use them on actions that are:
- Irreversible (sending email, git push, payment)
- Costly above some threshold (API call > $X, large compute job)
- Outside the agent's normal capability sandbox
- Reaching external humans (sending message to user, customer)
Don't gate:
- Reads (memory queries, file reads inside workspace)
- Idempotent writes inside the workspace
- Tool calls within the agent's existing capability bundle
The asymmetry: false positives (gating something that didn't need it) cost developer time. False negatives (auto-approving something dangerous) cost real money or trust. Skew toward gating.
The flow
- Agent calls
ask(prompt, detail, ttl)— writes toapprovals/{id}with statuspending - Cloud Function fires FCM push to all registered approval channels
- Phone receives push (200–800ms typical)
- Long-press notification → quick action sheet → Approve / Deny
- Biometric (Face ID / Touch ID) confirms the approve gesture
- App writes
status: approvedto Firestore - Agent's snapshot listener fires (~50ms); agent resumes
End-to-end: 1.5–6s when the phone is in hand. TTL of 300s typical; agent gets a denial-equivalent if no decision.
Why mobile, not Slack/email
Slack is async by reputation. Email is slower. SMS is fine but lacks the "long-press for action" affordance. Phone push with haptic feedback is sub-second and biometric-authable. The phone is also where the user is — at their desk, on the bus, in a meeting. Slack is where the user might check in 20 minutes.
Why biometric on the approve gesture
The push notification can be tapped without auth (just shows the prompt). The approve action requires biometric. Reasoning:
- The push payload is non-sensitive — just a hint
- The approve write is a privileged action — Face ID / Touch ID
- If the phone is unattended on a desk, a passerby can read the prompt but can't approve
Why TTL'd
An agent that waits 24 hours for approval is broken. Default TTL: 300 seconds. After that, the approval transitions to timeout; the agent gets a denial-equivalent. The agent decides what to do — most often: log + retry later, or escalate to a different action.
Multi-channel fanout
One owner can register multiple approval channels (phone, watch, SMS fallback, web). FCM push fans out to all of them. First-responder wins; once decided, subsequent taps see "already decided." Useful for teams with rotating on-call.
Code (agent side)
from ujex_mobile import Mobile
m = Mobile(api_key=os.environ['UJEX_API_KEY'], agent_id='abc')
approval = m.ask(
prompt='Push to main?',
detail='Will deploy commit 4f3a91 (changes auth flow)',
ttl_sec=300,
)
result = approval.wait() # blocks until decided or timeout
if result['status'] == 'approved':
git_push()
elif result['status'] == 'timeout':
log.warn('No human reached; deferring push')
else:
log.warn('Push denied; will surface in summary')
Code (registering an approval channel)
# Phone app, one-time setup
from ujex_client import channels
channels.register(
uid='your-uid',
kind='fcm',
token=fcm_device_token,
)
Code (decide endpoint, app side)
# In the iOS / Android app's notification action handler
def on_approve_tapped(approval_id):
auth = await biometric_auth() # Face ID
if not auth.success:
return
await ujex.client.approvals.decide(
id=approval_id,
decision='approved',
device_id=current_device_id(),
)
Audit trail
Every step lands in the hash-chained audit log: ask, push, decide, timeout. Reconstructing "who approved what when" is a query, not a guess.
Idempotency
The agent might re-call ask after a network blip. Deterministic ID (hash of action + args + timestamp) prevents creating duplicate approval requests. The Cloud Function dedupes on conflict.
Backpressure
What happens if 50 agents ask for approval at once? FCM batches; the human gets 50 notifications. Practical answer: rate-limit at the agent level (no single agent should ask more than 1/min on average), and give each notification a clear identifier (agent name + action) so triage is feasible.
Failure modes
| Mode | Behavior |
|---|---|
| No registered channel | ask returns status: no_channel immediately |
| FCM push undelivered | App polls Firestore on launch; user sees pending approvals |
| Phone offline | FCM queues for ~28 days; if exceeded, agent times out per TTL |
| App crash mid-approve | Decision write is atomic; either approved or pending |
| Multiple devices simultaneously approve | First write wins; second sees "already decided" |
Compare to alternatives
| Slack approval bot | Email approval link | Mobile push | |
|---|---|---|---|
| Latency | 5–600s | 30s–24h | 1.5–6s |
| Biometric on action | ✗ | ✗ | ✓ |
| Haptic distinguishable | ✗ | ✗ | ✓ |
| Works without internet | ✗ | ✗ | Push queued by FCM |
| Multi-channel fanout | One channel | One inbox | Many devices |
What we ship
This is exactly the Ujex Mobile subsystem. mobile.ask + Cloud Function + FCM + native iOS/Android approval flow + audit log. Same code Celistra uses for permission approvals on supervised agents.
FAQ
Can I implement this without Ujex?
Yes — FCM is free, the Firestore schema is straightforward, and a basic implementation is ~500 LoC. Ujex pre-builds it and wires it to the rest of the agent stack.
Does this require iOS/Android dev experience?
If you self-build, yes. With Ujex you use the published apps (or fork them — Apache-2.0).
What about Telegram or WhatsApp instead of native apps?
Possible — register a Telegram bot as a channel kind. Latency is higher (1–10s); biometric isn't there. We support it as a fallback.
Is the prompt content encrypted in transit?
FCM payload only has the approval ID; the full prompt is fetched from Firestore over a TLS-authenticated connection on app open.