A Hash-Chained Audit Log for Your AI Agent (And Why It Matters)
Every privileged action goes into an append-only log. Each row stores hash = sha256(prev_hash || row.body). An hourly verifier walks the chain. Tampering breaks the hash chain at exactly the modified row. Total code: ~200 LoC in TS or Python (Apache-2.0, @axy/audit-chain).
Same content as the Celistra deep-dive, oriented for general-purpose agent infra (not just process orchestration). For anyone building on Ujex, the audit subsystem is a primitive — every callable writes audit rows automatically; you don't have to instrument anything.
What goes in the log
- Agent created / authenticated / scope changed
- Capability granted / revoked / used
- Email sent / received / scored for prompt injection
- Memory written / read / searched
- Webhook dispatched / received
- Spend cap hit / quota refused
- Approval requested / approved / denied / timed out
Not in the log: every model token, every keystroke. Other tools (Langfuse, LangSmith, OpenLIT) are right for that.
The chain
row[N].prevHash = row[N-1].hash
row[N].hash = sha256(prevHash || canonicalize(row.body))
canonicalize = stable JSON serialization with sorted keys. The choice of canonicalization is part of the protocol; switching mid-flight invalidates everything.
The library
@axy/audit-chain in TS and axy-audit-chain in Python. Apache-2.0. About 200 lines each. Append + verify primitives.
import { appendChainEntry, verifyChain } from '@axy/audit-chain';
await appendChainEntry(firestore, 'audit', {
actor: 'agent:abc',
action: 'postbox.send',
target: 'user@example.com',
body: { subject: 'Status update', recipient: '...', size: 412 },
});
// Hourly verification (Cloud Scheduler):
const result = await verifyChain(firestore, 'audit');
if (!result.valid) alert(`Chain broke at seq=${result.firstBadSeq}`);
What the verifier does
Walks the chain top-to-bottom; recomputes each row's hash from its body + prevHash. If recomputed hash ≠ stored hash, the row was modified. The verifier returns the first bad seq and stops. The verifier is idempotent and cheap — yearly chain at <1k events/day is <500MB and takes seconds to walk.
What this enables
| Question | Query |
|---|---|
| "Did agent X really email user Y?" | where actor=agent:X and action=postbox.send and target=user:Y |
| "How much did agent X spend last week?" | where actor=agent:X and action=quota.consumed sum body.usd |
| "When did the prompt-injection score get bypassed?" | where action=postbox.score and body.piScore > 0.7 and body.processed=true |
| "What capabilities did agent X have on Tuesday?" | find latest capability.granted rows before Tue, group by target=agent:X |
What this doesn't enable
Internal model reasoning. Whether a prompt was good or bad. The model's confidence on a tool call. For those, agent observability tools (Langfuse, LangSmith) are right.
Privacy. PII in the body stays there. Redact at append time if you can't retain certain fields.
Export and verify externally
Because the lib is open-source, anyone can verify your chain export. Standard procedure: dump rows + the last verified hash. Recipient runs the verifier; if hashes match, the export wasn't tampered.
FAQ
Why hash chain instead of just signing each row?
Signing catches modification. Chain catches modification AND insertion/deletion. Both useful; chain is strictly more powerful.
Can I retrofit this onto an existing log?
Yes. Backfill: walk the existing log in order, compute prevHash + hash for each row, write the new fields. From then on, every append uses the chain.
Is the hash chain GDPR-friendly?
The chain is. PII inside row bodies is your job. Redact at append, not after — modifying a row breaks the chain.