Tech

A Hash-Chained Audit Log for Your AI Agent (And Why It Matters)

Akshay Sarode Sep 03, 2025

Direct answer

Every privileged action goes into an append-only log. Each row stores hash = sha256(prev_hash || row.body). An hourly verifier walks the chain. Tampering breaks the hash chain at exactly the modified row. Total code: ~200 LoC in TS or Python (Apache-2.0, @axy/audit-chain).

Same content as the Celistra deep-dive, oriented for general-purpose agent infra (not just process orchestration). For anyone building on Ujex, the audit subsystem is a primitive — every callable writes audit rows automatically; you don't have to instrument anything.

What goes in the log

Agent created / authenticated / scope changed
Capability granted / revoked / used
Email sent / received / scored for prompt injection
Memory written / read / searched
Webhook dispatched / received
Spend cap hit / quota refused
Approval requested / approved / denied / timed out

Not in the log: every model token, every keystroke. Other tools (Langfuse, LangSmith, OpenLIT) are right for that.

The chain

row[N].prevHash = row[N-1].hash
row[N].hash = sha256(prevHash || canonicalize(row.body))

canonicalize = stable JSON serialization with sorted keys. The choice of canonicalization is part of the protocol; switching mid-flight invalidates everything.

The library

@axy/audit-chain in TS and axy-audit-chain in Python. Apache-2.0. About 200 lines each. Append + verify primitives.

import { appendChainEntry, verifyChain } from '@axy/audit-chain';

await appendChainEntry(firestore, 'audit', {
  actor: 'agent:abc',
  action: 'postbox.send',
  target: 'user@example.com',
  body: { subject: 'Status update', recipient: '...', size: 412 },
});

// Hourly verification (Cloud Scheduler):
const result = await verifyChain(firestore, 'audit');
if (!result.valid) alert(`Chain broke at seq=${result.firstBadSeq}`);

What the verifier does

Walks the chain top-to-bottom; recomputes each row's hash from its body + prevHash. If recomputed hash ≠ stored hash, the row was modified. The verifier returns the first bad seq and stops. The verifier is idempotent and cheap — yearly chain at <1k events/day is <500MB and takes seconds to walk.

What this enables

Question	Query
"Did agent X really email user Y?"	`where actor=agent:X and action=postbox.send and target=user:Y`
"How much did agent X spend last week?"	`where actor=agent:X and action=quota.consumed` sum body.usd
"When did the prompt-injection score get bypassed?"	`where action=postbox.score and body.piScore > 0.7 and body.processed=true`
"What capabilities did agent X have on Tuesday?"	find latest `capability.granted` rows before Tue, group by target=agent:X

What this doesn't enable

Internal model reasoning. Whether a prompt was good or bad. The model's confidence on a tool call. For those, agent observability tools (Langfuse, LangSmith) are right.

Privacy. PII in the body stays there. Redact at append time if you can't retain certain fields.

Export and verify externally

Because the lib is open-source, anyone can verify your chain export. Standard procedure: dump rows + the last verified hash. Recipient runs the verifier; if hashes match, the export wasn't tampered.

FAQ

Why hash chain instead of just signing each row?

Signing catches modification. Chain catches modification AND insertion/deletion. Both useful; chain is strictly more powerful.

Can I retrofit this onto an existing log?

Yes. Backfill: walk the existing log in order, compute prevHash + hash for each row, write the new fields. From then on, every append uses the chain.

Is the hash chain GDPR-friendly?

The chain is. PII inside row bodies is your job. Redact at append, not after — modifying a row breaks the chain.