How to add a proper audit trail to your LangChain agent in 10 minutes
LangSmith is great for debugging. But when your compliance officer asks what your AI decided and why, you need something tamper-evident — not a dev tool. Here's how to build that in one afternoon.
The moment every AI startup eventually hits
I've talked to a lot of teams using LangChain in production. Almost none of them have anything a compliance officer would recognise as a record.
It goes something like this. Your LangChain agent has been running in production for a few months. It's making decisions — flagging transactions, drafting contracts, routing support tickets, approving credit applications. Then one day your compliance person walks over and asks: "If a customer disputes a decision the AI made six weeks ago, what records do we have?"
You open LangSmith. You show them the traces. They squint at it and ask: "Can someone edit these? How do we know they haven't been changed? Is this admissible?"
And that's where LangSmith — which is an excellent tool, genuinely — hits its limits. It was built for engineers debugging models. It was not built for legal defensibility.
Why LangSmith isn't a compliance record
To be clear: this isn't a knock on LangSmith. It does exactly what it says on the tin. The problem is that "great for debugging" and "suitable as an audit trail" are different requirements, and confusing them will cause you pain later.
Here's what LangSmith doesn't give you:
- Tamper-evidence. Traces in LangSmith can be deleted, modified via the API, or simply aged out. There's no cryptographic proof that a record hasn't changed since it was written. If you're ever asked "how do you know this log wasn't altered?", you can't answer.
- Signed records. A proper audit trail needs a chain of custody. Who ran this agent? What exact prompt was used? What model version? LangSmith captures some of this, but not in a signed, non-repudiable format.
- Retention guarantees. LangSmith's data retention depends on your plan. Regulatory regimes like GDPR, CCPA, SOC 2, and EU AI Act often require you to retain decision records for years — sometimes a decade or more. A SaaS plan is not a retention guarantee.
- Human-readable compliance output. Your legal team can't read a JSON trace. A compliance record needs to be renderable in plain English: what was the input, what did the model reason, what action was taken, and what was the outcome.
The core distinction: Observability tools answer "what happened?" for engineers. Audit trails answer "what happened?" for regulators, lawyers, and auditors — with cryptographic proof that the answer hasn't been altered.
What a proper AI agent audit trail actually needs
Before writing any code, it helps to know what you're actually building. A proper LangChain audit trail for compliance needs four things:
- Tamper-evident storage. Each record should be hashed and chained, so any modification is detectable. Think append-only ledger, not a mutable database row.
- Structured reasoning capture. Not just "input → output" but the intermediate steps: which tools were called, what they returned, what the model's chain-of-thought was, what decision was reached.
- Retention guarantees. Records should be immutably stored somewhere you control — or with contractual guarantees — for as long as your regulatory requirements demand.
- Human-readable output. The stored record should be exportable as something a non-engineer can read and verify. PDF, signed JSON, structured HTML — anything that a lawyer can put in front of a judge.
LangChain's callback system is the right hook for this. Callbacks fire at every meaningful step: chain start, LLM call, tool use, chain end, errors. You can attach a custom callback handler that captures everything and ships it to compliant storage. That's exactly what SealVeraCallbackHandler does.
The code: SealVeraCallbackHandler
Here's the full handler. Drop this into your project, and it wires into LangChain's lifecycle automatically.
# pip install sealvera langchain
from sealvera import SealVeraCallbackHandler
from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain_openai import ChatOpenAI
from langchain import hub
# 1. Instantiate the handler
# Your API key, the environment tag, and a retention policy.
# "7y" means records are sealed for 7 years — common for financial regs.
sealvera_handler = SealVeraCallbackHandler(
api_key="sv_live_...",
environment="production",
retention="7y",
# Optional: attach metadata to every record in this session
metadata={
"service": "loan-underwriting-agent",
"version": "2.4.1",
"operator_id": "team-risk",
}
)
# 2. Build your agent exactly as you normally would
llm = ChatOpenAI(model="gpt-4o", temperature=0)
prompt = hub.pull("hwchase17/openai-tools-agent")
tools = [...] # your tools here
agent = create_openai_tools_agent(llm, tools, prompt)
# 3. Pass the handler into AgentExecutor
# That's it. No other changes to your agent logic.
agent_executor = AgentExecutor(
agent=agent,
tools=tools,
callbacks=[sealvera_handler], # <-- the only change
verbose=False, # you can turn off LangSmith noise
)
# 4. Run normally
result = agent_executor.invoke({
"input": "Assess the credit risk for application #CV-20481"
})
Let's walk through what's happening under the hood.
On on_chain_start: SealVera opens a new audit session. It records the exact inputs, a UTC timestamp, the agent class, and a session UUID. It generates a SHA-256 hash of this opening record and stores it as the chain anchor.
On on_llm_start / on_llm_end: Every LLM invocation is captured — the exact prompt sent (including the full system prompt and conversation history), the model name, the temperature, token counts, and the raw completion. This is the part that matters most for compliance: you have proof of exactly what the model was asked and what it said.
On on_tool_start / on_tool_end: Every tool call is logged with its arguments and return value. If your agent queries a database, calls an external API, or runs a calculation, those actions are in the record. The tool's output is hashed and linked to the LLM step that triggered it.
On on_chain_end: The session is closed. SealVera hashes the entire chain of events — anchors, LLM steps, tool steps — into a single root hash, then signs it with SealVera's private key. The signature and root hash are stored alongside the record. You can verify integrity at any time without contacting SealVera.
On on_chain_error: Errors are logged too. This matters: a half-completed agent run that errors out is still a decision event. You need to know it happened.
On privacy: If your agent handles PII, you can pass redact_fields=["ssn", "account_number"] to the handler. SealVera will redact those fields before hashing — you retain proof of the decision without storing the sensitive data in the audit record.
The zero-touch alternative: SEALVERA_AUTOLOAD
If you're running multiple agents across a codebase and don't want to touch every file, there's an easier path. Set one environment variable:
# In your environment or .env file SEALVERA_API_KEY=sv_live_... SEALVERA_ENVIRONMENT=production SEALVERA_RETENTION=7y SEALVERA_AUTOLOAD=1
Then import the SealVera module anywhere before your LangChain code runs — at the top of your entrypoint, in your app's __init__.py, or as a startup hook:
# entrypoint.py or app/__init__.py import sealvera.autoload # noqa — side-effect import, patches LangChain globally # Everything below runs normally. All LangChain agents in this process # are now automatically audited. No per-agent changes needed. from myapp import run_server run_server()
The autoload module monkey-patches LangChain's BaseCallbackManager to inject SealVeraCallbackHandler into every chain and agent that gets constructed. It's not pretty, but it works reliably across LangChain versions, and it's how you get coverage across a large codebase in under five minutes.
Tradeoff to be honest about: autoload means you're auditing everything, including dev and test runs if you're not careful. Either gate it behind an environment check (if os.getenv("ENV") == "production"), or use SealVera's environment tagging to filter your dashboard by environment after the fact.
What the output looks like
Once records are flowing, the SealVera dashboard gives you a few different views depending on who's looking:
For engineers: A trace view similar to LangSmith — steps, latency, token counts, tool calls. But each step has a verification indicator: a green checkmark if the record's hash chain is intact, a red flag if anything has been modified.
For compliance: A structured decision report, auto-generated from each chain run. It reads like a document: "At 14:32 UTC on 2026-02-14, the loan underwriting agent evaluated application #CV-20481. The agent used the CreditBureauTool and IncomeVerificationTool. The model determined the application presented elevated risk based on a debt-to-income ratio of 0.71. The agent returned: DECLINE." Plain English, timestamped, signed.
For export: Every record can be exported as a signed PDF or a JSON document with a detached signature. You can hand this to an auditor, attach it to a legal filing, or store it in your own document management system. The signature is verifiable against SealVera's public key, which is published and rotated on a known schedule.
For retention: Records are stored in append-only cold storage. You configure the retention period when you set up the handler — SealVera enforces it. Records can't be deleted before the retention period expires, by you or by SealVera. If you need records stored in a specific region for data residency, that's a config option too.
Do this now, not after the audit
I've talked to engineering teams who've been through regulatory audits on AI systems. The ones who had proper audit trails describe it as painful but survivable — produce the records, explain the decisions, move on. The ones who didn't have records describe something much worse: trying to reconstruct what an AI decided six months ago from server logs and LangSmith traces that may or may not be complete, explaining to regulators why there's no reliable record, and in some cases, defending against the assumption that records were deliberately not kept.
The EU AI Act is already in effect for high-risk AI systems. The FTC has been issuing guidance on AI accountability since 2021. In financial services, FINRA and the SEC are actively examining AI decision records. In healthcare, the FDA is updating its AI/ML frameworks. The direction is clear, and it's coming faster than most startups expect.
The good news: this is a ten-minute fix. One import, one API key, one environment variable. You don't need to redesign your architecture. You don't need to hire a compliance engineer. You need a callback handler and somewhere tamper-evident to send the records.
Do it now, while it's easy. The alternative is explaining to a regulator — or a judge — why your AI system made thousands of decisions and you have no defensible record of any of them.
This is what SealVera does.
The callback handler above ships with the SDK. It's new software and we're actively improving it. If something doesn't work the way you'd expect, I want to know. Free to start, no credit card.
Try it free