Briefcase AI

Govern every decision your AI makes. Enforce controls before an action runs, capture the full context behind it, and keep a complete record you can verify later.

Quickstart Why Briefcase GitHub

Install

pip install briefcase-ai

Briefcase is an open-source Python SDK (with a fast Rust core) that wraps the decision points in code you already have — bring your own LLM calls and storage. Apache-2.0 licensed.

Start where you are

I use an AI coding assistant Let Cursor, Claude Code, or Copilot wire Briefcase in — via the docs or the MCP server.

I'm an engineer Instrument a decision in minutes, send records anywhere, replay to catch regressions.

I lead platform / governance Enforce controls before actions run and route through versioned, reconstructable policies.

I review for reproducibility Reconstruct a past decision exactly and verify a sealed, tamper-evident record.

How it works

A single decision moves through five acts — and Operate (cost, tracing, events) runs alongside all of them. The docs follow the same path, threaded by one running example: a support-ticket triage agent.

graph LR
    A["Capture"] --> B["Control"]
    B --> C["Store & Query"]
    C --> D["Replay & Verify"]
    D --> E["Prove"]

Explore by act

Capture

Decision Recording

Build a complete, queryable record of every decision — inputs, outputs, parameters, and evidence. Record decisions →

Exporters

Send decision records to the console, a file, or your own backend in one line. Choose an exporter →

PII Sanitization

Strip sensitive values out of records before they are ever stored. Redact PII →

Control & Route

Guardrails

Decide whether an action is allowed before it runs — blocked unless a rule explicitly permits it (deny-by-default), and composable. Add guardrails →

Versioned Routing Policy

Route decisions through versioned policies and reconstruct which rule fired, at any past date. Version policies →

Validation Engine

Check that a prompt’s references resolve against a versioned knowledge base before a model sees them. Validate prompts →

Store & Query

Storage Adapters

Persist decisions to a durable, queryable backend. Choose a backend →

Bitemporal Storage

Append-only records that track both when a fact was true and when you learned it — so you can reconstruct exactly what was known at any past instant. Store bitemporally →

External Data

Snapshot the outside data a decision depended on and detect when it drifts. Track external data →

Replay & Verify

Deterministic Replay

Re-run a stored decision and compare the result against the original. Replay decisions →

Drift Detection

Measure how consistent outputs stay across repeated runs. Detect drift →

Prove

As-of Reconstruction

Reconstruct exactly what was known at any past instant — the evidence and policy as they were then, not as they are now. Reconstruct as-of →

Audit Bundles

Seal a decision, its evidence, and its policy into one tamper-evident, verifiable artifact. Seal a bundle →

Operate — alongside every act

Cost Tracking

Estimate token costs, compare models, and watch budgets. Track costs →

OpenTelemetry

Trace decisions with OpenTelemetry alongside your existing telemetry. Trace decisions →

Multi-Agent & Events

Correlate decisions across a multi-step agent workflow and emit events. Correlate agents →

Follow a real workflow

Audit a Decision End-to-End One decision, from capture to a sealed, verifiable record.

Govern Agent Actions Enforce controls before actions and route through versioned policy.

Observe AI in Production Exporters, cost, drift, tracing, and events working together.

Reproducible RAG Version embeddings, snapshot sources, and validate references.

Integrations

MCP Server — expose Briefcase tools to Claude Code, Cursor, and other MCP clients.
lakeFS — one bundled versioned-data source; bring any other via the VCS protocol.

Pre-built framework integrations (LangChain, CrewAI, LlamaIndex, AutoGen, and more) are available in Briefcase AI Enterprise.