agent-assembly
agent-assembly is the open-source core of the AI Agent Assembly governance platform. It enforces policy on AI agents — what they may call, spend, and connect to — and records every decision in an immutable audit trail.
This book is the contributor and operator reference for the core. If you build with a language SDK instead, read the per-SDK guides below.
New here? Start with the Introduction — it explains what Agent Assembly is, the problem it solves, the core concepts, and the three-layer interception model. Then move on to the Quick Start.
Other docs: Docs Hub · Python SDK · Node SDK · Go SDK
Run it locally
Point the gateway at a bundled reference policy and you have a governing daemon listening on 127.0.0.1:50051:
git clone https://github.com/ai-agent-assembly/agent-assembly.git
cd agent-assembly
cargo run -p aa-gateway -- --policy policy-examples/low-risk.yaml
From there, attach an SDK shim, the aa-proxy sidecar, or the eBPF layer to start intercepting agent actions. The Architecture chapter explains how those three layers fit together.
Where to go next
| You want to… | Read |
|---|---|
| Understand what this is and why | Introduction |
| Get a gateway running quickly | Quick Start |
Look up an aasm command | CLI Reference |
| Follow a task end-to-end | Usage Guide |
| Understand the threat model and defenses | Security Model |
| See how the crates fit together | Architecture |
| Check which SDK versions are compatible | Compatibility matrix |
| Read the wire-protocol contract | Protocol changelog |
| See latency and build-time numbers | Benchmarks — baseline |
Audience
This book targets contributors and operators of agent-assembly. SDK users (Python, TypeScript, Go) should refer to the per-SDK guides in the sibling repositories.
See also
- README — top-level project overview, prerequisites, quickstart
- CONTRIBUTING — development workflow, branch naming, PR rules
- API reference — generate locally with
cargo doc --workspace --no-deps --open
Diagram rendering
This book renders Mermaid diagrams via the mdbook-mermaid preprocessor:
graph LR
SDK[SDK shim] --> Gateway[aa-gateway]
Proxy[aa-proxy] --> Gateway
eBPF[aa-ebpf] --> Gateway
Gateway --> Audit[(Audit log)]
Introduction
agent-assembly is a governance and security runtime for AI agents. It sits between an agent and the tools, models, and networks it reaches for, evaluates every action against policy and budget, and records the outcome in an immutable audit trail. It is the open-source core of the AI Agent Assembly platform.
This section is the place to start. It explains what the runtime is and the problem it solves, defines the handful of core concepts the rest of the book assumes, and gives a teaser of the three-layer interception model that lets the runtime see what an agent does no matter how the agent is built.
Read the pages in order:
| Page | What it covers |
|---|---|
| What it is & the problem | What Agent Assembly governs, why ungoverned agent tool-use is risky, and the value proposition. |
| Core concepts | Agents, policies, budgets, audit — the vocabulary used throughout the book. |
| The three-layer model | How the SDK, sidecar proxy, and eBPF layers compose so nothing slips through. |
When you are ready to run something, jump to the Quick Start. For the security rationale behind the design, read the Security Model; for the crate-level implementation, read Architecture.
What Agent Assembly is & the problem
In plain terms. AI agents act on their own — they run tools, call services, and spend money to get a job done. Agent Assembly is the set of guardrails around them: it checks every action an agent tries to take against rules you define, allows or blocks it before it happens, and keeps a permanent record of what was decided. Think of it as a security checkpoint that an AI agent cannot walk around.
It is for the people responsible for those agents — developers wiring them up, security and operations teams keeping them safe, and the planners who need to know the controls exist. With it you can decide which tools an agent may use, stop it from leaking data or overspending, and review exactly what every agent did and why.
What it is
agent-assembly is a governance-native runtime for AI agents. An AI agent —
an LLM wired up to tools, APIs, shells, and network access — is given a goal and
then decides, on its own, which actions to take to reach it. Agent Assembly
governs those actions. Every time an agent tries to call a tool, reach the
network, or spend money on a model call, the runtime evaluates that action
against a policy and a budget, returns allow or deny before the
action runs, and writes an immutable audit record of the decision.
A governing gateway, pointed at a reference policy, is one command away:
cargo run -p aa-gateway -- --policy policy-examples/low-risk.yaml
That daemon listens on 127.0.0.1:50051 and is ready for any interception layer
to connect. The rest of this book explains how to put it to work.
The problem: ungoverned agent tool-use is risky
A traditional program does exactly what its code says. An AI agent does not. It plans its own steps at runtime, so the set of actions it might take is open-ended and not knowable in advance. The moment you give an agent real capabilities — the ability to run shell commands, hit internal APIs, call third-party services, read files, or pay for tokens — that open-endedness becomes a concrete risk:
- Unbounded tool-use. An agent can invoke any tool it has been handed, in any order, with any arguments it constructs. A prompt-injected or simply confused agent may call a destructive tool it was never meant to use.
- Data exfiltration. An agent that can both read sensitive data and reach the network can leak that data — intentionally coerced by an attacker, or by accident — over an outbound request. Secrets and credentials are the highest-value target.
- Runaway spend. Agents loop. A planning loop that retries, fans out, or gets stuck can burn through an LLM budget in minutes with no natural stopping point.
- No accountability. When an agent does something it should not have, teams need to answer what did it do, when, and was it allowed? Without a tamper- evident record of every decision, that question has no answer.
- Bypass. Controls that live only inside the agent’s own code are only as trustworthy as the agent. An agent that skips the SDK, or is compromised, slips past anything that depended on its cooperation.
These risks are not hypothetical edge cases — they are the default behavior of a capable agent with no guardrails. Restricting the model’s prompt is not enough, because the model is exactly the component you cannot fully trust.
The value proposition
Agent Assembly turns “trust the agent to behave” into “the runtime enforces what the agent may do.” It provides:
- Policy enforcement at the action boundary. Allow/deny decisions are made by a central gateway before an action executes, driven by declarative policy rather than agent cooperation.
- Budget control. Per-team spend is tracked and enforced; a request that would breach the budget is denied, so a runaway loop is stopped, not just reported after the fact.
- An immutable audit trail. Every decision — allow and deny alike — is recorded, giving teams a complete, tamper-evident account of agent behavior for debugging, incident response, and compliance.
- Defense that does not depend on the agent. Enforcement is layered across three independent interception points (see the three-layer model), so governance holds even when an agent skips its SDK or actively tries to evade it.
Crucially, the agent does not have to cooperate. The whole point is that governance is enforced around the agent, by infrastructure the agent does not control. The Security Model section makes the trust boundaries explicit.
Who this book is for
This book is the reference for contributors and operators of the
agent-assembly core — people running the gateway, writing policy, and
deploying the interception layers. If you are instead building an application
with a language SDK, start from the per-SDK guides: Python
SDK, Node
SDK, Go
SDK.
Last updated: 2026-06-12 by Chisanan232
Core concepts
Four concepts recur throughout this book. Understanding them here makes every later chapter easier to read.
Agent
An agent is the workload being governed: an LLM-driven program that decides, at runtime, which actions to take to accomplish a goal. From the runtime’s point of view an agent is an identity that performs actions — calling a tool, making an LLM request, or reaching out over the network. Agents register with the gateway and are organized under a team and an org, which is the scope at which policy and budget are applied.
Each governed action is described by an action type (for example, a tool call or an LLM call), a target (what it is acting on), and a set of labels (metadata used by policy rules). This is the unit the runtime makes a decision about.
Policy
A policy is a declarative document — written in YAML or TOML — that states what agents are and are not allowed to do. Rules match on the action type, target, and labels of a request and resolve to allow or deny.
Policies are scoped and they cascade. Rules can be attached at the org,
team, agent, and tool levels; when an action is evaluated, the gateway
walks those scopes and merges them with a most-restrictive-wins rule, so a
broad organizational deny cannot be loosened by a narrower scope. Policy is
evaluated server-side, in the gateway — never by the agent or a dashboard —
so the decision cannot be tampered with by the workload it governs. The reference
policies under policy-examples/ are a good starting point. The detailed
evaluation path is documented in Architecture.
Budget
A budget caps how much a team may spend on agent activity, primarily the cost of LLM calls. The gateway tracks consumption per team against a cost model and treats the budget as part of the policy decision: a request that would breach the budget is downgraded from allow to deny. This makes budget a hard guardrail that stops runaway spend in the moment, rather than a billing report that arrives after the money is gone.
Audit
The audit trail is the immutable, append-only record of every decision the gateway makes — both allows and denies — together with the action that prompted it. Because it is tamper-evident and complete, it answers the accountability question for any agent: what did it do, when, and was it permitted? Audit records use a single wire format regardless of which interception layer observed the action, so the gateway presents one unified history. Audit data underpins debugging, incident response, and compliance export.
With these four in hand — agents perform actions, policy decides allow/deny, budget caps spend, and audit records everything — the three- layer interception model explains how the runtime actually sees an agent’s actions in order to govern them.
Last updated: 2026-06-11 by Chisanan232
The three-layer interception model at a glance
To govern an action, the runtime first has to see it. Agent Assembly intercepts agent actions at three independent layers, each catching what the layers above it might miss, and routes every observed action to one central gateway for a decision. This page is a teaser; the Security Model covers why the layers are arranged this way and what each defends against, and Architecture covers how each is implemented.
The three layers
The layers are ordered by a deliberate trade-off — lowest latency first, highest detection authority first:
| Layer | Runs in | Crate(s) | Latency | Catches | Trade-off |
|---|---|---|---|---|---|
| 1 — SDK (in-process) | The agent’s own process | aa-sdk-client + per-language shims, aa-wasm | Lowest | What the SDK is wired into | Fastest path; but requires the agent to adopt the SDK, and an agent could skip it. |
| 2 — Sidecar proxy | An adjacent process / sidecar | aa-proxy | Medium | Outbound HTTPS, with no code change | Catches network egress the SDK missed; sees only what is routed through it. |
| 3 — eBPF (kernel) | The Linux kernel | aa-ebpf and friends | Highest cost | Everything else, including bypass attempts | Highest detection authority; Linux-only and needs elevated privileges. |
The latency-vs-authority trade-off is the key idea. The in-process SDK is the cheapest place to make a decision, but it is also the easiest for an agent to avoid — it lives inside the very process you do not fully trust. The eBPF layer is the most expensive to run, but it watches from the kernel, below anything the agent can reach, so it catches actions the higher layers never saw — including deliberate attempts to bypass the SDK.
How they compose
The layers are not alternatives; they stack. A deployment runs whatever subset fits its constraints, and because every layer reports to the same gateway using the same audit wire format, the gateway sees one unified view no matter which layers produced the events. Coverage is the union of the layers you deploy: the SDK handles the fast common path, the proxy backstops network egress without touching agent code, and eBPF is the floor that catches what slips past both. Run all three and an action has nowhere to hide.
graph TD
classDef agent fill:#eef2ff,stroke:#6366f1
classDef l1 fill:#eaf6ee,stroke:#3aa55b
classDef l2 fill:#fff3d6,stroke:#c98a00
classDef l3 fill:#fdecea,stroke:#d75748
classDef gw fill:#e8f1ff,stroke:#5b8def
Agent["AI agent<br/>(tool / LLM / network calls)"]:::agent
subgraph Interception["Three interception layers"]
L1["Layer 1 — SDK shim<br/>in-process · lowest latency"]:::l1
L2["Layer 2 — Sidecar proxy<br/>aa-proxy · outbound HTTPS"]:::l2
L3["Layer 3 — eBPF<br/>kernel · highest authority"]:::l3
end
GW["Gateway (aa-gateway)<br/>policy · budget · decision"]:::gw
Audit[("Immutable audit log")]
Agent -->|"action"| L1
Agent -.->|"network egress"| L2
Agent -.->|"syscalls / TLS"| L3
L1 -->|"allow / deny request"| GW
L2 -->|"allow / deny request"| GW
L3 -->|"audit-only events"| GW
GW -->|"ALLOW / DENY"| Agent
GW --> Audit
The gateway is the single brain behind all three: it holds the agent registry, evaluates policy, enforces budgets, and appends the audit record before answering allow or deny.
Where to go next
- Security Model — the threat model and why this layered defense closes the gaps, including what each layer is and is not trusted to do.
- Architecture — the crate-level how: the gateway, the policy engine, the transports, and the full interception data flow.
Last updated: 2026-06-11 by Chisanan232
Requirements
Before you install Agent Assembly, make sure your machine meets the prerequisites below. The CLI and the governing gateway run on macOS and Linux; only the kernel-level eBPF interception layer is Linux-only.
At a glance
| You want to… | You need |
|---|---|
Install and run the aasm CLI from a release | A supported OS (macOS or Linux) — nothing else |
| Build the workspace from source | Rust stable ≥ 1.75, protoc, and a C toolchain |
| Run the SDK or sidecar-proxy interception layers | macOS or Linux |
| Run the eBPF interception layer | Linux only — a recent kernel with BTF and a nightly Rust toolchain |
Supported platforms
The three interception layers have different platform reach. The SDK shim and
the sidecar proxy (aa-proxy) run anywhere the runtime builds; kernel-level
eBPF interception is Linux-only.
| Platform | Runtime / CLI | Sidecar proxy (aa-proxy) | eBPF interception |
|---|---|---|---|
| Linux (x86_64 / arm64) | ✅ | ✅ | ✅ — kernel with BTF + nightly toolchain |
| macOS (Apple Silicon / Intel) | ✅ | ✅ | ❌ — Linux-only |
| Windows | ⚠️ via WSL2 | ⚠️ via WSL2 | ⚠️ via WSL2 |
On macOS, governance is enforced through the SDK and proxy layers; the
eBPF layer is unavailable. See aa-ebpf/README.md
for kernel requirements.
Installing the CLI only
If you just want the aasm operator CLI from a published release, you need
nothing more than a supported OS. The quick-install script
downloads a pre-built binary for x86_64/aarch64 on macOS
(apple-darwin) and Linux (unknown-linux-gnu). Jump straight to
Installation.
Building from source
To build the Cargo workspace yourself — for development, or to run the gateway
via cargo run — install the following.
Required
- Rust stable, ≥ 1.75 — install via rustup. The workspace uses the 2021 edition.
protoc— the Protocol Buffers compiler, required by theaa-protoandaa-gatewaybuild scripts.- macOS:
brew install protobuf - Debian / Ubuntu:
apt-get install protobuf-compiler
- macOS:
Recommended developer tooling
These are not needed to run the CLI but are used by the test and contribution workflow:
cargo-nextest— the test runner used across the workspace.cargo-deny— dependency and license checks.- Lefthook — git pre-commit / pre-push hooks.
Linux-only build dependencies
On Linux, the native-TLS path in aa-proxy additionally requires:
pkg-configlibssl-dev(Debian/Ubuntu) oropenssl-devel(RHEL-family)
Requirements per interception layer
Each interception layer can be deployed independently. Pick the layers you need and install only their requirements.
| Layer | What it does | Requirements |
|---|---|---|
| SDK shim (in-process) | Fastest path; the agent adopts a language SDK that reports to the gateway | The relevant SDK: python-sdk, node-sdk, or go-sdk. Runs on macOS or Linux. |
Sidecar proxy (aa-proxy) | Intercepts outbound HTTPS via MitM with a per-host CA — no code changes | macOS or Linux. On Linux, pkg-config + libssl-dev/openssl-devel. |
| eBPF (kernel) | Catches everything else, including bypass attempts | Linux only. A recent kernel with BTF enabled and a nightly Rust toolchain to build the BPF-target crates. Not available on macOS. |
The eBPF caveat. The
aa-ebpf-probesandaa-ebpf-programscrates compile for thebpfel-unknown-nonetarget and are intentionally outside the host Cargo workspace. They cannot be selected withcargo -pand do not build on macOS. If you are on macOS, you can still run and govern agents through the SDK and proxy layers — you simply do not get the kernel-level layer.
Next
With the prerequisites in place, continue to Installation.
Last updated: 2026-06-11 by Chisanan232
Installation
This page covers every supported way to get the aasm CLI onto your machine,
then how to verify it works. Pick one method:
| Method | Best for | Needs a published release? |
|---|---|---|
| Quick-install script | Fast, reproducible install on macOS / Linux | Yes |
| Homebrew tap | macOS / Linux users who already use Homebrew | Yes |
| Pre-built binaries | Air-gapped or scripted installs, custom verification | Yes |
cargo install / from source | Contributors and bleeding-edge builds | No |
Alpha note. Agent Assembly is in the
v0.0.1pre-release series; published releases are GitHub pre-releases. The public API and wire protocol are not yet stable — do not use in production.
Quick-install script
The one-line installer downloads the matching pre-built tarball plus its
SHA256SUMS file from the GitHub Release, verifies the checksum, and installs
the aasm binary:
curl -sSf https://raw.githubusercontent.com/ai-agent-assembly/agent-assembly/master/scripts/install-cli.sh | sh
By default the binary is installed to /usr/local/bin if that directory is
writable, otherwise to ~/.local/bin (always user-writable, no sudo needed).
The installer script lives in the repo at
scripts/install-cli.sh.
A short hosted alias (
https://install.ai-agent-assembly.dev— hosted install script, coming soon) is planned but not yet live — use theraw.githubusercontent.comURL above for now.
If the install directory is not on your PATH, the script prints the line to add
to your shell profile, for example:
export PATH="$HOME/.local/bin:$PATH"
Pin a version or change the install directory
The installer honors these environment variables:
# Install a specific release tag (default: latest)
AASM_VERSION=v0.0.1-alpha.5 curl -sSf https://raw.githubusercontent.com/ai-agent-assembly/agent-assembly/master/scripts/install-cli.sh | sh
# Install to a custom directory
AASM_INSTALL_DIR=/usr/local/bin curl -sSf https://raw.githubusercontent.com/ai-agent-assembly/agent-assembly/master/scripts/install-cli.sh | sh
| Variable | Default | Purpose |
|---|---|---|
AASM_INSTALL_DIR | /usr/local/bin or ~/.local/bin | Installation directory |
AASM_VERSION | latest | Specific release tag to install |
AASM_REQUIRE_SIGNATURE | 0 | When 1, a missing cosign signature aborts the install (see below) |
AASM_NO_MODIFY_PATH | 0 | When 1, suppress the PATH hint |
Supply-chain verification (checksum + cosign)
The installer always enforces a SHA-256 checksum: it downloads SHA256SUMS
and aborts if the tarball’s hash does not match. The checksum file itself is
additionally signed with cosign (keyless, via
GitHub OIDC — Fulcio cert + Rekor log). If cosign is installed locally, the
installer verifies that signature against the release workflow’s identity before
trusting the checksums. To make a missing/unverifiable signature fatal:
AASM_REQUIRE_SIGNATURE=1 curl -sSf https://raw.githubusercontent.com/ai-agent-assembly/agent-assembly/master/scripts/install-cli.sh | sh
Releases published before signing was added carry no cosign bundle; with the default
AASM_REQUIRE_SIGNATURE=0the installer warns and falls back to checksum-only (the SHA-256 check is never skipped).
Homebrew (macOS / Linux)
Install the latest tagged aasm release from the
Homebrew tap:
brew install ai-agent-assembly/homebrew-agent-assembly/aasm
Pre-built binaries (manual)
Each GitHub Release
publishes per-platform tarballs plus a SHA256SUMS file and a
SHA256SUMS.cosign.bundle signature. Tarballs are named
aasm-<arch>-<os>.tar.gz, where <arch> is x86_64 or aarch64 and <os> is
apple-darwin (macOS) or unknown-linux-gnu (Linux).
To install and verify by hand:
VERSION=v0.0.1-alpha.5
ASSET=aasm-aarch64-apple-darwin.tar.gz # adjust for your platform
BASE="https://github.com/ai-agent-assembly/agent-assembly/releases/download/${VERSION}"
curl -sSfL "${BASE}/${ASSET}" -o "${ASSET}"
curl -sSfL "${BASE}/SHA256SUMS" -o SHA256SUMS
# Verify the checksum (use sha256sum on Linux, shasum -a 256 on macOS)
shasum -a 256 -c <(grep "${ASSET}" SHA256SUMS)
# (Optional) Verify the cosign signature on the checksum file
curl -sSfL "${BASE}/SHA256SUMS.cosign.bundle" -o SHA256SUMS.cosign.bundle
cosign verify-blob \
--bundle SHA256SUMS.cosign.bundle \
--certificate-identity-regexp '^https://github\.com/ai-agent-assembly/agent-assembly/\.github/workflows/release\.yml@refs/tags/v.*$' \
--certificate-oidc-issuer 'https://token.actions.githubusercontent.com' \
SHA256SUMS
tar -xzf "${ASSET}" aasm
install -m755 aasm ~/.local/bin/aasm
Build from source
Contributors and anyone who wants the bleeding edge can build from the Cargo
workspace. This needs the build prerequisites
(Rust ≥ 1.75 and protoc).
git clone https://github.com/ai-agent-assembly/agent-assembly.git
cd agent-assembly
cargo build -p aa-cli # produces ./target/debug/aasm
The compiled binary is at ./target/debug/aasm. Add it to your PATH or run it
by path. You can also install it onto your PATH with Cargo:
cargo install --path aa-cli # installs `aasm` into ~/.cargo/bin
The eBPF-target crates (
aa-ebpf-probes,aa-ebpf-programs) are intentionally outside the workspace and are not built bycargo build -p aa-cli. See Requirements.
Verify the install
Confirm the binary is on your PATH and runs:
$ aasm --version
aasm 0.0.1-alpha.5
A fuller report — the CLI version plus whether a gateway and API are reachable —
comes from aasm version. With no control plane running yet, both report
unreachable, which is expected at this point:
$ aasm version
+-----------+---------------+-------------+
| COMPONENT | VERSION | STATUS |
+=========================================+
| cli | 0.0.1-alpha.5 | - |
|-----------+---------------+-------------|
| gateway | - | unreachable |
|-----------+---------------+-------------|
| api | - | unreachable |
+-----------+---------------+-------------+
List the available commands with aasm --help:
$ aasm --help
aasm — command-line tool for Agent Assembly
Usage: aasm [OPTIONS] <COMMAND>
Commands:
admin Gateway administrative operations
agent Manage monitored agent processes
alerts Manage governance alerts
audit Query audit log entries and export compliance reports
...
status Show fleet health, agents, approvals, and budget at a glance
topology Visualize agent topology, trees, lineage, and statistics
gateway Manage the aa-gateway governance daemon — agent registry, policy engine, audit log
start Start the locally-managed Agent Assembly gateway process
version Show CLI and gateway version information
...
Troubleshooting
| Symptom | Cause | Fix |
|---|---|---|
aasm: command not found | Install dir not on PATH | Add the install dir to PATH (the installer prints the exact line) |
could not determine latest release | The repo has no published release yet, or a network/API issue | Pin a tag with AASM_VERSION=..., or check the releases page |
SHA256 mismatch | Corrupted or tampered download | Re-download; do not install. Report it if it persists |
cosign signature verification FAILED | Bad or wrong-identity signature | Do not install; report it |
Next
Now configure the CLI to talk to your gateway — see Configuration.
Last updated: 2026-06-12 by Chisanan232
Configuration
The aasm CLI works with zero configuration — if you never create a config
file, it talks to a gateway API at http://localhost:8080. This page covers the
config file format, named contexts (connection profiles), the environment
variables the CLI reads, and the separate agent-assembly.toml runtime config
the gateway consumes.
Where the CLI connects, and how it decides
Every CLI command that talks to the control plane resolves three things — the API URL, an optional API key, and an output format — from the following sources, highest priority first:
- Explicit flags:
--api-url,--api-key. - A named context selected with
--context <name>, or thedefault_contextfrom the config file. - The built-in default API URL:
http://localhost:8080.
So aasm status with no flags and no config file connects to
http://localhost:8080. A --api-url flag always wins over any context.
The CLI config file: ~/.aa/config.yaml
CLI configuration lives at ~/.aa/config.yaml. The file is optional; if it is
absent the CLI uses defaults. Its schema:
# Name of the context used when --context is not given (optional).
default_context: local
# Named connection profiles. Each has an api_url and an optional api_key.
contexts:
local:
api_url: http://localhost:8080
production:
api_url: https://api.example.com
api_key: secret123 # optional; omit for unauthenticated endpoints
# Settings for `aasm dashboard start` (optional; shown with defaults).
dashboard:
port: 3000
auto_open: false
| Key | Type | Default | Purpose |
|---|---|---|---|
default_context | string | (none) | Context used when --context is not passed |
contexts.<name>.api_url | string | — | Base URL of the gateway API for this context |
contexts.<name>.api_key | string | (none) | Bearer token sent with requests for this context |
dashboard.port | integer | 3000 | Port the embedded dashboard SPA server listens on |
dashboard.auto_open | bool | false | Open the browser automatically after the dashboard is ready |
Named contexts (connection profiles)
A context is a named API URL + key, so you can switch between, say, a local
gateway and a hosted one without retyping flags. Manage contexts with
aasm context; the commands read and write ~/.aa/config.yaml for you.
Create or update contexts:
$ aasm context set local --api-url http://localhost:8080
Context 'local' saved.
$ aasm context set production --api-url https://api.example.com --api-key secret123
Context 'production' saved.
Choose the default context:
$ aasm context use local
Switched to context 'local'.
List them (the * marks the default; keys are never printed, only flagged as set):
$ aasm context list
local * http://localhost:8080
production https://api.example.com (key set)
Once a default is set, every command uses it. Override per-invocation with
--context:
aasm status # uses default context (local)
aasm status --context production # one-off against production
aasm status --api-url http://localhost:9090 # ad-hoc URL, ignores contexts
Environment variables
The CLI reads these environment variables. Where one overlaps a flag or config value, the precedence is noted.
| Variable | Used by | Precedence |
|---|---|---|
AASM_DASHBOARD_PORT | aasm dashboard | Highest — beats --port and dashboard.port in config |
AASM_VERSION / AASM_INSTALL_DIR | the install script | Installer only |
AA_POLICY | aasm gateway start | Default policy path; overridden by --policy |
AA_DATA_DIR | gateway / proxy / dashboard | Directory for PID files and managed-process state |
AA_PROXY_ADDR | aasm proxy start | Proxy listen address (default 127.0.0.1:8899) |
AA_GATEWAY_URL | aasm proxy start | Gateway URL the proxy reports to |
AA_CA_DIR | aasm proxy | Per-host CA material directory |
Note the two prefixes:
AASM_*variables configure the CLI surface, whileAA_*variables configure the underlying daemons the CLI launches (gateway, proxy). They are not interchangeable.
Output format
Most list/get commands accept --output table|json|yaml (default table). Use
json or yaml for scripting:
$ aasm version --output json
[
{
"component": "cli",
"version": "0.0.1-alpha.5",
"status": "-"
},
...
]
Gateway runtime config: agent-assembly.toml
The CLI config above is about how the CLI connects. The gateway itself
reads a separate runtime config — agent-assembly.toml — that selects its
persistence backends. A starter file ships at the repo root as
agent-assembly.toml.example:
# agent-assembly.toml — example runtime configuration
[storage]
policy_store = "redis"
audit_sink = "postgres"
session_store = "redis"
credential_store = "postgres"
rate_limit_counter = "redis"
lifecycle_store = "postgres"
# Per-driver connection settings live under [storage.<driver-name>].
[storage.redis]
url = "redis://localhost:6379"
[storage.postgres]
url = "postgresql://localhost:5432/assembly"
Each storage kind names a driver (memory, redis, or postgres); the runtime
resolves the name to a registered backend at boot, so you can switch backends
without recompiling.
Validate it before you boot
Use aasm config validate to check an agent-assembly.toml (currently the
[storage] section) before starting the gateway:
$ aasm config validate agent-assembly.toml.example
Config is valid: agent-assembly.toml.example
A valid file exits 0; an invalid one reports the problem and exits non-zero.
Next
You are configured. Walk through starting a gateway and observing an agent in First run.
Last updated: 2026-06-11 by Chisanan232
First run
This walkthrough takes you from a freshly installed aasm to a running
governance gateway that is ready for an agent to connect. Every command and its
output below was captured from a real v0.0.1-alpha.5 build.
The flow
flowchart LR
A["aasm gateway start<br/>--policy low-risk.yaml"] --> B["gRPC gateway<br/>127.0.0.1:50051"]
B --> C["aasm gateway status<br/>→ running"]
C --> D{"Connect an<br/>interception layer"}
D -->|SDK shim| E["Agent registers<br/>via gRPC"]
D -->|Sidecar proxy| E
D -->|eBPF on Linux| E
E --> F["aasm status / topology<br/>view the fleet"]
F --> G["aasm gateway stop"]
Two endpoints, one gateway. The gateway speaks gRPC on
127.0.0.1:50051— this is what SDK shims and the sidecar proxy connect to. The operator commandsaasm status,aasm agent, andaasm topologytalk to the gateway’s HTTP API onhttp://localhost:8080. In the OSS alpha the gRPC listener is whataasm gateway startbrings up; until an HTTP API server is also serving on8080, the HTTP-backed commands reportunreachable. That is expected and called out at each step below.
1. Start the gateway
Point the gateway at one of the bundled reference policies. policy-examples/
ships low-risk.yaml, medium-risk.yaml, and high-risk.yaml; low-risk
allows and audits everything, which is the easiest starting point.
$ aasm gateway start --policy policy-examples/low-risk.yaml
Gateway started on grpc://127.0.0.1:50051 (pid 74472)
Logs: /Users/you/.aasm/logs/gateway.log
This spawns aa-gateway as a detached background process listening for gRPC on
127.0.0.1:50051. (If you built from source, ensure aa-gateway is reachable —
aasm gateway start looks in $PATH, ~/.cargo/bin, and ./target/{debug,release}.)
Alternative — from a source checkout without installing: the gateway can be run directly with Cargo, which is the form the rest of the book uses:
cargo run -p aa-gateway -- --policy policy-examples/low-risk.yamlIt listens on the same
127.0.0.1:50051.
2. Confirm it is running
$ aasm gateway status
Gateway: running pid=74472 listen=127.0.0.1:50051 uptime=5s
If nothing is running you get a non-zero exit and:
$ aasm gateway status
Gateway: not running
Tail the gateway log at any time with aasm gateway logs.
3. Check overall status
aasm status gives the fleet-wide picture — gateway health, registered agents,
pending approvals, and budget. It queries the HTTP API at http://localhost:8080:
$ aasm status
Agent Assembly Status
─────────────────────────────────────
Gateway: http://localhost:8080
Health: ✗ unreachable
─────────────────────────────────────
RUNTIME HEALTH
──────────────
API: ✗ unreachable
Uptime: 0s
Connections: 0
Lag: 0 ms
ACTIVE AGENTS
─────────────
(no agents registered)
PENDING APPROVALS
─────────────────
Count: 0
BUDGET STATUS
─────────────
Daily spend : $-- (no limit set)
Date: --
(no per-agent data)
Error: gateway is not running. Start it with: aasm start
The unreachable health here reflects the gRPC-vs-HTTP split described above:
the gRPC gateway from step 1 is up, but the HTTP API on 8080 is not being
served in this OSS-only setup. Once an API server is serving on 8080 (for
example through the hosted control plane, or a future OSS API server), Health
flips to reachable and registered agents appear in ACTIVE AGENTS.
Add --watch to auto-refresh the display every 5 seconds, or --json for a
machine-readable header suitable for scripting and CI.
4. Observe an agent
Agents register with the gateway through an interception layer — they are not created from the CLI. Wire one of the SDKs into your agent, or front it with the sidecar proxy, and point it at the gateway:
- SDK shim (in-process): install python-sdk, node-sdk, or go-sdk and follow that SDK’s quickstart. The shim reports every action to the gateway over gRPC.
- Sidecar proxy (no code changes): run
aasm proxy startto intercept the agent’s outbound HTTPS and forward governance decisions to the gateway. - eBPF (Linux only): kernel hooks catch everything else, including bypass attempts.
A quick way to exercise the sidecar path end-to-end is the bundled Docker
Compose stack, which runs aa-runtime as a sidecar against a stub agent:
cd examples/docker-compose
AA_API_KEY=dev-local-key docker compose up
The sidecar exposes the agent IPC socket at
/tmp/aa-runtime-my-agent-001.sock and a readiness probe at
http://localhost:8080/ready.
Once an agent is registered and the HTTP API is reachable, list the fleet:
aasm agent list # all registered agents
aasm agent inspect <id> # detail for one agent
Until then these commands report the API as unreachable:
$ aasm agent list
error: API request failed: error sending request for url (http://localhost:8080/api/v1/agents)
5. View the topology
aasm topology visualizes the agent fleet — trees, lineage, teams, and
aggregate stats. Like aasm status, it reads the HTTP API:
aasm topology overview # fleet-wide overview
aasm topology tree <id> # subtree rooted at an agent
aasm topology stats # aggregate statistics
With no reachable API it reports:
$ aasm topology overview
error: registry unreachable — check --api-url
6. Open a dashboard
For a live, interactive view there are two consoles:
- Web dashboard —
aasm dashboard startserves the embedded SPA athttp://127.0.0.1:3000(port configurable; see Configuration). It blocks untilCtrl-C; useaasm dashboard opento launch your browser against an already-running server. - Terminal (TUI) dashboard —
aasm dashboardopens an interactive in-terminal dashboard for real-time monitoring, no browser required.
The web dashboard’s app shell looks like this after you sign in — the full governance navigation (Monitor / Control / Manage) down the left, with the approvals indicator, theme toggle, Settings, and Log out across the top:

The data panels are empty here because this is the open-source local-mode gateway, which serves the SPA but not the populated data API (that lives in the hosted control plane). See Observe in the dashboard for the full picture, including the live-operations and dark-mode views.
7. Stop the gateway
When you are done, shut the gateway down cleanly (SIGTERM, escalating to SIGKILL after the timeout):
aasm gateway stop
Where to go next
- CLI Reference — every
aasmcommand and flag. - Usage Guide — govern an agent end-to-end, author policies, and set budgets.
- Security Model — the threat model and the three-layer defense-in-depth rationale.
Last updated: 2026-06-12 by Chisanan232
CLI Reference — Overview
The aasm binary (crate aa-cli) is the operator front-end for Agent
Assembly. It talks to a running aa-gateway over its HTTP / OpenAPI surface
(default http://localhost:8080) for registry, policy, audit, approval, cost,
and topology operations, and manages local daemon processes (gateway, proxy,
dashboard) directly.
Invocation
aasm [OPTIONS] <COMMAND> [SUBCOMMAND] [ARGS]
Every command supports --help (-h for a one-line summary) at each layer:
aasm --help # list all top-level commands
aasm policy --help # list policy subcommands
aasm policy apply --help # flags + arguments for one subcommand
Global options
These flags are defined on the root parser (aa-cli/src/lib.rs) and are
global — they may be passed before the command or on any subcommand.
| Flag | Type | Default | Description |
|---|---|---|---|
--context <CONTEXT> | string | (default context, if any) | Named context from ~/.aa/config.yaml to use for the API URL and key. |
--output <OUTPUT> | table | json | yaml | table | Output format for list/get commands. |
--api-url <API_URL> | string | http://localhost:8080 | Override the gateway API base URL. Takes precedence over the resolved context. |
--api-key <API_KEY> | string | (none) | Override the API key. Takes precedence over the context’s stored key. |
-h, --help | flag | — | Print help. |
-V, --version | flag | — | Print the aasm version. |
Several commands also expose a local
--outputor--jsonflag that overrides the global--outputfor that command only (e.g.aasm logs --output json,aasm status --json,aasm gateway status --json). These are called out on the relevant command pages.
Output formats
--output (source: aa-cli/src/output.rs) selects how list/get commands
render:
table(default) — human-readable, colorized tables viacomfy-table.json— machine-readable pretty JSON.yaml— machine-readable YAML.
Commands that stream (aasm logs --follow, aasm approvals watch),
visualize (aasm trace, aasm topology tree), or open a TUI (aasm dashboard) ignore --output where it does not apply.
Config and context resolution
CLI configuration lives at ~/.aa/config.yaml (source:
aa-cli/src/config.rs). It holds named contexts (connection profiles), an
optional default context, and dashboard settings:
default_context: production
contexts:
production:
api_url: https://api.example.com
api_key: prod-key
staging:
api_url: https://staging.example.com
dashboard:
port: 3000
auto_open: false
The active API URL and key are resolved with this precedence (highest first):
- Explicit
--api-url/--api-keyflags. - The named context —
--context <name>, otherwisedefault_context. - Built-in default URL
http://localhost:8080(no key).
Manage contexts with the aasm context command group.
Note on paths. The CLI config file is
~/.aa/config.yaml. Separately, the locally-managed gateway uses~/.aasm/for its runtime artifacts —~/.aasm/config.yaml(gateway config, seeaasm start),~/.aasm/policy.yaml,~/.aasm/logs/gateway.log, and~/.aasm/gateway.pid. These are distinct files.
Exit codes
aasm follows the standard convention:
0— success.- non-zero — failure. Common causes: the gateway is unreachable, the API returned a non-2xx status, a named context was not found, a file failed to parse, or a validation/simulation step found problems.
Some commands give the exit code a documented meaning so it can gate CI:
| Command | Non-zero exit means |
|---|---|
aasm status | Gateway unreachable, any agent has violations, or storage health probe reports unavailable. |
aasm policy simulate | The simulation detected policy violations. |
aasm policy validate, aasm config validate | The file is invalid (error printed to stderr). |
aasm audit verify-chain | The audit hash chain failed verification. |
Command groups
| Command | Talks to | Purpose |
|---|---|---|
aasm status | Gateway HTTP | Fleet health, agents, approvals, budget at a glance. |
aasm agent | Gateway HTTP | List, inspect, suspend, resume, kill registered agents. |
aasm policy | Gateway HTTP + local | Apply, version, diff, simulate, validate, show policies. |
aasm topology | Gateway HTTP | Visualize agent trees, teams, lineage, stats. |
aasm alerts | Gateway HTTP | List, inspect, resolve governance alerts. |
aasm approvals | Gateway HTTP + WS | Human-in-the-loop approval queue. |
aasm audit | Gateway HTTP + local | Query, export, verify, and compliance-export audit data. |
aasm logs | Gateway HTTP + WS | Query and stream audit-log events. |
aasm trace | Gateway HTTP | Visualize a single session trace. |
aasm cost | Gateway HTTP | Cost summary and monthly forecast. |
aasm dashboard | Gateway HTTP/WS + local | TUI dashboard and embedded SPA server. |
aasm gateway | Local process | Manage the aa-gateway daemon. |
aasm proxy | Local process | Manage the aa-proxy sidecar and its CA. |
aasm start / aasm stop | Local process | Start/stop the locally-managed gateway. |
aasm sandbox | Local | Run a WASM tool under the sandbox. |
aasm config | Local | Validate / boot an agent-assembly.toml. |
aasm context | Local | Manage ~/.aa/config.yaml contexts. |
aasm admin | Gateway HTTP | Administrative operations (retention). |
aasm version | Gateway HTTP | CLI + gateway/api versions. |
aasm completion | Local | Generate shell completion scripts. |
Developer-only commands. The source tree also defines
aasm run(launch a governed AI dev tool) andaasm tools(discover installed AI dev tools). Both are gated behind thedevtoolregion inaa-cli/src/commands/mod.rsandaa-cli/Cargo.tomland are stripped from the published crate by.ci/strip-for-publish.shbefore release. They are intentionally not documented here because they are not part of the publishedaasmsurface.
Last updated: 2026-06-11 by Chisanan232
aasm status
Show fleet health, agents, approvals, and budget at a glance. aasm status
fetches the deployment overview, runtime health, agent list, pending
approvals, cost rollup, and storage health from the gateway in one shot and
renders a dashboard-style summary.
Synopsis
aasm status [OPTIONS]
This command has no subcommands.
Options
| Flag | Type | Default | Description |
|---|---|---|---|
--watch | flag | off | Auto-refresh the status display every 5 seconds. Runs until interrupted (Ctrl-C). |
--json | flag | off | Print only the deployment-overview header as machine-readable JSON (the AAASM-1579 contract). Distinct from --output json, which serializes the full snapshot. |
Plus the global options.
Exit code
0— all healthy.- non-zero — the gateway is unreachable, at least one agent has violations, or
the storage health probe reports
unavailable. All failure modes collapse to a single non-zero code so shell scripts can gate on it.
Examples
Show the full status summary:
aasm status
Agent Assembly Status
─────────────────────────────────────
Mode: local
Gateway: http://localhost:7391
Storage: sqlite (~/.aasm/local.db)
Version: 0.0.1
Uptime: 2h 15m 33s
Health: ✓ ok
─────────────────────────────────────
Active Agents
ID NAME FRAMEWORK STATUS SESSIONS VIOLATIONS LAST EVENT
a1b2… research-bot langgraph active 3 0 2m ago tool_call
Pending Approvals: 1 (oldest 2m 15s)
Budget: $12.50 / $50.00 daily ███████░░░░░░░░░░░░░ 25%
Continuously refresh:
aasm status --watch
Machine-readable deployment header for CI:
aasm status --json
{
"mode": "local",
"gateway_url": "http://localhost:7391",
"storage_backend": "sqlite",
"storage_path": "~/.aasm/local.db",
"version": "0.0.1",
"uptime_secs": 8133,
"health": "ok"
}
Full snapshot as JSON (every section):
aasm status --output json
Last updated: 2026-06-11 by Chisanan232
aasm agent
Manage monitored agent processes registered with the gateway.
Synopsis
aasm agent <SUBCOMMAND> [OPTIONS]
| Subcommand | Purpose |
|---|---|
list | List all registered agents. |
inspect | Show detailed information about one agent. |
suspend | Suspend a running agent. |
resume | Resume a suspended agent. |
kill | Deregister and terminate an agent. |
All subcommands accept the global options,
including --output table|json|yaml.
aasm agent list
List all registered agents, with optional client-side filters.
Options
| Flag | Type | Default | Description |
|---|---|---|---|
--status <STATUS> | string | — | Filter by agent status (e.g. Active, Suspended, Deregistered). |
--framework <FRAMEWORK> | string | — | Filter by agent framework (e.g. langgraph, crewai). |
--watch | flag | off | Auto-refresh the table every 2 seconds. |
Example
aasm agent list --status Active --framework langgraph
ID NAME FRAMEWORK VERSION STATUS TOOLS
a1b2c3… research-bot langgraph 1.2.0 Active search, fetch
aasm agent inspect
Render a detailed key-value view of a single agent: identity, status, tools, metadata, active sessions, recent events, and recent trace session IDs.
Arguments
| Argument | Type | Description |
|---|---|---|
<AGENT_ID> | string | Hex-encoded agent UUID to inspect. |
Example
aasm agent inspect a1b2c3d4e5f600112233445566778899
Agent a1b2c3d4…
Name: research-bot
Framework: langgraph 1.2.0
Status: Active
PID: 48213
Sessions: 3
Violations: 0
Tools: search, fetch, summarize
Recent traces:
7f3a… 2026-06-09T14:02:11Z (aasm trace 7f3a…)
aasm agent suspend
Suspend a running agent. The reason is logged for audit.
Arguments / options
| Name | Type | Default | Description |
|---|---|---|---|
<AGENT_ID> | string (arg) | — | Hex-encoded agent UUID to suspend. |
--reason <REASON> | string | required | Reason for suspending (logged for audit). |
--force | flag | off | Skip the confirmation prompt. |
Example
aasm agent suspend a1b2c3… --reason "investigating cost spike" --force
Suspended a1b2c3… : Active → Suspended
aasm agent resume
Resume a previously suspended agent.
Arguments
| Argument | Type | Description |
|---|---|---|
<AGENT_ID> | string | Hex-encoded agent UUID to resume. |
Example
aasm agent resume a1b2c3…
Resumed a1b2c3… : Suspended → Active
aasm agent kill
Deregister and terminate an agent.
Arguments / options
| Name | Type | Default | Description |
|---|---|---|---|
<AGENT_ID> | string (arg) | — | Hex-encoded agent UUID to kill. |
--force | flag | off | Skip the confirmation prompt. |
Example
aasm agent kill a1b2c3… --force
Killed a1b2c3… — deregistered and terminated.
Last updated: 2026-06-11 by Chisanan232
aasm policy
Manage governance policies — apply new versions, inspect history, roll back, diff, simulate, validate locally, and view effective policy.
Synopsis
aasm policy <SUBCOMMAND> [OPTIONS]
| Subcommand | Purpose |
|---|---|
apply | Apply a policy YAML file and save it to version history. |
history | List recent policy versions. |
rollback | Roll back to a previous version. |
diff | Show the diff between two versions. |
simulate | Dry-run a policy against historical events or live traffic. |
validate | Validate a policy YAML file locally (no apply). |
get | Show the active policy YAML (or a specific version). |
list | List all deployed policies. |
show | Show an agent’s effective policy view. |
All subcommands accept the global options.
aasm policy apply
Apply a policy YAML file and save it to version history.
| Name | Type | Default | Description |
|---|---|---|---|
<FILE> | path (arg) | — | Path to the policy YAML file. |
--applied-by <APPLIED_BY> | string | — | Identity of the person or system applying the policy. |
aasm policy apply ./policies/prod.yaml --applied-by alice@example.com
Applied policy 9f2c1a (version 2026-06-09T14:00:00Z) — active, 12 rules
aasm policy history
List recent policy versions.
| Name | Type | Default | Description |
|---|---|---|---|
-n, --limit <LIMIT> | integer | 10 | Maximum number of versions to show. |
aasm policy history -n 5
aasm policy rollback
Roll back to a previous policy version, making it active again.
| Name | Type | Description |
|---|---|---|
<VERSION> | string (arg) | Version identifier (SHA-256 prefix) to roll back to. |
aasm policy rollback 9f2c1a
aasm policy diff
Show a colorized unified diff between two policy versions. Colors are suppressed when stdout is not a TTY.
| Name | Type | Description |
|---|---|---|
<VERSION_A> | string (arg) | First version identifier (SHA-256 prefix). |
<VERSION_B> | string (arg) | Second version identifier (SHA-256 prefix). |
aasm policy diff 9f2c1a 7ab310
aasm policy simulate
Simulate a policy against historical audit events or live traffic without enforcing it. Exits non-zero if the simulation detects any violation, so it can gate a CI pipeline.
| Flag | Type | Default | Description |
|---|---|---|---|
--policy <POLICY> | path | required | Path to the policy YAML file to simulate. |
--against <AGAINST> | path | — | Audit-log JSONL file to replay against the policy. |
--live | flag | false | Observe live agent traffic instead of replaying a file. |
--duration <DURATION> | string | — | Duration for live simulation (e.g. 60s, 5m). |
--output-file <OUTPUT_FILE> | path | — | Write the simulation report JSON here. (Named --output-file to avoid colliding with the global --output.) |
aasm policy simulate --policy ./candidate.yaml --against ./audit/session.jsonl
Simulation: 412 events, 3 would-be violations
deny file_write /etc/passwd (rule: block-system-paths)
exit status: 1
aasm policy validate
Validate a policy YAML file locally (no apply, no gateway contact). Exits 0
when valid, 1 with error details on stderr otherwise.
| Name | Type | Description |
|---|---|---|
<FILE> | path (arg) | Path to the policy YAML file to validate. |
aasm policy validate ./policies/prod.yaml
✓ policy valid — 12 rules
aasm policy get
Show the currently active policy YAML, or a specific version.
| Flag | Type | Default | Description |
|---|---|---|---|
--version <VERSION> | string | (latest active) | Version identifier (SHA-256 prefix) to retrieve. Omit for the active policy. |
aasm policy get --version 9f2c1a
aasm policy list
List all policies deployed to the governance runtime. Takes no flags of its
own (uses the global --output).
aasm policy list --output json
NAME VERSION ACTIVE RULES
9f2c1a 2026-06-09T14:00:00Z yes 12
7ab310 2026-06-01T09:30:00Z no 11
aasm policy show
Show an agent’s effective policy view. By default prints the agent identity; add a flag to expand into the capability cascade or budget rollup.
| Name | Type | Default | Description |
|---|---|---|---|
<AGENT_ID> | string (arg) | — | Hex-encoded agent UUID (32 hex characters). |
--show-permissions | flag | off | Print the effective capability set with cascade provenance (granted-by / denied-by scope). |
--show-budget | flag | off | Print the budget rollup across agent / team / org / subtree. |
aasm policy show a1b2c3… --show-permissions
Capability Effective Granted by Denied by
search Allow team:research —
file_write Deny — org
Last updated: 2026-06-11 by Chisanan232
aasm topology
Visualize agent topology — fleet overview, delegation trees, teams, ancestry lineage, and aggregate statistics.
Synopsis
aasm topology <SUBCOMMAND> [OPTIONS]
| Subcommand | Purpose |
|---|---|
overview | Fleet-wide topology overview. |
tree | Render a subtree rooted at a given agent. |
team | Show all agents in a team. |
lineage | Show the ancestry chain for a given agent. |
stats | Show aggregate topology statistics. |
All subcommands accept the global options,
including --output table|json|yaml (tables render via box-drawing trees for
tree/lineage).
aasm topology overview
Show a fleet-wide topology overview across all teams and root agents.
| Flag | Type | Default | Description |
|---|---|---|---|
--status <STATUS> | string | — | Filter agents by status (active, suspended, deregistered). |
--show-budget | flag | off | Include governance level in agent nodes. |
aasm topology overview --status active
aasm topology tree
Render a delegation subtree rooted at one agent, using box-drawing characters.
| Name | Type | Default | Description |
|---|---|---|---|
<AGENT_ID> | string (arg) | — | Root agent ID (hex-encoded UUID). |
--max-depth <DEPTH> | integer | 10 | Maximum traversal depth from the root. |
--status <STATUS> | string | — | Filter tree nodes by status. |
--show-budget | flag | off | Include governance level in tree nodes. |
aasm topology tree a1b2c3… --max-depth 3
research-bot (a1b2c3…)
├── fetch-worker (d4e5f6…)
│ └── parse-worker (778899…)
└── summarize-worker (aabbcc…)
aasm topology team
Show all agents belonging to a single team.
| Name | Type | Default | Description |
|---|---|---|---|
<TEAM_ID> | string (arg) | — | Team ID. |
--status <STATUS> | string | — | Filter members by status. |
--show-budget | flag | off | Include governance level in agent nodes. |
aasm topology team research --status active
aasm topology lineage
Show an agent’s complete ancestry chain, ordered root-first.
| Name | Type | Default | Description |
|---|---|---|---|
<AGENT_ID> | string (arg) | — | Agent ID (hex-encoded UUID). |
--show-permissions | flag | off | After the lineage, also print the agent’s effective capability set with cascade provenance. |
aasm topology lineage 778899… --show-permissions
root-bot (a1b2c3…)
└── fetch-worker (d4e5f6…)
└── parse-worker (778899…) ← target
aasm topology stats
Show aggregate topology statistics — total/root/active/suspended counts, max
depth, team sizes, and depth/spawn histograms. Takes no flags of its own
(uses the global --output).
aasm topology stats --output json
Total agents: 42
Root agents: 5
Max depth: 4
Active: 38 Suspended: 3 Deregistered: 1
Teams: 5
Avg children/parent: 2.31
Last updated: 2026-06-11 by Chisanan232
aasm alerts
Manage governance alerts — list, inspect, and resolve.
Synopsis
aasm alerts <SUBCOMMAND> [OPTIONS]
| Subcommand | Purpose |
|---|---|
list | List governance alerts. |
get | Show full detail for one alert. |
resolve | Resolve an alert. |
All subcommands accept the global options.
aasm alerts list
List governance alerts as a color-coded table, with optional filters.
| Flag | Type | Default | Description |
|---|---|---|---|
--agent <AGENT> | string | — | Filter by agent ID. |
--severity <SEVERITY> | string | — | Filter by severity (critical, warning, info). |
--status <STATUS> | string | unresolved | Filter by status (unresolved, acknowledged, resolved). |
aasm alerts list --severity critical
ID SEVERITY CATEGORY STATUS MESSAGE
al-301 critical budget unresolved team:research over daily cap
al-298 warning policy_violation unresolved file_write denied (agent a1b2c3…)
aasm alerts get
Render a detailed key-value view of one alert.
| Argument | Type | Description |
|---|---|---|
<ALERT_ID> | string | Alert ID to inspect. |
aasm alerts get al-301
aasm alerts resolve
Resolve an alert, optionally attaching a note.
| Name | Type | Default | Description |
|---|---|---|---|
<ALERT_ID> | string (arg) | — | Alert ID to resolve. |
--reason <REASON> | string | — | Optional resolution note. |
--force | flag | off | Skip the confirmation prompt. |
aasm alerts resolve al-301 --reason "raised team cap" --force
Resolved al-301.
Last updated: 2026-06-11 by Chisanan232
aasm approvals
Manage human-in-the-loop approval requests — list pending actions, approve or reject them, and watch for new requests in real time.
Synopsis
aasm approvals <SUBCOMMAND> [OPTIONS]
| Subcommand | Purpose |
|---|---|
list | List pending (or resolved) approval requests. |
get | Show details of one request. |
approve | Approve a pending action. |
reject | Reject a pending action. |
watch | Watch for new approval requests over WebSocket. |
All subcommands accept the global options.
aasm approvals list
List approval requests as a colored table. The TIMEOUT_IN column is
color-coded (red < 60s, yellow 60–180s, green > 180s).
| Flag | Type | Default | Description |
|---|---|---|---|
--output <FORMAT> | table | json | yaml | global default | Per-command output override. |
--status <STATUS> | pending | approved | rejected | pending | Filter by lifecycle status. Resolved history is bounded (default cap 1000). |
--agent <AGENT> | string | — | Filter to approvals submitted by this agent ID (exact match). |
aasm approvals list --status pending
ID AGENT ACTION CONDITION SUBMITTED_AT TIMEOUT_IN
ap-77 a1b2c3… file_write /etc/hosts 2026-06-09T14:01:00Z 2m 30s
aasm approvals get
Show details of a single pending approval request.
| Name | Type | Default | Description |
|---|---|---|---|
<ID> | string (arg) | — | Approval request ID to look up. |
--output <FORMAT> | table | json | yaml | global default | Per-command output override. |
aasm approvals get ap-77
aasm approvals approve
Approve a pending action.
| Name | Type | Default | Description |
|---|---|---|---|
<ID> | string (arg) | — | Approval request ID to approve. |
--reason <REASON> | string | — | Optional reason. May also be supplied on piped stdin. |
aasm approvals approve ap-77 --reason "verified safe"
Approved ap-77.
aasm approvals reject
Reject a pending action. A reason is required in non-interactive mode
(supply --reason or pipe it on stdin).
| Name | Type | Default | Description |
|---|---|---|---|
<ID> | string (arg) | — | Approval request ID to reject. |
--reason <REASON> | string | required (non-interactive) | Reason for rejection. May also be piped on stdin. |
aasm approvals reject ap-77 --reason "writes outside allowed path"
Rejected ap-77.
aasm approvals watch
Watch for new approval requests in real time over the gateway WebSocket
events endpoint (filtered to approval events).
| Flag | Type | Default | Description |
|---|---|---|---|
-i, --interactive | flag | off | Enable interactive mode with keyboard shortcuts (a=approve, r=reject, q=quit; arrow keys navigate). |
aasm approvals watch --interactive
▶ ap-78 a1b2c3… network_egress api.openai.com 3m 00s
a approve r reject ↑/↓ select q quit
Last updated: 2026-06-11 by Chisanan232
aasm audit
Query audit log entries and export tamper-evident compliance reports.
Synopsis
aasm audit <SUBCOMMAND> [OPTIONS]
| Subcommand | Purpose |
|---|---|
list | Query audit log entries with filters. |
export | Export audit data fetched from the gateway as CSV/JSON/JSONL. |
verify-chain | Verify the SHA-256 hash chain of a local JSONL audit file. |
compliance-export | Full-fidelity compliance export of a local JSONL audit file. |
All subcommands accept the global options.
Time filters.
--sinceaccepts a duration shorthand (30m,2h,1d) or an ISO 8601 timestamp;--untilaccepts an ISO 8601 timestamp.
aasm audit list
Query audit log entries from the gateway (GET /api/v1/logs) with optional
filters, rendered as a table (or --output json|yaml). The result column is
color-coded: allow=green, deny=red, pending=yellow.
| Flag | Type | Default | Description |
|---|---|---|---|
--agent <AGENT> | string | — | Filter by agent identifier. |
--action <ACTION> | string | — | Filter by action type (e.g. ToolCallIntercepted, PolicyViolation). |
--result <RESULT> | allow | deny | pending | — | Filter by policy decision result. |
--since <SINCE> | string | — | Show events after this duration or ISO 8601 timestamp. |
--until <UNTIL> | string | — | Show events before this ISO 8601 timestamp. |
--limit <LIMIT> | integer | 50 | Maximum number of entries to return. |
--dry-run-only | flag | off | Show only observe-mode shadow events (dry_run: true). When off (default), shadow events are hidden so you see live enforcement decisions only. |
aasm audit list --result deny --since 2h --limit 20
SEQ TIMESTAMP AGENT EVENT RESULT
142 2026-06-09T14:01:00Z a1b2c3… PolicyViolation deny
aasm audit export
Export audit entries fetched from the gateway to CSV/JSON/JSONL, with optional
compliance metadata headers. Writes to stdout unless --output-file is given.
| Flag | Type | Default | Description |
|---|---|---|---|
--format <FORMAT> | csv | json | jsonl | required | Export file format. JSONL is preferred for SIEM ingestion. |
--compliance <COMPLIANCE> | eu-ai-act | soc2 | — | Prepend a compliance metadata header. |
--output-file <OUTPUT_FILE> | string | (stdout) | Write output to a file. (Named --output-file to avoid colliding with the global --output.) |
--agent <AGENT> | string | — | Filter by agent identifier. |
--action <ACTION> | string | — | Filter by action type. |
--result <RESULT> | allow | deny | pending | — | Filter by policy decision result. |
--since <SINCE> | string | — | Show events after this duration or ISO 8601 timestamp. |
--until <UNTIL> | string | — | Show events before this ISO 8601 timestamp. |
--limit <LIMIT> | integer | 1000 | Maximum number of entries to fetch. |
aasm audit export --format jsonl --compliance soc2 --since 1d \
--output-file audit-2026-06-09.jsonl
aasm audit verify-chain
Verify the SHA-256 hash chain of a local JSONL audit log file. Exits non-zero if the chain is broken (tamper evidence).
| Argument | Type | Description |
|---|---|---|
<PATH> | path | Path to the JSONL audit log file to verify. |
aasm audit verify-chain ./audit/session-7f3a.jsonl
✓ chain valid — 412 entries, genesis → entry 0xab12…
aasm audit compliance-export
Full-fidelity compliance export of a local JSONL audit file. Preserves the SHA-256 hash chain anchors, credential findings (kind + offset only — never the raw secret), and delegation lineage for SIEM ingestion and regulatory review.
| Flag | Type | Default | Description |
|---|---|---|---|
--input <INPUT> | path | required | Per-session audit JSONL file produced by the gateway. |
--format <FORMAT> | csv | json | jsonl | jsonl | Export format. JSONL is preferred for SIEM/regulator ingestion. |
--compliance <COMPLIANCE> | eu-ai-act | soc2 | — | Prepend a compliance framework header. |
--output-file <OUTPUT_FILE> | path | (stdout) | Write output to a file. |
--agent <AGENT> | string | — | Filter by hex-encoded agent identifier (32 hex chars). |
--event-type <EVENT_TYPE> | string | — | Filter by audit event-type label (e.g. PolicyViolation). |
--since <SINCE> | string | — | Include entries after this duration shorthand or ISO 8601 timestamp. |
--until <UNTIL> | string | — | Include entries before this ISO 8601 timestamp. |
aasm audit compliance-export --input ./audit/session-7f3a.jsonl \
--format jsonl --compliance eu-ai-act --output-file compliance.jsonl
Last updated: 2026-06-11 by Chisanan232
aasm logs
Query and stream audit-log events. In default mode it fetches recent entries
over HTTP; with --follow it streams events live over the gateway WebSocket
(like tail -f).
Synopsis
aasm logs [OPTIONS]
This command has no subcommands.
Options
| Flag | Type | Default | Description |
|---|---|---|---|
-f, --follow | flag | off | Stream events in real time over WebSocket. |
--agent <AGENT> | string | — | Filter by agent identifier. |
--type <TYPE> | comma-separated | — | Filter by event type(s). Accepted: violation, approval, budget. |
--since <SINCE> | string | — | Show events after this duration (30m, 2h, 1d) or ISO 8601 timestamp. |
--until <UNTIL> | string | — | Show events before this ISO 8601 timestamp. |
--limit <LIMIT> | integer | 50 | Maximum number of entries in non-follow mode. |
--no-color | flag | off | Disable colored output. |
--output <FORMAT> | table | json | yaml | global default | Per-command output override. |
Plus the global options.
Examples
Show the last 50 entries:
aasm logs
2026-06-09T14:01:00Z [VIOLATION] a1b2c3… file_write denied: /etc/passwd
2026-06-09T14:01:05Z [APPROVAL] a1b2c3… network_egress pending: api.openai.com
Filter to violations and budget events for one agent:
aasm logs --agent a1b2c3… --type violation,budget --since 1h
Stream live (Ctrl-C to stop):
aasm logs --follow --type violation
Emit JSON for piping into jq:
aasm logs --output json --limit 200 | jq '.[].message'
Last updated: 2026-06-11 by Chisanan232
aasm trace
Visualize a single agent session trace as an indented tree or a horizontal timeline. The trace is fetched from the gateway and the flat span list is folded into a hierarchy (LLM calls, tool calls, tool results, policy allow/deny).
Synopsis
aasm trace [OPTIONS] <SESSION_ID>
This command has no subcommands.
Arguments
| Argument | Type | Description |
|---|---|---|
<SESSION_ID> | string | Session ID to retrieve the trace for. |
Options
| Flag | Type | Default | Description |
|---|---|---|---|
--format <FORMAT> | tree | timeline | tree | Visualization format. tree = indented box-drawing tree; timeline = horizontal ASCII duration bars. |
Plus the global options.
Examples
Tree view (default):
aasm trace 7f3a1c2b
session 7f3a1c2b
├─ 🧠 llm: gpt-4 (1200ms)
│ ├─ 🔧 tool_call: search (340ms)
│ │ └─ 📥 tool_result: search (12ms)
│ └─ ⛔ deny: file_write — path outside allowlist
└─ 🧠 llm: gpt-4 (800ms)
Timeline view:
aasm trace 7f3a1c2b --format timeline
llm: gpt-4 ████████████████████ 1200ms
tool_call: search ██████ 340ms
llm: gpt-4 █████████████ 800ms
Last updated: 2026-06-11 by Chisanan232
aasm cost
Query cost summary and forecast spending.
Synopsis
aasm cost <SUBCOMMAND> [OPTIONS]
| Subcommand | Purpose |
|---|---|
summary | Show cost summary for the current period. |
forecast | Forecast monthly spend from the current daily rate. |
Both subcommands accept the global options.
aasm cost summary
Show the cost summary for a time period, optionally grouped by a dimension.
| Flag | Type | Default | Description |
|---|---|---|---|
--period <PERIOD> | today | month | today | Time period to report on. |
--group-by <GROUP_BY> | agent | — | Group spend by dimension. |
aasm cost summary --period month --group-by agent
Cost Summary (month, 2026-06)
Total: $312.40 / $1,000.00 (31.2%)
AGENT MONTHLY SPEND
a1b2c3… $180.10
d4e5f6… $132.30
aasm cost forecast
Forecast monthly spending by extrapolating the current daily rate over the
remaining days of the month. Takes no flags of its own (uses the global
--output).
aasm cost forecast
Cost Forecast (2026-06-09, day 9 of 30)
Current daily spend: $12.50
Projected monthly spend: $375.00
Monthly limit: $1,000.00
Projected utilization: 37.5%
Last updated: 2026-06-11 by Chisanan232
aasm dashboard
Real-time governance monitoring. With no subcommand, aasm dashboard opens an
interactive terminal (TUI) dashboard. The subcommands manage an embedded
single-page-app (SPA) web server instead.
Synopsis
aasm dashboard [SUBCOMMAND] [OPTIONS]
| Form | Purpose |
|---|---|
aasm dashboard (no subcommand) | Open the interactive TUI dashboard. |
start | Serve the embedded SPA over HTTP. |
open | Open the browser to an already-running dashboard. |
stop | Stop a dashboard server started with start. |
The TUI streams status over HTTP polling plus a WebSocket event feed. Panels:
fleet health + agents, event log, budget bars, and the pending-approvals queue
with countdown timers. Keyboard shortcuts (Tab/Shift-Tab to cycle panels,
arrows to select, a/r to approve/reject, p policy viewer, ? help,
q quit).
The dashboard port resolves from (highest first): AASM_DASHBOARD_PORT env
var → --port flag → dashboard.port in ~/.aa/config.yaml (default
3000).
aasm dashboard start
Serve the embedded SPA at http://127.0.0.1:<port>. Blocks until Ctrl-C.
Reverse-proxies /api/* to the configured gateway.
| Flag | Type | Default | Description |
|---|---|---|---|
--port <PORT> | integer | 3000 (config) | Port to listen on. Overrides config; also reads AASM_DASHBOARD_PORT. |
--open | flag | off | Open the system browser once the server is ready. |
aasm dashboard start --port 8088 --open
Dashboard serving at http://127.0.0.1:8088 (Ctrl-C to stop)
Once the server is up, the browser opens to the dashboard home / overview — your confirmation that the dashboard is set up and running:

Navigating to the Live Operations route lays out the L1→L2→L3 traffic
pipeline, a tail -f event stream with filters, and the approval queue:

Captured against the open-source local-mode gateway, which serves the SPA but not the live event/approval data API (that is the hosted control plane), so the stream shows “reconnecting…” and the pipeline columns are empty. The chrome and layout are fully real. See Observe in the dashboard for more.
aasm dashboard open
Open the system browser to an already-running dashboard server.
| Flag | Type | Default | Description |
|---|---|---|---|
--port <PORT> | integer | 3000 (config) | Port to connect to. Overrides config; also reads AASM_DASHBOARD_PORT. |
aasm dashboard open --port 8088
aasm dashboard stop
Stop a dashboard server previously started with aasm dashboard start. Takes
no flags.
aasm dashboard stop
Dashboard server stopped.
Last updated: 2026-06-12 by Chisanan232
aasm gateway
Manage the aa-gateway governance daemon directly — the process that holds
the agent registry, evaluates the policy engine, and writes the audit log.
aasm gateway startruns the gateway with low-level flags (listen address, socket, policy path). For the higher-level local developer workflow (deployment mode + dashboard), seeaasm start.
Synopsis
aasm gateway <SUBCOMMAND> [OPTIONS]
| Subcommand | Purpose |
|---|---|
start | Spawn aa-gateway as a detached background process. |
stop | Terminate a running gateway (SIGTERM → SIGKILL fallback). |
status | Report whether the gateway is running and serving gRPC. |
logs | Tail the gateway log file. |
aasm gateway start
Spawn aa-gateway in the background (or foreground with --no-detach). The
binary is resolved from $PATH, then ~/.cargo/bin, then
./target/release, then ./target/debug.
| Flag | Type | Default | Description |
|---|---|---|---|
--policy <POLICY> | path | $AA_POLICY → ~/.aasm/policy.yaml → /etc/aasm/policy.yaml | Policy YAML file. |
--listen <LISTEN> | string | 127.0.0.1:50051 | TCP listen address. |
--socket <SOCKET> | path | — | Unix domain socket path. Takes precedence over --listen. |
--no-detach | flag | off | Block the caller instead of detaching to the background. |
--log-file <LOG_FILE> | path | ~/.aasm/logs/gateway.log | Log file for gateway stdout/stderr. |
aasm gateway start --listen 127.0.0.1:50051 --policy ./policy.yaml
aasm gateway stop
Terminate a running gateway gracefully (SIGTERM, escalating to SIGKILL). Takes no flags.
aasm gateway stop
aasm gateway status
Report whether aa-gateway is running and serving gRPC.
| Flag | Type | Default | Description |
|---|---|---|---|
--json | flag | off | Emit machine-readable JSON instead of human-readable text. |
aasm gateway status --json
{ "running": true, "pid": 48213, "listen": "127.0.0.1:50051", "uptime_seconds": 8133 }
aasm gateway logs
Tail the gateway log file, with optional level filtering. Non-JSON lines pass through so operator notes are preserved.
| Flag | Type | Default | Description |
|---|---|---|---|
-f, --follow | flag | off | Stream new log entries in real time (like tail -f). |
--lines <LINES> | integer | 50 | Number of lines to show from the end of the log. |
--level <LEVEL> | log level | — | Filter entries by minimum severity. |
--log-file <LOG_FILE> | path | ~/.aasm/logs/gateway.log | Path to the log file. |
aasm gateway logs --follow --level warn
Last updated: 2026-06-11 by Chisanan232
aasm proxy
Manage the aa-proxy sidecar — its lifecycle, the per-host CA trust, and log
tailing. The proxy intercepts outbound HTTPS via MitM so network-egress policy
can be enforced without code changes (layer 2 of the three-layer model).
Synopsis
aasm proxy <SUBCOMMAND> [OPTIONS]
| Subcommand | Purpose |
|---|---|
start | Spawn the proxy sidecar (background or foreground). |
stop | Stop the running proxy. |
status | Show whether the proxy is running. |
install-ca | Install the proxy CA into the OS trust store. |
uninstall-ca | Remove the proxy CA from the OS trust store. |
logs | Tail the proxy log file. |
aasm proxy start
Spawn aa-proxy in the background (or foreground with --no-detach). The
binary is resolved from $PATH, then ~/.cargo/bin, then ./target/release.
| Flag | Type | Default | Description |
|---|---|---|---|
--listen <LISTEN> | string | 127.0.0.1:8899 (env AA_PROXY_ADDR) | Address the proxy listens on. |
--gateway <GATEWAY> | string | env AA_GATEWAY_URL | Gateway URL to forward policy decisions to. |
--ca-dir <CA_DIR> | path | env AA_CA_DIR | Directory for CA certificate and key storage. |
--no-detach | flag | off | Run in the foreground instead of daemonizing. |
--log-file <LOG_FILE> | path | — | Redirect proxy stdout/stderr to this file (background mode only). |
aasm proxy start --listen 127.0.0.1:8899 --gateway http://localhost:50051
aasm proxy stop
Stop the running proxy sidecar. Takes no flags.
aasm proxy stop
aasm proxy status
Show whether the proxy sidecar is running (confirmed via a TCP connect probe).
| Flag | Type | Default | Description |
|---|---|---|---|
--json | flag | off | Emit machine-readable JSON output. |
aasm proxy status --json
aasm proxy install-ca
Install the proxy CA certificate into the OS trust store so intercepted TLS connections validate.
| Flag | Type | Default | Description |
|---|---|---|---|
--ca-dir <CA_DIR> | path | env AA_CA_DIR | Directory where the CA certificate and key are stored. |
--yes | flag | off | Skip the confirmation prompt. |
aasm proxy install-ca --yes
aasm proxy uninstall-ca
Remove the proxy CA certificate from the OS trust store. Same options as
install-ca.
| Flag | Type | Default | Description |
|---|---|---|---|
--ca-dir <CA_DIR> | path | env AA_CA_DIR | Directory where the CA certificate and key are stored. |
--yes | flag | off | Skip the confirmation prompt. |
aasm proxy uninstall-ca --yes
aasm proxy logs
Tail the proxy log file, with optional level/time filtering.
| Flag | Type | Default | Description |
|---|---|---|---|
-f, --follow | flag | off | Stream new log entries continuously (like tail -f). |
--lines <LINES> | integer | 50 | Number of lines to show from the end of the log. |
--level <LEVEL> | string | — | Filter to lines at or above this level: error, warn, info, debug. |
--since <DURATION> | string | — | Show only entries since a relative duration (e.g. 5m, 1h, 30s). |
aasm proxy logs --follow --level warn --since 10m
Last updated: 2026-06-11 by Chisanan232
aasm start / aasm stop
Start and stop the locally-managed Agent Assembly gateway. These are the
high-level developer-laptop commands: aasm start picks a deployment mode,
binds the right address, runs the gateway in the background, and (in local
mode) enables the dashboard. aasm stop terminates it gracefully and cleans
up the PID file.
For low-level gateway control (explicit listen address, Unix socket, policy path), see
aasm gateway.
aasm start
Synopsis
aasm start [OPTIONS]
Options
| Flag | Type | Default | Description |
|---|---|---|---|
--mode <MODE> | local | remote | local | Deployment mode. local binds 127.0.0.1 (loopback only); remote binds 0.0.0.0. |
--port <PORT> | integer | 7391 | TCP port the gateway listens on. |
--config <CONFIG> | path | ~/.aasm/config.yaml | YAML config file consumed by the gateway. |
--foreground | flag | off | Stay in the foreground; do not daemonize. |
--no-dashboard | flag | off | Disable dashboard serving (even in local mode). |
Behavior
- Resolve the listen address from
mode+port. - Exit early (idempotent) if a gateway is already running at that address — verified by a live PID file and a successful TCP probe.
- Spawn
aa-gateway(background, or foreground with--foreground). - In background mode, write the PID file and wait for the listener before printing the success banner.
Exit 0 on a normal start, an idempotent “already running” path, or a clean
foreground exit. Exit non-zero if the readiness probe times out or the spawn
fails.
Example
aasm start --mode local --port 7391
Agent Assembly gateway started (pid 48213)
Gateway: http://localhost:7391
Dashboard: http://localhost:7391
aasm stop
Synopsis
aasm stop [OPTIONS]
Options
| Flag | Type | Default | Description |
|---|---|---|---|
--timeout <TIMEOUT> | integer (seconds) | 30 | Seconds to wait for graceful shutdown before sending SIGKILL. |
Behavior
Resolves the PID file (~/.aasm/gateway.pid) and chooses one of four terminal
states — no PID file, stale PID file, graceful SIGTERM, or escalated SIGKILL —
always cleaning up the PID file so the next aasm start sees a clean slate.
Example
aasm stop --timeout 15
Sent SIGTERM to pid 48213; exited gracefully.
Last updated: 2026-06-11 by Chisanan232
aasm sandbox
Run a WebAssembly tool inside the Agent Assembly tool-execution sandbox, with
filesystem, CPU (instruction fuel), memory, and wall-clock isolation. This
surfaces the aa-sandbox runtime to the CLI without going through the cloud
/dispatch_tool HTTP route.
Synopsis
aasm sandbox <SUBCOMMAND> [OPTIONS]
| Subcommand | Purpose |
|---|---|
run | Run a .wasm module inside a fresh sandbox. |
info | Show the default sandbox runtime limits. |
aasm sandbox run
Run a WebAssembly module under WASI preview 1 inside a fresh sandbox and report the outcome. Unset limits fall back to the safe-by-default values.
| Name | Type | Default | Description |
|---|---|---|---|
<WASM> | path (arg) | — | Path to a .wasm module to execute under WASI preview 1. |
--fuel <FUEL> | integer | 10000000 (10M) | Wasmtime instruction-fuel budget. Raise for long-running tools. |
--memory-pages <MEMORY_PAGES> | integer | 16 (1 MiB) | Maximum linear-memory pages (1 page = 64 KiB). |
--wall-clock-ms <WALL_CLOCK_MS> | integer | 5000 (5s) | Wall-clock deadline in milliseconds. |
aasm sandbox run ./tool.wasm --fuel 50000000 --wall-clock-ms 10000
Sandbox run: ./tool.wasm
Outcome: completed
Fuel used: 3,201,884 / 50,000,000
Wall time: 812ms / 10000ms
aasm sandbox info
Show the default sandbox runtime limits. Takes no arguments.
aasm sandbox info
Default sandbox limits:
Fuel: 10,000,000 units
Memory pages: 16 (1 MiB)
Wall clock: 5000 ms
Last updated: 2026-06-11 by Chisanan232
aasm config
Validate and boot an agent-assembly.toml runtime configuration file. These
operate on the runtime TOML (storage drivers, etc.) — distinct from the
CLI’s own ~/.aa/config.yaml connection profiles (see
aasm context).
Synopsis
aasm config <SUBCOMMAND>
| Subcommand | Purpose |
|---|---|
validate | Validate an agent-assembly.toml (currently the [storage] section). |
boot | Build the [storage] backends and run a sample policy lookup. |
aasm config validate
Parse the TOML file and resolve every [storage] driver name against the
built-in driver registry. Exits 0 when valid; 1 with the error on stderr
otherwise. Unknown sections are ignored.
| Argument | Type | Description |
|---|---|---|
<FILE> | path | Path to the agent-assembly.toml file to validate. |
aasm config validate ./agent-assembly.toml
✓ agent-assembly.toml valid — storage driver: memory
aasm config boot
Resolve every [storage] driver through the registry, build each backend, and
perform a sample policy lookup to confirm the configuration actually boots.
Exits 0 on success; 1 with the error on stderr.
| Argument | Type | Description |
|---|---|---|
<FILE> | path | Path to the agent-assembly.toml file to boot from. |
aasm config boot ./agent-assembly.toml
✓ booted storage backends; sample policy lookup OK
Last updated: 2026-06-11 by Chisanan232
aasm context
Manage named API contexts (connection profiles) stored in
~/.aa/config.yaml. A context bundles an API URL and optional API key under a
name so you can switch between gateways with --context <name>.
See Config and context resolution for how the active context is resolved.
Synopsis
aasm context <SUBCOMMAND> [OPTIONS]
| Subcommand | Purpose |
|---|---|
list | List all configured contexts. |
set | Create or update a named context. |
use | Switch the default context. |
aasm context list
List all configured contexts with their API URLs. Takes no arguments.
aasm context list
NAME API URL DEFAULT
production https://api.example.com *
staging https://staging.example.com
aasm context set
Create or update a named context.
| Name | Type | Default | Description |
|---|---|---|---|
<NAME> | string (arg) | — | Name of the context to create or update. |
--api-url <API_URL> | string | required | API URL for this context. |
--api-key <API_KEY> | string | — | API key for this context (optional). |
aasm context set staging --api-url https://staging.example.com
Saved context 'staging'.
aasm context use
Switch the default context (the one used when --context is not passed).
| Argument | Type | Description |
|---|---|---|
<NAME> | string | Name of the context to set as default. |
aasm context use production
Default context set to 'production'.
Last updated: 2026-06-11 by Chisanan232
aasm admin
Gateway administrative operations. The current scope is manual retention; more admin subcommands are added as the operator surface grows.
Synopsis
aasm admin <SUBCOMMAND> [OPTIONS]
| Subcommand | Purpose |
|---|---|
run-retention | Trigger one manual retention pass against the running gateway. |
The subcommand accepts the global options,
honoring --output yaml (defaults to pretty JSON).
aasm admin run-retention
Trigger one manual retention pass (POST /api/v1/admin/retention-policy/run).
Exits 0 on a successful pass, non-zero when the gateway is unreachable or
returns a non-2xx status (the error chain is printed to stderr).
| Flag | Type | Default | Description |
|---|---|---|---|
--dry-run | flag | off | Log what would be retained/dropped without taking any action. |
aasm admin run-retention --dry-run
{
"dry_run": true,
"audit_events_scanned": 14293,
"audit_events_dropped": 0
}
Last updated: 2026-06-11 by Chisanan232
aasm version
Show CLI and gateway version information. Prints the aasm CLI version, then
probes the gateway health endpoint (GET /api/v1/health) for the gateway and
API versions. When the gateway is unreachable, the gateway/api rows show an
unreachable marker.
Synopsis
aasm version
This command has no subcommands or flags of its own. It honors the global
--output and the resolved API context (--api-url / --context).
aasm -V/aasm --versionprints only the CLI version (the standard clap flag).aasm versionadditionally reports the gateway and API versions.
Example
aasm version
COMPONENT VERSION
cli 0.0.1
gateway 0.0.1
api 0.0.1
JSON form:
aasm version --output json
Last updated: 2026-06-11 by Chisanan232
aasm completion
Generate a shell completion script for aasm and write it to stdout. Source
or install the output to get tab-completion for commands, subcommands, and
flags.
Synopsis
aasm completion <SHELL>
This command has no subcommands.
Arguments
| Argument | Type | Description |
|---|---|---|
<SHELL> | shell | Shell to generate completions for. Supported values come from clap_complete::Shell: bash, elvish, fish, powershell, zsh. |
Examples
Bash (current session):
source <(aasm completion bash)
Zsh (install into a completions directory on $fpath):
aasm completion zsh > ~/.zfunc/_aasm
Fish:
aasm completion fish > ~/.config/fish/completions/aasm.fish
Last updated: 2026-06-11 by Chisanan232
Usage Guide
This guide walks through the real, day-to-day tasks an operator performs with
Agent Assembly, using the aasm CLI, the governance gateway, the three
interception layers, and the dashboard. Every command and every screenshot on
these pages was produced against the actual 0.0.1-alpha.5 build — where a
scenario needs a platform Agent Assembly does not target locally (for example
the Linux-only eBPF layer, or the SaaS control-plane API the web dashboard
talks to), the page says so explicitly rather than showing a mock-up.
What you can do
| Scenario | Goal | Page |
|---|---|---|
| Govern an agent | Launch a real AI dev tool under governance, end to end | Govern an agent end-to-end |
| Egress control | Restrict which hosts an agent may reach, and dry-run it before applying | Enforce an egress policy |
| Cost control | Set per-team spend caps and watch spend accumulate | Team budgets and cost |
| Observe | Watch the fleet in the web dashboard and the terminal TUI | Observe in the dashboard |
| Architecture in practice | Choose and combine the SDK, proxy, and eBPF layers | Choosing interception layers |
| When things break | Diagnose the most common local failures | Troubleshooting |
The shape of every scenario
Agent Assembly governance always has the same three moving parts:
- A gateway — the brain. It holds the agent registry, evaluates policy, tracks budgets, and writes the audit log. You start it once.
- At least one interception layer — the SDK shim, the
aa-proxysidecar, or the eBPF kernel hooks — that observes what an agent does and asks the gateway for an allow/deny decision. - A policy — a YAML document describing what is allowed: capabilities, network egress, per-tool rules, budgets, and approval gates.
The operator surface for all of this is the aasm binary:
aasm — command-line tool for Agent Assembly
Commands:
admin Gateway administrative operations
agent Manage monitored agent processes
alerts Manage governance alerts
audit Query audit log entries and export compliance reports
logs Query and stream audit log events
policy Manage governance policies
context Manage named API contexts (connection profiles)
config Validate an `agent-assembly.toml` runtime configuration file
completion Generate shell completion scripts
status Show fleet health, agents, approvals, and budget at a glance
version Show CLI and gateway version information
trace Visualize a session trace (tree or timeline)
approvals Manage human-in-the-loop approval requests
cost Query cost summary and forecast spending
dashboard Open an interactive TUI dashboard for real-time governance monitoring
gateway Manage the aa-gateway governance daemon
run Launch an AI dev tool (claude, codex, copilot, windsurf) with governance wiring
sandbox Run a WebAssembly tool inside the Agent Assembly sandbox
tools List and manage AI dev tools on this system
topology Visualize agent topology, trees, lineage, and statistics
proxy Manage the aa-proxy sidecar — lifecycle, CA trust, and log tailing
start Start the locally-managed Agent Assembly gateway process
stop Stop the locally-managed Agent Assembly gateway process
Two global flags appear in nearly every example below:
--api-url <URL>— where the CLI sends its requests. Defaults to the SaaS control-plane API onhttp://localhost:8080. When you run the local gateway (aasm start/aa-gateway --mode local) it serves its HTTP API onhttp://127.0.0.1:7391, so the local-mode examples pass--api-url http://127.0.0.1:7391.--output <table|json|yaml>— table for humans,json/yamlfor scripting.
A note on ports. The gRPC policy server listens on
127.0.0.1:50051(where SDKs and the proxy connect). The local control-plane HTTP API and the embedded dashboard are served on127.0.0.1:7391. The full web dashboard’s data API (/api/v1/fleet,/api/v1/policies, …) is provided by the SaaS/cloud control plane on port8080, which is not part of the open-source local runtime — see Observe in the dashboard for what renders locally and what needs the hosted backend.
Last updated: 2026-06-11 by Bryant
Govern an agent end-to-end
Goal. Take a real AI dev tool on your machine — Claude Code, Codex, Copilot, or Windsurf — and launch it so that everything it does runs through Agent Assembly governance: it is registered with the gateway, tagged to a team and trace, and routed through the proxy so its tool-calls and network requests are policy-checked and audited.
Prerequisites
- The
aasmbinary built (cargo build -p aa-cli; the binary is at./target/debug/aasm). - The gateway binary on
PATHfor theaasm starthelper (cargo build -p aa-gateway --bin aa-gateway). - At least one supported AI dev tool installed.
Step 1 — See which tools Agent Assembly can govern
aasm discovers the AI dev tools already installed on the system and reports the
governance level it can apply to each. This is a real probe of the machine,
not a static list:
$ aasm tools list
+---------------+-----------------------+---------------------------------------------------------+------------------+
| TOOL | VERSION | PATH | GOVERNANCE LEVEL |
+====================================================================================================================+
| ClaudeCode | 2.1.172 (Claude Code) | /opt/homebrew/bin/claude | L3Native |
|---------------+-----------------------+---------------------------------------------------------+------------------|
| Codex | codex-cli 0.135.0 | /opt/homebrew/bin/codex | L2Enforce |
|---------------+-----------------------+---------------------------------------------------------+------------------|
| GitHubCopilot | 1.388.0 | /Users/you/.vscode/extensions/github.copilot-1.388.0 | L1Observe |
+---------------+-----------------------+---------------------------------------------------------+------------------+
The governance level reflects how deeply Agent Assembly can integrate with
that tool — from L3Native (the tool exposes a hook the runtime wires into
directly) down to L1Observe (the runtime can observe but not natively
intercept, so the proxy and eBPF layers do the enforcing).
Step 2 — Start the gateway
The gateway is the decision engine every governed action is checked against. For a local, in-process control plane:
$ aasm start --mode local --port 7391
This serves the HTTP control-plane API and the dashboard on
http://127.0.0.1:7391 with a local SQLite store. You can confirm it is up:
$ aasm --api-url http://127.0.0.1:7391 status
Agent Assembly Status
─────────────────────────────────────
Mode: local
Gateway: http://127.0.0.1:7391
Storage: sqlite
Version: 0.0.1-alpha.5
Uptime: 2m 24s
Health: ✓ ok
─────────────────────────────────────
STORAGE
───────
Backend: sqlite
Path: /Users/you/.aasm/local.db
DB Health: ✓ ok (0ms)
Rows: audit_events: 0 hot
agents: 0 | policies: 0
The fleet starts empty (
agents: 0) — nothing is governed until you launch a tool underaasm runin the next step.
Step 3 — Launch the tool under governance
aasm run <tool> is the heart of this scenario. It assigns the session an
agent identity, a team, and a trace id for lineage tracking, wires
in the proxy, and then execs the real tool. Before running it for real, use
--dry-run to see exactly what governance wiring will be applied — nothing is
launched:
$ aasm run claude --team-id research --agent-id research-bot-01 --dry-run
--- aasm run dry-run ---
agent_id: research-bot-01
trace_id: dry-run-daa9d73a-f2fc-4977-9d00-50f4c4025fa9
session_id: dry-run-0d7a0c16-25b2-456b-84e8-b7907fa963d1
--- managed settings ---
<dry-run: managed settings not generated>
--- launch command ---
claude
--- environment ---
AA_AGENT_ID=research-bot-01
AA_REGISTRATION_ID=dry-run-2b00ef56-3f35-4ef9-8164-ea899dfe90aa
AA_SESSION_ID=dry-run-0d7a0c16-25b2-456b-84e8-b7907fa963d1
AA_TEAM_ID=research
AA_TRACE_ID=dry-run-daa9d73a-f2fc-4977-9d00-50f4c4025fa9
AI_AGENT=claude-code_2-1-165_agent
CLAUDECODE=1
CLICKUP_API_TOKEN=***MASKED***
GITHUB_TOKEN=***MASKED***
JIRA_API_TOKEN=***MASKED***
SLACK_BOT_TOKEN=***MASKED***
...
Notice two things that are doing real work:
- The
AA_*environment variables (AA_AGENT_ID,AA_TEAM_ID,AA_TRACE_ID,AA_REGISTRATION_ID,AA_SESSION_ID) are injected so the launched tool’s events carry identity and lineage back to the gateway. - Secret-looking environment variables in your shell — API tokens, PATs — are
masked (
***MASKED***) in the launch environment that gets logged, so credentials never leak into the audit trail.
When you drop --dry-run, the same wiring is applied for real and the tool
starts. Useful flags:
| Flag | Effect |
|---|---|
--team-id <id> | Tag the session to a team (drives team budgets and topology). |
--governance-level <level> | Override the level Agent Assembly applies. |
--enforcement-mode observe (or --observe) | Compute and audit policy decisions but never block — a shadow run. |
--enforcement-mode enforce | Default — deny blocks, redact strips. |
--no-proxy | Skip proxy injection (not recommended for governed environments). |
--root-agent <id> | Record a parent for multi-agent lineage. |
The --enforcement-mode distinction matters when rolling governance out: start
with --observe to see what would be blocked without breaking the agent, then
switch to enforce once the policy is right.
Step 4 — Observe the governed agent
Once the tool is running under aasm run, the registered agent appears in the
fleet and its actions flow into the audit log. You inspect it with:
$ aasm agent list # all registered agents
$ aasm agent inspect <agent-id> # one agent in detail
$ aasm topology team research # the whole team
$ aasm status # fleet health at a glance
and watch its decisions live via the dashboard — see Observe in the dashboard.
Result
You now have a real AI tool running with a stable governed identity, every tool-call and outbound request routed through the gateway for an allow/deny decision, secrets scrubbed from the recorded environment, and a complete audit trail keyed to the agent, team, and trace you assigned in Step 3.
Last updated: 2026-06-11 by Bryant
Enforce an egress policy
Goal. Restrict the hosts an agent is allowed to reach, so a prompt-injected or confused agent cannot exfiltrate data to an arbitrary endpoint. You author a network allowlist, dry-run it against recorded traffic before applying it, and then enforce it at the proxy layer.
How egress enforcement works
Network egress is the job of the sidecar proxy (aa-proxy), the second of
the three interception layers. It terminates outbound HTTPS with a per-host CA
(MitM) and, for every CONNECT, asks: is this host on the policy’s allowlist?
Hosts that fail the check are refused before any bytes leave the machine — no
code change in the agent required.
The allowlist lives in the network section of a policy:
apiVersion: agent-assembly/v1
kind: Policy
metadata:
name: egress-allowlist
version: "1.0.0"
spec:
network:
allowlist:
- api.openai.com
- "*.githubusercontent.com"
Allowlist matching semantics
The proxy matches each requested host against every allowlist entry using these
rules (from aa_core::policy::is_host_allowed_by_egress_allowlist):
| Pattern | Matches | Does not match |
|---|---|---|
api.openai.com | api.openai.com (case-insensitive, exact) | evil.api.openai.com |
*.githubusercontent.com | raw.githubusercontent.com, objects.githubusercontent.com | bare githubusercontent.com |
* | every host | — |
| (empty allowlist) | every host (no restriction) | — |
The leftmost-label wildcard (*.example.com) requires at least one extra label
to the left and anchors on the right, so it cannot be fooled by an
attacker-crafted host like example.com.evil.net.
Step 1 — Validate the policy locally
Validation parses and type-checks the YAML without contacting a gateway, and warns about unrecognised keys so you catch typos early:
$ aasm policy validate egress-policy.yaml
Policy is valid: egress-policy.yaml
Step 2 — Dry-run against recorded traffic
aasm policy simulate replays an audit-log JSONL file through the policy engine
and reports what each event would have decided — without enforcing anything.
This is how you prove a new allowlist before it can break production traffic.
A replay file is one JSON object per line; each line is an audit event whose
payload is the serialized governance action. For egress, the action is a
NetworkRequest:
{"event_type":"ToolCallIntercepted","agent_id":"researcher-1","payload":"{\"NetworkRequest\":{\"url\":\"https://api.openai.com/v1/chat/completions\",\"method\":\"POST\"}}"}
{"event_type":"ToolCallIntercepted","agent_id":"researcher-1","payload":"{\"NetworkRequest\":{\"url\":\"https://evil.example.com/exfil\",\"method\":\"POST\"}}"}
{"event_type":"ToolCallIntercepted","agent_id":"researcher-1","payload":"{\"NetworkRequest\":{\"url\":\"https://raw.githubusercontent.com/org/repo/main/README.md\",\"method\":\"GET\"}}"}
Run the simulation:
$ aasm policy simulate --policy egress-policy.yaml --against traffic.jsonl
Simulation Report
--------------------------------------------------
Total events: 3
Allowed: 1
Denied: 2
Approval required: 0
EVENT# ACTION DECISION REASON
----------------------------------------------------------------------
1 net:POST:https://evil.example.com/exfil deny host not in network allowlist
2 net:GET:https://raw.githubusercontent.com/org/repo/main/README.md deny host not in network allowlist
The report lists the flagged (non-allow) outcomes. api.openai.com (event 0)
was allowed and so does not appear in the flagged list; the exfiltration attempt
to evil.example.com was denied, as expected.
Honest caveat — two matchers, one allowlist. The
raw.githubusercontent.comrequest was denied by the simulator above even though*.githubusercontent.comis on the allowlist. That is because thepolicy simulatedecision path matches the host with an exact string comparison, whereas the liveaa-proxyCONNECT path uses the glob-aware matcher described in the table above (which would allow it). When validating wildcard egress rules, confirm the live proxy behaviour as well as the simulation; treat a simulation deny on a wildcard host as “verify against the proxy”, not necessarily a real block.
For scripting and CI gating, write the structured report to a file and key off the exit status:
$ aasm policy simulate --policy egress-policy.yaml --against traffic.jsonl \
--output-file report.json
$ cat report.json
{
"total_events": 3,
"denied": 2,
"allowed": 1,
"approval_required": 0,
"budget_impact_usd": null,
"flagged_outcomes": [
{ "event_index": 1, "action": "net:POST:https://evil.example.com/exfil",
"decision": "deny", "reason": "host not in network allowlist" },
{ "event_index": 2, "action": "net:GET:https://raw.githubusercontent.com/org/repo/main/README.md",
"decision": "deny", "reason": "host not in network allowlist" }
]
}
You can also dry-run against live traffic for a fixed window instead of a file:
$ aasm policy simulate --policy egress-policy.yaml --live --duration 60s
Step 3 — Enforce at the proxy
Bring up the sidecar and trust its CA so TLS interception works:
$ aasm proxy install-ca # add the per-host CA to the OS trust store
$ aasm proxy start # listens on 127.0.0.1:8899 by default
$ aasm proxy status
aasm proxy start accepts --listen <addr> (default 127.0.0.1:8899),
--gateway <url> to point it at the gateway that owns the policy, and
--ca-dir <dir> for CA storage. Agents launched via aasm run have the proxy
injected automatically (Step 3 of
Govern an agent end-to-end); for other processes, route
their HTTPS through the proxy address.
When the policy is applied, the proxy refuses any CONNECT to a host outside the allowlist and the refusal is written to the audit log.
Result
Outbound traffic is now constrained to an explicit allowlist, verified with a dry-run before it could affect a running agent, and enforced at the network layer without modifying the agent’s code.
Last updated: 2026-06-11 by Bryant
Team budgets and cost
Goal. Put a hard spend cap on what an agent (and a team) can burn on model calls, so a runaway planning loop cannot run up an unbounded bill — and watch spend accumulate against that cap.
How budgets work
The gateway tracks per-agent and per-team spend and evaluates it on every
governed model call. Budgets are declared in the budget section of a policy.
These are the real fields the gateway parses:
apiVersion: agent-assembly/v1
kind: Policy
metadata:
name: research-budget
version: "1.0.0"
spec:
budget:
daily_limit_usd: 25.0 # per-agent cap, resets each day
monthly_limit_usd: 400.0 # per-agent cap, resets each month
org_daily_limit_usd: 100.0 # organisation-wide daily cap
org_monthly_limit_usd: 2000.0 # organisation-wide monthly cap
timezone: "Asia/Taipei" # IANA tz for the reset boundary (default UTC)
action_on_exceed: deny # "deny" (default) or "suspend"
window: "1h" # optional sub-day rollover window (humantime)
| Field | Meaning |
|---|---|
daily_limit_usd / monthly_limit_usd | Per-agent spend caps. Omit for no limit. |
org_daily_limit_usd / org_monthly_limit_usd | Organisation-wide caps, enforced independently of the per-agent caps. |
timezone | IANA timezone that defines the daily/monthly reset boundary. Defaults to UTC. |
action_on_exceed | What happens when the cap is hit: deny blocks further spend (default), suspend suspends the agent. |
window | Optional sub-day rollover (e.g. "5s", "30m", "1h30m"). When absent, spend rolls over at the calendar-day boundary. |
Step 1 — Validate and apply the budget policy
$ aasm policy validate research-budget.yaml
Policy is valid: research-budget.yaml
$ aasm policy apply research-budget.yaml --applied-by alice@example.com
policy apply saves the policy to version history (see
aasm policy history / aasm policy rollback), so a budget change is auditable
and reversible.
Step 2 — Watch spend against the cap
aasm cost summary reports spend for the current period. By default it shows
today; pass --period month for the month, and --group-by agent to break it
down per agent:
$ aasm cost summary --period today
$ aasm cost summary --period month --group-by agent
Each command takes --output json|yaml for scripting.
To see where spend is heading, aasm cost forecast projects the month from the
current daily rate:
$ aasm cost forecast
The fleet-level aasm status view also surfaces a budget block at a glance:
BUDGET STATUS
─────────────
Daily spend : $-- (no limit set)
Date: --
(no per-agent data)
(The example above is from a fresh gateway with no budget applied and no spend yet — once a budget policy is applied and agents start spending, the daily spend and per-agent rows populate.)
Step 3 — See budgets in topology
aasm topology team <team-id> lists every agent in a team; add --show-budget
to include each agent’s governance/budget posture in the tree:
$ aasm topology team research --show-budget
What happens at the cap
When an agent reaches its daily_limit_usd (or the org cap), the gateway
applies action_on_exceed:
deny— the offending model call is denied and audited. The agent keeps running but cannot spend until the window resets.suspend— the agent is suspended (you can lateraasm agent resume <id>).
Either way the decision lands in the audit log, so cost overruns are accountable after the fact, not just blocked in the moment.
Result
The team now has enforceable per-agent and organisation-wide spend caps with a defined reset boundary and a clear over-budget action, plus CLI views to track actual spend and forecast the month.
Last updated: 2026-06-11 by Bryant
Observe in the dashboard
Goal. Watch the governed fleet in real time. Agent Assembly ships two
observation surfaces from the same aasm binary: a web dashboard (a
Vite/React SPA) and an in-terminal TUI. This page shows what each looks like
and how to bring it up.
The web dashboard
The dashboard is a single-page React app. In production it is embedded into the
gateway and served at /; for UI development it runs under Vite on port 3000
and proxies /api to the control-plane API on port 8080.
Bring it up locally
The local-mode gateway serves the compiled SPA on its HTTP port (7391 by
default). Build the dashboard bundle once, then start the gateway pointed at it:
$ cd dashboard && pnpm install && pnpm build # produces dashboard/dist/
$ cd .. && aasm start --mode local --port 7391
# the dashboard is now at http://127.0.0.1:7391/
The login screen
The dashboard authenticates with an API key. This screen renders entirely client-side, so it is the same whether or not a backend is reachable:

The app shell and navigation
After authenticating, the canonical 12-route navigation appears, grouped into Monitor (Overview, Fleet, Topology, Live Ops, Alerts, Audit Log), Control (Capability, Policy, Secret Scrubbing), and Manage (Cost & Budget, Agent Groups, Members & Access). The header carries the approvals indicator, a light/dark theme toggle, Settings, and Log out:

An implemented page — Policies
The Policies page is the visual policy builder. It shows All / Active / Proposed tabs and a + new policy action; opening a row drops into the editor:

More implemented routes — Live Ops and Topology
The Live Operations route renders the real-time governance layout: the
L1→L2→L3 traffic pipeline (Identity → Capability → Scrub → External), a
tail -f event stream with agent/team/op-type/status filters and an auto-scroll
toggle, and the approval queue. Against the local-mode gateway the event stream
shows “reconnecting…” (no backend feed) and the columns are empty, but the full
operator layout is real:

The Topology route lists agents and teams; here it honestly reports
0 agents · 0 teams because the fleet data API is not part of the local runtime:

Light and dark themes
The header theme toggle flips the entire token-driven UI between light and dark. Here is the Overview route in dark mode:

Honest caveat — what renders locally vs. what needs the hosted backend. The screenshots above are all real captures of the
0.0.1-alpha.5SPA served by the local-mode gateway. The data panels are empty (zero policies, zero agents, “not implemented yet” on some routes) because the dashboard’s data API —/api/v1/fleet,/api/v1/policies,/api/v1/capability/matrix, and the auth-token endpoint — is provided by the SaaS/cloud control plane on port 8080, which is not part of the open-source local runtime. The local-mode gateway on7391serves the SPA and a small set of endpoints (/healthz,/api/v1/admin/status), so the chrome, navigation, theming, and page shells are fully real while the populated tables require the hosted backend. Routes still marked “not implemented yet” (e.g. Overview) render a ComingSoon placeholder by design in this build.
The terminal TUI
For operators who live in a terminal, aasm dashboard (no subcommand) launches
an interactive full-screen TUI built on ratatui, with a live feed and
keyboard-driven approval handling:
$ aasm dashboard
# ...full-screen TUI; press 'q' to quit
Open an interactive TUI dashboard for real-time governance monitoring
Usage: aasm dashboard [OPTIONS] [COMMAND]
Commands:
start Serve the embedded SPA at http://127.0.0.1:<port>. Blocks until Ctrl-C
open Open the browser to an already-running dashboard
stop Stop a dashboard server started with `aasm dashboard start`
The TUI polls the control-plane REST API and subscribes to a WebSocket feed for
live events; selecting a pending approval lets you approve or reject it inline
(y / n).
Honest caveat — no live TUI screenshot here. The TUI requires an interactive terminal (it switches to the alternate screen and raw mode) and a reachable events/approvals API (port 8080) to display populated panels. Driven headlessly against the empty local backend it renders the frame but with no data to show, so a meaningful still capture is not reproducible in this environment — the launch command and
--helpabove are real, and the panels populate once the hosted control plane (or a backend with live agents) is connected.
Serving the SPA without a browser launch helper
aasm dashboard start serves the embedded SPA directly and blocks until Ctrl-C;
aasm dashboard open opens your browser to an already-running server, and
aasm dashboard stop stops a server started with start. Pass --port (or set
AASM_DASHBOARD_PORT) to choose the port, and --open to launch the browser
once it is ready.
Result
You can observe the fleet either in the browser (rich, point-and-click) or in the terminal (fast, keyboard-driven), both from the same binary and both backed by the same gateway.
Last updated: 2026-06-12 by Chisanan232
Choosing interception layers
Goal. Decide which of the three interception layers to deploy, and how to combine them, for a given governance requirement. Agent Assembly enforces policy through three independently-deployable layers; this page is about the practical trade-offs, with the real commands for each.
The three layers at a glance
Listed lowest-latency-cost first, highest-detection-authority first:
| Layer | What it is | Catches | Cost / requirement |
|---|---|---|---|
| 1. SDK (in-process) | A thin Rust shim (aa-ffi-* over aa-sdk-client) the language SDKs call. Emits events to the gateway and applies pre-execution allow/deny via wrapper functions. | Anything the instrumented code path does. | Lowest latency, but requires the agent to adopt the SDK. |
2. Proxy sidecar (aa-proxy) | Intercepts outbound HTTPS via MitM with a per-host CA. Enforces network-egress policy with no code change. | Anything the SDK misses that goes over the network. | No code change; requires trusting the proxy CA. |
3. eBPF (aa-ebpf*) | Kernel hooks: uprobes on SSL libraries, kprobes/tracepoints on exec/file syscalls. | Everything else, including deliberate bypass attempts. | Highest authority; Linux-only. |
The gateway is the common brain for all three — every layer asks the same policy engine for its decision and writes to the same audit log.
When to use each
- Reach for the SDK layer when you control the agent’s code and want the lowest-overhead, most precise instrumentation — it sees tool-call arguments and results directly, in process.
- Add the proxy when you cannot or do not want to modify the agent, and the risk you care about is network egress / data exfiltration. It is the most practical way to govern a third-party or closed-source tool. See Enforce an egress policy.
- Add eBPF when you need defense-in-depth that an agent cannot bypass — e.g. it shells out, writes files, or makes raw connections that skip both the SDK and the proxy. This is the catch-all backstop.
Combining layers
The layers are additive, not exclusive. A typical governed deployment runs the SDK and the proxy: the SDK gives rich, in-process tool-call governance, while the proxy backstops the network path for anything the SDK does not see. On Linux, eBPF sits underneath both as the bypass-proof floor.
aasm run reflects this in its governance level (see
Govern an agent end-to-end): a tool reported as
L3Native integrates at the SDK depth, while an L1Observe tool relies on the
proxy and eBPF layers to do the actual enforcing.
Layer 2 in practice — the proxy
$ aasm proxy install-ca # trust the per-host CA so TLS interception works
$ aasm proxy start # background sidecar on 127.0.0.1:8899
$ aasm proxy status # confirm it is running
$ aasm proxy logs # tail the proxy log
$ aasm proxy uninstall-ca # remove the CA when you are done
aasm proxy start takes --listen <addr> (default 127.0.0.1:8899),
--gateway <url>, and --ca-dir <dir>.
Layer 3 in practice — eBPF
The eBPF layer is Linux-only: its uprobes/kprobes/tracepoints attach to a running kernel.
$ aasm proxy status
not running
On macOS the eBPF userspace crate compiles with non-Linux stubs (the
KprobeManager/UprobeManager attach paths are #[cfg(target_os = "linux")]),
so it builds for development but does not attach probes. To exercise the real
kernel hooks — SSL-library uprobes for outbound TLS, exec/openat/unlink
kprobes, and the sched_process_exec tracepoint — run on Linux.
Honest caveat. This page does not show live eBPF probe output because the attaching code is gated to Linux and this build was exercised on macOS. The architecture (userspace
aa-ebpfloading compiledaa-ebpf-probesand reading a shared BPF ring buffer) is real and documented in the crate; the live capture requires a Linux host with the privileges to load eBPF programs.
Result
You can match the interception layer (or stack of layers) to the requirement: SDK for precision where you own the code, proxy for code-free egress control, eBPF for a bypass-proof kernel backstop on Linux — all feeding one gateway and one audit log.
Last updated: 2026-06-11 by Bryant
Runnable examples
The pages in this guide explain how governance works. When you want to run
it, the framework-specific, end-to-end examples live in the dedicated
agent-assembly-examples
repository rather than in this book — that keeps the runnable code versioned and
testable on its own, while these pages stay focused on the concepts.
Every example is governed by the same three-layer interception model described
in Choosing interception layers: a gateway as the
brain, at least one interception layer (SDK shim, aa-proxy sidecar, or eBPF),
and a policy. Pick the language you are integrating, or browse the cross-cutting
scenarios:
- Node — examples-repo/node
- Python — examples-repo/python
- Go — examples-repo/go
- Scenarios (cross-cutting: approval-gates, audit-trace, budget-limits, policy-enforcement, sidecar-runtime) — examples-repo/scenarios
Last updated: 2026-06-14 by Chisanan232
Troubleshooting
Common local issues and the real diagnostics to resolve them. Every error
message below is reproduced verbatim from the 0.0.1-alpha.5 build.
aasm start fails: “failed to spawn aa-gateway”
$ aasm start --mode local --port 7391
aasm start: failed to spawn aa-gateway: No such file or directory (os error 2)
Cause. aasm start shells out to a separate aa-gateway binary, which must
be on your PATH.
Fix. Build it and put target/debug on PATH:
$ cargo build -p aa-gateway --bin aa-gateway
$ export PATH="$PWD/target/debug:$PATH"
$ aasm start --mode local --port 7391
aasm start fails: “–policy is required in legacy-grpc mode”
$ aasm start
Error: "--policy is required in legacy-grpc mode"
aasm start: gateway did not become ready within 5.000335375s
Cause. The aa-gateway binary defaults to its legacy gRPC mode, which
requires a policy file. For a local control plane with the HTTP API and
dashboard, you want local mode, which does not.
Fix. Run local mode directly:
$ aa-gateway --mode local
Agent Assembly [local mode] v0.0.1-alpha.5
Listening: http://127.0.0.1:7391
Dashboard: http://127.0.0.1:7391/
Storage: /Users/you/.aasm/local.db (SQLite)
Ctrl+C to stop.
For the legacy gRPC server, supply a policy:
aa-gateway --policy policy-examples/low-risk.yaml.
CLI commands say the gateway is “unreachable”
$ aasm status
Agent Assembly Status
─────────────────────────────────────
Gateway: http://localhost:8080
Health: ✗ unreachable
─────────────────────────────────────
...
Error: gateway is not running. Start it with: aasm start
$ aasm version
+-----------+---------------+-------------+
| COMPONENT | VERSION | STATUS |
+=========================================+
| cli | 0.0.1-alpha.5 | - |
|-----------+---------------+-------------|
| gateway | - | unreachable |
|-----------+---------------+-------------|
| api | - | unreachable |
+-----------+---------------+-------------+
Cause. The CLI defaults to the SaaS control-plane API on
http://localhost:8080. The local-mode gateway serves its API on 7391, not
8080, so the default target is unreachable.
Fix. Point the CLI at the local API:
$ aasm --api-url http://127.0.0.1:7391 status
Agent Assembly Status
─────────────────────────────────────
Mode: local
Gateway: http://127.0.0.1:7391
Storage: sqlite
Version: 0.0.1-alpha.5
Uptime: 2m 24s
Health: ✓ ok
─────────────────────────────────────
To avoid repeating the flag, save a named context with aasm context or set the
API URL in ~/.aa/config.yaml.
aasm gateway status says “not running” even though local mode is up
$ aasm gateway status
Gateway: not running
Cause. aasm gateway status tracks the legacy gRPC gateway via its PID
file. A gateway started in local mode (aa-gateway --mode local) is a
different process and is not reflected here.
Fix. Check local-mode liveness with the HTTP status instead:
$ aasm --api-url http://127.0.0.1:7391 status
or hit the health endpoint directly: curl http://127.0.0.1:7391/healthz.
A dashboard page loads but its tables stay empty / skeleton
Cause. The dashboard SPA served by the local-mode gateway can render its
chrome and page shells, but its data endpoints (/api/v1/fleet,
/api/v1/policies, …) are served by the SaaS/cloud control plane on port
8080, which is not part of the open-source local runtime. With only the local
gateway running, data panels stay empty or in their loading state.
Fix. Connect a control plane that serves the /api/v1/* data routes (the
hosted backend), or use the CLI (aasm agent list, aasm policy list,
aasm cost summary) against the local API for the same data in the terminal.
See Observe in the dashboard.
policy validate prints “Unknown key … will be ignored”
$ aasm policy validate policy-examples/medium-risk.yaml
warning: tier — Unknown key 'tier' will be ignored
warning: rules — Unknown key 'rules' will be ignored
warning: notifications — Unknown key 'notifications' will be ignored
Policy is valid: policy-examples/medium-risk.yaml
Cause. These are warnings, not errors — the policy still validates. The
keys tier, rules, notifications, and similar are not part of the schema the
gateway enforces; the supported spec sections are network, schedule,
budget, data, tools, capabilities, approval, and scope.
Fix. Move the intended behaviour into a supported section (e.g. express
allow/deny via capabilities or tools, gating via approval), or ignore the
warnings if the extra keys are intentional annotations. The capability-policy.yaml
example validates with no warnings and is a good reference shape.
A wildcard egress host is denied in policy simulate
If aasm policy simulate denies a host that your *.example.com allowlist entry
should permit, this is expected: the simulator’s decision path uses an exact
host comparison, while the live aa-proxy uses the glob-aware matcher. Confirm
the host against the running proxy rather than treating the simulation deny as a
real block — see the caveat in
Enforce an egress policy.
Quick reference
| Symptom | First thing to check |
|---|---|
| “failed to spawn aa-gateway” | aa-gateway on PATH? |
| “–policy is required” | Use aa-gateway --mode local, not the default |
| “unreachable” on every CLI call | Pass --api-url http://127.0.0.1:7391 |
gateway status “not running” | Local mode ≠ legacy gRPC; use status / /healthz |
| Empty dashboard tables | Data API (port 8080) not running locally |
validate warnings | Unknown keys ignored — move into a supported section |
Last updated: 2026-06-11 by Bryant
Security Model — Overview
Agent Assembly governs AI agents that you do not fully trust, running inside processes you do not fully control. The Security Model describes what the system protects, against whom, and how — and, just as importantly, where it refuses to place its trust.
This section is the why. For the how — concrete crates, types, and data paths — follow the cross-links into Architecture.
What the Security Model protects
An AI agent is, from a security standpoint, an attacker-shaped component: it executes language-model output, calls external tools, opens network connections, reads files, and spends money — all driven by prompts that may be adversarially crafted (prompt injection) or by a model that has been compromised or simply behaves unpredictably. The Security Model exists to keep that component inside a governed boundary. Concretely it protects:
- Tool and capability use — an agent may only invoke the tools its policy permits. Denied tool calls are refused before they execute.
- Network egress — outbound connections are constrained to an allowlist; exfiltration to an arbitrary host is blocked.
- Credentials and sensitive data — API keys, private keys, and connection strings are detected and redacted on every path before they are forwarded or persisted, so a leaked secret never lands in an upstream request or an audit record.
- Spend — per-team and per-org budgets cap how much an agent can cost; a runaway agent is denied or suspended when it exceeds its limit.
- The audit trail itself — every governed action produces a sanitized, tamper-evident record, so the system’s own evidence cannot be quietly poisoned with raw secrets or per-event noise.
Defense-in-depth philosophy
The Security Model rests on three principles, each developed in its own page.
1. Layered interception — see the action before you can govern it
To govern an action the system must first observe it. Agent Assembly
intercepts at three independent layers — the
in-process SDK shim (aa-sdk-client), the sidecar proxy (aa-proxy), and
kernel-level eBPF (aa-ebpf) — ordered lowest-latency-first and
highest-detection-authority-first. The layers are not alternatives; they
stack, so an action that slips past one is caught by the next. Coverage is
the union of the layers you deploy.
2. The SDK is not a trust boundary — the runtime is authoritative
The fastest layer runs inside the agent’s own process, which is exactly the
component we do not trust. So the system treats SDK-side checks as
best-effort advisory only and re-does the authoritative work at a trusted
chokepoint: the runtime (aa-runtime) re-scans, re-redacts, and re-normalizes
every event unconditionally, and the gateway (aa-gateway) is the sole
source of truth for policy. This is recorded as a formal decision in
ADR 0002 and detailed in
Trust boundaries.
Invariant: nothing the SDK asserts can shorten the trusted side’s work. Position — not code — confers authority. The same
aa-securityscanner is advisory inside the SDK and authoritative insideaa-runtime.
3. Fail-closed by default
When the system cannot make a safe decision, it denies. An empty policy cascade
returns a fail-closed Deny (aa-gateway/src/engine/decision.rs), and a
secret-bearing field too large to fully scan is redacted whole rather than
forwarded raw (aa-runtime/src/pipeline/enforcement.rs,
OversizedPolicy::RedactWhole). See
Protection and enforcement.
How the pages fit together
| Page | Question it answers |
|---|---|
| Threat model | What assets, adversaries, and threats are in scope? |
| Three-layer defense in depth | How do SDK, proxy, and eBPF compose so nothing slips through? |
| Protection and enforcement | How are policy, fail-closed, egress, scanning, and budgets enforced? |
| Trust boundaries | Why is the SDK untrusted and the runtime/gateway authoritative? |
| Audit and assurance | How is the audit trail kept tamper-evident and free of secrets? |
Last updated: 2026-06-11 by Chisanan232
Threat model
This page enumerates what the Security Model defends: the assets worth protecting, the adversaries who threaten them, and the concrete threats each control answers. It is specific to Agent Assembly — the system that governs AI agents at the three interception layers and a central gateway.
Assets
| Asset | Why it matters | Primary guard |
|---|---|---|
| Credentials & secrets — API keys, private keys, DB connection strings | Leaking one grants an attacker the agent’s privileges across third-party systems | aa-security credential scanner + redaction, run authoritatively in aa-runtime |
| Tool / capability surface | An unauthorized tool call is arbitrary code or data access on the agent’s behalf | Gateway policy engine (aa-gateway/src/policy/, aa-gateway/src/engine/) |
| Network egress paths | The channel through which data is exfiltrated | Network-egress policy (aa-gateway/src/policy/network.rs) + proxy MitM (aa-proxy) + eBPF SSL uprobes |
| Spend / budget | A looping or hijacked agent can burn unbounded cost | Budget control (aa-gateway/src/budget/) — deny or suspend on exceed |
| The audit trail | It is the evidence of what happened; if it can be poisoned, accountability is lost | Write-boundary sanitizer (aa-gateway/src/sanitizer/) + tamper-evident publish (aa-runtime/src/audit_publisher/) |
| Policy as source of truth | If the agent could author its own policy, governance is meaningless | Policy SoT is server-side in the gateway; the SDK cannot mutate it |
Adversaries
The model assumes the agent’s own process is not trusted. The relevant adversaries are:
- The prompt-injected agent. A legitimate, SDK-adopting agent whose model has been steered by adversarial input into making tool calls, network requests, or data accesses its operator never intended. It is cooperative at the binary level but hostile at the behavioral level.
- The compromised / modified agent. An agent whose code has been altered — including one running a patched, outdated, or stubbed SDK — that deliberately tries to suppress, weaken, or skip the in-process checks.
- The bypassing agent. An agent that ignores the SDK entirely and makes raw syscalls, opens its own sockets, or links TLS directly to avoid the fastest interception layer.
- The careless integrator. Not malicious, but ships secrets in tool arguments or LLM payloads, or wires an agent to an over-broad capability set.
Trust posture. Because the SDK runs inside an attacker-controllable process, any guarantee anchored in the SDK is not a guarantee. Security must hold even when the SDK does nothing — see Trust boundaries and ADR 0002.
Threat scenarios
- Unauthorized tool call — a prompt-injected agent invokes a tool outside
its policy (e.g.
terminal_exec, orread_fileon/etc/...). Caught by the gateway tool allow/deny stage and, for MCPtools/call, by argument-level matching in the proxy (aa-proxy/src/intercept/mcp.rs). - Credential / data exfiltration — the agent embeds an API key in an LLM
prompt or a tool argument and sends it upstream. The
aa-securityscanner detects the secret (via Aho-Corasick literal patterns) and redacts it before forward and before audit, on every path. - Network egress to an arbitrary host — the agent attempts to POST data to an attacker-controlled domain. The gateway’s network allowlist denies the request; the proxy enforces it at the wire even with no agent code change; eBPF SSL uprobes observe the plaintext if the agent bypasses both.
- Policy evasion / SDK bypass — the agent disables, stubs, or skips the
SDK scanner, or asserts an “already clean” marker. Defeated structurally:
there is no trust marker on the wire, and
aa-runtimere-scans unconditionally. This is proven by the bypass-resistance suite (aa-runtime/tests/aaasm_2568_gate_verification.rs). - Runaway spend — a looping agent issues unbounded LLM calls. The budget
tracker denies further requests once the daily/monthly limit is met, or
suspends the agent, per
action_on_exceed. - Audit poisoning / secret leakage into logs — a sender emits raw prompts,
full payloads, or per-beat heartbeat noise into the audit stream. The
write-boundary sanitizer strips banned keys recursively and collapses
heartbeats before anything reaches
audit_logs.
STRIDE mapping
| STRIDE category | Threat in this system | Control |
|---|---|---|
| Spoofing | Agent claims an identity / lineage it does not have to widen its policy scope | Gateway resolves agent lineage server-side from registered context; the policy cascade is keyed to it (aa-gateway/src/engine/mod.rs) |
| Tampering | Agent forges event fields or asserts “already scanned” to skip enforcement | No trust marker honored; aa-runtime re-scans every event; the sanitizer drops unknown top-level fields (aa-gateway/src/sanitizer/) |
| Repudiation | Action happens with no trustworthy record | Tamper-evident audit publish to NATS, subject assembly.audit.<tenant>.<agent> (aa-runtime/src/audit_publisher/subject.rs) |
| Information disclosure | Secrets leak via upstream requests or audit logs | aa-security scan + redact on every path; banned-key sanitizer as the final write-boundary backstop |
| Denial of service / spend exhaustion | Runaway agent burns budget or floods events | Budget deny/suspend (aa-gateway/src/budget/); fail-closed oversized-field cap bounds per-event scan cost |
| Elevation of privilege | Agent invokes a tool or egress beyond its grant | Policy engine deny stages; fail-closed empty cascade → Deny (aa-gateway/src/engine/decision.rs) |
Each control is detailed in Protection and enforcement.
Last updated: 2026-06-11 by Chisanan232
Three-layer defense in depth
To govern an action, Agent Assembly must first observe it. It does so at three independent interception layers, each catching what the layers above it might miss, and routes every observed action to one central gateway for the decision. This page explains why the layers are arranged the way they are and how they compose so an agent cannot quietly slip through. For the policy decision itself, see Protection and enforcement; for how implementation maps to crates, see Architecture.
The latency-vs-authority trade-off
The layers are ordered by a deliberate trade-off — lowest latency first, highest detection authority first:
| Layer | Runs in | Crate(s) | Cost | Catches | Detection authority |
|---|---|---|---|---|---|
| 1 — SDK (in-process) | The agent’s own process | aa-sdk-client + per-language shims, aa-wasm | Lowest | What the SDK is wired into | Lowest — lives inside the untrusted process |
| 2 — Sidecar proxy | An adjacent process / sidecar | aa-proxy | Medium | Outbound HTTPS, no code change | Medium — sees only routed traffic |
| 3 — eBPF (kernel) | The Linux kernel | aa-ebpf, aa-ebpf-probes | Highest | Everything else, including bypass attempts | Highest — observes below anything the agent can reach |
The in-process SDK is the cheapest place to make a decision — but also the easiest for an agent to avoid, because it lives inside the very process we do not fully trust. The eBPF layer is the most expensive to run, but it watches from the kernel, below anything the agent can reach, so it catches actions the higher layers never saw — including deliberate attempts to bypass the SDK. Authority is inverse to cost: the cheaper a layer is, the less you can trust it to be present.
What each layer catches
Layer 1 — SDK shim (in-process)
The language SDKs call into a thin native shim over aa-sdk-client, which ships
events over a Unix domain socket to the runtime and applies pre-execution
allow/deny via wrapper functions. It is the fastest path and gives the richest
context (it sees the call before it happens), but it requires the agent to
adopt the SDK and can be skipped. Its security checks are advisory only —
see Trust boundaries.
Layer 2 — Sidecar proxy (aa-proxy)
The proxy terminates outbound TLS with a per-host certificate signed by a local
CA generated on first start (aa-proxy/src/tls/ca.rs), inspects the decrypted
request, and enforces network-egress and data policy at the wire — with no
change to agent code. The interceptor returns a VerdictDecision of
Forward, ForwardRedacted, Block, or AlertAndForward
(aa-proxy/src/intercept/mod.rs), and for MCP tools/call it can match on
arguments (aa-proxy/src/intercept/mcp.rs) — a precision the raw-bytes scanner
alone cannot reach. It catches egress the SDK missed, but sees only what is
routed through it.
Layer 3 — eBPF (kernel)
The kernel layer attaches uprobes to the SSL library — SSL_write (outbound
plaintext) and SSL_read entry/exit (inbound plaintext) in
aa-ebpf-probes/src/ssl_probes.rs — and tracepoints/kprobes for process exec and
file syscalls (aa-ebpf-probes/src/exec_probes.rs, aa-ebpf/src/kprobe.rs).
Because it observes at the syscall / library boundary, it sees TLS plaintext and
process activity even when the agent never adopted the SDK and never routed
through the proxy. It is the floor. It is Linux-only and needs elevated
privileges.
How the layers compose
The layers are not alternatives — they stack. A deployment runs whatever
subset fits its constraints, and because every layer reports to the same gateway
using the same audit wire format (aa-proto audit events), the gateway sees one
unified view no matter which layers produced the events. Coverage is the
union of the layers you deploy:
- the SDK handles the fast common path,
- the proxy backstops network egress without touching agent code,
- eBPF is the floor that catches what slips past both.
Run all three and an action has nowhere to hide — an attempt to evade a higher layer simply surfaces at a lower one.
graph TD
classDef agent fill:#eef2ff,stroke:#6366f1
classDef l1 fill:#eaf6ee,stroke:#3aa55b
classDef l2 fill:#fff3d6,stroke:#c98a00
classDef l3 fill:#fdecea,stroke:#d75748
classDef gw fill:#e8f1ff,stroke:#5b8def
Agent["AI agent<br/>(tool / LLM / network calls)"]:::agent
subgraph Interception["Three interception layers (union coverage)"]
L1["Layer 1 — SDK shim<br/>aa-sdk-client · in-process · lowest latency<br/><i>advisory checks only</i>"]:::l1
L2["Layer 2 — Sidecar proxy<br/>aa-proxy · MitM outbound HTTPS<br/>Forward / Redact / Block"]:::l2
L3["Layer 3 — eBPF<br/>aa-ebpf · kernel SSL uprobes + syscalls<br/>highest authority"]:::l3
end
GW["Gateway (aa-gateway)<br/>authoritative policy · budget · decision"]:::gw
RT["Runtime (aa-runtime)<br/>authoritative scan + redact"]:::gw
Audit[("Tamper-evident<br/>audit trail")]
Agent -->|"adopted SDK path"| L1
Agent -.->|"routed HTTPS"| L2
Agent -.->|"raw syscalls / TLS<br/>(bypass attempt)"| L3
L1 --> RT
L2 --> RT
L3 --> RT
RT -->|"unified audit wire format"| GW
GW --> Audit
flowchart LR
classDef catch fill:#eaf6ee,stroke:#3aa55b
classDef miss fill:#fdecea,stroke:#d75748
A["Agent action"] --> Q1{"SDK adopted<br/>& wired?"}
Q1 -->|yes| C1["Caught at Layer 1<br/>(SDK)"]:::catch
Q1 -->|"no / skipped"| Q2{"Routed<br/>through proxy?"}
Q2 -->|yes| C2["Caught at Layer 2<br/>(proxy egress)"]:::catch
Q2 -->|"no / direct socket"| Q3{"Linux + eBPF<br/>deployed?"}
Q3 -->|yes| C3["Caught at Layer 3<br/>(eBPF kernel)"]:::catch
Q3 -->|no| U["Uncovered<br/>(deploy eBPF to close)"]:::miss
The second diagram makes the composition explicit: an action only escapes governance if it evades every deployed layer. With eBPF present, the bypass path collapses to “caught at Layer 3.”
Last updated: 2026-06-11 by Chisanan232
Protection and enforcement
Once an action is observed (see Three-layer defense in depth), it must be decided on and, where necessary, blocked or scrubbed. This page covers the enforcement machinery: policy evaluation, fail-closed behavior, network-egress control, credential scanning & redaction, and budgets as a control. Every claim below is grounded in the gateway, runtime, and security crates; for the broader component picture see Architecture.
Policy evaluation
The gateway is the authoritative decision point. The policy engine
(aa-gateway/src/engine/mod.rs) evaluates an AgentContext + GovernanceAction
and returns a PolicyDecision — Allow, RequireApproval { reason, timeout_secs }, or Deny { reason, source_scope }
(aa-gateway/src/engine/decision.rs).
Evaluation runs as a staged pipeline. The single-policy path (evaluate_primary)
and the scoped-cascade path (evaluate_with_cascade) share the same stages:
| Stage | Check | Outcome on violation |
|---|---|---|
| 1 | Schedule / active-hours window | Deny “outside active hours” |
| 2 | Network allowlist (for NetworkRequest) | Deny “host not in network allowlist” |
| 3 | Tool allow/deny | Deny “tool denied by policy” |
| 4 | Tool rate limit | Deny “rate limit exceeded” |
| 5 | Approval condition (requires_approval_if) | RequireApproval |
| 6 | Credential / custom-pattern scan | redact in memory — never deny |
| 7 | Budget (monthly then daily) | Deny “budget exceeded” + optional SuspendAgent |
Stage 6 is notable: a credential finding redacts rather than denies, so a governed action still proceeds but the secret never travels upstream. Denial is reserved for policy, egress, rate, and budget violations.
Scoped cascade and most-restrictive-wins
When scoped policies are loaded, the engine collects a cascade of
PolicyDocuments along the agent’s lineage (Global → Org → Team → Agent) and
merges them with most-restrictive-wins semantics (merge_decisions in
aa-gateway/src/engine/decision.rs): any Deny short-circuits and wins;
otherwise the narrowest-scope RequireApproval wins; only an all-Allow cascade
returns Allow.
Fail-closed behavior
The system denies whenever it cannot make a safe decision. Two load-bearing examples:
- Empty policy cascade →
Deny.merge_decisionsreturns a fail-closedDeny { reason: "no policy — fail-closed", source_scope: Global }for an empty cascade — it never silently allows (aa-gateway/src/engine/decision.rs). - Unscannable field → redact whole. In the runtime enforcement stage, a
secret-bearing field larger than
max_field_bytes(defaultDEFAULT_MAX_FIELD_BYTES = 64 KiB) cannot be fully scanned, so it is replaced wholesale withOVERSIZED_MARKER = "[REDACTED:OVERSIZED]"rather than forwarded raw —OversizedPolicy::RedactWhole, the sole and default variant (aa-runtime/src/pipeline/enforcement.rs). The doc comment is explicit: “The runtime is a security gate, so the policy is fail-closed.”
Null-as-no-match nuance. Inside a single policy document, an unresolvable graph variable contributes nothing to the decision — a
denycondition that references it does not fire (aa-gateway/src/policy/context.rs). This is a deliberate per-clause evaluation rule (fail-open on missing context within a clause), distinct from the system-level fail-closed default that governs the absence of any policy.
Network-egress control
Egress is enforced at two tiers. In the gateway, check_network_egress(host, policy) returns an EgressDecision against the policy’s allowlist
(aa-gateway/src/policy/network.rs); a NetworkRequest to a host outside a
non-empty allowlist is denied at Stage 2. At the wire, the proxy independently
enforces egress on decrypted traffic with no agent code change (see
Three-layer defense), and the eBPF SSL uprobes observe
egress plaintext even when the proxy is bypassed.
Credential scanning & redaction (aa-security)
The aa-security leaf crate is the credential-detection and redaction
primitive (extracted from aa-core per
ADR 0002, AAASM-2567). Its
CredentialScanner (aa-security/src/scanner.rs) compiles a single
Aho-Corasick automaton over literal secret prefixes and patterns, mapping
each match to a CredentialKind:
- LLM-provider keys (Anthropic, OpenAI),
- cloud keys (
AKIA…AWS access key, GCP service account, Azure connection string), - VCS tokens (
ghp_PAT,ghs_app token), - Slack tokens, database URLs (
postgres,mysql,mongodb), - private-key PEM blocks (RSA / EC / OpenSSH / generic / PGP).
A scan yields a ScanResult; redact() replaces each match with a
[REDACTED:<kind>] label, and the resulting Redaction
(aa-security/src/redaction.rs) stores only finding metadata — never the raw
secret value.
The same crate is wired in at every trusted point:
| Caller | Role |
|---|---|
aa-runtime (pipeline/enforcement.rs, RuntimeScanner::enforce) | Authoritative — re-scans every event unconditionally on both the batch and the violation path |
aa-gateway (engine/mod.rs Stage 6, audit.rs) | Scan-then-redact at evaluation and at the audit-write boundary |
aa-proxy (intercept/mod.rs) | Wire-level scan driving Block / ForwardRedacted |
SDK / aa-sdk-client | Advisory preflight only — best effort, never trusted |
The runtime’s RuntimeScanner holds one precompiled scanner, built once at
pipeline start and reused per event — it is never rebuilt per event — and only
the allowlisted secret-bearing fields of each Detail variant (ToolCall,
FileOp, Process) are scanned. Variants with no free-text secret fields
(LlmCall, Network, Violation, Approval) are matched explicitly with no
wildcard, so adding a new detail variant fails to compile until its
secret-bearing fields are triaged.
Budgets as a control
Budgets are a first-class security control against runaway spend, not merely a
cost report. The gateway’s BudgetTracker (aa-gateway/src/budget/) tracks
per-agent, per-team, and per-org daily/monthly spend. At Stage 7 the engine
checks monthly then daily limits; on exceed it returns a Deny whose
side-effect is driven by the policy’s action_on_exceed:
ActionOnExceed::Deny— refuse the individual request, keep the agent active;ActionOnExceed::Suspend— attachDenyAction::SuspendAgentso the service layer suspends the agent.
The default when action_on_exceed is absent is Deny
(aa-gateway/src/policy/validator.rs).
Decision flow
flowchart TD
classDef gw fill:#e8f1ff,stroke:#5b8def
classDef deny fill:#fdecea,stroke:#d75748
classDef redact fill:#fff3d6,stroke:#c98a00
classDef allow fill:#eaf6ee,stroke:#3aa55b
A["GovernanceAction + AgentContext"]:::gw --> Casc{"Policy cascade<br/>empty?"}
Casc -->|yes| FC["Deny — fail-closed<br/>'no policy'"]:::deny
Casc -->|no| S1["1 Schedule"]:::gw --> S2["2 Network allowlist"]:::gw
S2 --> S3["3 Tool allow/deny"]:::gw --> S4["4 Rate limit"]:::gw
S4 --> S5{"5 Requires<br/>approval?"}
S5 -->|yes| RA["RequireApproval"]:::redact
S5 -->|no| S6["6 Credential scan<br/>(aa-security)"]:::redact
S6 -->|finding| RED["Redact in memory<br/>[REDACTED:kind] — proceed"]:::redact
S6 -->|clean| S7{"7 Budget<br/>exceeded?"}
RED --> S7
S7 -->|"yes / Deny"| BD["Deny 'budget exceeded'"]:::deny
S7 -->|"yes / Suspend"| SUS["Deny + SuspendAgent"]:::deny
S7 -->|no| OK["Allow"]:::allow
S1 -.->|outside hours| D1["Deny"]:::deny
S2 -.->|host not allowed| D2["Deny"]:::deny
S3 -.->|tool denied| D3["Deny"]:::deny
S4 -.->|over limit| D4["Deny"]:::deny
Last updated: 2026-06-11 by Chisanan232
Trust boundaries
The single most important decision in Agent Assembly’s Security Model is where it places trust. The answer is recorded formally in ADR 0002 — SDK Security Boundary: the SDK is not a trust boundary; the runtime and gateway are authoritative. This page explains why, and how that decision is made bypass-resistant.
Why the SDK is not a trust boundary
The fastest interception layer — the SDK — runs inside the agent’s own process, which is exactly the component the model does not trust (see the threat model). An attacker who controls the agent controls a modified, outdated, or stubbed SDK. Therefore any guarantee anchored in the SDK is not a guarantee at all: security must hold even when the SDK does nothing.
ADR 0002 audited the prior state and found enforcement in the wrong place — the
only credential scan on the SDK fast-path was inside the SDK binding itself,
while the trusted runtime forwarded the SDK’s payload without independently
scanning it. A removed or bypassed SDK scanner would let raw secrets flow
SDK → runtime → gateway. The decision reversed this:
| Concern | Decision |
|---|---|
| Is the SDK a security boundary? | No. The SDK is untrusted. |
| Authoritative enforcement point | aa-runtime — scans, redacts, normalizes every event before forward/audit, unconditionally. |
| Source of truth | Gateway / control-plane — policy SoT; audit-write sanitizer as final backstop. |
| SDK-side detection | Best-effort advisory preflight only. No clean / already_scanned marker exists on the wire, and none is honored. |
Invariant. Nothing the SDK asserts can shorten the runtime’s work. The same
aa-securityscanner is advisory inside the SDK and authoritative insideaa-runtime. Position — not code — confers authority.
Why the runtime / gateway is authoritative
The SDK event fast-path is SDK → UDS → aa-runtime → gRPC → gateway. The runtime
is the mandatory chokepoint: every event must pass through it. So that is
where the authoritative work lives. RuntimeScanner::enforce
(aa-runtime/src/pipeline/enforcement.rs) scans, redacts, and normalizes every
secret-bearing field, and the pipeline run() loop
(aa-runtime/src/pipeline/mod.rs) calls it before any forward or audit, on
every path — both the batched path and the immediate policy-violation path.
Its doc comment states the rule directly:
Runs unconditionally — no field of the event can request that scanning be skipped, and there is no SDK trust marker on the wire.
The gateway, in turn, is the source of truth for policy (the agent cannot author its own policy) and applies a final write-boundary sanitizer before anything is persisted.
Bypass resistance — eBPF catches what the SDK and proxy miss
Trust placement answers “what if the SDK lies?” Bypass resistance answers
“what if the agent skips the SDK entirely?” The
three-layer model is the structural answer: a higher
layer evaded simply surfaces at a lower one. An agent that skips the SDK and the
proxy and links TLS directly is still observed by the eBPF SSL_write /
SSL_read uprobes and exec/file syscall hooks
(aa-ebpf-probes/src/ssl_probes.rs, aa-ebpf-probes/src/exec_probes.rs),
because the kernel sits below anything the agent can reach.
This is verified, not asserted. The bypass-resistance suite drives the public
aa_runtime::pipeline::run loop end-to-end and proves every inbound event is
scanned + redacted before forward/audit on both paths, with the raw secret
never leaving the runtime regardless of SDK behavior
(aa-runtime/tests/aaasm_2568_gate_verification.rs). The “no trust marker” guard
is partly compile-time — the exhaustive, wildcard-free match over Detail
variants forces any new secret-bearing field to be triaged before it compiles.
Trust-boundary diagram
flowchart LR
classDef untrusted fill:#fdecea,stroke:#d75748,stroke-dasharray: 4 3
classDef trusted fill:#eaf6ee,stroke:#3aa55b
classDef sot fill:#e8f1ff,stroke:#5b8def
subgraph U["UNTRUSTED — agent-controllable process"]
SDK["Python / Node / Go SDK<br/>+ aa-sdk-client shim<br/><i>advisory preflight only</i>"]:::untrusted
end
subgraph T["TRUSTED ENFORCEMENT"]
RT["aa-runtime<br/>mandatory chokepoint<br/>scan · redact · normalize<br/><b>unconditional</b>"]:::trusted
PX["aa-proxy<br/>wire egress + scan"]:::trusted
BPF["aa-ebpf<br/>kernel uprobes / syscalls<br/>bypass floor"]:::trusted
end
subgraph S["SOURCE OF TRUTH"]
GW["aa-gateway<br/>policy SoT · budget<br/>audit-write sanitizer"]:::sot
end
SDK -->|"UDS · no trust marker"| RT
PX --> RT
BPF --> RT
RT -->|"gRPC"| GW
%% the boundary line
SDK -. "trust boundary" .-> RT
Everything left of the runtime is untrusted and can only advise; everything from the runtime rightward is authoritative. The dashed edge is the trust boundary itself — the SDK’s assertions stop there. See ADR 0002 for the full decision record and the boundary-first migration order that ensured SDK-side scanning was never removed before the runtime became authoritative.
Last updated: 2026-06-11 by Chisanan232
Audit and assurance
Governance is only credible if there is a trustworthy record of what happened. Agent Assembly’s audit pipeline is designed so that the trail is free of secrets, tamper-evident, and supports non-repudiation — even when an upstream sender (an SDK, a proxy, an eBPF probe) emits something it should not. This page covers the write-boundary sanitizer, redaction, and the publish path. For where audit sits in the wider system, see Architecture.
The write-boundary sanitizer
Every audit event the gateway is about to persist passes first through
sanitize (aa-gateway/src/sanitizer/). The module’s own description states the
principle: “The sender is the first line of defense; this module is the last.”
It never trusts the inbound shape — it operates on the untyped JSON tree as
received and:
- strips banned keys recursively at any depth,
- drops unknown top-level fields, counting them so a newly-emitting sender is noticed (a drift signal), and
- collapses heartbeats into a single “last seen” update on the agent row instead of writing a per-beat record.
The four classes of “never store” data are removed regardless of what any
upstream emits: raw LLM prompts/completions, full tool-call payloads, eBPF
packet bodies, and per-heartbeat sequence records. The BANNED_KEYS list
(aa-gateway/src/sanitizer/rules.rs) is deliberately a superset — defense in
depth means erring toward dropping — and includes prompt, completion,
llm_input, llm_output, tool_payload, tool_response, tool_args,
tool_result, packet_body, packet_payload, and heartbeat_seq.
The sanitizer returns a SanitizeOutcome — either an Audit(SanitizedAuditEvent)
to persist, or a HeartbeatUpdate to fold into the agent’s “last seen” field
(aa-gateway/src/sanitizer/event.rs). The SanitizedAuditEvent type is a
constructor-guarded wrapper, so a value can only exist after it has been
through the banned-key pass.
Redaction: secrets never reach the record
The sanitizer removes whole banned containers; the aa-security scanner removes
secrets that appear inside otherwise-legitimate fields. Both run on the audit
path. At the gateway audit-write boundary (aa-gateway/src/audit.rs) the
CredentialScanner detects a secret and redact() replaces it with a
[REDACTED:<kind>] label; the resulting Redaction
(aa-security/src/redaction.rs) stores only finding metadata — kind and
offset — never the raw value. Combined with the runtime’s authoritative
re-scan (see Protection and enforcement), a secret is
redacted before forward and again before persist, so it never lands in
audit_logs.
Tamper-evidence and non-repudiation
Audit events are published off the runtime via the NATS audit publisher
(aa-runtime/src/audit_publisher/). Each entry is published to a structured,
tenant- and agent-scoped subject derived by subject_for
(aa-runtime/src/audit_publisher/subject.rs):
assembly.audit.<tenant>.<agent>
where <tenant> is the entry’s org id (falling back to team id, then
default) and <agent> is the agent id rendered as a hyphenated UUID. Scoping
every record to an immutable tenant+agent identity means a record cannot be
silently reattributed, and routing through a durable message bus separates the
production of audit evidence (the runtime, which an agent cannot reach into)
from its consumption (the gateway/storage), so the trail is not rewritable
by the governed party. This separation, plus the constructor-guarded sanitized
type and metadata-only redaction, is what makes the record non-repudiable:
the governed action and its decision are recorded by trusted components, with no
path for the agent to alter or suppress its own history.
End-to-end audit data flow
flowchart TD
classDef src fill:#eef2ff,stroke:#6366f1
classDef trusted fill:#eaf6ee,stroke:#3aa55b
classDef guard fill:#fff3d6,stroke:#c98a00
classDef store fill:#e8f1ff,stroke:#5b8def
SDK["SDK (advisory)"]:::src
PX["aa-proxy"]:::src
BPF["aa-ebpf"]:::src
RT["aa-runtime pipeline<br/>RuntimeScanner::enforce<br/>scan · redact · normalize<br/><b>unconditional</b>"]:::trusted
PUB["audit_publisher<br/>subject assembly.audit.<tenant>.<agent>"]:::trusted
BUS[["NATS bus<br/>(durable, append-oriented)"]]:::trusted
SAN["Gateway sanitizer<br/>strip BANNED_KEYS (recursive)<br/>drop unknown top-level (counted)<br/>collapse heartbeats"]:::guard
RED["aa-security redaction<br/>[REDACTED:kind] · metadata only"]:::guard
HB["agents.last_heartbeat<br/>update"]:::store
LOG[("audit_logs<br/>secret-free, attributed")]:::store
SDK --> RT
PX --> RT
BPF --> RT
RT --> PUB --> BUS --> SAN
SAN -->|"Audit(SanitizedAuditEvent)"| RED --> LOG
SAN -->|"HeartbeatUpdate"| HB
The record that reaches audit_logs has passed an authoritative redaction in
the runtime, a recursive banned-key strip in the sanitizer, and a final
metadata-only credential redaction — and is bound to an immutable tenant+agent
subject. No single compromised or careless sender can defeat the trail.
Last updated: 2026-06-11 by Chisanan232
Architecture
This chapter is the engineering map of agent-assembly — the open-source core
that governs AI agents by intercepting their actions at three independent layers
and routing every action through one central gateway.
It is written for contributors and integrators who want to understand how the system is built, not just how to operate it. For the system-level overview, see System architecture; for the security rationale, see the Security Model.
Pages in this chapter
- System architecture — the big picture: the 28 workspace crates, the three interception layers, the gateway / API / runtime / storage split, and the gRPC / HTTP / UDS transport topology, with a mermaid system diagram.
- Component deep-dives — a per-crate tour of responsibilities, key types, and dependencies: gateway, policy engine, budgets, runtime, the three interception crates, API, CLI, foundation crates, storage, and cache.
- Key workflows — policy evaluation, agent registration, budget tracking & rollup, and the interception/enforcement path, each as a mermaid sequence or flow diagram grounded in the real code path.
- Data flows — how an intercepted event travels from a layer through the gateway, the policy engine, and the write-boundary sanitizer into durable, tamper-evident storage.
- Building & contributing — build, test, and lint basics for working on the workspace.
The model in one diagram
flowchart LR
Agent[AI agent] --> Layers["3 interception layers<br/>SDK · proxy · eBPF"]
Layers --> RT["aa-runtime<br/>chokepoint"]
RT -->|gRPC :50051| GW["aa-gateway<br/>policy · budget · audit"]
GW --> Store[("storage")]
GW --> API["aa-api<br/>HTTP :7700"]
API --> Dash["dashboard / tooling"]
Start with System architecture.
System architecture
This page is the big-picture map of agent-assembly: the workspace crates, how
the three interception layers feed one central gateway, and which transport
each component speaks. Read it first; the component deep-dives,
key workflows, and data flows pages zoom into
each piece.
For the trust-boundary view of the same system — what each layer is trusted to do and where the authoritative checks live — see the Security Model.
The one-sentence model
Agents act; the three interception layers observe those actions and forward them to the gateway; the gateway evaluates policy, tracks budgets, and writes an audit record before returning allow or deny.
The gateway is the single decision-maker. The interception layers differ only in
where they sit and how much they can bypass — they all converge on the same
protobuf wire format defined in aa-proto and the same PolicyService RPC.
Workspace at a glance
The Cargo workspace declares 28 member crates in the top-level
Cargo.toml.
They group into a handful of architectural roles:
| Role | Crates | What they own |
|---|---|---|
| Foundation | aa-core, aa-proto, aa-security | Domain types (AgentId, AuditEntry, policy types), the gRPC/protobuf wire schema, and the credential scanner / redaction primitives. |
| Storage | aa-storage, aa-storage-memory, aa-storage-postgres, aa-storage-redis, aa-storage-sqlite-buffer, aa-cache | Storage trait facade + pluggable drivers, plus the in-process L1 cache. |
| Runtime / interception | aa-runtime, aa-ebpf, aa-ebpf-common, aa-proxy, aa-sdk-client, aa-wasm, aa-sandbox | The per-agent runtime chokepoint, the kernel/proxy/SDK interception layers, the FFI-agnostic SDK client, and the WASM tool sandbox. |
| Control plane | aa-gateway, aa-api, aa-cli | The governance gateway (gRPC), the HTTP/OpenAPI read API, and the aasm operator CLI. |
| Dev-tool adapters | aa-devtool, aa-devtool-claude-code, aa-devtool-codex, aa-devtool-copilot, aa-devtool-windsurf, aa-devtool-saas, plus the examples/aa-devtool-sample-myeditor sample | Adapters that wire common AI dev tools into the governance fabric. |
| Test / conformance | conformance, aa-integration-tests | The cross-crate trait conformance harness and the end-to-end integration suite. |
Two further eBPF crates — aa-ebpf-probes and aa-ebpf-programs — live
alongside the workspace but are intentionally out of workspace: they compile
for the bpfel-unknown-none BPF target and are built by aa-ebpf’s build.rs
via aya-build, so they cannot be selected with cargo -p.
The per-language SDK shims (Python / Node / Go) do not live in this
monorepo. They wrap aa-sdk-client and consume it via a pinned git SHA from the
sibling python-sdk / node-sdk / go-sdk repositories.
Crate / component map
The diagram highlights the core architectural crates; storage drivers,
dev-tool adapters, and test harnesses are folded into summary nodes for clarity.
Edges follow real path dependencies in each crate’s Cargo.toml.
graph TD
classDef foundation fill:#e8f1ff,stroke:#5b8def
classDef storage fill:#eef6ff,stroke:#5b8def
classDef ebpf fill:#fdecea,stroke:#d75748
classDef ffi fill:#eaf6ee,stroke:#3aa55b
classDef control fill:#fff3d6,stroke:#c98a00
classDef outOfWorkspace fill:#fdecea,stroke:#d75748,stroke-dasharray: 5 3
%% Foundation
aa_proto[aa-proto<br/><i>wire schema</i>]:::foundation
aa_core[aa-core<br/><i>domain types</i>]:::foundation
aa_security[aa-security<br/><i>scanner / redaction</i>]:::foundation
%% Storage
aa_storage[aa-storage<br/><i>trait facade</i>]:::storage
aa_cache[aa-cache<br/><i>L1 cache</i>]:::storage
storage_drivers["aa-storage-{memory,postgres,<br/>redis,sqlite-buffer}"]:::storage
%% Interception / runtime
aa_runtime[aa-runtime<br/><i>per-agent chokepoint</i>]:::ffi
aa_sdk_client[aa-sdk-client<br/><i>FFI-agnostic client</i>]:::ffi
aa_wasm[aa-wasm]:::ffi
aa_sandbox[aa-sandbox<br/><i>WASI tool sandbox</i>]:::ffi
aa_proxy[aa-proxy<br/><i>L2 sidecar</i>]:::ebpf
aa_ebpf[aa-ebpf<br/><i>L3 kernel</i>]:::ebpf
aa_ebpf_common[aa-ebpf-common]:::ebpf
aa_probes["aa-ebpf-probes /<br/>aa-ebpf-programs<br/><i>out-of-workspace BPF</i>"]:::outOfWorkspace
%% Control plane
aa_gateway[aa-gateway<br/><i>gRPC 50051</i>]:::control
aa_api[aa-api<br/><i>HTTP / OpenAPI</i>]:::control
aa_cli[aa-cli<br/><i>aasm</i>]:::control
aa_core --> aa_security
aa_storage --> aa_core
aa_cache --> aa_core
storage_drivers --> aa_storage
aa_runtime --> aa_core
aa_runtime --> aa_proto
aa_runtime --> aa_ebpf
aa_sdk_client --> aa_proto
aa_sdk_client -. preflight .-> aa_security
aa_wasm --> aa_core
aa_ebpf --> aa_core
aa_ebpf --> aa_ebpf_common
aa_probes --> aa_ebpf_common
aa_proxy --> aa_core
aa_proxy --> aa_proto
aa_proxy --> aa_runtime
aa_proxy --> aa_sandbox
aa_gateway --> aa_core
aa_gateway --> aa_proto
aa_gateway --> aa_runtime
aa_gateway --> aa_storage
aa_gateway --> aa_cache
aa_api --> aa_core
aa_api --> aa_gateway
aa_api --> aa_runtime
aa_cli --> aa_core
aa_cli --> aa_gateway
aa-core and aa-proto are the two foundation leaves everything else builds on:
aa-core holds the Rust domain model and the storage traits, aa-proto holds
the protobuf schema that crosses every process boundary.
How the layers, gateway, API, runtime, and storage fit together
flowchart TB
subgraph agent_host["Agent host"]
Agent[AI agent process]
subgraph layers["Three interception layers"]
L1["L1 — In-process SDK<br/>(aa-sdk-client shims, aa-wasm)"]
L2["L2 — Sidecar proxy<br/>(aa-proxy)"]
L3["L3 — eBPF<br/>(aa-ebpf, kernel)"]
end
RT["aa-runtime<br/>per-agent chokepoint"]
end
subgraph control["Control plane"]
GW["aa-gateway<br/>registry · policy · budget · audit"]
API["aa-api<br/>HTTP / OpenAPI read API"]
end
subgraph persistence["Storage"]
STORE[("aa-storage drivers<br/>memory / postgres / redis / sqlite-buffer")]
end
Dash["Dashboard / operators"]
CLI["aasm CLI"]
Agent --> L1 & L2 & L3
L1 -->|UDS IpcFrame| RT
L2 -->|forward| RT
L3 -->|ring buffer| RT
RT -->|gRPC PolicyService.CheckAction<br/>:50051| GW
GW --> STORE
API --> GW
Dash -->|HTTP / WS| API
CLI -->|gRPC| GW
- The interception layers are deployment-independent: a deployment can run
any subset (SDK only, SDK + proxy, all three). Each layer turns an agent
action into an event in the
aa-protoschema. aa-runtimeis the per-agent chokepoint. Because the SDK is untrusted, the runtime re-scans every event (the enforcement stage inaa-runtime/src/pipeline/enforcement.rs) before forwarding it.aa-gatewayis the brain. It hosts the agent registry, the policy engine, per-team budgets, and the audit pipeline, and it serves gRPC on:50051.aa-apidepends onaa-gatewayin-process and re-exposes its read surfaces over HTTP with an OpenAPI schema (viautoipa) for the dashboard and tooling.- Storage is a pluggable trait facade (
aa-storage) with swappable drivers, fronted by an in-process L1 cache (aa-cache).
Transport topology
Every cross-process message rides one of three transports. All gRPC and
Unix-socket payloads share the aa-proto schema.
flowchart LR
SDK["SDK shim<br/>(aa-sdk-client)"] -- "UDS IpcFrame" --> RT["aa-runtime"]
RT -- "gRPC :50051" --> GW["aa-gateway"]
PROXY["aa-proxy"] -- "gRPC :50051" --> GW
EBPF["aa-ebpf"] -- "ring buffer → events" --> RT
GW -- "in-process dep" --> API["aa-api"]
DASH["Dashboard"] -- "HTTP / OpenAPI :7700" --> API
CLI["aasm CLI"] -- "gRPC :50051" --> GW
| Transport | Default endpoint | Carries | Who speaks it |
|---|---|---|---|
| gRPC | 127.0.0.1:50051 (TCP) or UDS | PolicyService, AuditService, AgentLifecycleService, TopologyService, ApprovalService, SecretsService, InvalidationService | aa-runtime, aa-proxy, aa-cli → aa-gateway |
| HTTP / OpenAPI | 127.0.0.1:7700 (AA_API_ADDR) | Read APIs: registry, topology, audit, costs, alerts, traces | Dashboard / tooling → aa-api |
| Unix domain socket (UDS) | per-agent socket | IpcFrame events from the in-process SDK | SDK shim → aa-runtime |
The seven gRPC services are registered together in
aa-gateway/src/server.rs;
the gateway can serve them over either TCP (serve_tcp) or a Unix socket
(serve_uds). The default gRPC listen address is 127.0.0.1:50051; the HTTP API
default bind is 127.0.0.1:7700 (constant DEFAULT_ADDR in
aa-api/src/config.rs,
overridable via AA_API_ADDR).
Where to go next
- Component deep-dives — per-crate responsibilities, key types, and dependencies.
- Key workflows — policy evaluation, agent registration, budget rollup, and the enforcement path as sequence diagrams.
- Data flows — how an intercepted event travels from a layer through the gateway to the audit log and storage.
- Security Model — the same system viewed through trust boundaries and defense-in-depth.
Last updated: 2026-06-11 by Chisanan232
Component deep-dives
This page walks the major crates one by one: what each owns, its key types, and who it depends on. For the bird’s-eye map and the dependency diagram, start with System architecture.
All paths link into the
master tree on GitHub.
aa-gateway — the governance brain
aa-gateway is the central decision-maker. It hosts the agent registry, the
policy engine, per-team budgets, the audit pipeline, approvals, anomaly
detection, and the seven gRPC services. Its module tree is large; the load-bearing
sub-modules are:
| Module | Responsibility |
|---|---|
registry/ | Agent registry — AgentRecord / AgentRegistry backed by DashMap, lineage, orphan handling, token issuance, storage bridge. |
policy/ | The policy engine (parse → validate → compile → evaluate). See below. |
budget/ | Per-agent and per-team spend tracking, pricing tables, and rollup. See below. |
engine/ | Decision caching, rate limiting, scope index, and the policy file watcher. |
service/ | gRPC service impls: policy_service, audit_service, lifecycle_service, topology_service, approval_service, secrets_service. |
audit.rs, audit_consumer.rs, audit_reader.rs | The audit write path (AuditWriter), the NATS JetStream consumer, and the read API. |
sanitizer/ | The write-boundary sanitize() pass that drops “never store” data before persistence. |
invalidation/ | The push-invalidation hub that broadcasts policy/approval changes to subscribers. |
anomaly/, approval/, edges/, iam/, secrets/, ops/ | Anomaly baselines + responder, human-in-the-loop approvals, cross-team edge tracking, IAM, secret dispatch, and in-flight ops. |
server.rs | Registers all seven services and serves over TCP (serve_tcp) or UDS (serve_uds). |
Key types: AgentRecord, AgentRegistry, AgentStatus
(registry/store.rs).
Depends on: aa-core, aa-proto, aa-runtime, aa-storage, aa-cache.
Serves: gRPC on 127.0.0.1:50051.
The policy engine (aa-gateway/src/policy/)
The engine turns a YAML/TOML policy bundle into a decision. Entry point is
validator::PolicyValidator::from_yaml.
| Module | Role |
|---|---|
raw.rs | Deserialise the policy bundle (raw, untyped shape). |
validator.rs | Structural validation → PolicyValidator, PolicyValidatorOutput. |
expr.rs | Compile rule predicates into a typed expression tree. |
document.rs | The evaluated PolicyDocument and its scoped policies (ToolPolicy, NetworkPolicy, BudgetPolicy, DataPolicy, SchedulePolicy). |
scope.rs | PolicyScope plus OrgId / TeamId — the org → team → agent → tool cascade. |
network.rs | check_network_egress → EgressDecision for L2 proxy egress checks. |
rbac.rs | required_role_for, CallerRole, MutationKind — who may mutate which scope. |
history/, context.rs, error.rs | Version history, evaluation context, and the PolicyParseError / ValidationError types. |
The evaluation flow is detailed on the Key workflows page.
Budgets (aa-gateway/src/budget/)
| Module | Role |
|---|---|
tracker.rs | BudgetTracker — per-agent / per-team / global spend, daily + monthly windows, alert thresholds at 80 % / 95 %. |
pricing.rs | PricingTable — per-model cost tables used to price an action. |
rollup.rs | BudgetRollup / BudgetRow — composes agent / team / org / subtree rows for the dashboard, SDK, and CLI. |
persistence.rs, types.rs | Durable budget state and the BudgetAlert / BudgetState / BudgetWindow types. |
A request that would breach a budget downgrades from allow to deny. See budget tracking & rollup.
aa-runtime — the per-agent chokepoint
aa-runtime sits between an agent’s interception layers and the gateway. It is
the mandatory chokepoint on the SDK fast-path (SDK → UDS → runtime → gateway).
Because the SDK is untrusted, the runtime re-scans every event before forwarding.
| Module | Role |
|---|---|
layer.rs | LayerDetector / LayerSet bitflags — detects which of eBPF / proxy / SDK layers are active at startup. |
ipc/ | UDS server, length-prefixed IpcFrame codec, and the ResponseRouter. |
pipeline/ | Event aggregation: receive IpcFrames, enrich, batch, fan out; the enforcement.rs scan/redact stage; metrics.rs. |
pipeline/enforcement.rs | The authoritative scan/redact stage — fail-closed, oversized fields redacted whole, no already_scanned wire marker is honoured. |
gateway_client.rs | Optional gRPC PolicyServiceClient forwarding CheckAction to the gateway. |
ebpf_bridge.rs | Bridges eBPF ring-buffer events into the pipeline. |
l1_cache.rs, policy.rs | Local policy cache + PolicyRules for offline / local-mode decisions. |
approval.rs, approval_sink.rs | Approval queue and the wait_for_approval sink (timeout ⇒ Decision::Pending). |
invalidation_client.rs | Subscribes to the gateway’s push-invalidation stream. |
audit_publisher/, correlation/, health/ | NATS audit publishing, correlation IDs, and health checks. |
Key types: LayerSet, EnforcementConfig, PipelineEvent, EnrichedEvent.
Depends on: aa-core, aa-proto, aa-ebpf.
The three interception layers
L1 — In-process SDK: aa-sdk-client (+ aa-wasm)
aa-sdk-client is the FFI-agnostic SDK runtime client. The per-language
shims (Python / Node / Go, in their own repos) are thin wrappers over it.
| Module | Role |
|---|---|
config.rs | Resolve gateway endpoint / socket path / agent identity. |
codec.rs | Wire codec for IpcFrame framing. |
ipc.rs | UDS transport to aa-runtime. |
client.rs | Lifecycle + send-event surface. |
preflight.rs | Optional, feature-gated advisory credential preflight using aa-security. |
error.rs | Client error taxonomy. |
aa-wasm is a separate in-workspace target compiling governance components to
WebAssembly (via wasm-bindgen) for browser / edge agents without a native
sidecar.
Trust note: the SDK is not a security boundary — anything it asserts is re-verified by
aa-runtime. See trust boundaries.
L2 — Sidecar proxy: aa-proxy
Intercepts outbound HTTPS via MitM with a per-host CA, enforcing network-egress policy without code changes.
| Module | Role |
|---|---|
tls/ | Per-host CA (ca.rs), leaf-cert minting (cert.rs), OS keychain integration (keychain.rs). |
intercept/ | Detect, extract, and classify intercepted requests (detect.rs, extract.rs, event.rs), including MCP traffic (mcp.rs). |
proxy/ | The HTTP forwarding core (http.rs). |
mcp_enforce.rs | MCP-specific enforcement. |
audit_jsonl.rs | Local JSONL audit fallback. |
Depends on: aa-core, aa-proto, aa-runtime, aa-sandbox.
L3 — eBPF: aa-ebpf (+ aa-ebpf-common, out-of-workspace probes)
Kernel hooks watching SSL libraries (uprobes) and process exec / file syscalls. Linux-only, lowest bypass risk.
| Module | Role |
|---|---|
loader.rs, maps.rs, ringbuf.rs | Load BPF programs, manage maps, drain the ring buffer to userspace. |
uprobe.rs | Attach SSL_write / SSL_read uprobes to OpenSSL for plaintext capture. |
kprobe.rs, kprobes/, tracepoint.rs, syscall.rs | Process exec / file syscall hooks. |
agent_discover.rs, lineage.rs, shell_detect.rs | Discover governed processes, track lineage, detect shells. |
events.rs, alert.rs, error.rs | Event types, alerts, error taxonomy. |
aa-ebpf-common holds types shared between userspace and the BPF programs.
aa-ebpf-probes / aa-ebpf-programs are the out-of-workspace BPF-target
crates built by aa-ebpf/build.rs via aya-build.
Depends on: aa-core, aa-ebpf-common.
aa-api — the HTTP / OpenAPI read API
aa-api depends on aa-gateway in-process and re-exposes its read surfaces
over HTTP (Axum) with an OpenAPI schema (utoipa). It is the dashboard’s backend.
| Module | Role |
|---|---|
routes/ | One module per resource: agents, topology, policies, audit, costs, alerts, traces, approvals, edges, iam, dispatch, tools, destinations, logs, ops, admin, auth, capability. |
openapi.rs | The generated OpenAPI document. |
ws/, events.rs | WebSocket streaming + server-sent events for live dashboard updates. |
middleware/, auth/ | Request middleware and authentication. |
trace_store.rs, replay.rs, pagination.rs | Trace storage, replay, and paged responses. |
server.rs, config.rs | Axum server bootstrap; default bind 127.0.0.1:7700 (DEFAULT_ADDR, overridable via AA_API_ADDR). |
Depends on: aa-core, aa-gateway, aa-runtime.
aa-cli — the aasm operator front-end
aa-cli ships the aasm binary. It talks gRPC to the gateway and HTTP to the
API. Common subcommands: aasm status, aasm topology, aasm policy,
aasm agent, aasm cost, aasm audit, aasm dashboard (TUI). The full surface
is documented in the CLI Reference.
Depends on: aa-core, aa-gateway.
Foundation crates
aa-core — domain model + storage traits
The leaf everything builds on. Holds the Rust domain types and the storage trait contracts (std-gated).
| Area | Contents |
|---|---|
identity.rs | AgentId — an opaque 16-byte identity newtype. |
types/ | The wire domain types: types::AgentId (a String wire id, distinct from identity::AgentId), AuditEvent, Credential, SessionCtx, policy types. |
audit.rs | AuditEntry — hash-chained, tamper-evident audit record. |
policy.rs, capability.rs, risk_tier.rs, dev_tool.rs | Policy types, capability model, RiskTier, GovernanceLevel. |
storage/ | The six storage traits (PolicyStore, AuditSink, CredentialStore, LifecycleStore, SessionStore, RateLimitCounter), StorageError, and a conformance harness. |
topology/, evaluators.rs, time.rs, config.rs | Topology edges + cycle detection, evaluators, time abstractions, config. |
aa-proto — the wire schema
Protobuf definitions (under proto/, package prefix assembly.*.v1) compiled
with prost / tonic. Defines the seven gRPC services and all wire messages.
Every cross-process payload — gRPC and UDS alike — uses these types.
aa-security — credential scanner + redaction
A small leaf crate (only aho-corasick + serde) holding CredentialScanner,
CredentialFinding, and Redaction. Extracted out of aa-core so both the
runtime enforcement stage and the SDK preflight can depend on it without pulling
in the full core.
Storage & cache
aa-storage — trait facade + driver registry
aa-storage re-exports the aa_core::storage traits and adds the runtime
driver registry: StorageConfig, a Registry, factory traits, ConfigError,
and register_builtin_drivers (memory / redis / postgres). It is the loader the
CLI’s aasm config validate / aasm config boot exercise.
Storage drivers
| Crate | Backend | Notable deps |
|---|---|---|
aa-storage-memory | In-process DashMap / parking_lot | none beyond aa-storage + aa-core |
aa-storage-postgres | PostgreSQL via sqlx | sqlx (postgres), testcontainers-modules |
aa-storage-redis | Redis via redis + deadpool-redis | builds on aa-storage-memory for session fallback |
aa-storage-sqlite-buffer | Local SQLite write-buffer | rusqlite (bundled) — pinned to share libsqlite3-sys with sqlx-sqlite |
Each driver implements the aa-core storage traits and is verified against the
shared conformance harness.
aa-cache — in-process L1 cache
L1Cache<S: CacheSource> — a DashMap-backed, TTL’d, cache-aside wrapper over
any store. Concurrent misses for the same key collapse to a single backend load
(stampede protection). The gateway fronts its policy store with this cache.
WASM tool sandbox: aa-sandbox
aa-sandbox hosts a wasmtime-based runtime that executes WASM-marked tools.
It enforces three isolation surfaces — filesystem allowlist (WASI preopened
dirs), CPU budget (wasmtime instruction fuel), and memory ceiling (Store
limiter) — each surfaced as a deterministic SandboxError. It is consumed by
aa-proxy via the tool-dispatch surface.
Test / conformance crates
conformance— the cross-crate trait conformance harness; every storage driver runs the same suite.aa-integration-tests— end-to-end tests that wire multiple crates together (kept separate to avoid dependency cycles).
Last updated: 2026-06-11 by Chisanan232
Key workflows
This page traces the four workflows that define agent-assembly’s runtime
behaviour, each grounded in the real code path:
For component-level detail behind each box, see Component deep-dives; for the bird’s-eye map, see System architecture.
Policy evaluation
When aa-gateway receives a PolicyService.CheckAction RPC, the policy engine
under aa-gateway/src/policy/
walks parse → compile → scope cascade → budget → decision, then audits the
result. The decision type (engine/decision.rs) is one of Allow, Deny,
or RequireApproval.
flowchart TD
Req["CheckActionRequest<br/>(action, target, labels)"] --> Cache{Decision<br/>cache hit?<br/>engine/cache.rs}
Cache -->|hit| Resp
Cache -->|miss| Parse["policy/raw.rs<br/>deserialise bundle"]
Parse --> Validate["policy/validator.rs<br/>structural validation"]
Validate --> Compile["policy/expr.rs<br/>compile predicates"]
Compile --> Cascade["policy/document.rs + scope.rs<br/>org → team → agent → tool<br/>most-restrictive-wins"]
Cascade --> Budget["budget/tracker.rs<br/>check team budget"]
Budget --> Decide{PolicyDecision}
Decide -->|Allow| Audit
Decide -->|Deny| Audit
Decide -->|RequireApproval| Approval["approval queue<br/>(timeout ⇒ Pending)"]
Approval --> Audit
Audit["audit.rs<br/>append hash-chained entry"] --> Resp["CheckActionResponse"]
- Decision cache —
engine/cache.rsshort-circuits repeat lookups for the same(scope, action)key. - Parse + validate —
policy/raw.rsdeserialises the active bundle;policy/validator.rsenforces structural invariants (well-formed scopes, unique rule names). - Compile —
policy/expr.rsturns rule predicates into a typed expression tree evaluated against the request’sActionType, target, and labels. - Scope cascade —
policy/document.rs+scope.rswalkorg → team → agent → tooland merge most-restrictive-wins, with cycle detection on delegation. - Budget check —
budget/tracker.rs(priced viabudget/pricing.rs) downgrades an otherwise-allowed request to Deny if it would breach a budget. - Decision —
engine/decision.rsyieldsAllow,Deny { reason }, orRequireApproval { timeout_secs }. - Audit — every decision is appended to the hash-chained audit log via
audit.rsbefore the response is returned.
Latency targets and current p99 measurements live in Benchmarks — Policy Check p99.
Agent registration
Registration flows through AgentLifecycleService.Register
(aa-gateway/src/service/lifecycle_service.rs),
which validates delegation depth and writes into the DashMap-backed
AgentRegistry. Agents then keep their record live with periodic Heartbeats.
sequenceDiagram
autonumber
participant Agent
participant RT as aa-runtime
participant LS as AgentLifecycleService<br/>(aa-gateway)
participant Reg as AgentRegistry<br/>(registry/store.rs)
participant Store as Storage<br/>(storage_bridge.rs)
Agent->>RT: start with agent identity + parent
RT->>LS: gRPC Register(RegisterRequest)
LS->>LS: validate delegation depth<br/>(≤ DEFAULT_MAX_AGENT_DEPTH = 10)
alt depth OK and not already registered
LS->>Reg: insert AgentRecord (status Active)
Reg->>Store: persist via storage bridge
LS-->>RT: RegisterResponse (token)
else already registered / depth exceeded
LS-->>RT: AlreadyExists / FailedPrecondition
end
loop heartbeat interval
RT->>LS: Heartbeat(HeartbeatRequest)
LS->>Reg: refresh last-seen, recent events
LS-->>RT: HeartbeatResponse (control commands?)
end
- Delegation depth — a sub-agent’s depth must not exceed
DEFAULT_MAX_AGENT_DEPTH(10); over-deep registrations are rejected. - Lineage — the registry records parent/child links (
registry/lineage.rs) so the topology tree and orphan handling (registry/orphan.rs) work. - Control stream —
ControlStreamlets the gateway push commands (e.g.SuspendCommand) back to a live agent. - Deregister — on shutdown the agent calls
Deregister; orphaned children are handled per the configuredOrphanMode.
Budget tracking & rollup
Every priced action updates the in-memory BudgetTracker; the dashboard, SDK,
and CLI read a composed BudgetRollup across agent / team / org / subtree
scopes.
flowchart LR
subgraph track["Tracking (write path)"]
Action["priced action<br/>(model + tokens)"] --> Price["budget/pricing.rs<br/>PricingTable"]
Price --> Tracker["budget/tracker.rs<br/>BudgetTracker"]
Tracker --> Windows["daily + monthly windows<br/>per agent / team / global"]
Windows --> Alert{"≥ 80% / 95%?"}
Alert -->|yes| Broadcast["BudgetAlert<br/>(broadcast channel)"]
end
subgraph roll["Rollup (read path)"]
Req["GET /agents/{id}/budget<br/>or aasm policy show --show-budget"] --> Rollup["budget/rollup.rs<br/>BudgetRollup"]
Rollup --> Rows["BudgetRow[]<br/>agent · team · org · subtree"]
end
Tracker -. read-only accessors .-> Rollup
- Pricing —
budget/pricing.rsconverts model + token counts into a USD cost. - Windows —
BudgetTrackerkeeps daily and monthly windows for each agent, each team, and the global total. - Alerts — crossing 80 % or 95 % of a limit emits a
BudgetAlerton a broadcast channel (capacity 64) for live dashboards. - Rollup —
budget/rollup.rscomposes aBudgetRowper scope (agent,team:<id>,org,subtree) using the tracker’s read-only accessors — narrowest scope first. The same rollup drives both the HTTP endpoint andaasm policy show <agent_id> --show-budget.
Interception & enforcement
An agent action is observed by one of the three layers, normalised into the
aa-proto wire format, re-scanned by aa-runtime, then sent to the gateway for
a decision. The runtime is the mandatory chokepoint: it never trusts the
SDK’s assertions.
sequenceDiagram
autonumber
participant Agent
participant SDK as L1 SDK shim<br/>(aa-sdk-client)
participant Proxy as L2 proxy<br/>(aa-proxy)
participant eBPF as L3 eBPF<br/>(aa-ebpf)
participant RT as aa-runtime<br/>pipeline + enforcement
participant GW as aa-gateway<br/>PolicyService
alt L1 — in-process
Agent->>SDK: tool / LLM / network call
SDK->>RT: UDS IpcFrame (event)
else L2 — sidecar
Agent->>Proxy: outbound HTTPS (MitM)
Proxy->>RT: forwarded event
else L3 — kernel
Agent-->>eBPF: SSL_write / exec / file syscall
eBPF->>RT: ring-buffer event
end
RT->>RT: enrich (pipeline/event.rs)
RT->>RT: scan + redact (pipeline/enforcement.rs)<br/>fail-closed, oversized ⇒ redact whole
RT->>GW: CheckAction(CheckActionRequest)
GW-->>RT: Allow / Deny / RequireApproval
alt Allow
RT-->>Agent: pass-through
else Deny
RT-->>Agent: error / blocked
else RequireApproval
RT->>RT: approval_sink.wait_for_approval<br/>(timeout ⇒ Decision::Pending)
RT-->>Agent: allow or block on resolution
end
Key invariants from aa-runtime/src/pipeline/enforcement.rs:
- The runtime re-scans every event unconditionally — there is no
already_scanned/cleanwire marker, and none is honoured. - Enforcement is fail-closed: a field larger than
max_field_bytes(default 64 KiB) cannot be fully scanned, so it is redacted whole ([REDACTED:OVERSIZED]) rather than partially forwarded. - The credential scanner / redaction primitives come from the
aa-securityleaf crate.
The eBPF layer is observe-and-forward for bypass-detection: it cannot block in-kernel, so it streams audit events while the SDK and proxy layers carry the synchronous allow/deny. For the trust rationale, see three-layer defense.
Where each event goes next
Once a decision is made, the event flows into the audit and storage pipeline — covered in detail on the Data flows page.
Last updated: 2026-06-11 by Chisanan232
Data flows
This page follows the data — not the control decisions — through the system: how an intercepted event becomes a decision, then a durable, tamper-evident audit record. For the decision logic itself, see Key workflows; for the trust view, see the Security Model.
End-to-end: layer → gateway → policy → audit → storage
flowchart TD
subgraph layers["Interception layers"]
L1["L1 SDK<br/>(aa-sdk-client)"]
L2["L2 proxy<br/>(aa-proxy)"]
L3["L3 eBPF<br/>(aa-ebpf)"]
end
subgraph runtime["aa-runtime"]
IPC["ipc/ — UDS IpcFrame"]
PIPE["pipeline — enrich + batch"]
ENF["enforcement — scan + redact<br/>(fail-closed)"]
PUB["audit_publisher — NATS"]
end
subgraph gateway["aa-gateway"]
POL["PolicyService.CheckAction"]
AW["AuditWriter (audit.rs)<br/>append-only JSONL"]
SAN["sanitizer/ — sanitize()<br/>drop 'never store' data"]
CONS["audit_consumer.rs<br/>JetStream pull-consumer"]
end
NATS[("NATS JetStream<br/>assembly.audit.>")]
JSONL[("per-session JSONL<br/>tamper-evident")]
PG[("aa-storage-postgres<br/>audit_logs")]
L1 -->|IpcFrame| IPC
L2 -->|event| IPC
L3 -->|ring buffer| IPC
IPC --> PIPE --> ENF
ENF --> POL
ENF --> PUB
POL -->|decision| AW
AW --> JSONL
AW -. dual sink .-> PG
PUB -->|publish| NATS
NATS --> CONS
CONS --> SAN --> PG
There are two paths an audit record can take, and the design is deliberately layered so neither is a single point of failure:
- Synchronous decision audit (in-gateway). Every
CheckActiondecision is appended byAuditWriter(aa-gateway/src/audit.rs) as one JSON line to a per-session JSONL file. The JSONL file is the tamper-evident primary record (hash-chainedAuditEntry). When a durableStorageBackendis configured, the writer follows each JSONL append withstorage.append_audit_event(...)(the dual-sink path); a storage failure is logged but never stops the pipeline, and a restart can replay missed entries from the JSONL file. - Asynchronous event stream (via NATS).
aa-runtime’saudit_publisherpublishes audit records to the NATS subjectassembly.audit.<tenant>.<agent>and returns control to the agent immediately (fire-and-forget). The gateway’saudit_consumeris a durable JetStream pull-consumer overassembly.audit.>that batches, sanitises, and persists to Postgres.
The audit write path in detail
sequenceDiagram
autonumber
participant RT as aa-runtime<br/>audit_publisher
participant NATS as NATS JetStream<br/>assembly.audit.>
participant Cons as audit_consumer.rs<br/>(producer task)
participant Chan as bounded mpsc
participant Writer as audit_consumer.rs<br/>(DB-writer task)
participant San as sanitizer::sanitize
participant PG as audit_logs<br/>(Postgres)
RT->>NATS: publish AuditEvent (fire-and-forget)
NATS->>Cons: deliver (pull-consumer, AckPolicy::All)
Cons->>Chan: send().await (backpressure, never drop)
Chan->>Writer: drain up to batch_size
loop per batch
Writer->>San: sanitize(RawAuditEvent)
San-->>Writer: SanitizedAuditEvent / HeartbeatUpdate
Writer->>PG: multi-row INSERT … ON CONFLICT (event_id) DO NOTHING
Writer->>NATS: ack last message (acks whole batch)
end
Properties enforced by aa-gateway/src/audit_consumer.rs:
- Batching — the writer drains the channel into batches and writes each with
a single multi-row
INSERT, one DB round-trip and one ack per batch. - Idempotency — each event becomes an
AuditLogRecordkeyed by its ownevent_id;ON CONFLICT (event_id) DO NOTHINGdedupes retries and intra-batch repeats (bumpingaa_audit_duplicates_total). - At-least-once —
AckPolicy::Allacks the batch’s last message only after the whole batch persists; a failed batch is left un-acked so NATS redelivers afterack_wait. - Backpressure — the channel is bounded; a full channel makes the producer
await room rather than drop, so bursts queue durably in JetStream
(
aa_audit_consumer_channel_depthexposes the in-flight depth).
The write-boundary sanitizer
Before anything reaches audit_logs, the consumer runs the write-boundary
sanitize() pass (aa-gateway/src/sanitizer/). The sanitizer is the last line
of defense and never trusts the inbound shape — it operates on the untyped JSON
tree as received:
flowchart LR
Raw["RawAuditEvent<br/>(untyped JSON)"] --> Strip["strip banned keys<br/>recursively"]
Strip --> Drop["drop unknown top-level fields<br/>(count them as a metric)"]
Drop --> Beat{"heartbeat?"}
Beat -->|yes| Collapse["collapse into<br/>HeartbeatUpdate<br/>(last-seen, not per-beat)"]
Beat -->|no| Out["SanitizedAuditEvent"]
Collapse --> Out
Four classes of “never store” data are dropped at this boundary regardless of what an upstream SDK or proxy emitted: raw LLM prompts / completions, full tool-call payloads, eBPF packet bodies, and per-heartbeat sequence records. Counting unknown fields means a newly-emitting sender is noticed rather than silently persisted.
Two-layer defense: the sender (runtime enforcement) is the first line — it scans and redacts before forwarding; the sanitizer is the last line — it strips before persisting. Neither trusts the other. See trust boundaries.
Storage data flow
The gateway never talks to a concrete database directly — it goes through the
aa-storage trait facade, and the active driver decides where bytes land.
flowchart TD
GW["aa-gateway"] --> Facade["aa-storage<br/>trait facade + Registry"]
Facade --> Cache["aa-cache<br/>L1Cache (cache-aside, TTL)"]
Cache --> Driver{"active driver"}
Driver --> Mem[("aa-storage-memory<br/>DashMap")]
Driver --> PG[("aa-storage-postgres<br/>sqlx")]
Driver --> Redis[("aa-storage-redis<br/>deadpool")]
Driver --> SQLite[("aa-storage-sqlite-buffer<br/>local write-buffer")]
- L1 cache. Read-heavy stores (e.g. the policy store) are fronted by
aa-cache::L1Cache, aDashMap-backed cache-aside layer with TTL and stampede protection — concurrent misses for the same key collapse to one backend load. - Driver selection.
aa-storage’sRegistry+register_builtin_driversresolves the configured backend at boot;aasm config validateandaasm config bootexercise this loader. - Audit storage shape.
audit_entry_to_storage_event(aa-gateway/src/storage/audit_bridge.rs) maps a hash-chainedAuditEntryinto the storageAuditEventkeyed byevent_id; the Postgres driver writes it as a metadata-onlyaudit_logsrow (no raw payloads — those were already dropped by the sanitizer).
Summary of the data’s journey
| Stage | Component | Form of the data |
|---|---|---|
| Observe | L1/L2/L3 layer | agent action → aa-proto event |
| Normalise | aa-runtime pipeline | EnrichedEvent |
| Redact | aa-runtime enforcement | secrets scanned, oversized redacted whole |
| Decide | aa-gateway policy engine | Allow / Deny / RequireApproval |
| Record (sync) | AuditWriter | hash-chained JSONL line (+ optional dual sink) |
| Publish (async) | audit_publisher → NATS | assembly.audit.<tenant>.<agent> |
| Sanitise | sanitizer::sanitize | “never store” data stripped |
| Persist | aa-storage-postgres | audit_logs row, deduped by event_id |
Last updated: 2026-06-11 by Chisanan232
Building & contributing
This page is the short version of building, testing, and linting the workspace.
The authoritative source is
CONTRIBUTING.md
at the repo root; read it before opening a pull request.
Prerequisites
- Rust stable (≥ 1.75) — install via rustup.
- cargo-nextest —
cargo install cargo-nextest(the test runner). - cargo-deny —
cargo install cargo-deny(license / advisory checks). - Lefthook —
brew install lefthook(macOS) or see the Lefthook install guide. The hook configuration lives inlefthook.toml.
Setup
git clone https://github.com/ai-agent-assembly/agent-assembly.git
cd agent-assembly
# Install git hooks (fmt, clippy, deny on commit; doc on push)
lefthook install
# Verify the workspace builds
cargo build --workspace
# Run the full test suite
cargo nextest run --workspace
Common commands
| Task | Command |
|---|---|
| Build everything | cargo build --workspace |
| Full test suite | cargo nextest run --workspace |
| Tests for one crate | cargo nextest run -p aa-gateway |
| A single test | cargo nextest run -p aa-gateway budget::types::tests::provider_variants_are_distinct |
| Format | cargo fmt --all |
| Lint | cargo clippy --all-targets -- -D warnings |
| License / advisory check | cargo deny check |
| Docs | cargo doc --workspace --no-deps |
Notes:
- eBPF crates (
aa-ebpf*) compile with target-specific toolchains;cargo check -p aa-ebpfis sufficient on non-Linux environments. The out-of-workspace BPF crates (aa-ebpf-probes,aa-ebpf-programs) are built byaa-ebpf/build.rsviaaya-buildand cannot be selected withcargo -p. - The CLI binary is
aasm(shipped byaa-cli); smoke-test it with./target/debug/aasm <subcommand>.
Faster builds (optional)
The dev profile already builds dependencies at opt-level = 1 with
line-tables-only debuginfo, so warm rebuilds link faster while backtraces stay
readable — no setup needed. A faster linker is opt-in: install it and
uncomment the block for your platform in
.cargo/config.toml
(mold + clang on Linux, lld via brew install llvm on macOS).
Commit & branch conventions
- Branches:
<version>/<ticket-number>/<short-summary>, e.g.v0.0.1/AAASM-42/add_agent_registry. - Commits: Gitmoji-prefixed,
<emoji> (<scope>): <imperative summary>, one logical unit per commit, bisectable. Example:✨ (aa-core): Add AgentId newtype wrapper.
Adding a new crate
cargo new --lib aa-<name>from the repo root.- Add
aa-<name>to themembersarray in the top-levelCargo.toml. - Inherit workspace metadata (
version.workspace = true, etc.) and use the shared[workspace.lints.clippy]rather than redefining clippy lints per-crate.
Last updated: 2026-06-11 by Chisanan232
API reference
Build and browse the Rust API docs locally. The authoritative reference lives in rustdoc, generated directly from source — there is no hand-written API doc to drift out of date. Generate the whole workspace and open it in one command:
cargo doc --workspace --no-deps --open
The rest of this chapter covers the flags that matter and maps each crate to its rustdoc entry point.
Generating rustdoc locally
The whole-workspace rustdoc is built with cargo doc. The pre-push lefthook hook also runs this command, so the docs are guaranteed to compile on master.
# Build rustdoc for every workspace member without recursing into transitive deps.
cargo doc --workspace --no-deps
# Same, but also opens the index page in the default browser.
cargo doc --workspace --no-deps --open
# Document private items too — useful when working inside a single crate.
cargo doc -p aa-gateway --no-deps --document-private-items --open
The HTML output lands in target/doc/. Open target/doc/aa_core/index.html (or any other crate’s index) directly if you’d rather not use --open.
Note on eBPF crates —
aa-ebpf*requires a nightly toolchain to build the BPF target. CI excludes these crates from the standard build matrix and validates them in a dedicated job. For rustdoc on macOS or non-Linux machines, runcargo doc --workspace --no-deps --exclude aa-ebpfto skip them.
Per-crate API surface
Once rustdoc is built (target/doc/<crate>/index.html), the most-frequented entry points are:
| Crate | rustdoc entry | Highlights |
|---|---|---|
aa-core | target/doc/aa_core/index.html | Domain newtypes (AgentId, TeamId), ActionType enum, common traits |
aa-proto | target/doc/aa_proto/index.html | Generated protobuf message types — wire format source of truth |
aa-runtime | target/doc/aa_runtime/index.html | Tokio runtime wrapper, agent lifecycle hooks |
aa-proxy | target/doc/aa_proxy/index.html | MitM HTTPS proxy primitives |
aa-gateway | target/doc/aa_gateway/index.html | Policy engine, agent registry, budget tracker |
aa-api | target/doc/aa_api/index.html | HTTP layer with utoipa-generated OpenAPI spec |
aa-cli | target/doc/aa_cli/index.html | aasm operator binary surface (clap commands) |
aa-sdk-client | target/doc/aa_sdk_client/index.html | Shared SDK runtime-client (UDS transport, codec, lifecycle) the Python/Node/Go shims wrap |
aa-wasm | target/doc/aa_wasm/index.html | wasm-bindgen surface for in-browser embedding |
conformance | target/doc/conformance/index.html | Cross-SDK protocol vector harness |
The HTTP API (served by aa-api) additionally publishes a generated OpenAPI v1 spec. Validate the spec with npx @stoplight/spectral-cli lint openapi/v1.yaml.
Hosted documentation (deferred)
Publishing rustdoc to docs.rs and the mdBook to GitHub Pages is out of scope for v0.0.1. Both are tracked as follow-up Stories under Epic AAASM-13. Until then, run cargo doc --workspace --no-deps --open and mdbook serve docs --open locally.
Last updated: 2026-06-11 by Chisanan232
Version Compatibility Matrix
This document tracks which versions of aa-runtime are compatible with each SDK version. Update this file whenever any component version changes — see CI enforcement below.
CI enforcement for SDK version changes is pending cross-repo CI integration. Until then, SDK version bumps must be accompanied by a manual update to this file.
Compatibility Matrix
aa-runtime | Python SDK (aa-ffi-python) | Node.js SDK (aa-ffi-node) | Go SDK (aa-ffi-go) | Protocol Version |
|---|---|---|---|---|
| v0.0.1-alpha.1 | v0.0.1-alpha.1 (PyPI 0.0.1a1) ✓ | v0.0.1-alpha.1 ✓ | v0.0.1-alpha.1 ✓ | protocol/v1 |
| v0.0.1-alpha.2 | v0.0.1-alpha.2 (PyPI 0.0.1a2) ✓ | v0.0.1-alpha.2 ✓ | v0.0.1-alpha.2 ✓ | protocol/v1 |
| v0.0.1-alpha.3 | v0.0.1-alpha.3 (PyPI 0.0.1a3) ✓ | v0.0.1-alpha.3 ✓ | v0.0.1-alpha.3 ✓ | protocol/v1 |
| v0.0.1 | v0.0.1 ✓ | v0.0.1 ✓ | v0.0.1 ✓ | protocol/v1 |
Legend:
- ✓ Compatible — fully supported
- ⚠️ Partial — works with known limitations (see notes)
- ✗ Incompatible — do not use together
Minimum Supported Runtime Version per SDK
| SDK | Minimum aa-runtime Version |
|---|---|
Python SDK (aa-ffi-python) v0.0.1 | aa-runtime v0.0.1 |
Node.js SDK (aa-ffi-node) v0.0.1 | aa-runtime v0.0.1 |
Go SDK (aa-ffi-go) v0.0.1 | aa-runtime v0.0.1 |
Supported Protocol Versions per Runtime
A runtime version may support multiple protocol versions to allow SDK upgrades without simultaneous runtime upgrades.
aa-runtime Version | Supported Protocol Versions |
|---|---|
| v0.0.1-alpha.1 | protocol/v1 |
| v0.0.1-alpha.2 | protocol/v1 |
| v0.0.1-alpha.3 | protocol/v1 |
| v0.0.1 | protocol/v1 |
Dual-URL SDK configuration
Starting with the v0.0.1 SDK line, every SDK accepts two endpoint fields so a single install can target either a single-host OSS deployment or a split enterprise deployment (gRPC gateway and HTTP control plane on different hosts).
| Field (Python / Node / Go) | What it addresses | Scheme |
|---|---|---|
gateway_url / gatewayUrl / WithGatewayURL | gRPC endpoint of the gateway | host:port, no scheme |
control_plane_url / controlPlaneUrl / WithControlPlaneURL | HTTP base URL for the control plane — aa-api (OSS) or the FastAPI cloud (enterprise) | full URL with scheme |
The HTTP control plane serves agent registration, policy checks, and topology
edges (POST /agents/{id}/register, POST /agents/{id}/policy/check,
POST /topology/edges). The gRPC transport carries the streaming op-control,
lifecycle, audit, and approval flows and always reads gateway_url.
Backwards-compatible default
control_plane_url is optional. When it is not set, each SDK defaults it to
the resolved gateway_url, so a single-host OSS dev install keeps working with
only one endpoint configured — the pre-feature behaviour is preserved exactly.
It only needs a distinct value when the HTTP control plane and the gRPC gateway
live on separate hosts (the production enterprise topology).
Resolution order and environment variables
Each field resolves as explicit init argument > environment variable > unset:
| Field | Environment variable |
|---|---|
gateway_url / gatewayUrl / WithGatewayURL | AA_GATEWAY_URL |
control_plane_url / controlPlaneUrl / WithControlPlaneURL | AA_CONTROL_PLANE_URL |
If control_plane_url is still unset after this chain, it falls back to
gateway_url as described above.
Canonical AA_* prefix and the deprecated AAASM_* alias
AA_* is the canonical environment-variable prefix across all SDKs —
AA_GATEWAY_URL, AA_CONTROL_PLANE_URL, and AA_API_KEY. New configuration
should always use this prefix.
The legacy AAASM_* prefix — used by the older zero-config gateway resolver
in each SDK — is a deprecated alias. It is still honoured for
backwards-compatibility, but reading a value from an AAASM_* variable emits a
deprecation warning, and the alias will be removed in a future major version.
Migrate to the AA_* names.
This prefix reconciliation is tracked across the SDKs under AAASM-3019; sibling subtasks update the Python, Node, and Go resolvers.
Per-SDK notes
- Python (AAASM-2028) —
control_plane_urlis a keyword argument oninit_assembly, threaded intoGatewayClient(httpx). The gRPC path (op_control) continues to readgateway_url. - Node (AAASM-2029) —
controlPlaneUrlis an optional field onAssemblyConfig. When set, the gateway client routes its HTTP traffic at it; the gRPC transport (op-control) keeps usinggatewayUrl. - Go (AAASM-2030) —
assembly.WithControlPlaneURLstores the value on the runtime options for parity with the other SDKs. The Go SDK has no HTTP control-plane caller today (lifecycle is delegated to theaasmruntime), so the field is in place ready for the first HTTP caller; gRPC dial behaviour is unchanged.
Authoritative strategy source
The enterprise-vs-OSS connectivity strategy — why the second field exists, the
transport split, and the per-SDK survey — is owned by
agent-assembly-enterprise/docs/sdk-compatibility.md (filed under AAASM-1953).
This section documents the OSS-visible surface of that convention; the
enterprise doc is the authoritative source for the strategy.
CI Enforcement
A CI check (compat-matrix-check) enforces that this file is updated whenever version-carrying files change in a pull request.
Currently enforced (monorepo scope):
Cargo.toml(workspace root)crates/*/Cargo.toml(all crate manifests)
Deferred — pending cross-repo CI integration:
sdk/python/pyproject.toml(Python SDK)sdk/node/package.json(Node.js SDK)sdk/go/go.mod(Go SDK)
Until cross-repo CI exists, SDK version bumps require a manual update to this file before merging.
How to Update This File
When bumping a component version:
- Add a new row to the Compatibility Matrix table for the new version combination.
- Update the Minimum Supported Runtime Version table if the minimum changes.
- Update the Supported Protocol Versions table if the runtime adds or drops protocol version support.
- Commit the change in the same PR as the version bump.
See versioning.md for the full versioning and deprecation policy.
Workspace changes (non-version bumps)
| PR / Ticket | Change | Compatibility impact |
|---|---|---|
| AAASM-107 | Added conformance workspace crate (test infrastructure, not shipped) | None — internal tooling only |
| AAASM-39 | Added aa-ebpf-common workspace crate (shared eBPF types, not shipped standalone) | None — internal shared types only |
| AAASM-37 | Added aa-ebpf-common workspace crate (no_std shared eBPF event types, not shipped as a public API) | None — internal kernel/userspace bridge only |
| AAASM-39 (impl) | Added exec tracepoint BPF programs, ProcessLineageTracker, ShellDetector, ExecLoader in aa-ebpf | None — kernel-level monitoring, not a public API |
| AAASM-64 | Added aa-ffi-go workspace crate (Go C-ABI staticlib bindings) | None — new FFI crate, no existing API changes |
| AAASM-936 | Added examples/aa-devtool-sample-myeditor workspace crate (sample DevToolAdapter impl + plugin authoring reference; publish = false) | None — example only, not shipped, depends on existing aa-core API surface |
| AAASM-971 | Added aa-devtool-codex workspace crate (OpenAI Codex CLI DevToolAdapter implementation; detect() + governance_level() wired in this PR; generate_managed_settings, apply_settings, build_launch_command land in AAASM-978/983/988) | None — new adapter crate, no changes to existing public APIs |
| AAASM-204 | Added aa-devtool-windsurf workspace crate (DevToolAdapter for Windsurf Cascade; L2 governance via admin settings + MCP registry control; publish = false) | None — new adapter crate, no changes to existing public API surface |
| AAASM-997 | Added aa-devtool-copilot workspace crate (DevToolAdapter for GitHub Copilot — VS Code extension detection, publish = false); added semver v1 dependency for latest-version selection | None — new adapter crate, no changes to existing public API surface |
| AAASM-1006 | Implemented MCP governance in aa-devtool-copilot: list_mcp_servers() reads chat.mcp.servers from VS Code settings.json; apply_mcp_governance() filters the server set (keep allowed, remove denied) and sets chat.mcp.requireApproval: "always" when deny list is non-empty; build_launch_command() returns LaunchFailed (Copilot is IDE-resident, not CLI-launchable) | None — implementation only within existing aa-devtool-copilot crate; no new crates, no existing public API changes |
| AAASM-946 | Added aa-devtool-claude-code workspace crate (ClaudeCodeAdapter — detection layer for Claude Code CLI; publish = false pending AAASM-201 completion) | None — new crate, no existing API surface changed; depends on existing aa-core::DevToolAdapter trait |
| AAASM-918 | Added aa-devtool-saas workspace crate (SaaS coding-agent DevToolAdapter for Claude.ai, ChatGPT, Cursor cloud; L1Observe governance; HMAC-SHA256 webhook signature verification; MCP allowlist advisory overlay for Claude.ai; publish = false) | None — new adapter crate, no changes to existing public APIs |
| AAASM-205 | Added aa-devtool workspace crate (DiscoveryService + built-in adapters for Claude Code, Codex, GitHub Copilot, Windsurf) | None — new crate, no existing API changes; aa-api and aa-cli gain a new optional dependency on it |
| AAASM-949 | Added RBAC role enforcement on POST /api/v1/policies: CallerRole + MutationKind + PolicyScopeKind enums and required_role_for() in aa-gateway/src/policy/rbac.rs; PolicyWriteAuth extractor + PolicyAuthorizationDenied error in aa-api/src/auth/policy_auth.rs; optional scope field on CreatePolicyRequest; auto-generated docs/src/policy-rbac.md + .ci/check-policy-rbac-doc.sh | POST /api/v1/policies now requires authentication (401 when unauthenticated) and returns 403 when the caller’s role is insufficient for the target scope; CreatePolicyRequest gains an optional scope field (defaults to global). Read-only endpoints unchanged. |
| AAASM-956 | Restored aa-devtool, aa-devtool-claude-code, aa-devtool-codex, aa-devtool-saas, and aa-devtool-windsurf to workspace members (dropped by a prior merge conflict resolution); implemented apply_settings() and apply_mcp_governance() in aa-devtool-claude-code via new apply.rs module (SettingsPathResolver trait, atomic write, unmanaged-key merge) | None — workspace member restoration only; apply_settings/apply_mcp_governance are internal adapter implementations with no changes to existing public API surfaces |
| AAASM-1206 | Added [profile.release] to workspace Cargo.toml (opt-level="z", lto=true, codegen-units=1, strip=true, panic="abort") — build profile change only, no version bump | None — affects binary size of release builds only; no API, protocol, or ABI changes |
| AAASM-1076 | Added aa-topology-integration-tests workspace crate (in-process end-to-end test harness for the topology pipeline; publish = false, dev-dependencies only) | None — test-only crate, no shipped artifacts; depends on existing aa-api / aa-gateway / aa-runtime public surfaces with no API changes |
| AAASM-1448 | Renamed aa-topology-integration-tests workspace crate to aa-integration-tests (in preparation for AAASM-1258 CLI subcommand coverage). Renamed .github/workflows/topology-integration.yml to integration-tests.yml. | None — test-only crate, no shipped artifacts; dev-dependencies only; no public API change |
| AAASM-1419 | Added CallStackNode proto message + repeated CallStackNode call_stack = 28 field on AuditEvent; added CallStackNode to aa-api ViolationPayload::Audit (utoipa schema regenerated); wired through dashboard useLiveOpsStream.mapEvent | None on protocol/v1 — non-breaking proto field addition (default empty). SDK regeneration for aa-ffi-python / aa-ffi-node / aa-ffi-go tracked as separate follow-up Tasks against this revision; older SDKs continue to interoperate (the new field is ignored on decode). |
| AAASM-2015 | Added aa-sandbox workspace crate (wasmtime + wasmtime-wasi host runtime scaffold for F116 ST-W tool-execution sandbox; doc-only modules error, policy, runtime — real WASI host wiring lands in AAASM-2017, fuel + memory-store limits in AAASM-2018) | None — new internal crate, no public API or protocol change; aa-wasm browser-target stub untouched |
| AAASM-2340 | Workspace prepared for crates.io publish via cargo-workspaces topological order. Per-crate publish flags set: publishable (default) for aa-core, aa-proto, aa-runtime, aa-ebpf, aa-ebpf-common, aa-proxy, aa-sandbox, aa-gateway, aa-cli; publish = false for all aa-devtool* (dev-tool subsystem held back from this alpha — not yet feature-complete), all aa-ffi-* + aa-wasm (SDK FFI scaffolding — each language SDK repo carries its own copy and ships via PyPI / npm / Go module proxy), and aa-api / conformance / aa-integration-tests / examples/* (cloud/enterprise consumers + workspace-internal tooling). All publishable crates’ path-deps gained explicit version = "0.0.1-alpha.3" literals so cargo publish manifest verification passes. release.yml publish-crate job replaced with publish-crates (cargo-workspaces). Sibling content bundled into crate tarballs via _embedded/ mirrors so cargo install aasm ships the full product — aa-cli/_embedded/dashboard/dist/ (real SPA, not stub), aa-proto/_embedded/proto/ (gRPC contract), aa-ebpf/_embedded/aa-ebpf-probes/ (BPF source, compiled at install time when nightly + bpfel target are present, otherwise graceful stubs). New aasm sandbox run / aasm sandbox info subcommands expose the WASI tool-execution sandbox (highlight ④ of the product spec) to OSS users. Source tree keeps the full aasm surface including run and tools; the .ci/strip-for-publish.sh script removes the held-back aa-devtool* deps and the two consuming source files from the working tree right before cargo workspaces publish runs (driven by strip-for-publish:begin / :end markers in aa-cli/Cargo.toml and aa-cli/src/commands/mod.rs). Restores cargo install aasm as a supported install path. Resolves AAASM-2094 the right way (supersedes the closed AAASM-2338 / PR #840). | Behavior delta — published aasm binary on crates.io omits the run and tools subcommands. Local source builds (cargo build -p aa-cli) expose the full surface unchanged. To restore the subcommands on crates.io once dev-tool ships, remove the strip step from release.yml and flip the three aa-devtool* crates’ publish flags. No public Rust API, protocol, or ABI changes; new aasm sandbox CLI surface is additive. At 0.x.y SemVer, internal crates carry no API stability commitment; READMEs note ‘internal use only’. |
| AAASM-2343 | Bumped workspace + 22 path-dep version literals from 0.0.1-alpha.3 to 0.0.1-alpha.4. Fourth pre-release in the v0.0.1 dry-run series. Verifies AAASM-2340 (cargo-workspaces topological publish — first cargo install aasm ever), AAASM-2339 (curl smoke channel gated with if: false), and AAASM-2336 (notify-downstream → node-sdk + python-sdk repository_dispatch, supersedes AAASM-2328 retry workaround). Companion python-sdk listener AAASM-2342 lands in the same release cycle. | None — pre-release version bump; AAASM-2340 behaviour delta (held-back aasm run / aasm tools on crates.io) carries forward unchanged. |
| AAASM-2461 | Bumped workspace + 22 path-dep version literals from 0.0.1-alpha.4 to 0.0.1-alpha.5. Fifth pre-release in the v0.0.1 dry-run series. Validates the full release pipeline end-to-end with all alpha-4 recovery fixes baked in: AAASM-2346 (cargo workspaces publish --allow-dirty), AAASM-2455 / AAASM-2457 (smoke matrix restructure), AAASM-2456 (RUNBOOK + release-readiness.sh + per-channel aggregator), plus SDK companions node-sdk#67 (AAASM-2344) and python-sdk#74/#75/#76 (AAASM-2345 / AAASM-2459 / AAASM-2460). On crates.io, aa-core re-publishes at 0.0.1-alpha.5 alongside its existing 0.0.1-alpha.4 row from the partial alpha-4 publish; the other 8 crates publish for the first time. | None — pre-release version bump; AAASM-2340 behaviour delta (held-back aasm run / aasm tools on crates.io) carries forward unchanged. |
| AAASM-2767 | Bumped workspace + 35 path-dep version literals from 0.0.1-alpha.5 to 0.0.1-alpha.6. Sixth pre-release in the v0.0.1 dry-run series. Re-runs the full release pipeline with the two alpha-5 recovery fixes baked in: AAASM-2463 commit 1 (PR #871 — --no-verify on cargo workspaces publish, bypassing the cargo publish --verify source-mutation guard that aa-ebpf/build.rs’s Cargo.toml.embedded rename tripped) and AAASM-2463 commit 2 (PR #871 — removed the smoke-test: job that raced publish-crates and the homebrew tap PR merge). On crates.io, aa-core / aa-proto / aa-ebpf-common re-publish at 0.0.1-alpha.6 alongside their existing 0.0.1-alpha.5 rows from the partial alpha-5 publish; the other 6 crates (aa-ebpf, aa-runtime, aa-proxy, aa-sandbox, aa-gateway, aa-cli) publish for the first time. | None — pre-release version bump; AAASM-2340 behaviour delta (held-back aasm run / aasm tools on crates.io) carries forward unchanged. |
| AAASM-2786 | Bumped workspace + 35 path-dep version literals from 0.0.1-alpha.6 to 0.0.1-alpha.7. Seventh pre-release in the v0.0.1 dry-run series. Re-runs the full release pipeline with the AAASM-2775 strip-for-publish fix baked into master (PR #1021 — wrapped aa-integration-tests/Cargo.toml’s audit-consumer = ["aa-gateway/audit-consumer"] feature forward in strip-for-publish:begin audit-consumer / :end markers and added the file to MARKED_FILES in .ci/strip-for-publish.sh; the alpha-6 publish-crates failed at the cargo-workspaces resolver because the workspace graph still referenced the stripped feature). Also benefits from two companion SDK-workflow settings fixes applied via API: org-level “Allow GitHub Actions to create/approve PRs” enabled (unblocks node-sdk’s docs-version PR step), and go-sdk’s github-pages env adds a v* tag deployment policy (unblocks Pages deployment on tag pushes). On crates.io, aa-core / aa-proto / aa-ebpf-common re-publish at 0.0.1-alpha.7 alongside their existing 0.0.1-alpha.5 rows (the alpha-6 retries failed); the other 6 crates publish for the first time. | None — pre-release version bump; AAASM-2340 behaviour delta (held-back aasm run / aasm tools on crates.io) carries forward unchanged. |
| AAASM-2805 | Bumped workspace + 35 historical path-dep version literals AND 8 newly added storage/cache path-dep version literals (AAASM-2797 / PR #1024) from 0.0.1-alpha.7 to 0.0.1-alpha.8. Eighth pre-release in the v0.0.1 dry-run series. Re-runs the full release pipeline with the AAASM-2797 fix baked into master — 5 storage/cache crates (aa-storage, aa-storage-memory, aa-storage-redis, aa-storage-sqlite-buffer, aa-cache) had path-deps without the version = "..." literal that cargo publish demands. alpha-7’s publish-crates died after publishing only aa-core@0.0.1-alpha.7 because of this latent bug. On crates.io, all 14 publishable crates are expected to land for the first time end-to-end: the 9 historical (re-publish at alpha-8 alongside existing rows) plus the 5 storage/cache crates (publish for the first time ever). Still-open follow-up: Homebrew brew install + test (macOS) silent-SIGKILL investigation (the AAASM-2792 revert didn’t fix it; --release post-AAASM-2575 is the fast profile, not size-optimized; suspect is a new transitive dep added since alpha-5 such as redis 1.2 / deadpool-redis 0.23 via aa-storage-redis). | None — pre-release version bump; AAASM-2340 behaviour delta (held-back aasm run / aasm tools on crates.io) carries forward unchanged. |
| AAASM-2849 | Bumped workspace + 43 path-dep version literals from 0.0.1-alpha.8 to 0.0.1-alpha.9. Ninth pre-release in the v0.0.1 dry-run series. First coordinated release after the AAASM-2851 SDK release decoupling chapter — validates that the repository_dispatch fan-out still works end-to-end after the restructure of release-node.yml (publish_mode gating, dry-run input, Resolve refactor) and release-python.yml (resolve job, sync-version composite action rename). Carries agent-assembly docs polish (AAASM-2199, 2827, 2833, 2841, 2858) and drives @agent-assembly/sdk@0.0.1-alpha.9 (full AAASM-2851 chain + AAASM-2842 public GatewayClient + AAASM-2870 README polish) and agent-assembly==0.0.1a9 (symmetric python-sdk content + AAASM-2863 PEP 440 test + AAASM-2868 docs CI gate + AAASM-2869 runbook) downstream via repository_dispatch. On crates.io, all 14 publishable crates re-publish at 0.0.1-alpha.9 alongside their existing 0.0.1-alpha.8 rows. | None — pre-release version bump; AAASM-2340 behaviour delta (held-back aasm run / aasm tools on crates.io) carries forward unchanged. |
| AAASM-2951 | Bumped workspace + 16 path-dep version literals from 0.0.1-alpha.9 to 0.0.1-beta.1. First beta-channel pre-release in the v0.0.1 series — promotes the pre-release channel up from alpha after the alpha-1 → alpha-9 dry-run series stabilised every release channel. Coordinated release across agent-assembly + python-sdk + node-sdk + go-sdk; drives @agent-assembly/sdk@0.0.1-beta.1, agent-assembly==0.0.1b1, and github.com/ai-agent-assembly/go-sdk@v0.0.1-beta.1 downstream. Carries the AAASM-2934 SDK Examples documentation chapter (multi-page Examples sections in the node/python/go SDK docs + an agent-assembly core-docs Examples pointer). On crates.io, all 14 publishable crates re-publish at 0.0.1-beta.1 alongside their existing 0.0.1-alpha.9 rows. | None — pre-release version bump; AAASM-2340 behaviour delta (held-back aasm run / aasm tools on crates.io) carries forward unchanged. |
| AAASM-3004 | Bumped workspace + 16 path-dep version literals from 0.0.1-beta.1 to 0.0.1-beta.2. Second pre-release in the v0.0.1 beta channel — a forward-roll cut on top of 0.0.1-beta.1 (no channel promotion, no scope expansion) carrying the AAASM-3000 IPC deadlock fix in aa-sdk-client (event reporting is now fire-and-forget, closing the deadlock against a runtime that doesn’t ack) plus the AAASM-2959 release-tooling sync that keeps aa-ffi-python and aa-ffi-node Cargo.lock consistent with the bumped aa-sdk-client revision. Coordinated release across agent-assembly + python-sdk + node-sdk + go-sdk; drives @agent-assembly/sdk@0.0.1-beta.2, agent-assembly==0.0.1b2, and github.com/ai-agent-assembly/go-sdk@v0.0.1-beta.2 downstream. On crates.io, all 14 publishable crates re-publish at 0.0.1-beta.2 alongside their existing 0.0.1-beta.1 rows. | None — pre-release version bump + a behaviour-preserving deadlock fix on the SDK event-report path (the prior code blocked on an ack that the runtime didn’t send; consumers that already worked still work). AAASM-2340 behaviour delta (held-back aasm run / aasm tools on crates.io) carries forward unchanged. |
| AAASM-2372 | Added aa-storage-redis workspace crate (Redis L2 shared-cache driver implementing SessionStore, RateLimitCounter, and PolicyStore from aa-core::storage; redis 1.2 + deadpool-redis 0.23 pooling; RateLimitCounter uses an atomic Lua INCRBY+EXPIRE script). No version change. | None — new driver crate, no changes to existing public API surface. xxhash-rust BSL-1.0 (transitive via redis) is already allow-listed in deny.toml. |
| AAASM-2369 | Added aa-storage-postgres workspace crate (L3 primary PostgreSQL storage driver — ships sqlx migrations for the four MVP tables orgs/agents/policies/audit_logs and a [storage.postgres] connection-pool config; publish = false until the storage-driver subsystem is feature-complete). The aa_core::storage trait impls (PgPolicyStore / PgAuditSink / PgCredentialStore / PgLifecycleStore) land in AAASM-2370. No version change. | None — new internal driver crate; no existing public API, protocol, or ABI change |
| AAASM-2575 | Split the default [profile.release] into a fast build (opt-level=2, lto="thin", codegen-units=16; strip + panic="abort" unchanged) and added a size-optimized [profile.dist] (inherits release; opt-level="z", fat lto, codegen-units=1). release.yml now ships the binary with --profile dist. Build-profile change only, no version bump. | None — affects build speed and which profile produces the shipped binary; dist reproduces the previous size-optimized output. No API, protocol, or ABI change. |
| AAASM-2555 | Added a [workspace.dependencies] table to the root Cargo.toml centralizing third-party crates shared by ≥2 members, and converted those members to dep = { workspace = true } (single source of version truth). Pure manifest refactor — Cargo.lock byte-for-byte unchanged and cargo tree -d identical to the prior revision (108 duplicate nodes); no version bump. Single-member and intentionally-pinned crates (e.g. rusqlite per AAASM-2374) stay declared locally. | None — no version, protocol, or ABI change; resolved dependency graph is identical, so runtime behavior is unchanged |
| AAASM-2588 | Added [profile.dev] (debug="line-tables-only") and [profile.dev.package."*"] (opt-level=1, debug=false) to tune dev/test build time, plus an opt-in (commented) .cargo/config.toml faster-linker template and a CONTRIBUTING.md section. Raised the integration-tests job timeout-minutes 20→30 to absorb the slightly heavier optimized-deps build. Build-config change only, no version bump. | None — affects local/CI build speed and dev-build debuginfo verbosity only; no API, protocol, or ABI change. |
| AAASM-2623 | Added aa-sdk-client workspace crate (Story AAASM-2570 — the shared, FFI-agnostic SDK runtime-client: UDS transport, IPC wire codec, AssemblyClient lifecycle, and advisory non-authoritative credential preflight, extracted from aa-ffi-python). Scaffold only in this PR (publish = false until AAASM-2559 makes the shared crates pinnable); modules land in AAASM-2624/2625/2626. aa-ffi-python is untouched — its migration onto this crate is AAASM-2561. | None — new internal crate, no existing public API, protocol, or ABI change |
| AAASM-2646 | Removed the fat aa-ffi-python + aa-ffi-node members from root Cargo.toml and deleted the crates (Epic AAASM-2552 final story). The thin Node/Python shims now live in the sibling node-sdk / python-sdk repos on the pinned aa-sdk-client (AAASM-2560 / AAASM-2561); aa-ffi-go (C-ABI staticlib artifact consumed by go-sdk) and aa-sdk-client are retained, as is workspace.exclude = ["node-sdk"] (the e2e_sdk_node tests still build the sibling thin shim). Shrinks cargo build --workspace by dropping the pyo3 / napi / napi-derive / napi-build dep subtrees. | None — workspace member removal only; the Python/Node/Go SDKs ship from their own repos and keep their versions + protocol/v1 compatibility. No aa-runtime version, protocol, or ABI change |
| AAASM-2703 | Removed the aa-ffi-go member from root Cargo.toml, deleted the crate, and deleted its ffi-go-staticlib.yml build workflow (Epic AAASM-2552). The thin Go cgo shim now lives in the sibling go-sdk repo (native/aa-ffi-go) on the pinned aa-sdk-client (AAASM-2704), matching the Node/Python model — the monorepo no longer hosts any FFI shim. Amends ADR 0002 (which had kept aa-ffi-go in the workspace). | None — workspace member removal only; the Go SDK ships from its own repo and keeps its version + protocol/v1 compatibility. No aa-runtime version, protocol, or ABI change |
| PR #1059 (Dependabot) | Bumped the workspace tower-http dependency from 0.6.11 to 0.7.0 in root Cargo.toml (HTTP middleware used by aa-api / aa-gateway). Compiles and passes the full workspace test suite + clippy unchanged. A transitive tower-http 0.6 remains in Cargo.lock via an upstream dependency; both coexist. No version bump. | None — internal third-party dependency bump; no public API, protocol, or ABI change |
Last updated: 2026-06-16 by Chisanan232
Protocol versioning policy
Use this page to decide how a protocol change must be versioned before you ship it. It defines the versioning scheme, the rules for classifying a change as breaking or non-breaking, and the deprecation lifecycle. Every change to proto schemas, JSON schemas, IPC framing, and wire formats is governed by this policy.
The short version: add fields and RPCs freely (MINOR); never remove, rename, or retype an existing field without a MAJOR bump and a migration guide.
Versioning scheme
The protocol uses Semantic Versioning (MAJOR.MINOR.PATCH):
| Component | Meaning |
|---|---|
MAJOR | Breaking change — existing SDKs must be updated to remain compatible |
MINOR | Non-breaking addition — new fields, new RPCs, new enum values (backward compatible) |
PATCH | Non-breaking fix — documentation corrections, description updates, no wire format change |
The current protocol version is protocol/v1 (pre-stable: v0.0.1).
Change classification
Non-breaking changes (MINOR or PATCH)
These changes can be made without requiring SDK updates:
| Change | Classification | Reason |
|---|---|---|
| Add an optional field to a message | MINOR | Existing decoders ignore unknown fields (proto3) |
| Add a new RPC method to a service | MINOR | Existing clients simply don’t call it |
| Add a new enum value | MINOR | Unknown enum values fall back to _UNSPECIFIED = 0 |
| Add a new service | MINOR | Existing clients don’t depend on it |
| Rename a field description (not the field itself) | PATCH | No wire format change |
| Fix a typo in a comment or doc string | PATCH | No wire format change |
| Tighten a JSON Schema description | PATCH | No wire format change |
Breaking changes (MAJOR)
These changes require a MAJOR version bump and a migration guide:
| Change | Classification | Reason |
|---|---|---|
| Remove a field from a message | MAJOR | Existing encoders/decoders break |
| Rename a field | MAJOR | Field number stays but name change breaks JSON/gRPC-gateway |
| Change a field’s type | MAJOR | Wire encoding changes |
| Change a field number | MAJOR | Proto3 wire encoding is field-number based |
| Remove an RPC method | MAJOR | Existing callers get UNIMPLEMENTED errors |
| Remove an enum value | MAJOR | Existing code holding that value breaks |
| Add a required field | MAJOR | Existing messages missing the field become invalid |
Change a JSON Schema type constraint | MAJOR | Existing valid documents become invalid |
Narrow a JSON Schema constraint (e.g. add minLength) | MAJOR | Previously valid values may now fail validation |
Deprecation lifecycle
Before a breaking change is introduced, the affected field, method, or value must go through a formal deprecation period:
Deprecated in vX.Y → Removed no earlier than v(X+2).0
Steps
- Deprecate — Mark the item as deprecated in the proto or JSON Schema with a
deprecatedannotation and a description explaining what to use instead. Bump MINOR version. - Announce — Add an entry to
CHANGELOG.mdunderDeprecated. Notify SDK maintainers. - Support period — The deprecated item remains fully functional for at least two MAJOR versions after the deprecating release.
- Remove — Remove the item in a future MAJOR release (no earlier than
v(X+2).0). Add a migration guide. UpdateCHANGELOG.mdunderRemoved.
Runtime backward compatibility
Runtime N must support SDKs speaking protocol N-1.
This means an aa-runtime at protocol v2.x must continue to accept connections from SDKs still using protocol v1.x. SDKs have a two-major-version window to migrate before a runtime drops support for the older protocol.
Example: deprecating a field
// Before (v1.2 — field is still used)
message AgentId {
string org_id = 1;
string team_id = 2;
string agent_id = 3; // original field name
}
// After (v1.3 — field deprecated, replacement added)
message AgentId {
string org_id = 1;
string team_id = 2;
string agent_id = 3 [deprecated = true]; // deprecated: use `id` instead (removed in v3.0)
string id = 4; // replacement field
}
CHANGELOG entry at v1.3:
### Deprecated
- `AgentId.agent_id` — use `AgentId.id` instead. Will be removed in v3.0.
Example migration guide — AgentId.agent_id → AgentId.id
Breaking change introduced in: protocol/v3.0
Deprecated since: protocol/v1.3
Affected SDK versions: All SDKs using AgentId.agent_id
Estimated migration effort: Low
What changed
The field AgentId.agent_id (field number 3) was removed. Use AgentId.id (field number 4) instead. The semantic meaning is identical — the field carries the agent’s own identifier (DID).
Before (protocol/v1.x — v2.x)
Proto encoding:
AgentId {
org_id: "acme"
team_id: "platform"
agent_id: "did:key:z6Mk..." // field 3
}
Python SDK:
agent_id = AgentId(org_id="acme", team_id="platform", agent_id="did:key:z6Mk...")
After (protocol/v3.0+)
Proto encoding:
AgentId {
org_id: "acme"
team_id: "platform"
id: "did:key:z6Mk..." // field 4
}
Python SDK:
agent_id = AgentId(org_id="acme", team_id="platform", id="did:key:z6Mk...")
Migration steps
- Search your codebase for all usages of
AgentId.agent_id(or the SDK-language equivalent). - Replace each with
AgentId.id. - Run your SDK’s conformance test suite against a
aa-runtimeat protocol/v3.0. - Deploy the updated SDK before upgrading
aa-runtimepast v2.x (runtime v2.x still supports protocol/v1 per the backward compatibility rule).
| Runtime protocol | Must support |
|---|---|
| protocol/v1 | protocol/v1 only (first version) |
| protocol/v2 | protocol/v1, protocol/v2 |
| protocol/v3 | protocol/v2, protocol/v3 (v1 support may be dropped) |
For the blank template to copy when writing a new migration guide, see docs/migration/template.md.
Last updated: 2026-06-11 by Chisanan232
Policy YAML Reference
A complete reference for the governance policy document the gateway loads,
validates, and enforces. Every field below is grounded in the policy engine’s
own types (aa-gateway/src/policy/) and the shared core
(aa-core). Validate any file locally before applying it:
aasm policy validate path/to/policy.yaml
Validation prints Policy is valid: <path> and exits 0 on success. Hard
constraint violations print error: <field>: <message> and exit 1.
Unrecognised keys are warnings, not errors — the file still validates, but
the unknown key is ignored at runtime, so a typo’d field silently does nothing.
Treat warnings as bugs.
Document formats
A policy may be written in either of two equivalent shapes.
Envelope format (recommended)
A Kubernetes-style wrapper. metadata.name and metadata.version are surfaced
in tooling; the actual policy lives under spec:.
apiVersion: agent-assembly/v1
kind: Policy
metadata:
name: my-policy
version: "1.0.0"
description: Optional free text.
spec:
budget:
daily_limit_usd: 20.0
Flat format
The same content with no wrapper — every section sits at the top level. There
is no metadata, so name and version are absent.
version: "1.0"
budget:
daily_limit_usd: 20.0
The validator auto-detects the format: if a top-level spec: key is present it
parses the envelope, otherwise it parses the flat form. The field tables below
describe the policy body (the content of spec:, or the whole document in flat
form).
Top-level fields
| Field | Type | Default | Example |
|---|---|---|---|
version | string | (none) | version: "1.0" |
scope | string | global | scope: team:platform |
approval_timeout_secs | integer > 0 | 300 | approval_timeout_secs: 600 |
network | section | (omitted → unrestricted) | see network |
schedule | section | (omitted → always active) | see schedule |
budget | section | (omitted → no cap) | see budget |
data | section | (omitted → no scan rules) | see data |
tools | map | (empty) | see tools |
capabilities | section | (omitted) | see capabilities |
approval | section | (omitted) | see approval |
scope accepts one of: global, org:<id>, team:<id>, agent:<uuid>, or
tool:<name>. The cascade evaluates policies in
Global → Org → Team → Agent → Tool order, most-restrictive-wins. An agent:
scope requires a valid hyphenated UUID; a team:/org:/tool: identifier must
not be empty. Any other shape is a validation error.
Complete example policy
A single policy exercising every section. This validates cleanly.
apiVersion: agent-assembly/v1
kind: Policy
metadata:
name: complete-example
version: "1.0.0"
description: Demonstrates every policy section.
spec:
scope: team:platform
approval_timeout_secs: 300
network:
allowlist:
- api.openai.com
- "*.anthropic.com"
schedule:
active_hours:
start: "09:00"
end: "18:00"
timezone: "Asia/Taipei"
budget:
daily_limit_usd: 25.0
monthly_limit_usd: 500.0
timezone: "Asia/Taipei"
action_on_exceed: deny
data:
credential_action: redact_only
sensitive_patterns:
- "sk-[A-Za-z0-9]{20,}"
capabilities:
allow:
- file_read
- network_outbound
- mcp_tool:git
deny:
- terminal_exec
approval:
timeout_seconds: 600
escalation_role: org-admin
tools:
read_file:
allow: true
limit_per_hour: 120
write_file:
allow: true
requires_approval_if: "path starts_with \"/etc\""
shell:
allow: false
network
Controls outbound (egress) connections. Backed by NetworkPolicy.
| Field | Type | Default | Example |
|---|---|---|---|
allowlist | list of glob strings | [] | allowlist: ["api.openai.com"] |
Glob pattern semantics
The matcher (aa_core::policy::is_host_allowed_by_egress_allowlist) supports
exactly three pattern shapes:
| Pattern | Matches | Does not match |
|---|---|---|
api.openai.com | exact host, case-insensitive | chat.openai.com, openai.com |
*.openai.com | any sub-domain at any depth: api.openai.com, a.b.openai.com | the bare apex openai.com; attacker suffixes like evilopenai.com |
* | every host (escape hatch) | — |
Matching is case-insensitive (DNS labels are case-insensitive per RFC 4343).
The leftmost-label wildcard *. requires at least one label before the suffix,
so *.openai.com deliberately excludes the bare openai.com — list both if you
need the apex too.
Default behavior
- No
network:section → egress is unrestricted (default-open). The caller’s posture wins. network:present butallowlistempty or omitted → also unrestricted. An empty list means “no restriction”, not “deny all”. To deny by default, list only the hosts you trust — anything not matched is then denied.
An allowlist entry that is empty or whitespace-only is a validation error
(network.allowlist[i]: allowlist entry must not be empty).
tools
Per-tool allow/deny, rate limiting, and approval gating. A map keyed by tool
name; each value is a ToolPolicy.
| Field | Type | Default | Example |
|---|---|---|---|
allow | bool | true | allow: false |
limit_per_hour | integer | (unlimited) | limit_per_hour: 10 |
requires_approval_if | expression string | (never) | requires_approval_if: "path starts_with \"/etc\"" |
allow defaults to true when omitted, so a tool entry that only sets
limit_per_hour is still permitted.
The * wildcard tool
A tool named * is the catch-all entry for any tool without its own named
rule. Pair "*": { allow: false } with explicit allow: true entries to get
deny-by-default behaviour (see the Strict example). Conversely
"*": { allow: true } is an explicit allow-everything default.
tools:
"*":
allow: false # deny every tool not named below
read_file:
allow: true # ...except read_file
requires_approval_if expression syntax
requires_approval_if holds a boolean expression evaluated against the
in-flight action. When it evaluates true, the action is routed to
human-in-the-loop approval instead of executing immediately. The expression is
parsed and validated at load time (aa-gateway/src/policy/expr.rs): an
empty expression, an unknown variable, or an unknown governance level (L4+) is
a hard validation error.
Fail-safe at runtime: if the engine cannot evaluate an expression (parse error, malformed action), it returns true — approval required — never a silent allow.
Grammar
expr := clause (combinator clause)*
clause := field op literal
combinator := AND | OR # AND binds tighter than OR; no parentheses
AND/OR are uppercase. There are no parentheses in this version; an
expression is OR-groups of AND-connected clauses.
Operators
| Operator | Meaning | Operand types |
|---|---|---|
== | equal | string, number, governance level, risk tier |
!= | not equal | string, number, governance level, risk tier |
> >= < <= | ordered comparison | number, governance level, risk tier, duration |
contains | substring / membership | string |
starts_with | prefix match | string |
in | value in list | string against ["a", "b"] |
not_in | value not in list | string against ["a", "b"] |
Literals
- String: double-quoted, e.g.
"/etc". Escapes:\"and\\. - Number: integer or float, e.g.
10,1.5. - List:
["read", "write"]— forin/not_in. - Governance level:
L0,L1,L2,L3(ordered). Any otherL<n>is a validation error. - Risk tier:
Low,Medium,High,Critical(ordered). - Duration: human-readable, digit-leading, e.g.
24h,30m,1h30m(compared as seconds —24h==86400).
Operands (variables)
The variable on the left of each clause must be one of the names the evaluator knows. Unknown names are rejected at load time (with a typo suggestion when close). The recognised variables:
| Variable | Resolves against | Type |
|---|---|---|
tool | the called tool’s name | string |
path | a file-access path | string |
url | a network-request URL | string |
method | a network-request HTTP method | string |
command | a process-exec command line | string |
args.<key>[.<nested>] | a JSON field inside a tool call’s args body | string / number |
tool_result.<key>[.<nested>] | a JSON field inside a tool result | string / number |
tool_result | the entire serialised tool-result body | string (contains/starts_with only) |
governance_level | the agent’s governance level | level (L0–L3) |
agent.depth | delegation depth | number |
agent.risk_tier | the agent’s risk tier | tier |
agent.age | seconds since the agent registered | number / duration |
agent.parent_agent_id | the agent’s parent id | string |
agent.team_id | the agent’s team id | string |
agent.children_count | number of direct children | number |
agent.is_root | 1 when depth == 0, else 0 | number (==/!=) |
agent.is_leaf | 1 when children_count == 0, else 0 | number (==/!=) |
team.active_agents | running agents in the team | number |
team.parallel_agents | alias of team.active_agents | number |
team.budget_remaining | remaining monthly budget | number |
child.tool | tool names across direct children | string |
child.risk_tier | risk tier of a child being spawned | tier |
parent.risk_tier | the parent agent’s risk tier | tier |
source.team_id | sending team of a message | string |
target.team_id | recipient team of a message | string |
target.channel_id | message channel id | string |
The args.<key> and tool_result.<key> forms walk a JSON pointer
(args.path → /path, args.headers.authorization →
/headers/authorization). They are null-safe: a non-matching action variant,
malformed JSON, or an unresolved pointer evaluates to false (no match), not
fail-safe-true.
Example expressions
Each of the following is a valid requires_approval_if value:
"path starts_with \"/etc\""— gate writes under/etc."args.path contains \"/etc\""— same idea, reading the path out of a tool call’s JSONargs."command contains \"sudo\""— gate any shell command invokingsudo."url contains \"internal\""— gate requests to internal hosts."tool == \"delete_database\""— gate one specific tool by name."agent.depth > 1"— gate actions from agents deeper than one delegation hop."agent.children_count > 10"— gate agents that have spawned many children."governance_level >= L2"— gate when the agent runs at L2 (Enforce) or above."agent.risk_tier >= High"— gate high- and critical-risk agents."agent.age < 24h"— gate brand-new agents (registered under a day ago)."method == \"DELETE\" OR method == \"PUT\""— gate destructive HTTP verbs."target.team_id in [\"finance\", \"security\"]"— gate messages sent to sensitive teams."tool_result contains \"sk-\""— gate when the response body looks like it carries a secret."command contains \"rm\" AND agent.is_root == 0"— gatermfrom non-root (delegated) agents only.
Divergence note. Earlier drafts of this ticket used illustrative expressions such as
"call_count > 10". There is nocall_countvariable in the engine; per-tool rate limiting is expressed with thelimit_per_hourfield instead, and “how many children” isagent.children_count. Only the variables in the table above are accepted — anything else fails validation.
data
Sensitive-data / credential handling. Backed by DataPolicy.
| Field | Type | Default | Example |
|---|---|---|---|
sensitive_patterns | list of regex strings | [] | sensitive_patterns: ["sk-[A-Za-z0-9]{20,}"] |
credential_action | enum | redact_only | credential_action: block |
credential_action values
| Value | Behaviour |
|---|---|
block | Refuse the action; the engine returns Deny (reason credential detected) and the payload never reaches upstream. |
redact_only | (default) Forward a redacted form of the payload upstream. Preserves historical behaviour. |
alert_only | Forward the unmodified payload and raise an alert. A deliberate downgrade for low-risk, audit-only modes. |
Any other value is a validation error.
sensitive_patterns regex syntax
Each entry is a regular expression compiled by the Rust regex crate (RE2-style
— linear-time, no backtracking, no look-around or backreferences). An invalid
regex is a hard validation error
(data.sensitive_patterns[i]: invalid regex: ...). Backslashes must be escaped
for YAML, e.g. a US-SSN pattern is written "\\b\\d{3}-\\d{2}-\\d{4}\\b".
Built-in vs custom
The runtime ships a built-in credential scanner (aa-security) that always
runs, independent of sensitive_patterns. It is an Aho-Corasick literal matcher
covering common high-confidence secret prefixes, including:
- API keys:
sk-(OpenAI),sk-ant-(Anthropic),AKIA…(AWS), GCP service accounts, Azure connection strings. - Tokens:
ghp_/ghs_(GitHub),xoxb-/xoxp-/xoxa-(Slack). - Database URLs:
postgres://,mysql://,mongodb://. - Private keys: RSA, EC, OpenSSH, PKCS#8, PGP PEM blocks.
sensitive_patterns is the custom layer on top: your own regexes for
organisation-specific identifiers (employee IDs, internal hostnames, PII shapes
like SSNs or emails) that the built-in literal set does not cover.
Performance notes
- The built-in scanner is pre-compiled once at construction; each scan pays zero pattern-compilation cost and runs in a single Aho-Corasick pass.
- Custom
sensitive_patternsare compiled by theregexcrate. Because that engine is backtracking-free, match time is linear in the input length — there is no catastrophic-backtracking risk. Still, keep the pattern list small and anchored where possible; each pattern is an independent scan over the payload.
budget
Spend limits in US dollars. Backed by BudgetPolicy.
| Field | Type | Default | Example |
|---|---|---|---|
daily_limit_usd | float > 0 | (no cap) | daily_limit_usd: 20.0 |
monthly_limit_usd | float > 0, ≥ daily | (no cap) | monthly_limit_usd: 400.0 |
org_daily_limit_usd | float > 0 | (no cap) | org_daily_limit_usd: 100.0 |
org_monthly_limit_usd | float > 0, ≥ org daily | (no cap) | org_monthly_limit_usd: 2000.0 |
timezone | IANA tz string | UTC | timezone: "America/New_York" |
action_on_exceed | enum | deny | action_on_exceed: suspend |
window | duration string | (calendar day) | window: "1h30m" |
Currency
All limits are USD. There is no currency selector — costs are computed from a USD pricing table and compared against these USD caps.
Per-agent vs global vs per-org
Spend is tracked per agent, and rolled up to team, org, and global totals.
daily_limit_usd/monthly_limit_usdare the global caps (applied to the aggregate).org_daily_limit_usd/org_monthly_limit_usdadd an independent per-org cap, enforced separately from the global cap. Either can trip first.
Timezone & reset behaviour
timezone (an IANA name such as Europe/London) sets the boundary at which the
daily and monthly counters reset. It defaults to UTC. An unparseable name is a
validation error (budget.timezone: '<x>' is not a valid IANA timezone name).
- Daily reset: counters reset at local midnight in the configured timezone. Reset is lazy — it happens on the next spend event once the stored date is earlier than “today” in that timezone, so an idle agent’s counter simply carries the old date until its next request.
- Monthly reset: triggers when the stored month differs from the current month in the configured timezone.
windowoverrides the calendar-day rollover with a fixed rolling window (humantime duration, e.g.5s,30m,1h). Must be a positive duration.
action_on_exceed
| Value | Behaviour |
|---|---|
deny | (default) Deny individual over-budget requests but keep the agent active. |
suspend | Suspend the agent entirely until the budget resets. |
Validation rules: every limit must be > 0; monthly_limit_usd must be
≥ daily_limit_usd (and the same for the org pair). Equal monthly/daily is
allowed; monthly without daily is allowed.
schedule
Time-of-day gating. Backed by SchedulePolicy → ActiveHours.
| Field | Type | Default | Example |
|---|---|---|---|
active_hours.start | HH:MM 24h | (required if active_hours present) | start: "09:00" |
active_hours.end | HH:MM 24h | (required if active_hours present) | end: "18:00" |
active_hours.timezone | IANA tz string | (required if active_hours present) | timezone: "Asia/Taipei" |
When active_hours is set, the agent is permitted to run only inside the
[start, end) window in the given timezone. Omitting schedule entirely means
the agent is always active.
Validation rules
startandendmust be zero-paddedHH:MM(e.g.09:00, not9:00), hours00–23, minutes00–59.startmust be earlier thanend(string comparison onHH:MM). A window that wraps past midnight (e.g.22:00–06:00) is rejected — model overnight coverage as two policies or a single all-hours policy instead.- All three fields are required once
active_hoursis present.
IANA timezone strings
Use canonical IANA names: UTC, America/New_York, Europe/London,
Asia/Taipei, Asia/Tokyo, etc. Fixed offsets like GMT+8 are not IANA
names and should be avoided.
Multiple active windows
A single policy expresses one window. To grant several disjoint windows (e.g. a morning and an afternoon block), apply multiple policies at different scopes in the cascade, or widen to a single enclosing window.
DST & timezone edge cases
Because the window is interpreted in a named IANA zone (not a fixed offset), it
follows daylight-saving transitions automatically — 09:00–18:00 stays
“9am to 6pm local” across the spring-forward and fall-back shifts. Two edge
cases are inherent to wall-clock time:
- Spring forward (clocks jump, e.g.
02:00→03:00): astart/endthat names the skipped hour refers to a wall-clock time that does not exist on that date. Prefer windows outside the local DST gap. - Fall back (clocks repeat an hour): a time inside the repeated hour occurs twice. The window still opens and closes, but the repeated wall-clock hour is ambiguous. Avoid placing a boundary inside the local fall-back hour for predictable behaviour.
Keeping boundaries away from the very early-morning DST transition hours sidesteps both cases.
capabilities
Coarse-grained allow/deny of action categories. Backed by
aa_core::CapabilitySet. Merged across the scope cascade with
parent-deny-wins semantics.
| Field | Type | Default | Example |
|---|---|---|---|
allow | list of capability strings | [] | allow: ["file_read"] |
deny | list of capability strings | [] | deny: ["terminal_exec"] |
Recognised capability strings:
| String | Capability |
|---|---|
file_read | read the filesystem |
file_write | write the filesystem |
network_outbound | outbound network |
network_inbound | inbound network |
terminal_exec | execute shell commands |
agent_spawn | spawn child agents |
mcp_tool:<name> | use a named MCP tool, e.g. mcp_tool:git |
model:<name> | use a named model, e.g. model:gpt-4o |
An unknown capability string, or an mcp_tool: / model: with an empty name,
is a validation error.
approval
Per-policy overrides for the approval-escalation routing. Backed by
ApprovalPolicy. When omitted, team routing defaults apply.
| Field | Type | Default | Example |
|---|---|---|---|
timeout_seconds | integer | (team default) | timeout_seconds: 600 |
escalation_role | string | (team default) | escalation_role: org-admin |
Note the distinction between the top-level approval_timeout_secs (the global
approval timeout for the document, default 300) and the approval.timeout_seconds
override inside this section.
Three complete example policies
These ship under policy-examples/ and all pass aasm policy validate.
Strict
Deny all unknown tools, $5/day budget, block all sensitive data. See
policy-examples/strict.yaml.
apiVersion: agent-assembly/v1
kind: Policy
metadata:
name: strict
version: "1.0.0"
description: >
Lock everything down. Deny all unknown tools, cap spend at $5/day,
and block any payload that trips the sensitive-data scanner. Use this
as the baseline for high-risk or untrusted agents.
spec:
scope: global
network:
# Empty-but-present allowlist still allows any host (an empty list means
# "no restriction"). To actually restrict egress, list the exact hosts.
allowlist:
- api.openai.com
- api.anthropic.com
budget:
daily_limit_usd: 5.0
monthly_limit_usd: 100.0
timezone: "UTC"
action_on_exceed: suspend
data:
# Block the payload outright when the scanner finds a credential.
credential_action: block
sensitive_patterns:
- "sk-[A-Za-z0-9]{20,}"
- "AKIA[0-9A-Z]{16}"
- "-----BEGIN [A-Z ]*PRIVATE KEY-----"
# Capability floor: deny the dangerous categories regardless of per-tool rules.
capabilities:
deny:
- terminal_exec
- file_write
- network_inbound
# Deny every tool that is not explicitly allowed below.
tools:
"*":
allow: false
read_file:
allow: true
limit_per_hour: 60
http_get:
allow: true
limit_per_hour: 30
requires_approval_if: "url contains \"internal\""
Balanced
Allowlist common tools, $20/day budget, PII detection on (redact). See
policy-examples/balanced.yaml.
apiVersion: agent-assembly/v1
kind: Policy
metadata:
name: balanced
version: "1.0.0"
description: >
A pragmatic default for trusted internal agents. Allowlist the common
tools, cap spend at $20/day, and detect PII / credentials by redacting
rather than blocking so workflows keep running.
spec:
scope: global
network:
allowlist:
- api.openai.com
- "*.anthropic.com"
- "*.slack.com"
- api.github.com
schedule:
active_hours:
start: "08:00"
end: "20:00"
timezone: "America/New_York"
budget:
daily_limit_usd: 20.0
monthly_limit_usd: 400.0
timezone: "America/New_York"
action_on_exceed: deny
data:
# Redact-only: forward a scrubbed payload upstream instead of refusing it.
credential_action: redact_only
sensitive_patterns:
# PII detection: US SSN and a generic email address.
- "\\b\\d{3}-\\d{2}-\\d{4}\\b"
- "\\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}\\b"
tools:
read_file:
allow: true
limit_per_hour: 120
http_get:
allow: true
limit_per_hour: 60
web_search:
allow: true
limit_per_hour: 30
write_file:
allow: true
requires_approval_if: "path starts_with \"/etc\" OR path contains \"..\""
shell:
allow: true
limit_per_hour: 10
requires_approval_if: "command contains \"rm\" OR command contains \"sudo\""
Audit-only
Log everything, enforce nothing. See
policy-examples/audit-only.yaml.
apiVersion: agent-assembly/v1
kind: Policy
metadata:
name: audit-only
version: "1.0.0"
description: >
Observe everything, enforce nothing. Every tool is allowed and the
sensitive-data scanner only raises an alert without modifying or blocking
the payload. Use this to map an agent's behaviour before tightening rules.
spec:
scope: global
# No `network:` clause → egress is unrestricted (default-open).
# No `budget:` clause → no spend cap is enforced.
data:
# alert_only: forward the unmodified payload and raise an alert side-effect.
# Deliberate downgrade documented for low-risk, audit-only modes.
credential_action: alert_only
sensitive_patterns:
- "sk-[A-Za-z0-9]{20,}"
tools:
# Wildcard allow: every tool is permitted; findings are logged, not enforced.
"*":
allow: true
See also
- L0–L3 Capability Matrix — what each governance level can do.
- Policy RBAC Role Matrix — who may mutate policy at each scope.
aasm policy— the full policy command group (validate,apply,simulate,history, …).
Last updated: 2026-06-15 by Chisanan232
L0–L3 Governance Capability Matrix
This document defines the four governance tiers used across all AI Agent Assembly dev-tool adapters and declares the tier attained by each supported tool for each capability dimension. It is the single source of truth for “what does L2 mean for this tool” — adapter implementation Stories reference this document rather than defining tiers ad hoc.
Status: Codex, GitHub Copilot, and Windsurf Cascade tiers are final (adapters merged). Claude Code (
AAASM-201) and SaaS coding-agent (AAASM-918) rows are placeholders pending those adapters landing.
Tier definitions
| Tier | Name | What AAASM can do |
|---|---|---|
| L0 | Discover | Auto-inventory the tool: name, version, config file paths. No runtime hooks. AAASM knows the tool is present but cannot observe or affect its actions. |
| L1 | Observe | Tool actions appear in the AAASM audit log. Policy rules are evaluated and results are visible to operators, but the tool is not blocked — it runs uninhibited. Provides real-time observability without enforcement. |
| L2 | Enforce | Policy overlay is active. AAASM evaluates rules and blocks, redirects, or redacts violating actions while AAASM is running. The tool cannot bypass enforcement, but may operate without constraint if AAASM is offline. |
| L3 | Native Governed | AAASM writes the tool’s own native configuration (settings files, sandbox config, MCP registry). Governance is baked into the tool’s startup state — even if AAASM goes offline, the last-written settings cap what the tool can do. Strongest enforcement tier. |
Capability matrix
Rows are the seven governance capability dimensions. Columns are the four tiers. A cell answers: “At this tier, is this capability available?”
| Capability | L0 Discover | L1 Observe | L2 Enforce | L3 Native Governed |
|---|---|---|---|---|
| Audit log capture | No | Yes — every action emits an audit event with agent attribution, timestamp, and tool context | Yes | Yes |
| Policy decision visibility | No | Yes — policy rules evaluated per action; results visible in dashboard and aasm policy check | Yes | Yes |
| MCP server allowlist enforcement | No | No — MCP server list is observed but not restricted | Yes — deny list enforced at proxy layer | Yes — allowed MCP server list written to tool’s native config; tool cannot load unlisted servers at startup |
| Terminal-exec block | No | No | Yes — exec calls intercepted at proxy or SDK layer; blocked when policy says deny | Partial — depends on tool-native sandbox support; see per-tool declarations below |
| File-write block | No | No | Yes — file-write events evaluated by policy; violations blocked at proxy or SDK layer | Partial — depends on tool-native sandbox support; see per-tool declarations below |
| Network-egress block | No | No | Yes — outbound HTTPS intercepted by aa-proxy; hosts not in allowlist receive 403 | Partial — some tools support native network restrictions in their config; see per-tool declarations below |
| Sub-agent governance | No | Yes — spawned agents are registered and appear in the topology tree | Yes — child agents inherit parent’s policy scope; budget shared | Yes — spawned agents are registered with governing tool’s team ID at the native config level |
Per-tool tier declarations
Codex
Adapter:
AAASM-202(Done) · Mechanism: sandbox policy sync + approval alignment + wrapper integration
| Capability | Tier | Notes |
|---|---|---|
| Audit log capture | L2 | Wrapper intercepts Codex API calls; audit events emitted for every tool invocation |
| Policy decision visibility | L2 | Policy evaluated per call; decisions surfaced via aasm topology and dashboard |
| MCP server allowlist | L3 | AAASM writes the Codex sandbox allowed_mcp_servers list at startup and on policy change |
| Terminal-exec block | L3 | Codex sandbox natively restricts exec; AAASM syncs the allowed-commands list from policy |
| File-write block | L3 | Codex sandbox file restrictions synced from AAASM policy (allowed_paths, denied_paths) |
| Network-egress block | L2 | Proxy layer intercepts outbound HTTPS; Codex sandbox network restrictions also synced (belt-and-suspenders) |
| Sub-agent governance | L2 | Sub-processes spawned by Codex register with AAASM via wrapper; inherit parent team policy |
Honest boundaries for Codex:
- If the user invokes Codex with
--no-sandbox, all L3 enforcement is bypassed. AAASM detects this at L1 (audit event) but cannot enforce. - Codex sandbox restrictions apply to the Codex subprocess only; they do not restrict processes Codex spawns via
subprocess.run()unless the sandbox’s exec allowlist is set correctly. - Approval-queue flows require AAASM gateway to be reachable; offline mode defaults to the policy’s
offline_action(allow or deny).
GitHub Copilot
Adapter:
AAASM-203(Done) · Mechanism: VS Code settings alignment + MCP governance
| Capability | Tier | Notes |
|---|---|---|
| Audit log capture | L1 | VS Code extension telemetry hooks emit audit events for Copilot chat messages and inline suggestions |
| Policy decision visibility | L1 | Policy decisions are visible in dashboard; enforcement is observability-only at this tier |
| MCP server allowlist | L3 | AAASM writes github.copilot.chat.mcp.enabled and the allowed MCP server list to VS Code settings.json via the settings sync adapter |
| Terminal-exec block | L0 | VS Code’s extension API does not expose a hook to block terminal commands initiated by Copilot. Blocking requires proxy layer (Layer 2) running alongside. |
| File-write block | L0 | VS Code extension API provides no file-write veto for inline edits. Observable via audit but not blockable at the extension level. |
| Network-egress block | L1 | Proxy layer can intercept outbound HTTPS from the VS Code process; no native Copilot setting restricts outbound hosts. |
| Sub-agent governance | L0 | Copilot does not expose a sub-agent spawning API that AAASM can intercept at the extension level. |
Honest boundaries for GitHub Copilot:
- Terminal-exec and file-write enforcement require
aa-proxy(Layer 2) running as a system-level MitM. The VS Code extension adapter alone cannot provide L2+ enforcement for these capabilities. - VS Code settings sync writes
settings.jsonat the workspace level; a user can override at the user-settings level. Enterprise-grade enforcement requires VS Code managed device policies (outside AAASM scope). - Network-egress block via proxy does not cover VS Code’s built-in Copilot HTTPS calls unless the proxy CA is trusted by the VS Code process.
Windsurf Cascade
Adapter:
AAASM-204(Done) · Mechanism: admin settings sync + MCP registry control
| Capability | Tier | Notes |
|---|---|---|
| Audit log capture | L1 | Windsurf telemetry hooks emit audit events for Cascade tool calls and agent spawning |
| Policy decision visibility | L1 | Policy evaluated and results visible; enforcement passive at this tier |
| MCP server allowlist | L3 | AAASM writes the Windsurf MCP registry (~/.codeium/windsurf/mcp_registry.json) via admin settings sync; unlisted servers are not loaded at Windsurf startup |
| Terminal-exec block | L1 | Cascade terminal actions are observable; no Windsurf-native exec block API exists. L2 blocking requires proxy layer. |
| File-write block | L1 | File edits are observable in audit log; no Windsurf-native veto API. L2 blocking requires proxy layer. |
| Network-egress block | L1 | Outbound HTTPS interceptable by proxy layer; no Windsurf-native network restriction config. |
| Sub-agent governance | L1 | Windsurf Cascade multi-agent flows are observable; child agents appear in topology but do not inherit policy scope automatically without the SDK. |
Honest boundaries for Windsurf Cascade:
- Windsurf does not expose a sandbox mode. L2 enforcement for exec and file operations requires
aa-proxyrunning at the system level. - Admin settings sync requires Windsurf’s config directory to be writable by the AAASM process. In multi-user environments, this requires elevated permissions or a per-user deployment.
- MCP registry control only governs MCP servers loaded by Windsurf at startup. A user can manually add servers to a workspace-level config that overrides the registry.
Claude Code
Adapter:
AAASM-201— Pending (in backlog) · Placeholder — do not rely on these declarations until AAASM-201 is merged
| Capability | Tier | Notes |
|---|---|---|
| Audit log capture | TBD | — |
| Policy decision visibility | TBD | — |
| MCP server allowlist | TBD | — |
| Terminal-exec block | TBD | — |
| File-write block | TBD | — |
| Network-egress block | TBD | — |
| Sub-agent governance | TBD | — |
SaaS Coding-Agent (Claude.ai / ChatGPT / Codex-web)
Adapter:
AAASM-918— Pending (in backlog) · Placeholder — tier declarations incomplete
| Capability | Tier | Notes |
|---|---|---|
| Audit log capture | L1 | SaaS agents emit L0–L1 events via the observability adapter (browser extension or API-level hook); execution is remote and not fully inspectable |
| Policy decision visibility | L1 | Policy decisions are visible but enforcement is not possible at the cloud execution layer |
| MCP server allowlist | L0 | Cloud-hosted tools do not expose an MCP allowlist config that AAASM can control |
| Terminal-exec block | L0 | Remote execution; no AAASM enforcement path |
| File-write block | L0 | Remote execution; no AAASM enforcement path |
| Network-egress block | L0 | Remote execution; egress is controlled by the SaaS provider, not AAASM |
| Sub-agent governance | L0 | SaaS multi-agent orchestration is opaque; AAASM cannot intercept spawn events |
Honest boundaries for SaaS coding-agents:
- SaaS-hosted tools execute remotely. AAASM’s enforcement capabilities (L2–L3) apply only to locally-running processes. This is a fundamental architectural limit, not a product gap.
- L1 observability is available only if the user installs the observability adapter (browser extension or API hook). Without it, even L1 is not available.
- These tools are out-of-scope for any enforcement stronger than L1 for v0.0.1.
Summary table
| Tool | Audit | Policy Vis. | MCP Allowlist | Exec Block | File Block | Net Block | Sub-agent |
|---|---|---|---|---|---|---|---|
| Codex | L2 | L2 | L3 | L3 | L3 | L2 | L2 |
| GitHub Copilot | L1 | L1 | L3 | L0† | L0† | L1 | L0 |
| Windsurf Cascade | L1 | L1 | L3 | L1† | L1† | L1 | L1 |
| Claude Code | TBD | TBD | TBD | TBD | TBD | TBD | TBD |
| SaaS Coding-Agent | L1 | L1 | L0 | L0 | L0 | L0 | L0 |
† These capabilities require aa-proxy (Layer 2) running alongside the tool for enforcement.
Without the proxy, the declared tier drops to L0 (discovery/inventory only).
Relationship to the three interception layers
The dev-tool adapter tier system is separate from but complementary to AAASM’s three interception layers (SDK / proxy / eBPF). The layers provide runtime enforcement regardless of which tool is active; the adapter tiers describe what each specific tool’s native API exposes:
| Layer | What it governs | Interaction with adapter tiers |
|---|---|---|
Layer 1 — SDK shim (aa-ffi-*) | Agents that use the AAASM SDK explicitly | Provides L2 enforcement for SDK-aware tools independent of adapter tier |
Layer 2 — aa-proxy | All outbound HTTPS from the machine | Provides L2 network/exec enforcement for any tool; fills gaps where adapter tier is L0 for exec/file/net |
Layer 3 — aa-ebpf (Linux only) | SSL uprobes + exec/file syscalls at kernel level | Provides L1 detection + alerting for any tool; cannot modify traffic in flight (no redaction at this layer) |
In practice, for tools where the adapter tier is L0 or L1 for exec/file/network enforcement, deploying
aa-proxy alongside the tool upgrades effective enforcement to L2 for those dimensions without
requiring a new adapter.
References
AAASM-199— Agent Assembly SDK interception overview (DevToolAdaptertrait +GovernanceLevelenum)AAASM-201— Claude Code adapter (pending; will update Claude Code row above)AAASM-202— Codex adapterAAASM-203— GitHub Copilot adapterAAASM-204— Windsurf Cascade adapterAAASM-206— Governance level (L0–L3) classification in policy schema (governance_levelfield inAgentRecordand policy conditions)AAASM-918— SaaS coding-agent adapter (pending; will finalize SaaS row above)docs/src/architecture/system-architecture.md— Three-layer interception modeldocs/src/policy-rbac.md— RBAC role matrix for policy mutations
Last updated: 2026-06-11 by Chisanan232
Policy RBAC Role Matrix
Auto-generated from the PolicyMutationRequiredRole table in aa-gateway/src/policy/rbac.rs. Do not edit by hand — run cargo run -p aa-api --bin generate_policy_rbac_doc to regenerate.
The 5 canonical RBAC roles in privilege order (highest → lowest):
OrgAdmin > TeamAdmin > Developer > Viewer > Auditor
Auditor may never mutate policies — all write attempts are denied.
| Scope | create | update | delete |
|---|---|---|---|
global | org_admin | org_admin | org_admin |
org | org_admin | org_admin | org_admin |
team | team_admin | team_admin | team_admin |
agent | developer | developer | developer |
tool | developer | developer | developer |
Role Descriptions
org_admin— Full policy mutation rights across all scopes.team_admin— Can mutate team-scoped policies and below (Agent, Tool).developer— Can mutate agent- and tool-scoped policies only.viewer— Read-only access — no writes permitted.auditor— Read-only audit access — all write attempts denied regardless of scope.
Last updated: 2026-05-08 by Chisanan232
Protocol Specification Changelog
Scope: This changelog covers the Agent Assembly protocol specification only — proto message schemas, JSON schema, IPC framing contract, and SDK protocol conformance requirements. For runtime/crate release notes, see the project CHANGELOG when it exists.
All notable changes to the protocol specification are documented here. Format follows Keep a Changelog. Protocol versioning follows the policy in docs/versioning.md.
[v0.0.1] — 2026-04-28
Initial release of the Agent Assembly protocol specification.
Added
Services
AgentLifecycleService(proto/agent.proto) — RPC surface for agent registration, heartbeat, deregistration, and runtime control streamPolicyService(proto/policy.proto) — synchronous policy check RPC for intercepting agent actions before executionAuditService(proto/audit.proto) — event reporting and streaming RPC for immutable audit log ingestion
Agent lifecycle messages (proto/agent.proto)
RegisterRequest— agent startup registration carrying identity, framework, tool list, risk tier, public key, and arbitrary metadataRegisterResponse— gateway issues credential token, assigns policy, sets heartbeat intervalHeartbeatRequest— periodic liveness signal carrying active run count and cumulative action countHeartbeatResponse— gateway signals policy update and/or suspend request to agentDeregisterRequest— clean or forced agent shutdown with optional reason stringDeregisterResponse— gateway confirms deregistration success and echoes agent identityControlStreamRequest— opens persistent server-streaming channel for runtime controlControlCommand— oneof wrapper dispatching to one of four command variants:SuspendCommand— instructs agent to pause executionResumeCommand— instructs agent to resume executionPolicyUpdateCommand— delivers updated policy document inlineKillCommand— instructs agent to terminate with optional reason
Policy messages (proto/policy.proto)
CheckActionRequest— policy check request carrying agent identity, credential token, trace/span IDs, action type, and action-specific contextCheckActionResponse— policy decision carryingDecisionenum, reason, policy rule reference, optional approval ID, optional redact instructions, and decision latencyActionContext— oneof wrapper for the five action context subtypes:LLMCallContext— model name, prompt token count, and sampled prompt prefixToolCallContext— tool name, source (mcp/builtin), JSON args, and target URLFileOpContext— operation type, file path, and byte countNetworkCallContext— method, URL, and header namesProcessExecContext— executable path and argument list
RedactInstructions— container for one or more redaction rulesRedactRule— field path (JSONPath) and replacement string for a single redactionBatchCheckRequest— wraps multipleCheckActionRequestitems for bulk evaluationBatchCheckResponse— wraps correspondingCheckActionResponseitems
Event messages (proto/event.proto)
EnvelopedEvent— typed event envelope with agent identity, timestamp, sequence number, and oneof payload for the five event subtypesAlertTriggered— credential or policy violation alert with severity and matched patternApprovalRequested— human-in-the-loop approval request with timeout and context summaryAgentStatusChanged— agent lifecycle state transition notificationBudgetThresholdHit— token or cost budget threshold breach notificationApprovalDecision— outcome of a previously requested approval
Audit messages (proto/audit.proto)
AuditEvent— immutable audit record with agent identity, timestamp, sequence number, SHA-256 hash chain field, and oneof payload for five detail subtypes:LLMCallDetail— model, token counts, finish reasonToolCallDetail— tool name, source, args hash, result hashFileOpDetail— operation, path, byte count, hashNetworkCallDetail— method, URL, status code, response byte countProcessExecDetail— executable, args hash, exit code
PolicyViolation— policy rule reference, decision, and triggering action summaryApprovalEvent— approval request and decision pair linked by approval IDReportEventsRequest/ReportEventsResponse— unary bulk event submissionStreamEventsResponse— server acknowledgement for the streaming submission RPC
Common types (proto/common.proto)
AgentId— composite agent identity:org_id,team_id,agent_id(DID string)Timestamp— millisecond-precision Unix timestamp (unix_msint64)Decisionenum —ALLOW,DENY,PENDING,REDACTActionTypeenum —LLM_CALL,TOOL_CALL,FILE_OPERATION,NETWORK_CALL,PROCESS_EXEC,AGENT_SPAWNRiskTierenum —LOW,MEDIUM,HIGH,CRITICAL
JSON Schema
schemas/policy/v1/policy-document.schema.json— PolicyDocument JSON Schema v1, defining the structure of policy rules evaluated byPolicyService- Example policy documents:
schemas/examples/strict.yaml,balanced.yaml,audit-only.yaml
IPC framing contract
- Transport: Unix domain socket (
/var/run/aa-runtime.sockby default) - Framing: prost varint length-delimited encoding — each frame is a varint-encoded byte length followed by the raw proto bytes
- Reference:
prost::encode_length_delimited/prost::decode_length_delimited - Conformance vectors:
conformance/vectors/ipc_framing/(10 vectors)
Tagging runbook
Run the following commands only when AAASM-12 (Protocol Specification epic) is fully
closed and all protocol tickets have been merged into master:
# Create annotated tag for the initial spec release
git tag -a spec/v0.0.1 -m "Protocol Specification v0.0.1 — initial release"
# Push the tag to the upstream remote
git push origin spec/v0.0.1
Tag namespace convention: spec/<version> — coexists with future runtime/<version>,
sdk/<version> tags in the same monorepo without ambiguity.
Last updated: 2026-05-04 by Chisanan232
Migration Guide — [FILL IN: brief title, e.g. “AgentId.agent_id renamed to AgentId.id”]
Template instructions: Copy this file to
docs/migration/<vX.Y-to-vZ.0>.md, fill in every[FILL IN]section, and delete these instruction lines. See the completed worked example indocs/versioning.mdfor a reference of what a finished guide looks like.
Breaking change introduced in: protocol/v[FILL IN]
Deprecated since: protocol/v[FILL IN] (omit if not previously deprecated)
Affected SDK versions: [FILL IN: e.g. “All SDKs using MessageName.field_name”]
Estimated migration effort: [FILL IN: Low / Medium / High]
Low — mechanical find-and-replace, no logic change. Medium — logic changes in a small number of call sites. High — widespread changes or dependent schema updates required.
What changed
[FILL IN: One or two paragraphs describing what was removed, renamed, or altered and why. Include the field number, message name, and proto file. Explain the motivation briefly — e.g. naming consistency, type safety, protocol simplification.]
Before (protocol/v[FILL IN].x)
Proto encoding:
[FILL IN: show the relevant message with the old field]
MessageName {
field_name: "example-value" // field N — old name/type
}
Python SDK:
[FILL IN: show the old API call]
obj = MessageName(field_name="example-value")
Node.js SDK:
[FILL IN: show the old API call]
const obj = new MessageName({ fieldName: 'example-value' });
Go SDK:
[FILL IN: show the old API call]
obj := &pb.MessageName{FieldName: "example-value"}
After (protocol/v[FILL IN].0+)
Proto encoding:
[FILL IN: show the relevant message with the new field]
MessageName {
new_field_name: "example-value" // field M — new name/type
}
Python SDK:
[FILL IN: show the new API call]
obj = MessageName(new_field_name="example-value")
Node.js SDK:
[FILL IN: show the new API call]
const obj = new MessageName({ newFieldName: 'example-value' });
Go SDK:
[FILL IN: show the new API call]
obj := &pb.MessageName{NewFieldName: "example-value"}
Migration steps
- [FILL IN: First step — e.g. “Search your codebase for all usages of
MessageName.field_name.”] - [FILL IN: Second step — e.g. “Replace each with
MessageName.new_field_name.”] - [FILL IN: Third step — e.g. “Run the conformance test suite to verify.”]
- [FILL IN: Deployment order step if relevant — e.g. “Deploy the updated SDK before
upgrading
aa-runtimepast vN.x (runtime vN.x still supports protocol/v(N-1)).”]
Verification
Run the conformance suite against a runtime at protocol/v[FILL IN]:
[FILL IN: exact command, e.g.]
cargo test -p conformance
python conformance/runner/runner.py --verbose
Expected: all vectors pass with no failures referencing [FILL IN: old field name].
See also
docs/versioning.md— change classification rules and deprecation lifecycledocs/protocol/CHANGELOG.md— full protocol changelogconformance/vectors/— test vectors for the affected message category
Last updated: 2026-06-06 by Bryant
Event: topology.cross_team_edge
Published by aa-gateway whenever an edge is inserted between two agents
that belong to different teams. Both agents must have a non-NULL team_id
in the agent registry; if either is missing the event is suppressed and an
info-level log line is emitted instead.
Transport
Internal Tokio broadcast channel (tokio::sync::broadcast::Sender<CrossTeamEdgeEvent>).
Channel capacity: 64. Slow consumers receive RecvError::Lagged(n) when they
fall behind.
Subscribers call InMemoryEdgeRepo::subscribe_cross_team_events().
Payload
Rust type: aa_gateway::edges::CrossTeamEdgeEvent
| Field | Type | Description |
|---|---|---|
edge_id | i64 | Auto-assigned id of the inserted edge |
source_agent_id | AgentId ([u8; 16]) | Agent that originated the relationship |
source_team_id | String | Team the source agent belongs to |
target_agent_id | AgentId ([u8; 16]) | Agent that was the target |
target_team_id | String | Team the target agent belongs to |
edge_type | EdgeType | Semantic type: one of delegates_to, calls, reads, writes, approves, messages |
occurred_at | DateTime<Utc> | UTC timestamp when the edge was recorded |
Example (JSON-serialised for illustration)
{
"edge_id": 42,
"source_agent_id": "01010101010101010101010101010101",
"source_team_id": "team-alpha",
"target_agent_id": "02020202020202020202020202020202",
"target_team_id": "team-beta",
"edge_type": "messages",
"occurred_at": "2026-05-10T04:00:00Z"
}
Publishing conditions
| Scenario | Action |
|---|---|
source.team_id != target.team_id (both set) | Publish CrossTeamEdgeEvent |
Either team_id is NULL | Log at INFO; no event |
source.team_id == target.team_id | No event |
Consumer notes (AAASM-198)
- Subscribe before inserting edges to avoid missing events on a lagged receiver.
- The broadcast channel drops events for receivers that fall more than 64 messages behind — design consumers to process promptly or buffer independently.
edge_idcan be used to fetch full edge metadata viaGET /api/v1/agents/{id}/edges.
Last updated: 2026-05-10 by Chisanan232
In-Flight Ops Registry — Architecture
Status: Active design — PR-A landed (AAASM-1422). Scope: Gateway-side tracking of agent operations from
CheckActionRequestingestion through to terminalCompleting/Terminatedstates, the IPC protocol that lets the dashboard observe and control those operations, and the SDK return-channel that propagates control signals back to running agents.
1 — Why this exists
The original audit pipeline records what already happened (AuditEvent is
post-facto and immutable). The Live Ops dashboard (AAASM-1326, AAASM-1334)
needs a live view of operations currently in flight: which agents are
running right now, which are paused, which were just terminated. None of that
existed before AAASM-1525 / AAASM-1422.
AAASM-1415 shipped the POST /api/v1/ops/{id}/{pause,resume,terminate} route
shells as stubs that return 202 + log so the dashboard’s row-action menu
could be wired without 404-ing. AAASM-1525 added the OpsRegistry skeleton in
aa-api with a 3-state machine (Running / Paused / Terminated) and a
client-driven POST /api/v1/ops registration endpoint. AAASM-1422 closes
the remaining gap: gateway-side ingestion from the policy-check path, a
5-state model that distinguishes pre-allow from post-completion, and a sub-task
plan for the IPC protocol and SDK enforcement.
2 — Decisions recorded for this iteration
| Decision | Choice | Why |
|---|---|---|
| Op identifier (AC #2 of AAASM-1422) | op_id = "{trace_id}:{span_id}" | Already in CheckActionRequest; distributed-tracing-native; lets the dashboard re-match same-id OpStateChanged WebSocket events without a new id allocator. No protobuf changes required for PR-A. |
| Crate home | aa-gateway::ops, re-exported via aa_api::ops | Mirrors BudgetTracker, AgentRegistry, PolicyEngine. PolicyServiceImpl (in aa-gateway) can ingest without a reverse-crate dep into aa-api. |
| State model | 5 states: Pending, Running, Paused, Completing, Terminated | Distinguishes “policy allow not yet decided” (Pending) and “action finished, draining” (Completing) from the active middle states. Aligns with AAASM-1422 description. |
| Storage primitive | DashMap<String, OpRecord> | Lock-free concurrent reads, shard-level write locks. Identical to BudgetTracker.per_agent. |
| Ingestion entry point | OpsRegistry::ingest(op_id) -> OpRecord keyed by {trace_id}:{span_id}, idempotent | Called from PolicyServiceImpl::check_action before policy evaluation so the op appears in Pending state even if the policy decision takes time. |
| Allow transition | OpsRegistry::allow(op_id): Pending → Running | Called from PolicyServiceImpl::check_action after an Allow decision. |
| Complete transition | OpsRegistry::complete(op_id): Running → Completing | Drained-out terminal state; entries stay readable briefly so the dashboard can render the completion before they’re swept. |
| Sweep policy | Background tokio task on the registry drops Completing + Terminated entries older than 60 s. Tick every 10 s. Configurable via spawn_sweep_task_with(registry, tick, ttl_seconds). (AAASM-1657 PR-H) | Bounds registry memory while giving the dashboard ~10 s of grace to render the terminal state before it disappears. |
3 — Data model
#![allow(unused)]
fn main() {
// aa-gateway/src/ops/mod.rs
pub enum OpState {
Pending, // ingested, awaiting policy decision
Running, // policy allowed; agent is actively executing
Paused, // operator paused via POST /api/v1/ops/{id}/pause
Completing, // action signalled complete, draining
Terminated, // operator terminated, or policy denied
}
pub struct OpRecord {
pub op_id: String, // "{trace_id}:{span_id}"
pub state: OpState,
pub registered_at: String,// RFC 3339 — first time the op id was seen
pub updated_at: String, // RFC 3339 — most recent transition
}
pub enum OpsError {
NotFound,
InvalidTransition,
}
pub struct OpsRegistry { /* DashMap<String, OpRecord> */ }
}
4 — State machine
stateDiagram-v2
[*] --> Pending: ingest()
Pending --> Running: allow()
Pending --> Terminated: deny() / terminate()
Running --> Paused: pause()
Paused --> Running: resume()
Running --> Completing: complete()
Running --> Terminated: terminate()
Paused --> Terminated: terminate()
Completing --> [*]: (sweep — PR-H)
Terminated --> [*]: (sweep — PR-H)
Transition rules:
| From → To | Method | Notes |
|---|---|---|
(none) → Pending | ingest(op_id) | Idempotent re-call returns the existing record unchanged. |
Pending → Running | allow(op_id) | Called from policy-engine Allow path. |
Pending → Terminated | terminate(op_id) | Policy Deny path may take this directly (PR-H). |
Running → Paused | pause(op_id) | Operator action via HTTP. |
Paused → Running | resume(op_id) | Operator action via HTTP. |
Running → Completing | complete(op_id) | Called by SDK when the agent finishes the action (PR-E/F/G). |
any non-terminal → Terminated | terminate(op_id) | Operator force-termination. |
| any other pair | (invalid) | Returns OpsError::InvalidTransition. |
The registry remains idempotent on terminal states: calling terminate on
an already-Terminated op returns the existing record without erroring.
5 — Ingestion path
agent ──gRPC──▶ PolicyServiceImpl::check_action(req)
│
├─▶ ops_registry.ingest("{trace_id}:{span_id}")
│ // entry created in `Pending`
│
├─▶ engine.evaluate(req) ─▶ EvaluationResult
│
├─▶ if Allow:
│ ops_registry.allow(op_id) // Pending → Running
│ if Deny:
│ ops_registry.terminate(op_id) // Pending → Terminated (PR-H)
│
└─▶ Response { decision, reason, ... }
This means: by the time the SDK receives the CheckActionResponse, the
gateway-side registry has the op recorded and the dashboard sees it in the
correct state via the WebSocket stream (PR-B).
PR-A ships the ingest() + allow() call sites. The terminate() on Deny
is deferred to PR-H so PR-A keeps a small surface area.
6 — IPC sketch (PR-D)
Today the gateway → SDK channel is request/response only (CheckActionRequest
→ CheckActionResponse). For real pause / terminate enforcement, the SDK
must learn about state changes while the action is in flight.
Two viable shapes:
- Server-streaming
OpControlStream— SDK opens a long-lived stream onregister_agent. Gateway pushes{op_id, signal: pause|resume|terminate}messages. SDK acknowledges via a separate unary RPC. (Recommended in PR-D.) - Bidirectional
OperationChannel— replace per-actionCheckActionwith a single bidi stream. Heavier protocol churn; deferred.
The SDK then cooperatively yields on pause, resumes on resume, and
fast-fails on terminate. Each SDK (Python / Node / Go) ships its own
enforcement layer in PR-E / PR-F / PR-G.
7 — Dashboard correlation (PR-C)
Today the dashboard’s useLiveOpsStream hook builds an in-memory map keyed by
GovernanceEvent.id (monotonic, unique per event). Two events for the same
op therefore can’t be correlated — the override-clear logic in LiveOpsPage
never sees its target id again.
After PR-B/PR-C, the WebSocket emits a new OpStateChanged payload variant:
{
"event_type": "ops_change",
"agent_id": "agent-7",
"payload": {
"op_id": "trace-abc:span-1", // stable across the op's lifetime
"state": "running", // OpState serialized snake_case
"updated_at": "2026-05-20T09:32:20.822Z"
}
}
The dashboard then keys its map by payload.op_id. The override-clear logic
matches on the same key, so a pause followed by the server’s confirming
paused event auto-clears the optimistic state without manual intervention.
8 — Sub-task plan
| Sub-task | Scope | Touches |
|---|---|---|
| PR-B | aa-proto + aa-api OpStateChanged event type & payload schema | proto/, aa-api/src/models/, OpenAPI |
| PR-C | Dashboard id-model rework — useLiveOpsStream correlates by op_id, override auto-clear | dashboard/src/ |
| PR-D | Gateway → SDK bidirectional return-channel: proto OpControlStream + aa-proto regen | proto/, SDK shims |
| PR-E | python-sdk cooperative pause + fast-fail terminate at shim layer | python-sdk repo |
| PR-F | node-sdk equivalent | node-sdk repo |
| PR-G | go-sdk equivalent | go-sdk repo |
| PR-H | Replace AAASM-1415 stub handlers with registry-backed transitions; emit OpStateChanged on each transition; add Pending → Terminated on policy Deny; add sweep policy | aa-api/src/routes/ops.rs, aa-gateway/src/service/policy_service.rs |
9 — Out of scope for this Task (AAASM-1422)
- Persistence across gateway restarts (registry is in-memory; restart re-empties it and the dashboard reconciles via the existing WS reconnect).
- Multi-gateway cluster coordination (sharded by
agent_id-affinity in a later release; not on the roadmap for v0.0.1). - Cross-team aggregation views beyond what the existing Live Ops page surfaces.
10 — References
- AAASM-1422 — this Task
- AAASM-1415 — stub
/ops/{id}/{pause,resume,terminate}endpoints - AAASM-1525 —
OpsRegistryskeleton with 3-state machine - AAASM-1326 / AAASM-1334 — Live Ops dashboard design + row actions
Last updated: 2026-05-21 by Chisanan232
Sandbox / Dry-Run Mode
Run any policy in observe-only mode for a few days before flipping the switch to live enforcement.
Sandbox mode is the governance analogue of a database transaction ROLLBACK: the gateway evaluates every rule, records every would-be decision in the audit log, and applies none of them. The agent proceeds as if no policy were in effect. Once you’ve reviewed the would-be violations and tuned the policy, you cut over to live enforce mode with a one-line change.
The feature is part of the open-source core — not an enterprise add-on.
How it works
Sandbox mode is an enforcement posture, not a separate runtime. It only changes what the gateway does after a policy decision is computed:
| Decision | Enforce mode (default) | Observe / dry-run mode |
|---|---|---|
Allow | Action proceeds | Action proceeds (identical) |
Deny | Action blocked; error returned | Action proceeds; dry_run: true shadow event written to the audit log |
Redact | Payload sanitised | Unredacted payload forwarded; shadow event written |
RequiresApproval | Action halts pending review | Action proceeds; shadow event written |
Every shadow event carries the full decision context: which rule matched (shadow_decision), what the rejection reason would have been (shadow_reason), and a dry_run: true flag the audit consumer can filter on.
Quick start — 5 steps
# 1. Author a policy in observe mode (zero risk to running agents)
cat > coding-team-sandbox.yaml << 'EOF'
name: coding-team-sandbox
enforcement_mode: observe # ← the one new field
rules:
- action: deny
match:
tool_name: bash
command_pattern: "rm -rf"
- action: redact
match:
output_contains_pattern: "(AKIA|ghp_)[A-Za-z0-9]+"
EOF
# 2. Apply the policy
aasm policy apply --file coding-team-sandbox.yaml
# 3. Run an agent under observe-mode governance
aasm run --observe claude --workspace .
# 4. After a few days, review what would have been blocked
aasm audit list --dry-run-only --since 7d
# 5. Confident the policy is right? Flip to live enforcement.
sed -i 's/enforcement_mode: observe/enforcement_mode: enforce/' coding-team-sandbox.yaml
aasm policy apply --file coding-team-sandbox.yaml
Policy configuration
enforcement_mode is a top-level optional field on the policy document:
name: my-policy
enforcement_mode: observe # "enforce" (default) | "observe" | "disabled"
rules: [ ... ]
When the field is omitted, the policy defaults to enforce — the pre-feature behaviour. Existing on-disk policies upgrade transparently.
Per-agent overrides via agent_overrides are also supported, so you can run a single experimental agent in observe mode while the rest of the team stays in live enforce:
name: coding-team-policy
enforcement_mode: enforce
agent_overrides:
- agent_glob: "experimental-*"
enforcement_mode: observe
Resolution order (highest priority first):
- Per-agent override —
agent_overridesblock in the policy YAML, orenforcement_modeon the agent’sRegisterAgentRPC payload. - Policy document default — the top-level
enforcement_modefield. - Server-wide default —
enforce.
CLI reference
aasm run --observe
Launches a managed AI dev tool with observe-mode governance for the duration of the session.
# Boolean shorthand — most common case
aasm run --observe claude --workspace .
# Explicit form — interchangeable with the above
aasm run --enforcement-mode observe claude --workspace .
# Disabled mode — only valid in hermetic test environments
aasm run --enforcement-mode disabled codex --workspace .
# Combine with --dry-run to preview the launch without executing the tool
aasm run --observe --dry-run claude --workspace .
When observe mode is active, a one-time banner prints to stderr ahead of any tool output:
⚠️ [AAASM] Running in sandbox/observe mode.
Policy decisions are recorded but NOT enforced.
Review captured events: aa audit list --dry-run-only
The child process inherits AA_ENFORCEMENT_MODE=observe in its environment so tools that env-sniff (or downstream wrappers) can surface their own observe-mode badge.
--observe and --enforcement-mode are mutually exclusive — passing both fails fast at clap-parse time.
aasm audit list --dry-run-only
Filters the audit log to shadow events only:
# Show shadow events from the last 24h
aasm audit list --dry-run-only --since 24h
# Compose with other filters
aasm audit list --dry-run-only --since 7d --agent "codex-*"
# Machine-readable output for CI gates
aasm audit list --dry-run-only --format json
The flag is exclusive: by default aasm audit list HIDES shadow events so operators don’t see them mixed with live decisions; --dry-run-only flips that to show ONLY shadow events.
SDK usage
All three SDKs expose the same posture surface. Pass an enforcement_mode (Python / Go) or enforcementMode (Node.js) at agent registration:
Python
from agent_assembly import init_assembly
ctx = init_assembly(
gateway_url="http://localhost:8080",
api_key="...",
agent_id="experimental-agent-001",
enforcement_mode="observe", # "enforce" | "observe" | "disabled"
)
The parameter is keyword-only; the type is Literal["enforce", "observe", "disabled"]. Omitting it preserves the pre-feature wire shape (the gateway applies its server-side enforce default).
Node.js / TypeScript
import { initAssembly, type EnforcementMode } from "@agent-assembly/sdk";
const ctx = await initAssembly({
gatewayUrl: "http://localhost:8080",
apiKey: "...",
agentId: "experimental-agent-001",
enforcementMode: "observe", // 'enforce' | 'observe' | 'disabled'
});
The EnforcementMode union narrows at compile time; runtime validation catches typos from JS / JSON-config / dynamic-input callers with a RangeError.
Go
import "github.com/agent-assembly/go-sdk/assembly"
a, err := assembly.Init(ctx,
assembly.WithGatewayURL("http://localhost:8080"),
assembly.WithAPIKey("..."),
assembly.WithSelfAgentID("experimental-agent-001"),
assembly.WithEnforcementMode(assembly.EnforcementModeObserve),
)
assembly.EnforcementMode is a string-typed alias; the empty zero value omits the field from the registration body, preserving pre-feature wire shape.
CI integration — the policy-regression gate
A common observe-mode use case: gate every PR on “would my policy change block any existing agent workflow?”
# .github/workflows/policy-regression.yml
jobs:
policy-regression:
steps:
- name: Run agent under observe-mode governance
run: aasm run --observe codex -- codex "refactor src/auth.py"
- name: Fail the PR on any would-be deny
run: |
BLOCKS=$(aasm audit list --dry-run-only --format json \
| jq '[.[] | select(.shadow_decision == "deny")] | length')
if [ "$BLOCKS" -gt 0 ]; then
echo "Policy regression: $BLOCKS actions would be blocked"
aasm audit list --dry-run-only --format table
exit 1
fi
The exclusive-filter semantic of --dry-run-only means this gate doesn’t pick up unrelated live-enforcement events from other agents on the same gateway.
Dashboard
The dashboard exposes a SandboxSummaryCard component that renders the per-policy observe-mode aggregates:
┌─ SANDBOX SUMMARY ────────────────────────────────┐
│ coding-team-sandbox (last 24h) │
│ │
│ 47 12 3 │
│ Would-be Would-be Would-be │
│ denies redactions pending approvals │
│ │
│ Top matched rule: block-bash-rm-rf (31×) │
│ │
│ [View all events] [Export CSV] [Enable live →] │
└───────────────────────────────────────────────────┘
The amber colour is intentional — it visually contrasts with the dashboard’s red (live-deny) and green (live-allow) tokens so an operator can tell at a glance whether they’re looking at observe-mode aggregates or live enforcement data.
Status (2026-05): the card primitive is shipped (AAASM-1563). The full integration — wiring it into Policy detail, the audit-log toggle, the amber row badge, and the “Enable live enforcement” action — is tracked under AAASM-1911 and depends on
aa-apisurface changes that aren’t in this release.
Graduating to live enforcement
Once you’ve reviewed the shadow events and tuned the policy:
- Inspect the most-common would-be violations:
aasm audit list --dry-run-only --since 7d --format json \ | jq 'group_by(.shadow_decision) | map({decision: .[0].shadow_decision, count: length})' - Adjust the policy — tighten matchers that fired too eagerly, relax ones that blocked legitimate work.
- Re-apply in observe mode for another short window to confirm the tuned policy behaves as expected.
- Flip to enforce:
enforcement_mode: enforceaasm policy apply --file my-policy.yaml
The cutover is instantaneous from the next CheckAction call onward — no agent restart required. Already-in-flight actions evaluated before the swap keep their original posture.
FAQ
Does observe mode affect performance?
No measurable difference. The rule pipeline runs identically; the only added work is writing the shadow audit event when a non-Allow decision would have fired. That’s the same audit-write path live enforcement already uses, so the per-request cost is dominated by the rule evaluation itself.
Are redacted payloads ever stored in observe mode?
No. The redact decision in observe mode forwards the unredacted payload to the agent (that’s the whole point — “what would have happened if we’d enforced”). The shadow audit event records that a redact rule matched, but neither the would-be redacted version nor the raw payload is persisted as a separate artefact. The audit pipeline’s existing PII-scanner pass still applies before any event is written.
Can I set observe mode per-agent without changing the policy? Yes — three ways:
- CLI:
aasm run --observe <tool>for the duration of that session. - SDK: pass
enforcement_mode="observe"(Python / Go) orenforcementMode: "observe"(Node.js) atinitAssembly. - Policy YAML:
agent_overridesblock targeting anagent_glob.
The per-agent override always wins over the policy document’s default.
What happens to an agent that’s mid-action when I flip from observe to enforce?
The action that’s already through CheckAction keeps its observe-mode disposition (allowed). The very next CheckAction call sees the new posture and starts enforcing. There’s no in-flight rollback.
Does the SDK have any guard against accidentally registering in observe mode? The SDK doesn’t second-guess the operator — observe mode is a deliberate posture. What the SDK does is:
- Reject typos (e.g.
"obesrve") with a clear error atinittime - Default to “no opinion” (omits the field from the registration body) so a pre-feature SDK call gets the gateway’s server-side
enforcedefault — only operators who explicitly opt in get observe mode
Can I use observe mode in production for a long-running agent? That’s the recommended pattern for new policies — run them in observe mode for a week, review the shadow events, then cut over. The audit log retention follows your normal retention policy, so the shadow events are queryable for as long as live events.
See also
- L0–L3 Capability Matrix — sandbox mode applies at all governance tiers
- System architecture — where the policy evaluator sits in the request pipeline
Last updated: 2026-06-11 by Chisanan232
Compliance Export
aasm audit compliance-export produces a full-fidelity export of a
per-session audit JSONL file for downstream regulatory review and SIEM
ingestion. Unlike aasm audit export (which queries the live gateway
through /api/v1/logs and emits a slim summary view), this command reads
directly from the on-disk JSONL files written by the gateway’s
AuditWriter, preserving the hash chain, credential findings, and
delegation lineage that an auditor needs to verify integrity offline.
When to use
Use aasm audit compliance-export whenever the produced bytes will leave
the gateway operator’s trust boundary — for example:
- Annual EU AI Act / SOC 2 evidence packs.
- Continuous SIEM ingestion (Splunk, ELK, Datadog) where each entry is treated as one log line.
- Cold-storage archives that must survive a future schema upgrade.
Use aasm audit export for the operational summary view (CSV / JSON
array of the slim REST shape) when you only need a quick at-a-glance
report and the consumer does not need the hash chain.
Output format
The default --format jsonl emits one ComplianceRecord per
line. Each record carries:
| Field | Meaning |
|---|---|
seq | Monotonic sequence within the session. |
timestamp | ISO 8601 UTC. |
event_type | ToolCallIntercepted, PolicyViolation, etc. |
agent_id, session_id | Hex-encoded 16-byte identifiers. |
payload | Pre-serialised JSON of the decision context. |
previous_hash, entry_hash | Hex-encoded SHA-256 anchors of the tamper-evident chain. |
credential_findings | Detected credential kinds + byte offsets (never the raw secret). |
redacted_payload | Post-redaction text when the gateway substituted secrets, null when clean. |
root_agent_id, parent_agent_id, team_id, delegation_reason, spawned_by_tool, depth | Lineage fields when the originating entry recorded them. |
--format json produces a pretty-printed JSON array of the same records
for human review. --format csv produces a flat spreadsheet view with
the regulator-relevant columns plus a credential_findings_count and a
boolean redacted flag; the payload body and lineage are dropped from
CSV to keep the file approachable in spreadsheet tools — use JSONL for
full fidelity.
Common invocations
Export an entire session in JSONL to a file:
aasm audit compliance-export \
--input /var/lib/aa-gateway/audit/session-<hex>.jsonl \
--format jsonl \
--output-file ./session.jsonl
Restrict to PolicyViolation entries in the last 24 hours and write to
stdout (pipe-friendly):
aasm audit compliance-export \
--input /var/lib/aa-gateway/audit/session-<hex>.jsonl \
--event-type PolicyViolation \
--since 24h
Generate an EU AI Act evidence pack with a regulatory header:
aasm audit compliance-export \
--input /var/lib/aa-gateway/audit/session-<hex>.jsonl \
--format jsonl \
--compliance eu-ai-act \
--output-file ./eu-ai-act-evidence.jsonl
The --compliance header lines begin with # so JSONL ingestors that
treat # as a comment skip them automatically; ingestors that do not
should be configured to strip the header band on the way in.
Verifying the export
The export carries the same hash chain as the source JSONL. To verify chain integrity offline, run:
aasm audit verify-chain /var/lib/aa-gateway/audit/session-<hex>.jsonl
verify-chain consumes the raw on-disk file rather than the export, so
the verifier sees exactly the bytes the gateway wrote. An auditor with
the export and a SHA-256 implementation can independently re-hash each
record’s canonical input (see the audit module
documentation for the canonical bytes layout) and
compare against the embedded entry_hash.
Security invariants
- The export never carries raw credential values.
credential_findingsrecords onlykind,offset, and the[REDACTED:<Kind>]label. redacted_payload(when present) is the scanner’s substitution output, with raw secret bytes already replaced by[REDACTED:<Kind>]markers.payloadretains the original (pre-redaction) string only when the source entry did so; the gateway’s default policy is to replacepayloadwithredacted_payloadon persistence when findings exist, so by default the export carries no raw secret. Operators who pipe pre-redaction payloads downstream do so explicitly via configuration.
Last updated: 2026-05-25 by Chisanan232
Agent-to-Agent Identity (Zero-trust A2A)
Agent Assembly enforces a zero-trust posture on every agent-to-agent
(A2A) tool dispatch: when agent A calls a tool exposed by agent B, the
gateway verifies that the caller’s credentials match the claimed
identity before any policy rule is evaluated. An impersonator (a third
agent C presenting A’s agent_id with C’s own credential_token) is
rejected at the front door and the attempt is recorded in the audit log.
How identity flows on an A2A call
agent A ── tool dispatch ──▶ agent B
│
▼
gateway PolicyService.CheckAction
│
▼
┌───── validate_credential_token ─────┐
│ registered token for agent_id │
│ matches the supplied token? │
└─────────────────┬───────────────────┘
│
┌────────────┴────────────┐
▼ ▼
Allow → evaluate policy Reject → A2AImpersonationAttempted
audit event + Deny response
agent_idin the request = the callee (the agent performing the action B).caller_agent_idin the request = the originator (A).credential_tokenis validated against the callee’s registered token —caller_agent_idis an attestation by the callee, not a credential.
Audit events
Two AuditEventType variants make A2A traffic explicit in the chain:
| Variant | Emitted when | Payload fields |
|---|---|---|
A2ACallIntercepted | Allow decision on a request whose caller_agent_id differs from agent_id. | caller_agent_id, callee_agent_id, plus the usual action_type, decision, policy_rule, latency_us. |
A2AImpersonationAttempted | Pre-policy-eval rejection because credential_token is empty or does not match the registered token for the claimed agent_id. | claimed_agent_id, credential_token_present (bool), reason, policy_rule = "a2a_identity_verification". |
Single-agent calls (no caller_agent_id, or caller equals callee) keep
emitting the existing ToolCallIntercepted / PolicyViolation
variants — nothing changes for non-A2A traffic.
Rejection rules
The gateway rejects before policy evaluation when:
- The claimed
agent_idis registered AND the suppliedcredential_tokenis empty → Deny with reason"missing credential token". - The claimed
agent_idis registered AND the suppliedcredential_tokenis non-empty but does not match the registered token → Deny with reason"credential token mismatch".
When the claimed agent is not registered, the gateway skips
identity validation and lets the policy engine decide (this preserves
the lightweight detection-slice fixtures that bypass the registry
entirely). To opt into strict validation for a specific agent,
register it via the AgentRegistry — that’s the activation gesture.
Operator visibility
Use the existing audit tooling to surface A2A activity:
# All A2A allows in the last hour
aasm audit list --since 1h --event-type A2ACallIntercepted
# Rejected impersonation attempts (security investigation)
aasm audit list --event-type A2AImpersonationAttempted
# Compliance export covering A2A traffic specifically
aasm audit compliance-export \
--input /var/lib/aa-gateway/audit/session-<hex>.jsonl \
--event-type A2ACallIntercepted \
--format jsonl \
--output-file ./a2a-traffic.jsonl
SDK expectations
When you build an A2A dispatch helper in your SDK, populate the
CheckActionRequest like this:
| Field | Set to |
|---|---|
agent_id | The callee (the agent that will execute the tool). |
credential_token | The callee’s registered token. |
caller_agent_id | The originator of the dispatch, attested by the callee. |
The Python / Node / Go SDKs ship A2A helpers that wrap this for you.
For framework-level integrations that build CheckActionRequest
directly, the new field is optional and proto3-additive — single-agent
SDKs that don’t populate it continue working unchanged.
What does not change
- Single-agent tool calls — no behavioural change, no new audit events.
- The credential validation is scoped to registered agents — bypassing the registry continues to be the recommended path for in-process tests and CI fixtures that don’t model identity.
- The policy engine — A2A enforcement is a pre-evaluation gate, not a new policy clause; existing rules still apply once the call passes identity validation.
Last updated: 2026-05-25 by Chisanan232
Tool Execution Sandbox — Network Egress
Agent Assembly’s Tool Execution Sandbox enforces a network allowlist on outbound traffic from sandboxed tools: when a tool tries to CONNECT to a host that is not on the allowlist, the proxy returns HTTP 403 before any upstream dial and emits an audit event recording the blocked egress. This is the network half of spec highlight ④ (Tool Execution Sandbox); the filesystem-isolation half is tracked under AAASM-1965.
Configuration
The allowlist is configured on the aa-proxy process via the
AA_PROXY_NETWORK_ALLOWLIST environment variable. Comma-separated;
empty means “no allowlist filter” (the pre-AAASM-1943 default-open
posture is preserved when the variable is unset).
export AA_PROXY_NETWORK_ALLOWLIST='api.openai.com,*.anthropic.com,*.googleapis.com'
aa-proxy run
Equivalent policy-DSL form (operator-facing documentation; the proxy reads from the env var today, with policy-DSL → proxy-config sync tracked under the AAASM-1232 closeout matrix):
apiVersion: agent-assembly.dev/v1alpha1
kind: GovernancePolicy
metadata:
name: prod-egress
version: "1.0.0"
spec:
network:
allowlist:
- api.openai.com
- "*.anthropic.com"
- "*.googleapis.com"
Pattern grammar
The same matcher (aa_core::policy::is_host_allowed_by_egress_allowlist)
is used by the proxy enforcement path and the gateway policy DSL. The
grammar is intentionally narrow:
| Pattern | Matches | Does NOT match |
|---|---|---|
api.openai.com (exact) | api.openai.com (case-insensitive) | chat.openai.com, openai.com, attackerapi.openai.com |
*.openai.com (leftmost-label wildcard) | api.openai.com, chat.openai.com, a.b.openai.com | openai.com (bare), evil.openai.com.attacker.net (suffix attack) |
* (universal — escape hatch) | every host | — |
No mid-label *, no character classes, no full POSIX glob. Allowlist
patterns that look more permissive than they are have historically been
the source of egress-rule misconfigurations; the narrow grammar lets
operators reason about every pattern at a glance.
The attacker-crafted-suffix case (evil.openai.com.attacker.net against
*.openai.com) is a classic confusion attack: the attacker hopes a
permissive glob would match. The narrow grammar rejects it.
Audit events
Both the allow and deny CONNECT paths emit PipelineEvent::Audit
events on the proxy’s broadcast channel. The deny path additionally
returns HTTP 403 Forbidden\r\nContent-Length: 0\r\n\r\n to the
sandboxed tool, which sees a connection refusal at its language-level
HTTP client.
Audit reviewers can correlate blocked-egress events to source tools
via the existing aasm logs / aasm audit list tooling. The audit
payload carries the target host so operators can spot patterns (e.g.
a tool repeatedly trying to reach a c2 server).
# Recent denied CONNECT attempts
aasm logs --since 1h --grep "denied by network allowlist"
# Compliance export of all network-policy violations
aasm audit compliance-export \
--input /var/lib/aa-gateway/audit/session-<hex>.jsonl \
--event-type PolicyViolation \
--format jsonl \
--output-file ./network-violations.jsonl
What this does NOT cover (deferred to AAASM-1965)
This page documents the network-egress half of spec highlight ④. The
filesystem-isolation half (“cat /etc/passwd from inside a sandboxed
tool blocked / redacted”) requires a WASM/WASI sandbox runtime that
doesn’t yet exist in the repo. Filed under
AAASM-1965
as a Story-point-8 follow-up:
aa-wasmextended withwasmtime+ WASI preview 1 host handlers.ToolRegistrydistinguishing WASM-runnable tools from native / shell tools.- Filesystem allowlist enforcement returning
EACCESfor paths outside the sandbox root. - E2E tests for the
cat /etc/passwddenial path.
The ST-W ignored placeholder in
aa-integration-tests/tests/e2e_tool_sandbox.rs::st_w_1_filesystem_isolation_for_sandboxed_tools
contains the exact assertion plan the follow-up will fill in.
Last updated: 2026-05-25 by Chisanan232
Org-Tier Isolation (Multi-Tenancy)
Agent Assembly enforces a three-tier isolation hierarchy — Org / Team / Agent — so a single gateway can safely host workloads from multiple tenants. AAASM-1524 covers the Agent and Team tiers; this guide describes the Org tier added in AAASM-2008.
What the Org tier guarantees
When agents are registered with a non-empty proto.AgentId.org_id, the
gateway enforces the following invariants:
| Surface | Org-tier behaviour |
|---|---|
| Audit log | Every audit entry carries the originating agent’s org_id on Lineage. GET /api/v1/logs?org_id=X filters to a single tenant. |
| Topology | GET /api/v1/topology/overview?org_id=X returns only X’s agents. The registry maintains an org_index secondary index for O(members) lookup. |
| Credential validation | An agent registered in Org A presenting its valid token but claiming agent_id.org_id = "B" is rejected with A2AImpersonationAttempted. The registry’s credential reverse-index catches cross-org reuse before any policy evaluation. |
| Policy scope | A policy with scope: org:<id> cascades only for agents in that org. (Requires the multi-document loader from AAASM-2023 — partial today.) |
| Budget | Every Org owns an independent spend envelope on the BudgetTracker.org_budgets map. record_cost rolls each charge into the agent’s org_id and enforces org_daily_limit_usd / org_monthly_limit_usd set via policy YAML or the with_org_*_limit builders. Exhausting one Org’s envelope never affects another. |
How to set up multi-tenancy
Register each agent with a non-empty org_id:
init_assembly(
gateway="grpc://gateway:50051",
agent_id={
"org_id": "acme",
"team_id": "platform",
"agent_id": "research-bot-001",
},
credential_token=os.environ["AA_CREDENTIAL"],
)
The same convention applies via the Node and Go SDKs and via direct
PolicyService.CheckAction calls — the proto AgentId triple is the
canonical identity.
Querying by Org
Audit log
# Browser / curl
curl 'http://gateway/api/v1/logs?org_id=acme&per_page=50'
# Compliance export covering one org's audit trail
aasm audit compliance-export \
--input /var/lib/aa-gateway/audit/session-<hex>.jsonl \
--org-id acme \
--format jsonl \
--output-file ./acme-audit.jsonl
Audit entries written before the agent was registered with an org_id
(or by lightweight test fixtures that bypass the registry) carry
org_id = None on Lineage and never match an explicit org_id
filter. This is intentional — multi-tenancy isolation requires explicit
Org tagging on the entry at write time.
Topology
curl 'http://gateway/api/v1/topology/overview?org_id=acme'
The overview endpoint scopes via AgentRegistry::org_members(oid). The
other topology endpoints (tree, team, lineage, stats) accept the
org_id query parameter but currently ignore it — the next ticket in
the Org-tier rollout will wire each handler.
Cross-org credential reuse detection
When an agent in Org A presents its credential but claims agent_id.org_id = "B", the gateway:
- Computes the registry key from the claimed
{org_id, team_id, agent_id}triple. Becauseorg_idis part of the hash, the claimed key differs from the agent’s actual registration key. - Looks up the claimed key — fails (no agent registered there).
- Looks up the supplied credential_token in the reverse index — finds the actual owner.
- Detects the mismatch, returns
Denywith reason"credential token registered to a different agent", and emits anA2AImpersonationAttemptedaudit event withclaimed_org_idin the payload.
A reviewer searching aasm audit list --event-type A2AImpersonationAttempted
sees these attempts grouped by the org the attacker tried to claim.
Configuring Org-tier budget limits
Operator-facing knobs live in the budget: section of any Global-scoped
policy document:
budget:
daily_limit_usd: 10000.0 # global cap across all orgs
monthly_limit_usd: 250000.0
org_daily_limit_usd: 1000.0 # AAASM-2022 — per-org daily cap
org_monthly_limit_usd: 25000.0 # AAASM-2022 — per-org monthly cap
timezone: "UTC"
action_on_exceed: deny
Semantics:
org_daily_limit_usd/org_monthly_limit_usdare uniform per-Org caps — the same envelope applies to every Org that records spend. Cross-Org isolation comes from the tracker maintaining an independentBudgetStateperorg_id, not from per-Org-customised limits.- Enforcement order in
record_costis global → org → team → agent, monthly checked before daily within each tier. The first tier that exceeds returnsBudgetStatus::LimitExceededand the deny is recorded. - Limits enter the tracker via
with_org_daily_limit/with_org_monthly_limitbuilders during policy load. Restoring from persisted snapshot preserves limits via the same path — theorg_budgetsmap is empty on first restore until the migration in AAASM-2022 follow-up lands.
Observing per-Org spend
#![allow(unused)]
fn main() {
// In-process accessor:
let alpha = budget.org_state("acme").map(|s| s.spent_usd);
}
The dashboard / CLI surfaces for aasm budget status --org <id> are
queued under AAASM-1232
follow-up subtasks.
Known gaps
- Org-scoped policy E2E:
PolicyEngine::load_from_filedoesn’t populate the scope_index, soscope: org:<id>policies need a multi-document loader — AAASM-2023. - Topology endpoints beyond
overview: tree / team / lineage / stats accept theorg_idquery param but currently ignore it. - Persistence schema for Org-tier spend: the on-disk snapshot does
not yet carry the
org_budgetsmap; a restored tracker starts with empty Org state.
The headline scenarios — audit isolation, topology overview scoping, cross-org credential rejection (AAASM-2008), and cross-org budget envelope isolation (AAASM-2022) — ship complete.
Last updated: 2026-05-25 by Chisanan232
Multi-Document Policy Cascade
PolicyEngine::load_cascade_from_dir(dir) loads every *.yaml file in a
directory and populates the gateway’s scope_index so each document
cascades by its declared scope (Global / Org(<id>) / Team(<id>) /
Agent(<id>)). This unlocks org-scoped, team-scoped, and agent-scoped
policy rules in the runtime evaluation path — a capability that
load_from_file (single-document) does not provide.
When to use
- Multi-tenant deployments where each org needs its own deny/allow overrides on top of a Global baseline.
- Team-level guardrails layered on top of the org’s rules (e.g. “platform team can use bash, but support cannot”).
- Per-agent escape hatches for a single high-risk agent that needs a narrower allowlist than its team’s default.
Single-policy deployments should continue using load_from_file — the
cascade adds zero value when there’s only one document.
Directory layout
policies/
├── 000-global-allow-all.yaml # scope: global (or omitted)
├── 100-org-acme-deny-bash.yaml # spec.scope: org:acme
├── 200-team-platform.yaml # spec.scope: team:platform
└── 300-agent-research-bot.yaml # spec.scope: agent:<UUID>
Filename prefixes are convention only — the loader sorts alphabetically so the cascade order is deterministic across filesystems. Use numeric prefixes to make precedence visually obvious.
Scope field placement (gotcha)
When using the envelope format (apiVersion / kind / metadata /
spec), the scope: field MUST live inside spec:, not at the outer
envelope level:
# CORRECT — scope inside spec
apiVersion: agent-assembly.dev/v1alpha1
kind: GovernancePolicy
metadata:
name: org-acme-deny-bash
spec:
scope: org:acme
tools:
bash:
allow: false
# WRONG — scope at envelope level is SILENTLY IGNORED
apiVersion: agent-assembly.dev/v1alpha1
kind: GovernancePolicy
metadata:
name: org-acme-deny-bash
scope: org:acme # ← will be ignored; document defaults to Global
spec:
tools:
bash:
allow: false
The validator’s envelope parser deserializes spec’s value as a
RawPolicyDocument — outer-level keys outside the envelope frame are
silently dropped. Always put scope: inside spec:.
How the cascade is collected
At evaluation time, the gateway walks scopes from broadest to narrowest for the calling agent’s lineage:
- Global — every Global-scoped document.
- Org — documents matching the agent’s
lineage.org_id. The org is resolved fromctx.metadata["org_id"](populated by the SDK’s protoAgentId.org_id). - Team — documents matching the agent’s
lineage.team_id. - Agent — documents matching the agent’s
lineage.agent_id.
Each level augments the cascade — Global rules still apply for
agents in org-acme; the org-acme rules are added on top. The decision
merger (merge_decisions) resolves conflicts with narrower scopes
winning (Agent > Team > Org > Global).
How org_id flows from request to cascade
The cascade’s filtering by lineage.org_id works through two paths:
- From request context —
convert.rs::request_to_coredepositsproto.org_idintoctx.metadata["org_id"].PolicyEngine::evaluatereads this first and uses it as the lineage hint. This is the primary path. - From registry fallback — when
ctx.metadata["org_id"]is empty (e.g. for traffic that doesn’t go through the SDK’s identity plumbing), the engine falls back toregistry.lineage(agent_id).
The primary path is what makes scope: org:<id> work end-to-end: every
SDK call that populates AgentId.org_id lands in the right org’s
cascade automatically.
Programmatic loading
For tests or programmatic setups that don’t use a directory:
#![allow(unused)]
fn main() {
use aa_gateway::PolicyEngine;
use tokio::sync::broadcast;
let (alert_tx, _) = broadcast::channel(64);
let engine = PolicyEngine::load_cascade_from_dir(
std::path::Path::new("/etc/aa-gateway/policies/"),
alert_tx,
)?;
}
The loader returns the same PolicyEngine type as load_from_file,
so it drops into existing service wiring without code changes.
Caveats
- No filesystem watcher — the cascade is static at load. Hot-reload across multiple files is a separate concern; restart the gateway to pick up changes.
- First Global doc supplies budget config — alphabetical order
determines which Global document’s
budget:block sets daily / monthly limits anddata.sensitive_patterns. If two Global docs disagree on budget, the alphabetically-first one wins. - Parse failures abort the whole load — partial loads would be a
worse failure mode than the loud abort; the caller gets a
PolicyParseErrorfor the first bad file.
Related
- AAASM-2008 — Org-tier isolation (closes the audit / topology / credential surfaces; deferred the policy-scope half to this ticket).
aa-gateway/tests/cascade_merge_test.rs— pure-logic unit tests of the cascade evaluator (independent of the loader).aa-integration-tests/tests/e2e_org_isolation.rs::st_org_4_*— the E2E test that exercises this loader against a real gateway.
Last updated: 2026-05-25 by Chisanan232
Releases
This page tells you where to find a published build, which channels it ships to, and how the release is cut.
agent-assembly is in the v0.0.1 alpha pre-release series. The public API
and wire protocol are not yet stable.
Warning: every published tag is a pre-release. Do not run
v0.0.1-alpha.*in production — the wire protocol can change between alphas.
Where releases live
- GitHub Releases: https://github.com/ai-agent-assembly/agent-assembly/releases
— the source of truth for published tags and changelogs. The latest tag is a
pre-release (
v0.0.1-beta.2, 2026-06-15). - Per-tag notes: the source-controlled release notes live under
docs/release/(one file per tag, e.g.docs/release/v0.0.1-beta.2.md). - Top-level changelog:
CHANGELOG.md.
Distribution channels
A single coordinated tag push fans out to every channel:
| Channel | Artifact |
|---|---|
| GitHub Releases | aasm-*.tar.gz binaries + SHA256SUMS |
| crates.io | Workspace crates at the tag version |
| Homebrew tap | aasm formula (homebrew-agent-assembly) |
| PyPI / npm | SDK packages |
| GHCR | Container image |
Release process
The mechanics (version bump, tag, changelog, multi-channel publish) are driven
by the automated release workflow. Operators follow the pre-tag checklist in the
release runbook at docs/release/RUNBOOK.md. See also the
Versioning Policy and Compatibility Matrix.
Last updated: 2026-06-15 by Chisanan232
Performance Benchmark Baseline
Baseline results recorded on 2026-04-29. Machine: Apple M-series (arm64), macOS Darwin 25.2.0.
All benchmarks run with cargo bench in release profile.
SDK Hook Overhead (aa-ffi-python)
Target: < 2 ms P99 per LLM call (AAASM-34 AC #6).
| Benchmark | Mean | Low | High |
|---|---|---|---|
report_llm_call_channel | 237 ns | 229 ns | 245 ns |
Verdict: PASS — 3 orders of magnitude below the 2 ms target.
Note (AAASM-2562): the
aa-ffi-pythonSDK-hook benchmark (sdk_bench) moved to thepython-sdkrepo when the fat binding left this workspace — run it there withcargo bench --bench sdk_bench. The numbers above are retained as the historical 2026-04-29 baseline.
Proxy Intercept Latency (aa-proxy)
Target: < 5 ms P99 per intercepted request (AAASM-36 AC #5).
| Benchmark | Mean | Low | High |
|---|---|---|---|
intercept/openai_response | 2.74 us | 2.74 us | 2.75 us |
intercept/openai_with_credential_redaction | 3.82 us | 3.79 us | 3.86 us |
Verdict: PASS — both variants well below the 5 ms target. Credential redaction adds ~1 us overhead.
Gateway Policy Check (aa-gateway)
| Benchmark | Mean | Low | High |
|---|---|---|---|
check_action_rpc/round_trip/minimal_llm_call | 79.6 us | 78.8 us | 80.5 us |
check_action_rpc/round_trip/full_tool_call_1kb | 79.6 us | 78.3 us | 80.9 us |
check_action_rpc/round_trip/worst_case_network | 76.3 us | 75.6 us | 76.9 us |
Credential Scanner Throughput (aa-core)
| Benchmark | Mean | Throughput |
|---|---|---|
scanner/scan_1mb_payload | 6.31 ms | ~159 MB/s |
Comparing Against Baseline
Run cargo bench to generate HTML reports in target/criterion/.
Each benchmark group produces a report/index.html with historical
comparison charts when prior runs exist.
To compare against this baseline:
- Run
cargo benchon the baseline commit to populatetarget/criterion/. - Run
cargo benchon the new commit — Criterion auto-compares and reports percentage change with statistical significance.
Last updated: 2026-06-06 by Chisanan232
Build-Time Baseline
Before/after harness for Epic AAASM-2551 (Rust build & compile-time performance). This page records the build-time baseline established by Story AAASM-2557 so the profile (AAASM-2553), dev/linker (AAASM-2554), dependency-dedup (AAASM-2555), and CI (AAASM-2556) Stories can each quote a measured before/after against the same harness.
This is distinct from Baseline, which records runtime (
cargo bench) numbers. This page measures how long the workspace takes to compile, not how fast it runs.
Harness
Run the full capture with:
make build-baseline # wraps scripts/build-baseline.sh
# or
bash scripts/build-baseline.sh
The harness records four measurements and archives the raw outputs (logs, the
cargo build --timings HTML, the top-crate extraction, and the cargo tree -d
report) under target/build-baseline/ (gitignored):
| # | Measurement | Command |
|---|---|---|
| 1 | Cold build | cargo clean then cargo build --workspace --timings |
| 2 | Warm rebuild | touch aa-cli/src/main.rs then cargo build --workspace |
| 3 | Test build | cargo nextest run --workspace --no-run (compile only) |
| 4 | Duplicate deps | cargo tree -d |
Measurement 3 deliberately compiles the test binaries without running
them: the build-time signal the profile/linker/dedup Stories move is the
compile cost, whereas the full suite’s run wall-clock is dominated by
Docker-backed integration tests and is sensitive to timing flakes. Set
BUILD_BASELINE_RUN_TESTS=1 to additionally run the full suite
(--no-fail-fast) and record its build+run wall-clock.
Why aa-ebpf is excluded
aa-ebpf requires a nightly toolchain plus bpf-linker, so the workspace’s own
make build-workspace and make test targets build with --exclude aa-ebpf.
The baseline mirrors that to measure the build path developers and the
non-eBPF CI jobs actually hit. Pass BUILD_BASELINE_INCLUDE_EBPF=1 to include
it on a nightly-capable host. Other tunables: BUILD_BASELINE_WARM_FILE,
BUILD_BASELINE_TOP_N, BUILD_BASELINE_OUT (see the script header).
Reproducibility notes
- Wall-clock is whole-second resolution from the shell; expect a few percent run-to-run variance, especially for the link-bound warm rebuild.
- Numbers are machine-specific. Always compare a before/after pair captured on the same machine — never an absolute number against a different host.
- The third-party registry cache (
~/.cargo) is shared, so the cold build measures compile + link time, not crate download time.
Recorded baseline
Captured 2026-06-05 on Apple M-series (arm64, 16 logical CPUs, 128 GB),
macOS Darwin 25.4.0, cargo 1.95.0, cargo-nextest 0.9.133, default
[profile.dev] and [profile.release] (i.e. the pre-Epic configuration).
| Measurement | Wall-clock |
|---|---|
Cold build (cargo build --workspace --timings) | 124 s |
Warm rebuild (touch aa-cli/src/main.rs, relink) | 5 s |
Test build (cargo nextest run --workspace --no-run) | 396 s |
Packages built in >1 version (cargo tree -d) | 34 |
Distinct duplicate (name, version) build units | 105 |
Local wall-clock is noisy: across three runs the cold build measured 91–211 s on this machine (background load / thermal). Treat these as the local order-of-magnitude; the Epic’s per-Story before/after pairs must be captured on the same idle machine, and CI numbers are authoritative.
Top longest-compiling crates
From the archived cargo build --timings HTML
(target/build-baseline/cargo-timing.html), summing each crate’s units
(build-script + lib + codegen):
| Rank | Compile (s) | Crate |
|---|---|---|
| 1 | 63.6 | aws-lc-sys 0.40.0 |
| 2 | 35.2 | wasmtime 45.0.0 |
| 3 | 33.7 | cranelift-codegen 0.132.0 |
| 4 | 29.8 | rustls 0.23.40 |
| 5 | 25.3 | object 0.39.1 |
| 6 | 25.2 | libsqlite3-sys 0.30.1 |
| 7 | 23.1 | asn1-rs 0.7.1 |
| 8 | 22.9 | thiserror 1.0.69 |
| 9 | 21.0 | rustix 1.1.4 |
| 10 | 21.0 | wasmtime-internal-jit-debug 45.0.0 |
The long poles are the WebAssembly stack (wasmtime, cranelift-codegen,
wasmtime-internal-jit-debug — pulled by aa-wasm) and crypto/TLS
(aws-lc-sys, rustls), confirming the Epic’s hypothesis. Per-crate seconds
shift run-to-run with build parallelism, but this set is stable.
Duplicate dependencies (dedup baseline for AAASM-2555)
cargo tree -d reports 34 packages built in more than one version
(105 distinct (name, version) units). The worst offenders:
| Versions | Package |
|---|---|
| 4 | hashbrown |
| 3 | rand, rand_core, getrandom |
| 2 | winnow, webpki-roots, wast, wasm-encoder, untrusted, toml, toml_datetime, thiserror-impl, … |
The complete set of multi-version packages — the committed dedup baseline for
AAASM-2555 to diff against — follows. The full cargo tree -d report (with the
inverted dependent trees) is also archived at
target/build-baseline/cargo-tree-dups.txt for the dependency paths.
block-buffer v0.10.4 v0.12.0
const-oid v0.9.6 v0.10.2
convert_case v0.10.0 v0.11.0
cpufeatures v0.2.17 v0.3.0
crypto-common v0.1.7 v0.2.1
deadpool v0.12.3 v0.13.0
deadpool-runtime v0.1.4 v0.3.1
digest v0.10.7 v0.11.3
fixedbitset v0.4.2 v0.5.7
foldhash v0.1.5 v0.2.0
getrandom v0.2.17 v0.3.4 v0.4.2
hashbrown v0.14.5 v0.15.5 v0.16.1 v0.17.1
hashlink v0.9.1 v0.10.0
hmac v0.12.1 v0.13.0
itertools v0.13.0 v0.14.0
lru v0.16.4 v0.18.0
petgraph v0.6.5 v0.8.3
phf v0.11.3 v0.12.1
phf_shared v0.11.3 v0.12.1
rand v0.8.6 v0.9.4 v0.10.1
rand_chacha v0.3.1 v0.9.0
rand_core v0.6.4 v0.9.5 v0.10.1
reqwest v0.12.28 v0.13.3
sha2 v0.10.9 v0.11.0
similar v2.7.0 v3.1.1
thiserror v1.0.69 v2.0.18
thiserror-impl v1.0.69 v2.0.18
toml v0.9.12 v1.1.2
toml_datetime v0.7.5 v1.1.1
untrusted v0.7.1 v0.9.0
wasm-encoder v0.248.0 v0.251.0
wast v35.0.2 v251.0.0
webpki-roots v0.26.11 v1.0.7
winnow v0.7.15 v1.0.2
AAASM-2555 should re-run cargo tree -d after centralizing
[workspace.dependencies] and confirm this count drops.
Full test build+run (context)
The default harness records test compile time only, because the full
suite’s run wall-clock is dominated by integration-test execution rather than
the build. For reference, one BUILD_BASELINE_RUN_TESTS=1 capture on the same
machine measured 3452 s end-to-end build+run — of which the run phase was
Summary [2546 s] 3764 tests run: 3744 passed (228 slow, 4 leaky), 20 failed.
The 20 failures are local timing-sensitive integration assertions (e.g. the
aa-api L1-invalidation 100 ms check) and do not affect compile time. This
number is here for completeness; the profile/linker/dedup Stories should be
judged against the compile rows above, not this run-dominated figure.
Acceptance-criteria mapping (AAASM-2557)
| Acceptance criterion | Evidence |
|---|---|
| Baseline numbers for cold build, warm rebuild, and test build+run recorded | “Recorded baseline” → wall-clock table (cold/warm/test-build) + “Full test build+run (context)” |
cargo build --timings HTML identifies the top 5 longest-compiling crates | “Top longest-compiling crates” table (target/build-baseline/cargo-timing.html) |
cargo tree -d attached as the dedup baseline for AAASM-2555 | “Duplicate dependencies” table (target/build-baseline/cargo-tree-dups.txt) |
Last updated: 2026-06-05 by Chisanan232
PolicyService CheckAction RPC — Latency Benchmark Results
Environment
| Parameter | Value |
|---|---|
| CPU | Apple M3 Max |
| Memory | 128 GB |
| OS | macOS 26.2 (Darwin) |
| Rust | 1.95.0 (2026-04-14) |
| Tonic | 0.13.1 |
| Transport | TCP loopback (127.0.0.1) |
| Profile | --release (optimized) |
SLA Target
p99 < 5ms end-to-end round-trip (serialize + transport + evaluate + respond).
Criterion Micro-Benchmarks
Reused TCP connection, single client, 100 samples per variant.
| Payload Variant | Description | Mean | Std Dev |
|---|---|---|---|
minimal_llm_call | LlmCallContext, no PII | 77.9 us | ~1 us |
full_tool_call_1kb | ToolCallContext, ~1KB args_json | 82.2 us | ~1 us |
worst_case_network | NetworkCallContext, long URL (~400 bytes) | 81.9 us | ~1 us |
Sustained Load Test (60 seconds)
1,000 req/sec sustained for 60 seconds, 10 concurrent clients, ToolCallContext payload.
| Metric | Value | vs SLA |
|---|---|---|
| Total requests | 60,000 | |
| Actual RPS | 999 | |
| p50 | 144 us | 34x headroom |
| p95 | 357 us | 14x headroom |
| p99 | 803 us | 6.2x headroom |
| p999 | 2.65 ms | 1.9x headroom |
| max | 10.89 ms |
Verdict
PASS — p99 latency of 803 us is well under the 5ms SLA target with 6.2x headroom.
The max latency (10.89 ms) exceeds 5ms but this is expected for a single outlier in 60,000 requests on a non-isolated workstation. The p999 (2.65 ms) confirms the tail is well-bounded for all practical purposes.
Last updated: 2026-05-04 by Chisanan232
CI/CD Pipeline Performance
Before/after record of the CI/CD workflow redesign delivered under Epic AAASM-2551 (Rust build & compile-time performance — local + CI). This page documents what changed and why, and quotes real GitHub Actions run data proving the speed-up.
This is distinct from Build-Time Baseline, which measures how long the workspace takes to compile. This page measures how long the CI pipeline takes end-to-end per change, and how much runner compute it consumes.
The problem (before)
ci.yml had ~30 jobs gated by a binary changes router (dorny/paths-filter
emitting only rust / dashboard / ebpf). Any edit under aa-*/** set
rust == true, which fanned out to ~22 Rust jobs regardless of which sub-area
changed — including the expensive ones that are almost never relevant to a
given change: the eBPF nightly build + sudo e2e, the proto breaking-check, the
OpenAPI drift + Spectral lint, the schema lint, the TimescaleDB and
migration-drift testcontainer jobs, full llvm-cov coverage, SonarCloud, and the
criterion benchmark. There was also no aggregate gate job, and the
aa-integration-tests suite ran twice on Linux.
The result: a one-line dependency bump paid for nearly the entire matrix.
What changed
| Story | Change |
|---|---|
| AAASM-2598 | Per-workflow concurrency groups; cancel-in-progress gated to pull_request (superseded PR runs are cancelled; pushes/releases never are). |
| AAASM-2599 | Fine-grained changes router — added proto / schema / openapi / storage outputs (each a strict subset of rust) and re-gated the single-purpose validators onto them. Added a single CI Success aggregate gate (needs every functional job, if: always(), fails on any failure/cancelled; coverage/sonar excluded as advisory). |
| AAASM-2600 | Docker / FFI images build PR-light (one arch, is_latest only) on PRs; full multi-arch + push only on v* tags. |
| AAASM-2601 | Relocated Coverage / SonarCloud / Benchmark behind push-or-label gates — they no longer run on every PR. |
| AAASM-2611 | Least-privilege permissions: contents: read at the top of every workflow; write elevated per-job only where needed. |
| AAASM-2628 | Closed a trigger-path gap — schemas/** (and openapi/**) were missing from ci.yml’s on.*.paths, so schema-only changes never ran schema-lint. |
| AAASM-2631 | Dropped the redundant Linux aa-integration-tests run — it already runs in ci.yml’s test job; the dedicated workflow is now macOS-only. |
The mechanism: a typical change now runs the always-on fast gate
(build, fmt, clippy, rustdoc, test, deny, no-std, conformance) plus
only the area(s) it actually touched. Everything else skips, and a single
CI Success status summarises the run.
Measured results (real GitHub Actions runs)
Apples-to-apples: the identical dependency-bump PR, before and after
The same dependabot/cargo/master/async-nats-0.49.1 PR was re-run before and
after the redesign — same diff, same content:
| Metric | Before — run #2179 (2026-06-04) | After — run #2283 (2026-06-06) | Δ |
|---|---|---|---|
| Jobs executed | 23 of 30 | 16 of 32 | −7 jobs |
| Runner-minutes (Σ job durations) | 64.0 | 17.3 | −73 % |
| Wall-clock | 71.1 min | 10.0 min | −86 % (7.1× faster) |
Because async-nats is a transitive cargo bump that touches no proto / schema /
OpenAPI / storage / eBPF / dashboard code, the after-run correctly skips
Benchmark, Coverage, SonarCloud, Migration drift check, TimescaleDB Tests, Proto lint & breaking check (buf), Schema lint, OpenAPI drift,
OpenAPI lint, and both eBPF jobs — none of which it can affect.
Dashboard-only PR
A dashboard dependency bump now runs only the dashboard jobs:
| Before — run #2180 | After — run #2288 | |
|---|---|---|
| Jobs executed | full dashboard + rust fan-out | 7 of 31 (24 skipped — every Rust job) |
| Wall-clock | 55.2 min | 10.4 min |
Master push (full coverage, incl. Coverage + SonarCloud)
Pushes still run the acceptance jobs (Coverage/Sonar are push-gated), yet
still benefit from area-routing, concurrency cancellation, and the shared
dashboard-assets artifact:
| Before — run #2200 | After — run #2292 | |
|---|---|---|
| Runner-minutes | 80.8 | 44.1 |
| Wall-clock | 132 min | 29 min |
Methodology & caveats
- Data was pulled from the GitHub Actions REST API
(
/repos/.../actions/runs/<id>/jobs). Runner-minutes = the sum of each non-skipped job’scompleted_at − started_at. Wall-clock = the run’supdated_at − run_started_at. - Runner-minutes and job-count are deterministic measures of work performed.
Wall-clock carries cache-warmth and runner-availability noise (a cold
Swatinem/rust-cacheor a busy runner pool inflates it), so treat the wall-clock figures as illustrative and the runner-minute / job-count figures as the load-bearing evidence. - Run numbers are cited so each row can be re-inspected:
gh api repos/ai-agent-assembly/agent-assembly/actions/runs/<id>/jobs.
Takeaway
For the common case — a focused change or a dependency bump — the pipeline does
~75 % less work and returns a result ~7× sooner, while a single
CI Success gate still guarantees nothing necessary was skipped: every functional
job is a dependency of the gate, and each area’s validators run whenever their own
inputs change.
Last updated: 2026-06-07 by Chisanan232
Local Development
This page covers the from-clone development loop for the agent-assembly
monorepo. For contribution conventions (commit style, PR process) see
CONTRIBUTING.md.
Prerequisites
- Rust stable (≥ 1.75) via rustup
protoc— Protocol Buffers compiler (brew install protobuf/apt-get install protobuf-compiler); required by theaa-protoandaa-gatewaybuild scriptscargo-nextest,cargo-deny, and Lefthook- Linux only for the proxy / eBPF layers — see Supported platforms.
Bootstrap
git clone https://github.com/ai-agent-assembly/agent-assembly.git
cd agent-assembly
# Installs toolchains, clones the SDK polyrepos as siblings, installs git
# hooks, and builds the workspace.
make dev-setup
# Smoke-tests each SDK repo in parallel, then checks gateway health.
make dev-verify
Everyday loop
cargo build --workspace --exclude aa-ebpf # build (skip the BPF-target crate off Linux)
cargo nextest run --workspace # full test suite
cargo nextest run -p aa-core # one crate
cargo fmt --all # format
cargo clippy --all-targets -- -D warnings # lint
cargo deny check # dependency / license audit
The eBPF crates compile with a target-specific toolchain; on non-Linux hosts
cargo check -p aa-ebpf is sufficient.
Git hooks
Hooks are managed by Lefthook
(lefthook.toml). Install them once with lefthook install. The pre-commit
hook runs fmt, clippy, and deny scoped by file glob; the pre-push hook
runs cargo doc --workspace --no-deps.
Running locally
Point the gateway at a bundled reference policy and connect a sidecar:
cargo run -p aa-gateway -- --policy policy-examples/low-risk.yaml
See the CLI page for aasm operator commands and the README
“Running with Docker Compose” section for the sidecar stack.
Troubleshooting
| Symptom | Cause | Fix |
|---|---|---|
protoc / “Could not find protoc” build error | Protocol Buffers compiler missing | Install it (brew install protobuf or apt-get install protobuf-compiler) — aa-proto and aa-gateway need it |
cargo build fails on aa-ebpf* off Linux | eBPF crates target the BPF toolchain | Build with --exclude aa-ebpf; use cargo check -p aa-ebpf on non-Linux hosts |
| Pre-commit hook does not run | Lefthook hooks not installed | Run lefthook install once in the repo |
Pre-push fails on cargo doc | A doc comment has a broken intra-doc link | Run cargo doc --workspace --no-deps locally and fix the reported link |
make dev-verify skips the Go smoke test | go-sdk checkout is missing or has no internal/smoke/ | Expected when the Go SDK sibling repo is absent; clone it next to agent-assembly to enable it |
Last updated: 2026-06-11 by Chisanan232
Consuming the Shared Crates
The thin per-language SDK shims live in their own repositories
(python-sdk, node-sdk) but reuse Rust crates that are developed in this
monorepo. Four crates are consumed from outside the workspace:
| Crate | Role in the SDK shim |
|---|---|
aa-core | wire types and traits |
aa-proto | generated protobuf / gRPC wire types |
aa-security | advisory, non-authoritative credential preflight |
aa-sdk-client | UDS transport, IPC codec, AssemblyClient lifecycle |
Distribution mechanism: git SHA pin
The chosen distribution mechanism is a git SHA pin, not a registry publish. The rationale (crates.io was rejected; a bare branch name does not resolve once a crate consumes the dependency, so a full SHA is required) is recorded in ADR 0002 — SDK Security Boundary.
A consumer pins each crate to an exact commit:
[dependencies]
aa-core = { git = "https://github.com/ai-agent-assembly/agent-assembly.git", rev = "<full-40-char-sha>", package = "aa-core", features = ["serde"] }
aa-proto = { git = "https://github.com/ai-agent-assembly/agent-assembly.git", rev = "<full-40-char-sha>", package = "aa-proto" }
aa-security = { git = "https://github.com/ai-agent-assembly/agent-assembly.git", rev = "<full-40-char-sha>", package = "aa-security" }
aa-sdk-client = { git = "https://github.com/ai-agent-assembly/agent-assembly.git", rev = "<full-40-char-sha>", package = "aa-sdk-client" }
Notes:
- Use the full 40-character SHA, not a branch.
cargo’srevis a precise revspec; a bare branch name fails to resolve once another crate in the graph consumes the same dependency. - A git dependency checks out the whole repository, so workspace inheritance
(
version.workspace,[lints] workspace,dep = { workspace = true }) and theproto/sources at the workspace root resolve transparently — the consumer does not need to reproduce any of it. aa-sdk-clientispublish = falseon purpose: it is distributed only via the git pin, never to crates.io.
Regression guard
scripts/standalone-build-smoke.sh builds each of the four crates as a
git-SHA-pinned consumer from a clean checkout of HEAD, outside the workspace.
It runs in CI via the Crate Pinnability Smoke workflow on every pull request
and master push that touches a shared crate, so a path-coupling regression —
e.g. a shared crate gaining a dependency that resolves only inside the workspace
checkout — fails CI here before an SDK repo hits it.
Run it locally with:
make standalone-smoke
# or
bash scripts/standalone-build-smoke.sh
Last updated: 2026-06-07 by Chisanan232
Architecture Decision Records
This directory contains Architecture Decision Records (ADRs) for agent-assembly. Each ADR documents a significant architectural choice — the context that drove the decision, the alternatives considered, and the consequences accepted.
The format follows a lightweight variant of Michael Nygard’s template. New ADRs are numbered sequentially and never rewritten; superseded decisions are recorded by adding a new ADR that links back.
Index
| ADR | Title | Status |
|---|---|---|
| 0001 | Storage Architecture — SQLite (local) / PostgreSQL + TimescaleDB (production) | Accepted |
| 0002 | SDK Security Boundary, Shared-Crate Layout & Distribution | Accepted |
ADR 0001: Storage Architecture — SQLite (local) / PostgreSQL + TimescaleDB (production)
Status: Accepted Date: 2026-05 Spec reference: lines 7107–7215
Context
agent-assembly needs to persist three categories of data, and the spec (lines 7113–7134) is explicit that they have fundamentally different access patterns and must not be forced into a single store:
| Category | Nature | Query pattern |
|---|---|---|
| ① Audit events — tool-call records, policy decisions, behaviour log | write-heavy, append-only, strong time-series, large volume | time-range scan, filter by agent_id, filter by dry_run |
| ② Agent registry & config — online agents, identity, policy configuration | read-heavy, small volume, requires ACID | key lookup, simple joins |
| ③ Metrics / aggregates — token usage, cost, event rate, anomaly data | time-series, requires fast rollup | time-series range query, rollup, window functions |
The product ships in two deployment modes — Local Dev Mode (single machine, zero ops, fast feedback loop) and Production (multi-instance gateway behind a load balancer, durable retention, compliance evidence) — and a single backend cannot serve both well.
Without a deliberate decision recorded here, two failure modes become likely as Epic 18 lands:
- Future contributors encountering
sqlite.rsandpostgres.rsside by side propose replacing one to “simplify”; the asymmetric requirements of the two deployment modes are not visible from the code alone. - A contributor reading “time-series workloads at thousands of events per second” reaches for Cassandra by reflex without seeing that the agent-registry ACID requirement and the operational cost rule it out at current scale.
Decision
| Concern | Choice |
|---|---|
| Local Dev Mode storage | SQLite (single file at ~/.aasm/local.db, WAL journal mode) |
| Production storage | PostgreSQL 15+ with the TimescaleDB 2.x extension |
| Policy hot-path cache | Redis 7+, optional, off by default; enable only when policy-eval latency becomes measurable |
| Wide-column / NoSQL audit store | Not used (see Why not Cassandra below) |
| Backend abstraction | A single StorageBackend trait in aa-gateway/src/storage/; both SQLite and Postgres implement it; business logic depends only on the trait |
| Compression / retention for warm data | TimescaleDB native column-store compression (production); manual rolling-delete (local dev) |
The StorageBackend trait surface, configuration schema, retention-policy structure, and environment-variable overrides are defined in Epic AAASM-1569.
Storage Stack
Local Dev Mode
SQLite (single file: ~/.aasm/local.db, journal_mode = wal)
├── Audit events — table with (ts, agent_id) index
├── Agent registry — table
├── Policy versions — table (BLOB for the YAML/JSON document)
└── Metrics — in-memory aggregation only; not persisted
(dev does not need historical trends)
Rationale: zero external dependencies, single process, single user. A developer can open the file in any SQLite browser. Performance is sufficient because dev volumes do not approach the multi-writer or multi-machine ceiling.
Production (Self-hosted / SaaS)
PostgreSQL 15+
+ TimescaleDB 2.x extension (same Postgres instance, single connection pool)
├── audit_events (hypertable, chunk_interval = 7 days,
│ compression policy = 30 days)
├── metrics (hypertable, chunk_interval = 1 day)
├── agent_registry (standard table, JSONB metadata column)
└── policy_versions (standard table, JSONB document column)
Redis 7+ (optional; enable when measured needed)
├── Policy cache (TTL: 30s) — hot-path policy decisions
├── Session state — approval queue, pending decisions
└── Rate-limit counters — per-agent, per-team
Rationale: PostgreSQL alone handles the registry and policy store cleanly (ACID, JSONB for flexible schema, async-native via sqlx). TimescaleDB is a PostgreSQL extension — not a separate system — so it adds time-series partitioning and compression to the same instance with negligible operational overhead. Redis stays opt-in because policy-eval latency is acceptable straight from Postgres at current scale.
Alternatives Considered
Cassandra (rejected)
Cassandra is appropriate for workloads with extremely high sustained write volume, multi-region geo-distribution, and a tolerance for eventual consistency (the Netflix-scale event-stream archetype). It is the wrong fit here because:
- ACID is required for the agent registry. Registry mutations (agent online / offline, identity rotation, enforcement-mode change) must be linearizable; an eventually-consistent registry produces visible correctness bugs — for example, an agent that is “offline” in one node’s view and “online” in another’s, racing policy evaluations against itself.
- Current scale is far below Cassandra’s sweet spot. Early production deployments are in the low-thousands-of-events-per-second range; PostgreSQL + TimescaleDB handles this comfortably on commodity hardware.
- Operational complexity is disproportionate. Cassandra demands cluster sizing, repair scheduling, compaction tuning, and tombstone management. For a small operating team, this overhead is not justified by any benefit at the current data volume.
- No reuse of existing investment. Postgres expertise,
sqlxintegration, and the same TimescaleDB hypertable cover the time-series workload without introducing a second data system.
MongoDB (rejected)
Considered for the agent registry and policy store because of the JSON-document schema flexibility. Rejected because:
- PostgreSQL’s
JSONBcolumn type covers the same flexible-schema use case (indexed, queryable, schema-evolution-friendly) without introducing a second data system to operate. - Strict ACID semantics for the registry are stronger in Postgres than in MongoDB’s default replication model.
- Splitting “events go to one DB, registry goes to another” complicates joins (for example, listing audit events grouped by registered-agent metadata) that PostgreSQL handles trivially.
Single SQLite for production (rejected)
Considered for symmetry with Local Dev Mode. Rejected because:
- SQLite has no network protocol; a multi-instance gateway cannot share a single database file safely.
- SQLite’s single-writer model becomes a hard bottleneck for the audit-event write rate seen in production.
- WAL mode improves concurrent reads but does not address the multi-machine or multi-writer requirement.
- Backup, replication, and point-in-time recovery — table-stakes in production — are not first-class in SQLite.
PostgreSQL alone (without TimescaleDB) (rejected)
Plain PostgreSQL is viable for the registry and policy store, but for audit_events:
- Time-bucketed query patterns degrade as the table grows; manual partition management is error-prone.
- Compression of old data requires an external tool or a custom ETL job.
- TimescaleDB provides both (hypertable partitioning + native compression) as PostgreSQL extensions, so adopting it costs only an extension install — no separate process or operational target.
Since TimescaleDB is strictly additive (compatible with the rest of the Postgres schema and tooling), there is no reason to defer it.
Consequences
Positive
- Zero external dependencies for local development. A first-time contributor can run the gateway and immediately have a working, persistent store.
- Production-grade time-series performance via TimescaleDB hypertables and compression policies, without standing up a separate data system.
- Business logic stays storage-agnostic. All gateway code talks to the
StorageBackendtrait; swapping backends is a configuration change, not a code change. - Compression and retention come for free in production via TimescaleDB compression policies; the application-level
apply_retentiononly handles tier transitions (warm → cold archive or drop). - Compliance posture is clean (GDPR, SOC 2 Type II, ISO 27001): retention is operator-configurable and audit-event durability is guaranteed once the row commits.
Negative / Accepted trade-offs
- Two backend implementations to maintain. The CI matrix must cover both SQLite and PostgreSQL. The
StorageBackendtrait constrains this cost: feature parity is enforced at compile time. - TimescaleDB extension is an operational requirement for production PostgreSQL deployments. Managed-PG offerings (Aiven, Timescale Cloud, RDS with the extension available) cover this; self-hosted operators must install the extension package.
- Redis adds a moving part when enabled. The optional, off-by-default flag keeps it out of the dependency surface until measured latency justifies it.
- Local-dev and production semantics differ slightly (for example, no compression in SQLite). The differences are documented in the gateway config reference and reflected in
aasm statusoutput.
Spec Reference
| Spec lines | Topic |
|---|---|
| 7107–7215 | Complete storage architecture discussion (Q&A format) |
| 7113–7134 | Three data categories and their access patterns |
| 7140–7155 | Local Dev Mode storage stack (SQLite) |
| 7157–7191 | Production storage stack (PostgreSQL + TimescaleDB) |
| 7165–7172 | “Why not Cassandra” rationale |
| 7175–7213 | Recommended complete storage stack + hot / warm / cold tiering |
| 7213 | Architecture decision (one-sentence conclusion) |
| 7215 | Spec recommendation that this decision be recorded as an ADR |
Related
- Epic: AAASM-1569 — Durable Persistence Layer (this ADR is its S-L deliverable)
- Story: AAASM-1593 — ADR 0001 story ticket
- All E18 implementation stories (
StorageBackendtrait, SQLite backend, PostgreSQL backend, migration runner, retention engine, etc.) implement the decision recorded here.
Last updated: 2026-05-21 by Chisanan232
ADR 0002: SDK Security Boundary, Shared-Crate Layout & Distribution
Status: Accepted Date: 2026-06 Epic: AAASM-2552
Amendment (AAASM-2703 / AAASM-2704, 2026-06) — the original decision below kept
aa-ffi-goin the monorepo as a staticlib artifact. That has been reversed for consistency: the thin Go shim now lives in thego-sdkrepo (native/aa-ffi-go/) as a thin C-ABI over the git-SHA-pinnedaa-sdk-client, exactly like the Node/Python shims. The monorepo no longer hosts any FFI shim (AAASM-2703 removedaa-ffi-go; AAASM-2704 vendored it into go-sdk).
Context
Two problems in the SDK / FFI layer were audited on 2026-06-05 and must be resolved together, because the fix for one constrains the other.
1. Security enforcement is in the wrong place
CredentialScanner (in aa-core/src/scanner.rs) is the credential-detection/redaction primitive. Today it runs:
| Location | Trusted? | Authoritative? |
|---|---|---|
aa-gateway (audit.rs, engine/mod.rs) | yes (server) | yes |
aa-proxy (intercept/, audit_jsonl.rs) | yes (sidecar) | yes |
aa-ffi-python (src/handle.rs) | no — in the SDK binding | it is the only scan on the SDK fast-path |
aa-runtime | yes (trusted) | no — it does not scan or redact at all |
The SDK event fast-path is SDK → UDS → aa-runtime → gRPC → gateway. aa-runtime is the mandatory chokepoint, but its pipeline is only enrich → is_policy_violation (blocked_actions) → forward/batch — it forwards the SDK’s payload without independently scanning or redacting it. Therefore a removed or bypassed SDK scanner lets raw secrets flow SDK → runtime → gateway, where the only remaining guard is the gateway’s narrower banned-key sanitizer. The SDK is being trusted as a security boundary, and it must not be.
2. The FFI bindings are duplicated and diverged
The bindings are reimplemented per language rather than sharing one implementation:
| Binding | Form | Shared-crate use |
|---|---|---|
agent-assembly/aa-ffi-python | 1,357 lines (codec/config/detect/handle/hooks/ipc/lib), path deps | in-workspace |
python-sdk/rust/aa-ffi-python | 719-line lib.rs, imports aa_core + aa_proto | git-SHA-pinned (rev = ed4aa11a…) |
node-sdk/native/aa-ffi-node | 178 lines, imports no aa_* crate | none — reimplemented |
go-sdk/internal/ffi | Go cgo consumer of the aa-ffi-go staticlib | consumes a built artifact |
The Node binding diverged precisely because it shares no code with the Python one — nothing forces it to track the same logic. Go originally kept one Rust artifact in the monorepo, consumed by the language (later revised — see the amendment at the top: the Go shim now lives in go-sdk alongside the others).
Decision
| Concern | Choice |
|---|---|
| Is the SDK a security boundary? | No. The SDK is untrusted. |
| Authoritative enforcement point | aa-runtime — scans, redacts, and normalizes every event before forward/audit, unconditionally. |
| Source of truth | gateway / control-plane (policy SoT; audit-write sanitizer kept as final backstop). |
| SDK-side detection | Best-effort advisory preflight only. No clean / already_scanned marker exists on the wire, and none is honored. |
| Security primitives home | A new aa-security crate (scanner, redaction, audit-normalization) — moved out of aa-core. |
| Shared runtime-client home | A new aa-sdk-client crate (UDS transport, proto codec, AssemblyHandle lifecycle, event shipping, advisory preflight). |
| Per-language bindings | Thin pyo3 / napi / cgo shims over aa-sdk-client: ergonomic API, hooks, type translation, event capture — no security authority. |
| Dependency direction | aa-runtime, aa-gateway, aa-proxy, aa-sdk-client → aa-security (security logic is not in aa-core). |
| Shared-crate distribution | git SHA pin (see below). |
Trust model
UNTRUSTED TRUSTED ENFORCEMENT SOURCE OF TRUTH
Python/Node/Go SDK ──UDS──▶ aa-runtime (mandatory chokepoint) ──gRPC──▶ gateway / control-plane
• ergonomic API • scan (authoritative) • policy SoT
• hooks, event capture • redact (before forward + audit) • audit-write sanitizer
• type translation • policy / approval (already server-side) (final backstop)
• BEST-EFFORT preflight • normalize; re-scans EVERYTHING, always
(advisory only)
Invariant: nothing the SDK asserts can shorten the runtime’s work. The runtime scans unconditionally; aa-security running inside the SDK is advisory, the same crate running inside aa-runtime is authoritative. Position — not code — confers authority.
Crate topology
| Crate | Role | Authority |
|---|---|---|
aa-security (new) | scanner / redactor / normalization primitives | none (library) |
aa-core | wire types, traits | none |
aa-sdk-client (new) | UDS transport, proto codec, AssemblyHandle, event shipping, advisory preflight | none |
aa-runtime | authoritative scan / redact / normalize + policy / approval | ✅ the boundary |
aa-gateway | policy SoT + audit-write sanitizer (final backstop) | ✅ SoT |
aa-ffi-{python,node,go} | thin pyo3 / napi / cgo shims | none |
Canonical bindings (resolved)
- Python:
python-sdk/rust/aa-ffi-python(the git-pinned SDK consumer) is canonical; the monorepoagent-assembly/aa-ffi-pythonis the duplicate to retire. The two differ in size (719 vs 1,357 lines), so the shared logic must be reconciled intoaa-sdk-clientby diffing both — not by lifting either copy wholesale. - Node:
node-sdk/native/aa-ffi-nodeis the only Node binding, but it shares no code with the core (imports noaa_*crate). It is re-pointed ontoaa-sdk-client, which makes the drift structurally impossible. - Go: (revised by AAASM-2703 / AAASM-2704)
aa-ffi-gois relocated into thego-sdkrepo (native/aa-ffi-go/) as a thin C-ABI shim over the git-SHA-pinnedaa-sdk-client, mirroring Node/Python — the monorepo no longer hosts it.
Distribution mechanism: git SHA pin
The shared crates (aa-core, aa-proto, and the new aa-security, aa-sdk-client) are consumed by the SDK repos via git dependency pinned to an exact commit SHA. This is already the established, in-production pattern — python-sdk/rust/aa-ffi-python/Cargo.toml already declares:
aa-core = { git = "https://github.com/ai-agent-assembly/agent-assembly.git", rev = "ed4aa11a…", package = "aa-core", features = ["serde"] }
aa-proto = { git = "https://github.com/ai-agent-assembly/agent-assembly.git", rev = "ed4aa11a…", package = "aa-proto" }
The decision is to extend this same mechanism to aa-security and aa-sdk-client, not to introduce a new one.
Migration order (boundary-first, gated)
The Epic executes in this order so SDK-side scanning is never removed before the runtime is authoritative:
- This ADR.
- Extract
aa-security(move scanner/redaction/normalization out ofaa-core; temporary re-export for compat). - [GATE]
aa-runtimeauthoritative scan/redact/normalize stage + guardrails. - SDK-bypass resistance test suite (proves the gate).
- Make the shared crates pinnable.
- Extract
aa-sdk-client. - Node SDK → thin shim. 8. Python SDK → thin shim. 9. Remove fat
aa-ffi-*from the workspace.
Steps 6–9 (anything that removes SDK-side scanning) are blocked on step 3.
Alternatives Considered
Trust SDK-side scanning (rejected)
Treating the SDK as the scan boundary is the current accidental state. Rejected: the SDK is attacker-controllable (a bypassed, modified, or simply outdated SDK), so any guarantee anchored there is not a guarantee. Security must hold even when the SDK does nothing.
Keep security primitives in aa-core (rejected)
aa-core is depended on by everything, including the thin shims and storage drivers. Hosting the scanner there enlarges the security-review blast radius to the whole base crate and forces unrelated consumers to pull it in. A small, dedicated aa-security crate gives a reviewable surface and a clean dependency direction.
Per-language reimplementation / pure-language transport (rejected)
Letting each SDK speak UDS + protobuf natively (no shared Rust) is internally coherent, but it reproduces the transport logic N times. The current divergence (Python rich, Node reinvented, no shared types) is exactly this failure mode realized halfway — paying the native-build cost and duplicating. One shared aa-sdk-client removes the duplication while keeping the shims idiomatic.
Publish shared crates to crates.io / a private registry (rejected)
A registry would enable prebuilt-artifact reuse, but crates.io publishing was already attempted and dropped (AAASM-2338), and it adds a publish pipeline plus version-bump discipline. git-SHA pinning is already working in python-sdk, requires no new infrastructure, and pins to an exact, reproducible commit. (cargo’s rev must be a SHA, not a bare branch name, or resolution fails once a crate consumes the dependency.)
Keep the bindings in the monorepo workspace (rejected for ownership)
Keeping aa-ffi-* in the workspace preserves atomic cross-crate changes, but couples each SDK’s release to the monorepo and keeps the FFI dep trees (pyo3/napi/prost/tokio) in the core build. Moving the thin shims into their SDK repos — consuming pinned shared crates — gives the SDKs independent release cadence and shrinks the core workspace, while the shared aa-sdk-client keeps a single source of truth. Go already demonstrates the artifact-consumption variant of this model.
Consequences
Positive
- The SDK can no longer weaken enforcement. Scan/redact/normalize run authoritatively at
aa-runtimeregardless of SDK behavior; this is proven by the bypass-resistance suite. - Drift becomes structurally impossible. One
aa-sdk-clientimplementation, consumed by thin shims, replaces N reimplementations. - Reviewable security surface.
aa-securityis a small, leaf crate that the trusted enforcers depend on directly. - Smaller core build. Removing the fat bindings drops pyo3/napi/prost/tokio FFI dep trees from
cargo build --workspace. - No new release infrastructure. Distribution reuses the git-SHA pin already in production.
Negative / accepted trade-offs
- Authoritative scanning adds hot-path cost. Payload inspection at the runtime is more work than the current
blocked_actionscheck; the gate Story carries explicit guardrails (precompiled scanner, secret-bearing-fields only, size caps, metrics) and must stay within the policy-latency budget. - The SDK repos rebuild the shared crates (no shared
target/); org-wide CPU may rise unlesssccacheor prebuilt artifacts are added later. - Pinned SHAs require deliberate bumps. SDK repos pick up core changes only when their pin is advanced — an explicit, visible step rather than implicit coupling.
- A temporary
aa-corere-export of the moved primitives is needed during migration and must be removed once consumers are repointed.
Related
- Epic: AAASM-2552 — SDK security boundary + FFI consolidation
- Story: AAASM-2558 — this ADR
- Gate: AAASM-2568 —
aa-runtimeauthoritative enforcement (blocks Stories 6–9) - Follow-on stories: AAASM-2567 (
aa-security), AAASM-2570 (aa-sdk-client), AAASM-2559 (pinnable crates), AAASM-2560 / AAASM-2561 (Node / Python shims), AAASM-2562 (remove fat bindings), AAASM-2569 (bypass tests)
Last updated: 2026-06-07 by Chisanan232