Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Performance Benchmark Baseline

Baseline results recorded on 2026-04-29. Machine: Apple M-series (arm64), macOS Darwin 25.2.0.

All benchmarks run with cargo bench in release profile.

SDK Hook Overhead (aa-ffi-python)

Target: < 2 ms P99 per LLM call (AAASM-34 AC #6).

BenchmarkMeanLowHigh
report_llm_call_channel237 ns229 ns245 ns

Verdict: PASS — 3 orders of magnitude below the 2 ms target.

Note (AAASM-2562): the aa-ffi-python SDK-hook benchmark (sdk_bench) moved to the python-sdk repo when the fat binding left this workspace — run it there with cargo bench --bench sdk_bench. The numbers above are retained as the historical 2026-04-29 baseline.

Proxy Intercept Latency (aa-proxy)

Target: < 5 ms P99 per intercepted request (AAASM-36 AC #5).

BenchmarkMeanLowHigh
intercept/openai_response2.74 us2.74 us2.75 us
intercept/openai_with_credential_redaction3.82 us3.79 us3.86 us

Verdict: PASS — both variants well below the 5 ms target. Credential redaction adds ~1 us overhead.

Gateway Policy Check (aa-gateway)

BenchmarkMeanLowHigh
check_action_rpc/round_trip/minimal_llm_call79.6 us78.8 us80.5 us
check_action_rpc/round_trip/full_tool_call_1kb79.6 us78.3 us80.9 us
check_action_rpc/round_trip/worst_case_network76.3 us75.6 us76.9 us

Credential Scanner Throughput (aa-core)

BenchmarkMeanThroughput
scanner/scan_1mb_payload6.31 ms~159 MB/s

Comparing Against Baseline

Run cargo bench to generate HTML reports in target/criterion/. Each benchmark group produces a report/index.html with historical comparison charts when prior runs exist.

To compare against this baseline:

  1. Run cargo bench on the baseline commit to populate target/criterion/.
  2. Run cargo bench on the new commit — Criterion auto-compares and reports percentage change with statistical significance.

Last updated: 2026-06-06 by Chisanan232