v0.7.0 — Live Agent Runtime + Repo Audit

PROMOTE, HOLD, or BLOCK
— against your real agent.

Start with release-gate audit — scan any AI agent repo in 30 seconds, no config needed. Then score, eval live against your real agent, and gate every deploy.

Step 1 — audit any repo (no config needed)
$ pip install release-gate $ release-gate audit https://github.com/your-org/your-agent Agents OpenAI / Agents SDK (4 files) · LangChain (12 files) Readiness Score 42 / 100 ████░░░░░░ Decision: ⚠ HOLD ✗ Budget / cost ceiling — runaway loop could exhaust API credits silently ✗ Kill switch / fallback — no way to stop a misbehaving agent at runtime ✗ Trace / tool policy — no record of which tools the agent called Next: release-gate audit . --emit-config -o governance.yaml
✓ Fully governed — PROMOTE
governance-safe-pass.yaml
$ release-gate score governance.yaml Readiness Score 94 / 100 Confidence high safety 100 access_control 100 cost 90 fallback 100 eval_quality 85 observability 80 Critical failures none Decision: ✓ PROMOTE (score 94/100)
✗ Missing safeguards — BLOCK
governance-unsafe-fail.yaml
$ release-gate score governance.yaml Readiness Score 41 / 100 Confidence low safety 20 access_control 30 cost 40 fallback 0 Critical failures: FALLBACK_DECLARED — kill switch missing ACTION_BUDGET — no budget cap set Decision: ✗ BLOCK (score 41/100)
🛡️
7
safeguards
checked per audit
🧩
11
frameworks
detected
📊
0–100
score
readiness
1
decision
PROMOTE / HOLD / BLOCK

How it works

Four steps to a defensible release decision

release-gate slots into your CI/CD pipeline. No backend, no dashboard, no sign-up.

1

Audit your repo — no config needed

Run release-gate audit on any AI agent repo. Detects 11 frameworks, scores 7 deployment safeguards, and returns a decision in seconds. Use --emit-config to scaffold a ready-to-commit governance.yaml.

$ release-gate audit .
$ release-gate audit https://github.com/your-org/your-agent Agents OpenAI / Agents SDK (4 files) · LangChain (12 files) Readiness Score 42 / 100 ████░░░░░░ Decision: ⚠ HOLD Next: release-gate audit . --emit-config -o governance.yaml
2

Scaffold a governance.yaml next to your code

Declare your model, budget cap, kill switch, eval cases, and trace policies. Takes 5 minutes — or use release-gate init for an interactive wizard.

governance.yaml
agent: model: gpt-4-turbo daily_requests: 5000 checks: action_budget: {max_daily_cost: 500} fallback_declared: kill_switch: {type: feature-flag} team_owner: platform-team trace_policies: forbidden_tools: [delete_database, export_data] max_tool_calls: 10 max_retries: 2
3

Score every deploy candidate — live or static

Add --agent to run evals against your actual agent. release-gate calls it, scores real responses, and captures latency. Without --agent, evals run in safe static mode — no LLM key, CI-friendly.

$ release-gate score governance.yaml \ --evals evals.yaml \ --agent py:my_pkg.agent:handle Readiness Score 94 / 100 confidence: high Evals run 7 (7 pass, 0 fail) pass rate 100% [live mode] Agent runtime 7 live call(s) avg 312ms · p95 480ms (0 errors) Decision: ✓ PROMOTE (score 94/100)
4

Catch regressions, then gate in CI

Compare a baseline report against the candidate. Any dimension that drops >10 points — especially safety or fallback — is flagged as a regression and blocks the release automatically.

$ release-gate compare baseline.json candidate.json Baseline score 94/100 (PROMOTE) Candidate score 71/100 (HOLD) Score delta −23 points Regressions detected: safety 100 → 60 (-40 pts) CRITICAL fallback 100 → 75 (-25 pts) Decision: ✗ BLOCK — critical regression in safety
5

Generate an evidence pack for every release

One command produces three audit artefacts — a machine-readable JSON report, an executive Markdown summary, and a full HTML dashboard — ready for compliance, security review, or stakeholder sign-off.

$ release-gate evidence-pack governance.yaml ✓ release-evidence/readiness_report.json ✓ release-evidence/executive_summary.md ✓ release-evidence/release-gate-evidence.html Upload as CI artifact or attach to release PR.

Live demo

Real commands. Real output.

Run the interactive demo locally — no config file needed. Or explore individual commands below.

Interactive walkthrough — no files needed
$ release-gate demo 🚪 release-gate  |  Interactive Demo The CI step your AI agent is missing before it goes to production. SCENARIO A — customer-support-agent (full governance) Checking FALLBACK_DECLARED ... kill_switch: {type: feature-flag, name: disable_support_agent} team_owner: platform-team ✓ FALLBACK_DECLARED kill switch declared, team owner assigned Checking ACTION_BUDGET ... action_budget: max_daily_cost: 500 ✓ ACTION_BUDGET projected $132/day — well within $500 cap Score: 100/100 Decision: ✓ PROMOTE SCENARIO B — data-export-agent (missing controls) Checking FALLBACK_DECLARED ... fallback_mode: retry # no kill_switch, no team_owner, no runbook ✗ FALLBACK_DECLARED Real risk: Loop runs until OpenAI credit limit hit — could take hours Score: 33/100 Decision: ✗ BLOCK
Run it now:  pip install release-gate && release-gate demo  —  no governance.yaml needed, works immediately after install.
$ release-gate score examples/governance-safe-pass.yaml \ --evals examples/evals.yaml \ --traces examples/traces/safe-trace.json release-gate | Readiness Scorer v0.7.0 Project customer-support-agent v1.0.0 Checks run 5 (5 pass, 0 warn, 0 fail) Evals run 7 (7 pass, 0 fail) pass rate 100% Traces checked 1 (0 violations) Score 94 / 100 confidence: high Dimension Breakdown: safety 100 ██████████ (wt 30%) cost 90 █████████░ (wt 20%) access_control 100 ██████████ (wt 20%) fallback 100 ██████████ (wt 15%) eval_quality 85 ████████░░ (wt 10%) observability 80 ████████░░ (wt 5%) Critical failures none Decision: ✓ PROMOTE (score 94/100) exit 0
Try it yourself: pip install release-gate  then  release-gate score examples/governance-safe-pass.yaml

Features

From zero to gated deploy in 4 steps

Audit any repo in 30 seconds. Score, eval, and gate before every deploy.

Start here
🔎

Repo Audit — no config needed

Run release-gate audit https://github.com/org/repo on any AI agent repo. Detects 11 agent frameworks, scores 7 deployment safeguards, returns PROMOTE / HOLD / BLOCK. Then --emit-config scaffolds a ready-to-commit governance.yaml from what it found.

Core
📊

Readiness Scorer — 0–100

Six weighted dimensions (safety 30%, cost 20%, access control 20%, fallback 15%, eval quality 10%, observability 5%) collapse into one number and one decision: PROMOTE, HOLD, or BLOCK.

Core
🔍

Regression Gate

Compare any two readiness report snapshots. Drops >10 points in any dimension — especially safety, fallback, or access control — automatically BLOCK the release. Ship with a diff, not a guess.

Core
🧪

Eval Runner

Declare behavior test cases in YAML: refuse_or_mask, contains_keywords, valid_json, no_tool_calls. Runs in static mode (CI-safe, no LLM key needed) or live mode with any agent callable.

Core
🛡️

Trace Validator

Feed your agent’s execution trace (JSON or JSONL). Detects forbidden tool calls, allowed-list violations, retry storms, token budget overruns, and tool-call loops before they reach production.

Core
📄

Evidence Pack

One command generates three audit artefacts: readiness_report.json, executive_summary.md, and release-gate-evidence.html. Attach to PRs, compliance tickets, or security reviews.

Phase 2

Live Agent Runtime

Add --agent py:module:fn, cmd:./script, or an https:// endpoint to run your eval suite live against the real agent. release-gate calls it, scores actual responses, and records per-call latency. A failing or unreachable agent is a failed eval. Stdlib-only, no SDK required.

New
🧩

Model Intelligence Layer

Stop hardcoding prices. A model: block declares pricing source: static, custom, locked snapshot, OpenRouter live, or LiteLLM. Unknown pricing with on_unknown: hold fails the check — never assumes $0.

New
🔒

Pricing Lock

Snapshot live prices into a tamper-evident pricing.lock.json (sha256-protected). CI scores offline, reproducibly. Stale snapshots (> max_age_days) surface as WARN so prices never drift silently.

v0.5
💸

Impact Simulator

Normal vs. runaway cost side-by-side. Engineering leaders see the dollars at risk, not YAML warnings. The HTML report uploads as a CI artifact automatically.

v0.5
🔒

Cryptographic Sign & Verify

Sign governance.yaml with RSA-PSS + SHA-256. Verify in CI that no one changed budget limits or policies after review.

v0.5
⚙️

GitHub Actions Native

5 lines in your workflow. Exit code 0 = PROMOTE, 10 = HOLD, 1 = BLOCK. The HTML report is auto-uploaded as a CI artifact — your team reviews it without leaving GitHub.

CI/CD Integration

Gate every push automatically

Works with GitHub Actions, GitLab CI, Jenkins, and any shell. All commands return structured exit codes.

# .github/workflows/governance.yml name: AI Release Gate on: [push, pull_request] jobs: release-gate: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Score & gate release uses: VamsiSudhakaran1/release-gate@v0.7.0 with: command: score config: governance.yaml evals: evals.yaml html-report: evidence.html # evidence pack auto-uploaded as CI artifact
release-gate demo

Interactive demo — no config needed

release-gate score

0–100 score + PROMOTE/HOLD/BLOCK

release-gate compare

Regression detection vs baseline

release-gate evidence-pack

JSON + Markdown + HTML artefacts

score --agent <spec>

Run evals live against your real agent (py: / cmd: / https://)

release-gate pricing-lock

Snapshot live model prices to lock file

release-gate impact

Cost simulation & runaway scenario

🟢

Exit codes your pipeline understands

0 = PROMOTE / PASS — deploy it.
10 = HOLD / WARN — review before deploying.
1 = BLOCK / FAIL — do not deploy.

📋

Evidence pack as CI artifact

Every PR gets a readiness report, executive summary, and HTML dashboard — attached automatically so reviewers see the full picture without running anything locally.

🔄

Regression baseline in git

Commit readiness_report.json as your baseline. Run release-gate compare on every PR to catch silent degradations in safety or fallback coverage.

Governance checks

5 checks + evals + traces. One decision.

Each check maps to a real failure mode — cost explosion, no kill switch, open access, bad inputs, forbidden tool use.

Check / LayerWhat it validatesBlocked when
ACTION_BUDGET Estimated daily cost vs. declared budget cap Cost exceeds max_daily_cost or no budget set
BUDGET_SIMULATION Projected cost with retries, caching & spike multipliers across 10+ models Projected cost exceeds budget or multipliers are out of range
FALLBACK_DECLARED Kill switch, fallback mode, team owner, runbook URL Any field missing — no owner means no one gets paged at 3 AM
IDENTITY_BOUNDARY Auth required, rate limit configured, data isolation rules Auth is optional or rate limit absent — anyone can exhaust budget
INPUT_CONTRACT JSON Schema defined, valid & invalid sample payloads provided Schema missing (FAIL) or no valid samples (WARN)
Evals (behavior) refuse_or_mask, contains_keywords, valid_json, no_tool_calls — static (CI-safe) or live against a real agent via --agent Critical evals fail (safety category), or agent raises an error
Live Agent Runtime Per-call latency (avg / p50 / p95 / max), error rate, optional token usage — captured when --agent is set Unreachable or error-throwing agent counts as failed eval
Trace Validator Forbidden tools, allowed-list violations, retry storms, token budget, tool loops Any forbidden tool called or retry storm detected
Pricing Resolver Model token pricing from static table, custom inline, lock file, OpenRouter, or LiteLLM Pricing unknown & on_unknown: hold — never silently assumes $0

Get started in 30 seconds pip install release-gate && release-gate demo

Run evals live: release-gate score governance.yaml --evals evals.yaml --agent py:my_pkg.agent:handle

View on GitHub →

Your agent deserves a proper release gate.

No dashboard, no sign-up, no backend. Just one command between your agent and production.

pip install release-gate

▶  See live output GitHub →