Agentist is the harness that helps you manage your AI agents at scale — a human approves what matters, you coach them like you'd manage a person, and every action lands on one audit trail.
kubectl rollout undo deploy/checkoutError rate is 9.2% since the last deploy and p95 is up 340ms. Rolling back restores the last good version (v411).
| run | agent | state | budget | age |
|---|---|---|---|---|
| incident #4821 | Cornelia | suspended · approval | 41k / 60k | 4m 12s |
| incident #4822 | Cornelia | running | 22k / 60k | 1m 40s |
| deploy-gate #903 | Rune | suspended · approval | 6k / 60k | 1m 02s |
| cost-report #77 | Vega | running | 9k / 60k | 0m 31s |
| backup-verify #88 | Atlas | done | 2k / 60k | 6h 09m |
| policy | applies to | effect |
|---|---|---|
| remediation-needs-approval | *.remediate · destructive | require approval |
| budget-per-run | * | cap budget ≤ $2 |
| pii-redaction | gateway · egress | redact |
| prod-data-read-only | gateway · data | deny writes |
| seq | time | agent | action | verdict |
|---|---|---|---|---|
| 1184 | 14:02:14 | Cornelia | resume: remediate | approved by you |
| 1182 | 14:02:11 | Cornelia | model call: diagnose | admitted |
| 1181 | 14:02:09 | Cornelia | data: metrics.query | redacted PII |
| 1176 | 14:01:40 | Vega | step: scale 50 → 500 | denied · budget |
| member | role | approval authority |
|---|---|---|
| you | admin | all gates |
| sre-team | operator | .approval("on-call") |
| deploys | operator | .approval("release") |
| finance | viewer | none |
| agent | version | owner | last eval |
|---|---|---|---|
| Cornelia · SRE | 1.4.0 | platform | 34 / 37 ✓ |
| Rune · Release | 0.9.2 | deploys | 28 / 28 ✓ |
| Vega · Cost | 1.1.3 | finops | 2 pending |
| Atlas · Ops | 2.0.1 | platform | 19 / 19 ✓ |
“Elevated latency — monitor.”
“p99 of 5s on checkout is customer-facing — page on-call.”
Every quarter the models get better, cheaper, and more interchangeable. A brilliant reasoner is something you buy — it's not where your agent's value, or its risk, is decided.
An agent is judged by what it's allowed to do — who signs off, what gets recorded, what happens when it's wrong. None of it comes from the model.
The structure that makes an agent bounded, observable, and accountable is a real discipline — harness engineering. It isn't instant or optional, but it is buildable.
Three things the harness gives everyone — no code, no job title required. The consoles below are live; try them.
The agent proposes; it doesn't act. Anything destructive suspends the run — durably, for minutes or days — until someone with authority signs off, in the console or right in Slack.
An agent can't act on a hallucination, because it can't act alone. The sign-off isn't a courtesy notification after the fact — nothing happens without it.
in the framework: .approval("on-call") — one line
kubectl rollout undo deploy/checkoutError rate is 9.2% since the last deploy and p95 is up 340ms. Rolling back restores the last good version (v411).
You can talk to your agents, and when one gets something wrong you don't rewrite a prompt — you correct the decision. The correction becomes a test case the agent is graded against on every version, or a note to its charter, the mission and values it works from.
Agents get better the way people do — through feedback that sticks, not folklore that drifts.
in the framework: charter · evals · versioned agents — coaching is recorded, not remembered
“Elevated latency — monitor.”
“p99 of 5s on checkout is customer-facing — page on-call.”
Every step, every model call, every approval and denial lands on a single append-only journal — gapless and tamper-evident. Review across runs to judge whether the approvals were right, where to tighten, and where to step back.
Replay any captured call exactly — same input, same model, same route — and diff it against what happened. Oversight you can check, not vibes.
in the framework: event-sourced journal — the audit trail and the runtime state are the same record
| seq | time | agent | action | verdict |
|---|---|---|---|---|
| 1184 | 14:02:14 | Cornelia | resume: remediate | approved by you |
| 1182 | 14:02:11 | Cornelia | model call: diagnose | admitted |
| 1181 | 14:02:09 | Cornelia | data: metrics.query | redacted PII |
| 1180 | 14:02:08 | Cornelia | model call: triage | admitted |
| 1176 | 14:01:40 | Vega | step: scale 50 → 500 | denied · budget |
Three habits compose into a management loop. Decisions become judgment, judgment becomes policy — and the chance that an agent ever acts on a hallucination stops being something you hope against and becomes something you've engineered out.
Approve, coach, and audit are what everyone sees. Underneath, the same structure does the unglamorous work — this is the engineering, and the docs go all the way down.
No harness is automatic — and none should be scary. There are two ways to get yours.
Open-source and TypeScript-native. Your team defines agents as typed code; the runtime runs them durably in your own cloud, with the loop — approvals, coaching, audit — built in from the first line.
// destructive steps wait for a human — one line export const incident = workflow("incident") .step("triage", oncall.duties.triage) .approval("on-call") // ← the loop, in the path .step("remediate", oncall.duties.remediate) .commit()
Harness engineering is what we do. For teams that want the loop around their agents without standing up the practice first, we build the harness with you — your agents, your cloud, engineered together.
We're onboarding a small group of design partners now. No packages or published pricing yet — this is a direction we're building openly, and it starts with a conversation.
Agentist is in active development. Reach out and we'll be in touch the moment you can get in.