Your agents should never act alone.

Agentist is the harness that helps you manage your AI agents at scale — a human approves what matters, you coach them like you'd manage a person, and every action lands on one audit trail.

See the loop Read the docs

Open-source Self-hosted Governed by default

agentist console — proddemo · example data

Operate

Runs Approvals2

Govern

Policies Audit Access

Manage

Registry Coach

env: prod
● data in your boundary

Approvals

runs paused, waiting on a personrouted: Console + Slack

⏸ incident #4821 is paused. Cornelia wants to act — and can't, until someone decides.waiting 4m 12s

Cornelia — SRE on-callproposes a destructive action · incident #4821

destructive

Proposed action

kubectl rollout undo deploy/checkout

Error rate is 9.2% since the last deploy and p95 is up 340ms. Rolling back restores the last good version (v411).

- image: checkout:v412+ image: checkout:v411

your decision — and your reason — is journaled with the run

gate .approval("on-call") · also live in Slack #sre-oncall · suspends durably, for minutes or days

RRune — Releasescale checkout 6 → 10 — sustained 5xx under load1m

An agent here can't act on a hallucination — because it can't act alone.

Runs

prod · last 24h47 runs · 2 running

run	agent	state	budget	age
incident #4821	Cornelia	suspended · approval	41k / 60k	4m 12s
incident #4822	Cornelia	running	22k / 60k	1m 40s
deploy-gate #903	Rune	suspended · approval	6k / 60k	1m 02s
cost-report #77	Vega	running	9k / 60k	0m 31s
backup-verify #88	Atlas	done	2k / 60k	6h 09m

3 suspended on a human · 2 running · 1 failed⏎ open trace

Policies

admission control — checked before every step6 active

policy	applies to	effect
remediation-needs-approval	*.remediate · destructive	require approval
budget-per-run	*	cap budget ≤ $2
pii-redaction	gateway · egress	redact
prod-data-read-only	gateway · data	deny writes

The agent proposes; the runtime commits — only if a policy allows it.

Audit

append-only journal · tamper-evidentseq 1100–1184

seq	time	agent	action	verdict
1184	14:02:14	Cornelia	resume: remediate	approved by you
1182	14:02:11	Cornelia	model call: diagnose	admitted
1181	14:02:09	Cornelia	data: metrics.query	redacted PII
1176	14:01:40	Vega	step: scale 50 → 500	denied · budget

gapless ✓no gaps, no tamper⏎ open call detail

Access

roles & approval authoritySSO: Okta · SCIM ✓

member	role	approval authority
you	admin	all gates
sre-team	operator	.approval("on-call")
deploys	operator	.approval("release")
finance	viewer	none

Every console action — approve, rollback, policy change — is itself journaled.

Registry

agents · versions · evalsprod

agent	version	owner	last eval
Cornelia · SRE	1.4.0	platform	34 / 37 ✓
Rune · Release	0.9.2	deploys	28 / 28 ✓
Vega · Cost	1.1.3	finops	2 pending
Atlas · Ops	2.0.1	platform	19 / 19 ✓

deploy · pin · rollback — versions are manifests, rollback-ready

Coach — Cornelia

run incident #4820 · step triageeval set: 37 cases

Agent output

severity sev3

“Elevated latency — monitor.”

Your correction

severity sev2

“p99 of 5s on checkout is customer-facing — page on-call.”

Corrections become eval cases and charter notes — recorded, not remembered.

Why a harness

The model isn't where the trust lives.

Reasoning is now a commodity.

Every quarter the models get better, cheaper, and more interchangeable. A brilliant reasoner is something you buy — it's not where your agent's value, or its risk, is decided.

Production is decided around it.

An agent is judged by what it's allowed to do — who signs off, what gets recorded, what happens when it's wrong. None of it comes from the model.

That structure is engineered.

The structure that makes an agent bounded, observable, and accountable is a real discipline — harness engineering. It isn't instant or optional, but it is buildable.

Every agent that goes to production needs a harness.
The harness is the trust.

The loop

Manage agents the way you already manage people.

Three things the harness gives everyone — no code, no job title required. The consoles below are live; try them.

01Approve · control

Consequential actions wait for a person.

The agent proposes; it doesn't act. Anything destructive suspends the run — durably, for minutes or days — until someone with authority signs off, in the console or right in Slack.

An agent can't act on a hallucination, because it can't act alone. The sign-off isn't a courtesy notification after the fact — nothing happens without it.

in the framework: .approval("on-call") — one line

agentist console — runsdemo · example data

incident #4821

agent Cornelia — SRE on-callsuspended · approval

Cornelia — SRE on-callproposes a destructive action

destructive

Proposed action

kubectl rollout undo deploy/checkout

Error rate is 9.2% since the last deploy and p95 is up 340ms. Rolling back restores the last good version (v411).

- image: checkout:v412+ image: checkout:v411

also live in Slack #sre-oncall

gate .approval("on-call") · the run suspends durably — it resumes at the exact step

Nothing is committed without the decision — and the decision, with your reason, lands on the trail.

02Coach · management

Correct it once — like a 1:1.

You can talk to your agents, and when one gets something wrong you don't rewrite a prompt — you correct the decision. The correction becomes a test case the agent is graded against on every version, or a note to its charter, the mission and values it works from.

Agents get better the way people do — through feedback that sticks, not folklore that drifts.

in the framework: charter · evals · versioned agents — coaching is recorded, not remembered

agentist console — coachdemo · example data

Coach — Cornelia

run incident #4820 · step triageeval set: 37 cases

Agent output

severity sev3

“Elevated latency — monitor.”

Your correction

severity sev2

“p99 of 5s on checkout is customer-facing — page on-call.”

corrections are recorded, not remembered

03Audit · oversight

Every action, one trail.

Every step, every model call, every approval and denial lands on a single append-only journal — gapless and tamper-evident. Review across runs to judge whether the approvals were right, where to tighten, and where to step back.

Replay any captured call exactly — same input, same model, same route — and diff it against what happened. Oversight you can check, not vibes.

in the framework: event-sourced journal — the audit trail and the runtime state are the same record

agentist console — auditdemo · example data

Audit

append-only journal · tamper-evidentseq 1100–1184

seq	time	agent	action	verdict
1184	14:02:14	Cornelia	resume: remediate	approved by you
1182	14:02:11	Cornelia	model call: diagnose	admitted
1181	14:02:09	Cornelia	data: metrics.query	redacted PII
1180	14:02:08	Cornelia	model call: triage	admitted
1176	14:01:40	Vega	step: scale 50 → 500	denied · budget

gapless ✓no gaps, no tamper⏎ open call detail

re-issues the same input, model & route · diffs vs captured

Approvecontrol

→

Auditoversight

→

Coachmanagement

→

Policywhat you've learned

↺ policy narrows what ever needs your sign-off — you step back deliberately, not hopefully

Three habits compose into a management loop. Decisions become judgment, judgment becomes policy — and the chance that an agent ever acts on a hallucination stops being something you hope against and becomes something you've engineered out.

Behind the loop

The loop runs on a deeper harness.

Approve, coach, and audit are what everyone sees. Underneath, the same structure does the unglamorous work — this is the engineering, and the docs go all the way down.

Admission controlPolicy runs before every step — the agent proposes, the runtime commits. Preventive, not forensic. Policy as codeHard gates written once, enforced for every agent — no one can forget them. DurabilityEvery run is event-sourced. Crash mid-run and it resumes at the exact step — exactly once. IsolationAgents are untrusted by default — sandboxed, tenant-scoped, never holding a credential. Access & rolesWho can approve, operate, or view — RBAC with SSO behind it, every change itself audited. SovereigntyThe whole harness deploys into your own cloud. Your data, your keys, your models — nothing leaves. GatewaysOne governed boundary for every model, data, and tool call — validated, budgeted, audited. The consoleWhere the loop lives: runs, approvals, coaching, the journal, policies, and the fleet.

This is the how — capability first, framework as the way there. Read the docs →

Engineered, not instant

A harness is built. We make it the shortest path.

No harness is automatic — and none should be scary. There are two ways to get yours.

With your engineers

The framework

Open-source and TypeScript-native. Your team defines agents as typed code; the runtime runs them durably in your own cloud, with the loop — approvals, coaching, audit — built in from the first line.

SRC/INCIDENT.TS

// destructive steps wait for a human — one line
export const incident = workflow("incident")
  .step("triage", oncall.duties.triage)
  .approval("on-call")      // ← the loop, in the path
  .step("remediate", oncall.duties.remediate)
  .commit()

$ npx agentist init · coming soon Quickstart →

With ours

Built alongside you

Harness engineering is what we do. For teams that want the loop around their agents without standing up the practice first, we build the harness with you — your agents, your cloud, engineered together.

We're onboarding a small group of design partners now. No packages or published pricing yet — this is a direction we're building openly, and it starts with a conversation.

Build with us contact@agentist.dev

Your agents should never act alone.

Approvals

Runs

Policies

Audit

Access

Registry

Coach — Cornelia

The model isn't where the trust lives.

Reasoning is now a commodity.

Production is decided around it.

That structure is engineered.

Manage agents the way you already manage people.

Consequential actions wait for a person.

incident #4821

Correct it once — like a 1:1.

Coach — Cornelia

Every action, one trail.

Audit

The loop runs on a deeper harness.

A harness is built. We make it the shortest path.

The framework

Built alongside you

Be first to put your agents under management.