Agentist
Build with us

Your agents should never act alone.

Agentist is the harness that helps you manage your AI agents at scale — a human approves what matters, you coach them like you'd manage a person, and every action lands on one audit trail.

Open-source Self-hosted Governed by default
agentist console — proddemo · example data
Runs Approvals2 Policies Audit Access Registry Coach

Approvals

runs paused, waiting on a personrouted: Console + Slack
incident #4821 is paused. Cornelia wants to act — and can't, until someone decides.waiting 4m 12s
C
Cornelia — SRE on-callproposes a destructive action · incident #4821
destructive
Proposed action
kubectl rollout undo deploy/checkout

Error rate is 9.2% since the last deploy and p95 is up 340ms. Rolling back restores the last good version (v411).

- image: checkout:v412+ image: checkout:v411
your decision — and your reason — is journaled with the run
gate .approval("on-call") · also live in Slack #sre-oncall · suspends durably, for minutes or days
RRune — Releasescale checkout 6 → 10 — sustained 5xx under load1m
An agent here can't act on a hallucination — because it can't act alone.

Runs

prod · last 24h47 runs · 2 running
runagentstatebudgetage
incident #4821Corneliasuspended · approval41k / 60k4m 12s
incident #4822Corneliarunning22k / 60k1m 40s
deploy-gate #903Runesuspended · approval6k / 60k1m 02s
cost-report #77Vegarunning9k / 60k0m 31s
backup-verify #88Atlasdone2k / 60k6h 09m
3 suspended on a human · 2 running · 1 failed⏎ open trace

Policies

admission control — checked before every step6 active
policyapplies toeffect
remediation-needs-approval*.remediate · destructiverequire approval
budget-per-run*cap budget ≤ $2
pii-redactiongateway · egressredact
prod-data-read-onlygateway · datadeny writes
The agent proposes; the runtime commits — only if a policy allows it.

Audit

append-only journal · tamper-evidentseq 1100–1184
seqtimeagentactionverdict
118414:02:14Corneliaresume: remediateapproved by you
118214:02:11Corneliamodel call: diagnoseadmitted
118114:02:09Corneliadata: metrics.queryredacted PII
117614:01:40Vegastep: scale 50 → 500denied · budget
gapless ✓no gaps, no tamper⏎ open call detail

Access

roles & approval authoritySSO: Okta · SCIM ✓
memberroleapproval authority
youadminall gates
sre-teamoperator.approval("on-call")
deploysoperator.approval("release")
financeviewernone
Every console action — approve, rollback, policy change — is itself journaled.

Registry

agents · versions · evalsprod
agentversionownerlast eval
Cornelia · SRE1.4.0platform34 / 37 ✓
Rune · Release0.9.2deploys28 / 28 ✓
Vega · Cost1.1.3finops2 pending
Atlas · Ops2.0.1platform19 / 19 ✓
deploy · pin · rollback — versions are manifests, rollback-ready

Coach — Cornelia

run incident #4820 · step triageeval set: 37 cases
Agent output
severity sev3

“Elevated latency — monitor.”

Your correction
severity sev2

“p99 of 5s on checkout is customer-facing — page on-call.”

Corrections become eval cases and charter notes — recorded, not remembered.
Why a harness

The model isn't where the trust lives.

01

Reasoning is now a commodity.

Every quarter the models get better, cheaper, and more interchangeable. A brilliant reasoner is something you buy — it's not where your agent's value, or its risk, is decided.

02

Production is decided around it.

An agent is judged by what it's allowed to do — who signs off, what gets recorded, what happens when it's wrong. None of it comes from the model.

03

That structure is engineered.

The structure that makes an agent bounded, observable, and accountable is a real discipline — harness engineering. It isn't instant or optional, but it is buildable.

Every agent that goes to production needs a harness.
The harness is the trust.
The loop

Manage agents the way you already manage people.

Three things the harness gives everyone — no code, no job title required. The consoles below are live; try them.

01Approve · control

Consequential actions wait for a person.

The agent proposes; it doesn't act. Anything destructive suspends the run — durably, for minutes or days — until someone with authority signs off, in the console or right in Slack.

An agent can't act on a hallucination, because it can't act alone. The sign-off isn't a courtesy notification after the fact — nothing happens without it.

in the framework: .approval("on-call") — one line

agentist console — runsdemo · example data

incident #4821

agent Cornelia — SRE on-callsuspended · approval
C
Cornelia — SRE on-callproposes a destructive action
destructive
Proposed action
kubectl rollout undo deploy/checkout

Error rate is 9.2% since the last deploy and p95 is up 340ms. Rolling back restores the last good version (v411).

- image: checkout:v412+ image: checkout:v411
also live in Slack #sre-oncall
gate .approval("on-call") · the run suspends durably — it resumes at the exact step
Nothing is committed without the decision — and the decision, with your reason, lands on the trail.
02Coach · management

Correct it once — like a 1:1.

You can talk to your agents, and when one gets something wrong you don't rewrite a prompt — you correct the decision. The correction becomes a test case the agent is graded against on every version, or a note to its charter, the mission and values it works from.

Agents get better the way people do — through feedback that sticks, not folklore that drifts.

in the framework: charter · evals · versioned agents — coaching is recorded, not remembered

agentist console — coachdemo · example data

Coach — Cornelia

run incident #4820 · step triageeval set: 37 cases
Agent output
severity sev3

“Elevated latency — monitor.”

Your correction
severity sev2

“p99 of 5s on checkout is customer-facing — page on-call.”

corrections are recorded, not remembered
03Audit · oversight

Every action, one trail.

Every step, every model call, every approval and denial lands on a single append-only journal — gapless and tamper-evident. Review across runs to judge whether the approvals were right, where to tighten, and where to step back.

Replay any captured call exactly — same input, same model, same route — and diff it against what happened. Oversight you can check, not vibes.

in the framework: event-sourced journal — the audit trail and the runtime state are the same record

agentist console — auditdemo · example data

Audit

append-only journal · tamper-evidentseq 1100–1184
seqtimeagentactionverdict
118414:02:14Corneliaresume: remediateapproved by you
118214:02:11Corneliamodel call: diagnoseadmitted
118114:02:09Corneliadata: metrics.queryredacted PII
118014:02:08Corneliamodel call: triageadmitted
117614:01:40Vegastep: scale 50 → 500denied · budget
gapless ✓no gaps, no tamper⏎ open call detail
re-issues the same input, model & route · diffs vs captured
Approvecontrol
Auditoversight
Coachmanagement
Policywhat you've learned
↺ policy narrows what ever needs your sign-off — you step back deliberately, not hopefully

Three habits compose into a management loop. Decisions become judgment, judgment becomes policy — and the chance that an agent ever acts on a hallucination stops being something you hope against and becomes something you've engineered out.

Behind the loop

The loop runs on a deeper harness.

Approve, coach, and audit are what everyone sees. Underneath, the same structure does the unglamorous work — this is the engineering, and the docs go all the way down.

This is the how — capability first, framework as the way there. Read the docs →
Engineered, not instant

A harness is built. We make it the shortest path.

No harness is automatic — and none should be scary. There are two ways to get yours.

With your engineers

The framework

Open-source and TypeScript-native. Your team defines agents as typed code; the runtime runs them durably in your own cloud, with the loop — approvals, coaching, audit — built in from the first line.

SRC/INCIDENT.TS
// destructive steps wait for a human — one line
export const incident = workflow("incident")
  .step("triage", oncall.duties.triage)
  .approval("on-call")      // ← the loop, in the path
  .step("remediate", oncall.duties.remediate)
  .commit()
$ npx agentist init · coming soon Quickstart →
With ours

Built alongside you

Harness engineering is what we do. For teams that want the loop around their agents without standing up the practice first, we build the harness with you — your agents, your cloud, engineered together.

We're onboarding a small group of design partners now. No packages or published pricing yet — this is a direction we're building openly, and it starts with a conversation.

Build with us contact@agentist.dev
Early access

Be first to put your agents under management.

Agentist is in active development. Reach out and we'll be in touch the moment you can get in.