Agentist
$ npx agentist init Build with us
Open-source Self-hosted Governed by default

Build, run & govern agents in production.

The agentic framework your platform engineers are missing.

Define agents as typed code. Our runtime runs them durably in your own cloud — with governance, audit, memory, and human approvals built in. Agentic for the intelligent enterprise.

$ npx agentist init Quickstart →
  • Typed agents & workflows — no autonomous loops
  • Independent critics catch bad reasoning — no self-grading
  • Governance, audit & human approvals are built in — day one
  • Self-host on any cloud — Claude, GPT, or open models
src/incident.ts
import { agent, workflow, runtime } from "@agentist/sdk"
import { anthropic } from "@ai-sdk/anthropic"
import { triage, remediate } from "./duties"

// an agent is an identity with duties
const oncall = agent({
  id: "oncall",
  identity: { name: "Otto", role: "On-call SRE agent" },
  duties: { triage, remediate },
})

// destructive steps wait for a human
export const incident = workflow("incident")
  .step("triage", oncall.duties.triage)
  .approval("on-call")
  .step("remediate", oncall.duties.remediate)
  .commit()

// your runtime — your data, your keys, your cloud
const rt = runtime({
  store: process.env.DATABASE_URL,
  llm:   anthropic("claude-sonnet-4"),
})
rt.register(oncall, incident).serve()
Self-hosted in your cloud · connect Agentist Cloud anytime
Architecture

Everything an agent needs to run

You author agents with the SDK; an Engine runs their reasoning — pluggable, ours or ADK / LangGraph; the Runtime is the harness that governs and persists every step; the Console is where you operate them; the Data plane gives governed access to your lakes; and the AI Cloud is the infrastructure it all runs in — your cloud, sovereign.

☁ Your AI Cloudyour infrastructure · multi-cloud · sovereign
SDKauthor · typed code
Runtimegovern · durable · hosts the engine
Consoleoperate · observe · govern
Enginereason · pluggable: Agentist · ADK · LangGraph
Data planesecure gateways → your data lakes
↕ Optionally connect to Agentist Cloud for a managed control plane, connectors & burst.

The runtime doesn't think — it governs, persists, and operates whatever does. Bring your own engine or use ours; either way every model, data, and tool call crosses the runtime's gateways (admission, budgets, audit), runs durably, and lands in one console — a complete harness around the model, composed and deployed as TypeScript infrastructure-as-code.

Harness engineering

You're building a harness — do it right

Putting an LLM in production isn't prompt-writing — it's harness engineering: building the structure, governance, memory, and observability around the model that make it safe. That's the platform engineer's job, and Agentist is how you do it in code you own — not a pile of prompt hacks that drift.

Structuretyped workflows, not an open-ended loop
Governancepolicy gates & human approvals, in the path
Memorydurable state & scoped recall
Gatewaysgoverned access to models & data
Observabilityevery call traced, audited, replayable
Connectorsthe outside world, typed & permissioned

Don't start from scratch: extend a base harness — on-call SRE, support, data analyst — with your own identity, charter, skills, and policies. Harness engineering by composition.

The model reasons; the harness keeps it bounded, observable, and accountable — the line between a demo and production, and between writing prompts and engineering a system.

Get started

Quickstart

From zero to a triaged alert in about ten minutes.

1 · Create a project

bash
npx agentist init oncall
cd oncall

2 · Run it locally

agentist dev starts a local runtime (in-memory) and a console at localhost:3000.

bash
agentist dev
agentist run oncall.triage "checkout p99 latency is 5s"   # positional prompt — no flag
json
{ "severity": "sev2" }

One typed contract, every caller

Every agent has a typed input schema. The runtime validates every invocation against it — from code, an event, or a person — so nobody hand-writes JSON.

API & SDKpost the typed shape
Webhooks & eventsstructured payloads, mapped
Console formauto-generated from the schema

The Console renders a form straight from the schema, so an operator can trigger a run without touching code.

SDK · @agentist/sdk

The authoring layer

Define agents as typed TypeScript. The SDK compiles your code into a manifest the engine executes and the runtime governs — three distinct layers, one typed source.

Typed primitivesagent · duty · skill · tool · workflow · policy · charter — held in your head
Schemas everywhereZod on every boundary; outputs validated, auto re-prompt on mismatch
Identity & cultureagents have a persona and inherit your mission & values
Skills & base harnessesreusable bundles of tools & duties; extend a preset (SRE, support, data)
Generator ≠ evaluatorcritics are separate agents — no self-grading
Versionedsemver-pinned agents; manifests are rollback-ready

Each gets its own section below — Agents, Tooling, Workflows, Governance, Approvals, Conversations.

SDK

Agents have an identity — and a culture

An agent isn't a pile of rules. It has an identity — who it is — and inherits your company's charter: the mission and values every agent shares. That's what gives it judgment and character, not rigidity.

typescript
// your company's charter — defined here, or pulled from Agentist Cloud
const acme = charter({
  mission: "Keep systems reliable and customer trust intact.",
  values: ["Bias to safety", "Explain your reasoning", "Escalate when unsure"],
})

const oncall = agent({
  extends: harness("on-call"),            // a base harness — extend, don't rebuild
  id: "oncall",
  identity: { name: "Otto", role: "On-call SRE", persona: "Calm, terse, evidence-driven." },
  charter: acme,                          // shared mission & culture
  skills: [k8s, prometheus, runbooks],    // reusable capabilities
  duties: { triage, remediate },
})
  • Identity — name, role, persona. The agent is someone, not a tone.
  • Charter — your mission and values, inherited by every agent. Define it in code, or manage it centrally in Agentist Cloud and pull it in — one mission across every team.
  • Duties — each is one typed call. Culture gives judgment; policies give hard guardrails. Character and control.
  • Skills — reusable capabilities (tools + duties) you compose in. Extend a base harness (SRE, support, data) instead of starting from zero.
SDK

Tooling

Typed functions an agent may call. The runtime runs them — the agent never touches a path or a raw query.

typescript
const queryMetrics = tool({
  id: "queryMetrics",
  input:  z.object({ service: z.string(), window: z.string() }),
  output: z.object({ p99Ms: z.number(), errorRate: z.number() }),
  run: ({ service, window }) => prometheus.query(service, window),
})

// inside a duty:
const m = await ctx.tools.queryMetrics({ service: a.service, window: "15m" })
SDK

Workflows

Compose duties into a typed graph. The control flow is explicit — not hidden inside the model.

typescript
export const incident = workflow("incident")
  .input(Alert)
  .step("triage", oncall.duties.triage)
  .delegate("findings", (alert) =>
    ["app","db","network"].map((area) => investigate.duties.scan.with({ alert, area })),
  )                                   // parallel sub-agents — durable & governed
  .step("diagnose", oncall.duties.diagnose, (s) => ({ findings: s.findings }))
  .approval("on-call")
  .step("remediate", oncall.duties.remediate)
  .commit()
  • .step · .branch on a typed field · .delegate to sub-agents in parallel · .approval for a human · .critic to evaluate.
  • .delegate runs sub-agents concurrently — each a durable, admission-checked step, gathered under a shared budget and one trace.
  • .commit() freezes the graph into the manifest the engine runs and the runtime governs.
SDK

Governance

Governance is enforced with policies — hard gates the runtime checks before every step. Agents propose, the runtime commits, and every decision is recorded.

typescript
const remediationGate = policy({
  id: "remediation-needs-approval",
  applies: { duty: "oncall.remediate" },
  check: ({ state }) =>
    state.remediate.destructive
      ? { effect: "approval", reason: "Destructive remediation needs approval" }
      : { effect: "allow" },
})

runtime({ policies: [remediationGate] })   // applies to every agent — no one can forget it
SDK

Approvals

An approval pauses a run until a human signs off. The workflow suspends durably, waits in the Console, and resumes exactly where it paused — even days later.

typescript
workflow("remediate")
  .step("plan", oncall.duties.plan)
  .approval("on-call")                     // pauses → waits in the approvals inbox
  .step("apply", oncall.duties.apply)
  .commit()

await rt.resume(runId, "on-call", { by: "kc@kc.io", decision: "approve" })

Approvals don't have to live in the Console — route one to Slack, Jira, GitHub, or email through connectors, wherever your on-call already is. Same audit trail, anywhere.

SDK

Agents you can talk to

Your agents are always on and addressable. You talk to them; they talk back; and when something needs doing, they can start the conversation. But this isn't a chatbot bolted on — the conversation runs on the same governed, durable, audited substrate as everything else. The chat is the control plane.

typescript
const otto = rt.agent("oncall")   // every agent is addressable, always on

// talk to it — its typed duties run underneath the conversation
await otto.say("why is checkout slow right now?")

// it can open the thread itself — proactive, on its own triggers
await otto.notify("on-call", "p99 is climbing on checkout — I'd roll back 4f2a.")

// a destructive step surfaces as an approval *in the same thread*,
// admission-checked like any other step — reply to resume the run

Because a run is already an event-sourced thread and an agent already has an identity, "talk to your agent" and "continue this run" are the same thing. The approval, the chat, and the audit entry are one journaled trail — not three separate surfaces.

Always onevery agent is addressable the moment it's registered — and any past or running execution resumes as a conversation
Proactiveagents open the thread when something needs you — on a schedule, an event, or a budget breach
Governed conversationevery turn passes admission control — an agent chats freely but cannot act destructively without policy clearing it
Conversation = approval = audita gated step appears inline; reply to resume — and the chat, the approval, and the audit are one journaled trail
It's someoneconsistent identity, persona & charter — an agent with character, anywhere it runs (Console, Slack, your app)
You ↔ Ottoalways-on, addressable
Destructive stepadmission check
Approve in threadreply "go"
Resume · auditedone journaled trail
Engine · @agentist/engine

The reasoning engine — pluggable

The engine is the only layer that thinks: it runs the reason → act → observe loop and multi-agent orchestration. Everything around it — authoring, governance, durability, operations — is the runtime. The engine plugs in behind one contract — it proposes a step; the runtime commits it — so whether you run the Agentist engine or bring ADK, every reasoning step, tool call, and delegation is governed the same way.

reasonplan next step
admitpolicy · budget · types
acttool · model · delegate
observejournal result
Engine

Reasoning steps

Each turn of the loop is typed and governed: the engine proposes a step, the runtime admits it, the result is journaled. The whole reasoning trace is replayable — you see exactly what the agent thought and did.

reason p99 on checkout is 5s — check the last deploy act k8s.deploys.recent() ✓ admitted observe { deploy: v412, at: 14:02 } reason v412 correlates — propose a rollback to v411 act remediate(v411) ⏸ approval · on-call
  • Typed steps — each step has typed inputs and outputs, not free text.
  • Governed per step — admitted before it runs; budget enforced as it goes.
  • Replayable — the full trace is journaled; re-issue any captured step.
Engine

Delegate to sub-agents

The engine hands work to sub-agents in parallel and gathers typed results. Each branch is a durable, admission-checked step under a shared budget — in one trace.

typescript
const [diagnosis, signals, logs] = await ctx.delegate([
  k8s.duties.diagnose(a),
  metrics.duties.analyze(a),
  logs.duties.scan(a),
])   // parallel sub-agents — durable, governed, gathered
Engine

Tool use, governed

The engine selects and calls tools — typed functions and MCP servers alike. Every call crosses the gateway, so it's validated, budgeted, and audited; the agent never touches a credential or a path.

typescript
const recentDeploys = tool({
  input: z.object({ ns: z.string() }),
  run: (a) => kube.deploys(a.ns),     // server-side, validated I/O
})
// the engine calls it; the runtime admits & journals every invocation
Engine

Context engineering

The hardest part of any agent loop is deciding what the model reasons over. The engine gives you tools to shape the working window — pin facts, pull in recall, compact old steps, drop noise — all budget-aware. (Distinct from durable state and long-term recall.)

typescript
// shape what the model reasons over, mid-loop
ctx.context.pin(runbook)                                   // keep all run
const hits = await ctx.context.recall("past p99 incidents", { topK: 5 })
ctx.context.compact({ olderThan: 12 })                    // summarize old steps
ctx.context.drop("raw.logs")                              // evict noisy output

Or declare a context policy on the engine, applied every step:

typescript
agentist({
  context: {
    budget:  "32k",                 // token budget for the window
    include: [charter, runbooks],   // always in context
    compact: "summarize",           // when it fills, summarize the oldest
  },
})
  • Pin & include — keep the charter, runbooks, or key facts in the window for the whole run.
  • Recall into context — pull only the relevant, policy-checked slices from the data plane.
  • Compact & drop — summarize old steps and evict noisy tool output to stay in budget.
Engine

Bring a proven engine

An adapter conforms an existing framework to the engine port — so it runs inside the Agentist harness, in your cloud, durable and governed, without rewriting your agents.

typescript
import { runtime } from "@agentist/sdk"
import { adk } from "@agentist/engine-adk"   // also: langgraph · autogen · crewai

runtime({ engine: adk() })   // ADK reasons; the runtime governs every call
  • Conforms to the port — propose→commit; the harness, console, and contracts don't change.
  • Durable at the boundary — each call is a journaled step (the native engine adds mid-loop durability).
  • ADK · LangGraph · AutoGen · CrewAI — swap without touching governance.
Engine

The Agentist engine

The first-party engine goes where a wrapper can't — because it owns the loop, governance and durability reach inside the reasoning, not only the egress. It's the default.

typescript
import { runtime } from "@agentist/sdk"
import { agentist } from "@agentist/engine"

runtime({ engine: agentist() })   // typed reasoning, governed end to end
  • Admission at the reasoning step — shape what the model may propose, not only what it runs.
  • Mid-loop durable execution — event-source every step; resume mid-thought.
  • Typed end to end — the reasoning graph is typed; delegate is a primitive.
  • Charter-native judgment — your mission & values govern the reasoning as it happens.
Engine

MCP & A2A

Whichever engine runs, your agents stay reachable and composable: expose them as MCP tools and A2A servers, and consume external MCP tools and A2A agents — all through the governed gateway.

typescript
rt.mcp({ port: 8080 })                     // every agent → an MCP tool
oncall.a2a()                               // expose as an A2A server (Agent Card)
const peer = a2a("https://acme.dev/sre")   // consume an external A2A agent
Runtime · @agentist/runtime

The harness — it governs, it doesn't think

The runtime is the harness, not the brain: it hosts the engine behind one contract — the engine proposes, the runtime commits — and runs every step as a durable state machine in your own cloud, committing only if policy, budget, and types allow. Swap the engine; the harness is unchanged.

Every run is a durable state machine

cron
event
webhook
API
MCP
admitidempotency-keyed · dedup
durable queuebacklog · worker leases
per stepadmit → gateway → journal
complete

A trigger admits a run under an idempotency key; a worker leases each step, runs it through governance and the gateway, and journals the result before moving on. Crash mid-run — a deploy, an OOM, a lost node — and it resumes from the last journaled step. A step can also suspend for days awaiting an approval, a timer, or a signal, then resume at the exact point. Exactly once, every time.

Configure once — it applies to every agent

typescript
const rt = runtime({
  store:     postgres(process.env.DATABASE_URL),  // durable, resumable state
  llm:       anthropic("claude-sonnet"),          // default model
  policies:  [remediationGate],                   // admission control — runs before every step
  memory:    pgvector(),                           // scoped, per-tenant recall
  isolation: "container",                          // sandbox each agent
  audit:     true,                                 // OpenTelemetry + replayable log
})

rt.register(oncall, incident)
rt.serve()   // exposes the API, MCP, and the scheduler

Each concern below — durability, admission control, isolation, scale, observability, memory, triggers — is configured once here and enforced uniformly across every agent you register.

Runtime · Local

Local Development

One command. In-memory runtime, local console, instant feedback — then deploy the same manifest to your cloud.

bash
agentist dev          # runtime + console at localhost:3000
Runtime

Durability & recovery

Every run is event-sourced: each step's result is appended to a journal in Postgres before the next step begins. State lives in the database you already run — there's no separate workflow cluster to operate.

typescript
const rt = runtime({ store: postgres(process.env.DATABASE_URL) })
// crash mid-run? it resumes from the last journaled step —
await rt.resume(runId)        // exactly once, on any worker
Event-sourced executionevery step appended to an ordered journal; a run's state is the replay of its events, not in-memory hope
Crash-resumedeploy, OOM, or a lost node — the run resumes from the last journaled step on any worker, no work redone
Exactly-onceidempotency keys on every trigger and step de-dupe redeliveries and retries — an agent acts once, never twice
Suspend & resumepark a run for days on an approval, timer, or external signal; it survives restarts and resumes at the exact point
Retries & dead-letterper-step retry policy with backoff; exhausted steps land in a dead-letter queue for inspection & replay
Storage adaptersPostgres primary; SQLite & Redis for dev and queues; object storage for large artifacts
Runtime

Admission control

Most frameworks watch an agent act and escalate afterward. Agentist runs policy as an admission check before every step and state change — the agent proposes an action, and the runtime commits it only if policy, budget, and types allow. Preventive, not forensic.

typescript
const gate = policy({
  applies: { duty: "oncall.remediate" },
  check: (a, ctx) => a.sev === "1"
    ? ctx.approval("on-call")   // pause for a human
    : ctx.allow(),              // otherwise admit
})
Propose → commitevery tool call, model call, and state write is gated by policy before it runs; nothing side-effects without admission
Typed boundariesZod-validated I/O on every step; malformed output is rejected at the boundary, not discovered downstream
Budgets in-pathper-run caps on tokens & tool calls, enforced as the run executes — cut off before overspend, not after
Human approvals.approval(role) suspends the run durably and routes to Slack, Jira, GitHub, or the Console
Verdicts journaledevery admission decision — admitted, denied, redacted, approved-by — is recorded for audit & replay
Invocation contextevery call carries caller, tenant isolation key, trace id, and budget — policy decides with full context
Runtime

Isolation & security

Agents are untrusted code by default. Each runs sandboxed, scoped to its tenant, with secrets and provider keys held by the runtime — never within agent reach.

typescript
runtime({ isolation: "microvm" })  // process · container · microvm
// per-agent sandbox; the runtime holds secrets, never the agent
Isolation tiersprocess now, container for untrusted code, microVM on the horizon — pick the blast radius per agent
Tenant isolationmemory, state, and recall scoped per tenant by isolation key — one tenant never sees another's data
Secrets vaultcredentials and provider keys held by the runtime; agents reference them, never read them
mTLS everywhereservice-to-service traffic mutually authenticated and encrypted inside your cluster
Egress controlagents reach the network only through the gateway — no unsanctioned outbound calls
Sovereign by defaultthe whole runtime runs in your boundary; agents and data never leave it
Runtime

Scaling & concurrency

Workers are stateless and the queue is durable, so throughput is just a function of how many workers you run. One pool runs both live agents and batch jobs; scale horizontally, autoscale on backlog, and keep tenants fair.

typescript
runtime({ workers: { min: 2, max: 50, scaleOn: "queueDepth" } })
// stateless workers lease a durable queue — add more to scale
Horizontal workersstateless workers — run one or a hundred; the durable queue load-balances leases across them
Autoscalingscale on queue depth & lease backlog — idle to zero, burst on a flood of triggers
Concurrency & fairnessper-tenant and per-agent concurrency caps so one workload can't starve another
Rate & backpressureadmission throttles when downstream models or data sources saturate — no thundering herd
Burst beyond your clusteroverflow to Agentist Cloud or a second cloud when local capacity saturates
Zero-downtime deploysdrain leases and roll workers without dropping in-flight runs — they resume from the journal
Runtime

Observability & tracing

Every step, call, and decision emits a span and a journal entry — so you can watch the fleet live, trace one request across many agents, and reconstruct any past run exactly.

typescript
runtime({ audit: true })       // OTel spans + append-only journal
const out = await rt.replay(seq)   // re-issue the exact call
// same input, model & route — diff vs what was captured
OpenTelemetry nativetraces, metrics & logs on every step and gateway call — export to Datadog, Grafana, Honeycomb, anything OTLP
Multi-agent tracingone distributed trace spans agent→agent and agent→tool calls — follow a request across the whole fleet, not a single process
Performance monitoringper-step latency, token & tool-call throughput, queue depth and lease lag — the operational vitals, in your APM
Append-only audita tamper-evident journal of every step, call, and governance verdict — gapless sequence, evidence on tap
Exact replayre-issue any captured model or data call with the same inputs and route; diff the new output against the original
Live run inspectionattach to a running trace in the console — spans, budgets, and verdicts as they happen
Runtime

Durable state

The runtime persists run and agent state with the journal — it survives restarts and is available on resume. Long-term recall lives in the data plane; the engine's short-term working context is separate.

typescript
await ctx.state.put("incident.severity", "sev1")   // per-run, journaled
const sev = await ctx.state.get("incident.severity")
Persisted with the journalper-run and per-agent state — survives restarts, available on resume
Tenant-scopedpartitioned by isolation key; state never crosses a tenant boundary
TTL & pinningexpire ephemeral values, pin durable facts — control what persists and for how long
Runtime

Triggers & scheduling

A run can start from anywhere — and every entry point is idempotency-keyed, so a redelivered event or a double-click never fires an agent twice.

typescript
rt.cron("0 9 * * 1", weekly.report)    // schedule
rt.on("deploy.failed", oncall.triage)  // internal event
rt.serve()   // + webhooks, API & MCP — all idempotency-keyed
Cron & schedulestime-based runs with timezone-aware schedules and a missed-run policy
Eventsfire on internal domain events or an external bus — fan out to many agents from one event
Webhooksturn any external system's webhook into a governed, typed run
API & SDKPOST a run or call the typed client — see Invocation
MCPevery agent is exposed as an MCP tool; external MCP tools are consumed as agent tools
Idempotency-keyedevery trigger carries a key; redeliveries and retries de-dupe to exactly-once
Runtime · Gateway

The gateway

One governed boundary, both directions: agents reach out to models, data, and tools, and the world reaches in to call your agents. Every call — egress or invocation — is validated, policy-checked, and audited.

Agent
Gatewaypolicy · keys · audit
InferenceClaude · GPT · open
Datalakes · warehouses
ConnectorsSlack · GitHub · …

Pick a model per agent — or per duty. The gateway routes each to whatever your cloud runs.

typescript
const oncall = agent({
  id: "oncall",
  model: "claude-sonnet",                            // default for this agent
  duties: {
    triage:   duty({ model: "claude-haiku", /* … */ }),   // fast & cheap
    diagnose: duty({ model: "claude-opus",  /* … */ }),   // deep reasoning
  },
})
// gateway maps models per cloud: AWS→Bedrock · GCP→Vertex · Azure→OpenAI · bare-metal→vLLM
Multi-model routingClaude, GPT, open models — per agent, per duty, or by rule
Fallback & retriesdegrade to a backup model on error or rate limit
Cachingdedupe identical calls; cut latency and spend
Cost & rate controlsper-agent budgets and limits, enforced in the path
Prompt governancePII redaction & policy checks before anything leaves your boundary
Key managementprovider keys held by the gateway, never by agents
Bring your own endpointself-hosted or private models via the LLMProvider interface
Full auditevery model call captured and replayable

How it's deployed

The gateway ships inside the runtime — it deploys with it, in your cluster. Provider keys live in the gateway, in your boundary; models connect through each cloud's native service (Bedrock, Vertex, Azure OpenAI) or a self-hosted endpoint (vLLM, Ollama). Self-hosted, nothing routes through us.

With Agentist Cloud

Connect Agentist Cloud and the gateway gains a managed layer — pooled model rates, cross-cloud routing & GPU burst, a shared response cache, automatic provider fallback, and org-wide spend governance — while your prompts and data still never leave your boundary.

Security & compliance

Because every model and data call crosses the gateway, it's the one place compliance is both enforced and evidenced. Provider keys stay in-vault, services talk over mTLS, PII is redacted in-path, and every egress and invocation is journaled to the tamper-evident audit log — so SOC 2 evidence is a query, not a quarter-long scramble. Nothing leaves your boundary.

Invocation — one input, every caller

The same gateway governs the inbound side. An agent or duty declares one typed input schema; every caller — a human, another agent, your app, an API, or an MCP client — satisfies that one contract. Each entry is policy-checked, idempotency-keyed, and audited like any other call.

Human · CLIa positional prompt is the default — agentist run oncall.triage "p99 5s"; use flags or --json for typed multi-field input
Human · Consolestart a run from the console with a prompt box or a typed form
Agent → agenta typed call — triage.with({ … }) — traced end to end across agents
API · SDKtyped client, or POST /v1/runs/<agent.duty> with a JSON body matching the schema
MCPthe input schema becomes the tool's inputSchema — call from Claude or Cursor
Webhook · event · cronmap an external payload to the schema; validated & idempotency-keyed on the way in
typescript
// one typed input contract — a single field is positional on the CLI
const triage = duty({ input: z.object({ prompt: z.string() }), /* … */ })

// HUMAN · CLI — the prompt is the default; no flag needed
//   agentist run oncall.triage "checkout p99 latency is 5s"

// AGENT → AGENT — typed, traced across agents
const out = await triage.with({ prompt: "checkout p99 5s" })

// API · SDK — same contract, over the network
const run = await client.run(oncall.triage, { prompt: "checkout p99 5s" })
//   POST /v1/runs/oncall.triage   { "prompt": "checkout p99 5s" }

// MCP — the schema is the tool; expose every agent
rt.mcp({ port: 8080 })
Infrastructure · @agentist/infra

The platform, composed in code

Defining agents is half the job; the other half is standing up what they run on — a runtime, a store, model endpoints, the gateway, a vector store, sandboxes. Agentist gives you that as TypeScript infrastructure-as-code: compose the platform's components in one typed file, and agentist deploy materializes them onto Kubernetes — in your cloud or ours, from the same code.

typescript
import { stack, gateway, models, vectors, postgres, image } from "@agentist/infra"
import { bedrock } from "@agentist/infra/aws"

export default stack({
  target: "k8s",                          // the universal substrate
  store:  postgres({ instance: "db.r6g.large" }),
  models: [
    models.serve("llama-3.1-70b", {
      gpu: "H100:2",                      // multi-GPU, tensor-parallel
      image: image.vllm("0.6.3"),         // the serving image, in code
      autoscale: { min: 0, max: 8, scaledown: "60s" },   // scale-to-zero
      concurrency: 32,                    // inputs in flight per replica
      snapshot: true,                     // GPU snapshot → seconds-cold-start
    }),
    models.bind(bedrock("claude-sonnet-4")),   // or bind your cloud's
  ],
  gateway:   gateway({ egress: "deny", redactPII: true }),
  vectors:   vectors.pgvector(),
  sandboxes: { isolation: "microvm", idleTimeout: "5m" },
})
//   agentist deploy --to your-cluster   # your cloud or ours, same code
stack.tsTypeScript IaC
agentist deploycompile to k8s
Runtime · Gateway
Models on GPUs
Postgres · pgvector
Sandboxes
Your cloudsovereign
One language end to endinfrastructure in the same typed TypeScript as your agents — types and autocomplete across the whole platform
Kubernetes everywhereone definition runs in your cloud, ours, on-prem, or air-gapped — k8s is the only assumption
Granular per-workload controlGPU type & count, CPU, memory, autoscaling, concurrency & batching — set on each component independently
Provider abstractionsdeclare a need — object storage, a Postgres, a model — and bind it to Bedrock, S3, RDS, GCS, or self-hosted
Sovereign, not hostedunlike hosted infra clouds, the components land in your boundary — data and models never leave
Scale-to-zeroGPU model endpoints autoscale on demand and idle down to nothing — pay per second of use
Infrastructure

The stack

A stack is the unit of infrastructure — every component an agent platform needs, declared in one typed file: runtime, store, models, gateway, vectors, sandboxes, tool backends, schedules and queues. agentist deploy plans the diff and applies it incrementally; the stack's typed outputs wire straight into runtime({…}).

typescript
import platform from "./stack"     // the typed infra definition
await platform.plan()              // show the diff first
const out = await platform.apply() // incremental & reversible

// the stack's typed outputs wire straight into the runtime
runtime({ store: out.store, llm: out.models.claude })
Infra as typed codethe same TypeScript as your agents — types, autocomplete & refactors across the whole platform
The component setruntime, store, gateway, vectors, models, sandboxes, tools, schedules & queues — composed, not hand-wired
Plan & applyevery deploy shows a diff first; apply is incremental and reversible — no surprise teardowns
Typed outputsa stack returns typed handles — store, models, gateway — the runtime consumes directly; no copy-pasting URLs
Environments & modulesdev, staging & prod from one definition; group components into reusable modules shared across teams
State & driftdeploy state is tracked; drift is detected and reconciled on the next apply
Infrastructure

Compute & autoscaling

Every workload — a model endpoint, a tool backend, a batch job — gets fine-grained, per-component control over resources and scaling. The granularity of a serverless platform, declared in code and run on your own Kubernetes.

typescript
models.serve("llama-3.1-70b", {
  gpu: "H100:2", cpu: 8, memory: "32Gi",   // per-workload resources
  autoscale: { min: 0, max: 8, scaledown: "60s" },
  concurrency: 32,                          // inputs in flight / replica
  batch: { maxSize: 16, window: "10ms" },   // dynamic batching
  snapshot: true,                           // seconds-cold-start
})
Per-workload resourcesset GPU type & count, CPU, and memory on each component independently — right-size every piece
Autoscale knobsmin, max & buffer replicas, target concurrency, and a scaledown window — per workload, not one global setting
Scale-to-zeroidle workloads drop to zero replicas and cold-start on demand — pay only for what actually runs
Concurrency & batchinginputs-in-flight per replica and dynamic request batching for throughput under load
Fast cold startsmemory & GPU snapshots restore warm state — a scaled-to-zero endpoint resumes in seconds, not minutes
Timeouts & retriesper-component execution timeouts, retry policy, and backpressure when downstream saturates
Infrastructure

Images & environments

Define the exact runtime a component runs in — base image, system packages, dependencies — in the same TypeScript. Builds are layered and content-addressed, so a one-line change rebuilds one layer, not the world.

typescript
const env = image
  .debianSlim()
  .apt("git", "ffmpeg")
  .pip("vllm==0.6.3", "transformers")
  .run("python -m warm_cache")     // build-time step, cached layer
Images in codedeclare base image, apt & pip/npm deps, and build steps with a typed, chainable builder
Layered cachecontent-addressed layers — change one dependency, rebuild one layer
Bring a Dockerfilestart from an existing Dockerfile or registry image when you'd rather
GPU build stepsrun build-time steps on a GPU — compile kernels, warm caches, bake weights
Pinned & reproduciblelockfile-pinned, content-hashed images — the same build everywhere
Per-componenteach model, sandbox, or tool backend runs its own image — no monolith
Infrastructure

Models & GPUs

Serve open models on your own GPUs, or bind to your cloud's hosted models — the same models interface either way, every call routed through the governed gateway.

typescript
// self-host on your GPUs — OpenAI-compatible endpoint
const llama  = models.serve("llama-3.1-70b", { gpu: "H100" })
// or bind your cloud's hosted model — no servers to run
const claude = models.bind(bedrock("claude-sonnet-4"))
Self-host on GPUsLlama, Qwen, Mistral on T4 → H100/H200 — single- or multi-GPU with tensor parallelism
vLLM-backedhigh-throughput serving with continuous batching behind an OpenAI-compatible API
Bind hosted modelsBedrock, Vertex, Anthropic, or OpenAI in one line — no servers to run
Scale-to-zero + snapshotsGPU endpoints idle to nothing and cold-start in seconds from a snapshot
Weight & cache volumesmodel weights and compile artifacts cached on a volume across restarts
Per-duty routingcheap duties to a small model, deep reasoning to a large one — per agent, per duty, or by rule
Infrastructure

Sandboxes for untrusted code

When an agent writes code, it runs in a disposable sandbox — an isolated container provisioned on demand, scoped, and torn down. The execution substrate for tools that run generated code.

typescript
const box = await sandbox.create({ image: env, isolation: "microvm" })
const { stdout, exitCode } = await box.exec("python solve.py")
await box.snapshot("checkpoint")   // pause & resume later
await box.terminate()
On-demand isolationa fresh sandbox per task — process, container, or microVM blast radius
Exec & streamrun commands, stream stdout/stderr, capture exit codes
Ephemeral by defaultauto-terminate on idle or timeout — nothing lingers
Snapshotscheckpoint a sandbox's filesystem to pause and resume long-running sessions
Tunnels & mountsscoped network tunnels and only the volumes & secrets a task needs
Governed egressa sandbox reaches the network only through the gateway and policy
Infrastructure

Jobs & batch

Not everything is a live request. Embedding a corpus, an eval suite, or a nightly backfill is a batch job — run on the same worker pool as live agents, burst up for the work and back to zero when it drains. (Sub-agent parallelism within a run is delegate; this is data-level parallelism over a collection. Scheduling lives in Triggers.)

typescript
// map a function across a collection — data parallelism on the worker pool
const vectors = await job.map(corpus, embed)   // 10k docs, results gathered
// or long-running work, collected later
const handle  = await job.spawn(nightlyBackfill)
Parallel mapmap across thousands of inputs on the worker pool — ordered results, failures isolated per item
Background jobsspawn long-running work and collect it later — no request held open
Batch pipelineschain stages with durable queues between them — ingest → embed → index
Burst & scale-to-zerothe worker pool scales up for the batch and back to nothing when it's done
Same governanceevery job step is admission-checked, budgeted, and audited like any run
Infrastructure

Provider abstractions

A component declares an abstract need — object storage, a Postgres, a model. You bind it to a cloud-specific provider. Swap the binding, not the code.

typescript
stack({
  store:   rds("agentist-prod"),   // or cloudSql() · alloyDb() · a URL
  storage: s3("artifacts"),         // or gcs() · azureBlob() · minio()
  secrets: awsSecrets(),            // or vault() · gcpSecrets()
})
Object storageS3, GCS, Azure Blob, or in-cluster MinIO — behind one interface
Managed PostgresRDS, Cloud SQL, AlloyDB, CloudNativePG, or an existing connection URL
Model providersBedrock, Vertex, Anthropic, OpenAI, or a self-hosted vLLM endpoint
Secretsyour cloud's secret manager or the runtime vault, injected at deploy time
NetworkingVPC, private endpoints, and ingress mapped to each cloud's primitives
One matrix, your pickchoose per environment — in-cluster for dev, managed for prod
Infrastructure

Deploy anywhere

agentist deploy compiles the stack to Kubernetes and rolls it out — your cloud, ours, on-prem, or air-gapped. Kubernetes is the only assumption.

bash
agentist deploy --plan          # dry-run: show the diff first
agentist deploy --to prod       # compile → apply to your cluster
agentist deploy --to air-gapped --offline
One commandcompile, diff, and apply the whole platform with agentist deploy
GitOps-readythe stack is code — review infrastructure in PRs and deploy from CI
Zero-downtime rolloutsdrain and roll workers without dropping in-flight runs — they resume from the journal
Air-gappeddeploy into disconnected environments — no phone-home required
Your cloud or oursidentical definition; managed conveniences in Agentist Cloud, never lock-in
Helm & operators underneathstandard Kubernetes artifacts you can inspect, own, and extend
Cloud

Your AI Cloud, composed in the cloud

The whole platform runs in your cloud — deploy the runtime with one Helm chart into AWS, GCP, Azure, DigitalOcean, or bare metal. Your agents and data never leave your boundary, and it's all stood up as infrastructure-as-code.

Optional: Agentist Cloud

Plug in Agentist Cloud — the managed control plane to govern the whole agentic estate (charter, identities, RBAC, policies, registry, cost) across every team, with connectors and burst capacity. Your data plane still never leaves your boundary.

☁ Your AI CloudAWS · GCP · Azure · DigitalOcean · bare metal
Agentist Runtimeyour agents & data · in your boundary
Gatewaysyour models · your data lakes
↕ Optionally connect Agentist Cloud — managed control plane, connectors & burst.
bash
# deploy to any cloud, then connect Agentist
agentist deploy --target k8s
agentist cloud connect
☁ Your AI Cloudself-hosted runtime + data
Agentist Cloudmanaged plane
Charter & culture
Agent registry
Identity & RBAC
Policy management
Connectors & compliance
Cost & multi-cloud burst
Charter & culturemanage mission & values org-wide; every agent inherits them
Agent registrycatalog every agent, skill & base harness across teams, versioned
Identity & RBACSSO, SCIM, roles & scopes per team and per agent
Policy managementorg-wide policies & approval routes, centrally enforced
Skills & harness librarypublish & share base harnesses and skills company-wide
Training & evalsturn runs & coaching feedback into eval sets and improved agents
Secrets & keysa central vault — credentials rotated and scoped per agent
Cost & quota governancebudgets, limits & spend analytics per team and agent
Managed control planehosted Console & one view across every team's fleet
Connector marketplacevetted, enterprise-grade connectors & vertical packs
Compliance packsSOC2 / HIPAA bundles + exportable evidence
Multi-cloud, burstableburst compute & GPUs across AWS, GCP & Azure on demand
Support & LTSSLAs, long-term support, dedicated assistance

Self-host vs. Agentist Cloud

Self-host (open core)+ Agentist Cloud
Runtime & dataYour cloud / clusterYour cloud — unchanged
ComputeYour cluster+ multi-cloud burst & GPUs
Control planeSelf-hosted ConsoleManaged / hosted Console
ConnectorsOpen core + DIYMarketplace + vertical packs
ComplianceDIYSOC2 / HIPAA packs + evidence
SupportCommunitySLAs · LTS
Cloud · AWS

Deploy on AWS

Run the runtime on EKS, with RDS for Postgres and S3 for object storage. Models run on Amazon Bedrock (Claude, Llama, Titan); burst GPU on EC2 for self-hosted models.

bash
agentist deploy --target eks \
  --set postgres.url=$RDS_URL \
  --set objectStore=s3://acme-agentist \
  --set gateway.models=bedrock          # Claude on Bedrock
Cloud · GCP

Deploy on Google Cloud

Run on GKE, with Cloud SQL for Postgres and GCS for object storage. Models run on Vertex AI (Claude, Gemini); burst to GPU node pools on demand.

bash
agentist deploy --target gke \
  --set postgres.url=$CLOUDSQL_URL \
  --set objectStore=gs://acme-agentist \
  --set gateway.models=vertex           # Claude / Gemini on Vertex
Cloud · Azure

Deploy on Azure

Run on AKS, with Azure Database for PostgreSQL and Blob Storage. Models run on Azure OpenAI or Azure-hosted endpoints.

bash
agentist deploy --target aks \
  --set postgres.url=$AZURE_PG_URL \
  --set objectStore=az://acme-agentist \
  --set gateway.models=azure-openai
Cloud · DigitalOcean

Deploy on DigitalOcean

Run on DOKS, with Managed Postgres and Spaces for object storage. Use the Anthropic API or self-host models on GPU droplets.

bash
agentist deploy --target doks \
  --set postgres.url=$DO_PG_URL \
  --set objectStore=spaces://acme-agentist \
  --set gateway.models=anthropic
Cloud · Bare metal

Deploy on bare metal

Air-gapped or on-prem. Run on k3s or kubeadm, with self-managed Postgres and MinIO. Self-host models with vLLM or Ollama — no external calls leave the building.

bash
agentist deploy --target k8s \
  --set postgres.url=$PG_URL \
  --set objectStore=minio://agentist \
  --set gateway.models=vllm \
  --set airgapped=true
Cloud · Multi-cloud

Multi-cloud & burst

Run the data plane in your primary cloud and burst compute, GPUs, or inference into others on demand — one logical fleet, one control plane, governed centrally.

Why fleets need it: agent load is spiky — one incident or batch can fire hundreds of agents at once, each making GPU-heavy inference calls. Bursting spills that spike to spare GPUs in another cloud instead of queuing or failing, so latency stays flat and you pay for peak capacity only when you actually hit it.

☁ Your AI Cloudone fleet across providers
PrimaryAWS · runtime + data
BurstGCP · GPU + inference
Burstbare metal · private models
Cost- & latency-aware scheduling routes work to the cheapest healthy capacity; data stays pinned to your primary.
bash
# primary cloud + burst targets — one fleet
agentist deploy --target eks --set primary=true
agentist cloud burst add --target gke --gpu
agentist cloud burst add --target k8s --models=vllm
One fleetmany clouds, one control plane & audit trail
GPU burstspill to another cloud's GPUs under load
Model routingper-agent models mapped to each cloud's provider
Cost-awareschedule to the cheapest healthy capacity
Data stays putdata plane pinned to primary; only compute bursts
Failoverdrain & reschedule if a region or cloud degrades
Console

Operate · observe · coach

The control plane for your agents — watch every run, audit any decision, and coach agents to get better. CLI-first: launch agentist console from your terminal, or host it centrally in your cloud's data plane. Self-hosted, or managed via Agentist Cloud.

agentist console — incident fleet
env: prod● self-hosted · data in your boundarylast 24h
runs · prod · last 24h
suspendedrunningfaileddeniedmine
runagentstatewaiting-onbudgettriggerage
incident #4821Cornelia·SREsuspended(approval)on-call · 4m 12s41k/60kwebhook·a3f4m 12s
deploy-gate #903Rune·Releasesuspended(approval)release · 1m 02s6k/60kapi·77c1m 02s
backup-verify #88Atlas·Opssuspended(timer)timer · 17h 51m2k/60kcron·d106h 09m
incident #4822Cornelia·SRErunning22k/60kwebhook·b1e1m 40s
cost-report #77Vega·Costrunning9k/60kcron·e220m 31s
reindex #410Atlas·Opsfailed— step 3/618k/60kevent·f902m ago
incident #4820Cornelia·SREdone31k/60kwebhook·9c23m 41s
47 runs · 3 suspended on human · 2 running · 1 failed⏎ open trace
approvals · 2 pending · routed to #sre-oncall (Slack), Console
destructive  gate .approval("on-call") · 4m 12s in suspend
Cornelia · SRE on-call · incident #4821
kubectl rollout undo deploy/checkout
error rate 9.2% post-deploy · p95 +340ms — restores last-good v411.
- image: checkout:v412+ image: checkout:v411
also live in Slack #sre-oncall
gate .approval("release") · 1m 02s in suspend
Rune · Release · deploy-gate #903
scale checkout 6 → 10
sustained 5xx under load — +4 replicas, within budget.
observe · prod · last 24h · ▲ vs prior 24h
⛔ stuck on humans
3 runs
oldest 4m ▲
→ Approvals
◷ suspend backlog
11 runs
2 overdue ▲
→ Runs: suspended
⛔ admission denials
14
▲ +9 vs 24h
→ Policies: log
⚠ budget pressure
2 runs
≥80% budget
→ Runs: budget
✓ failed & retrying
1 failed
retries exhausted
→ Runs: failed
✓ lease health
8/8
0 orphaned workers
→ workers
⚠ gateway egress
data 2% 429
fallback ON · inference ok
→ Gateway
✓ p95 step latency
2.3s
within SLA · per-agent below
→ traces
by agent · latency & spend live here, where they're actionable
agentrunsp95 stepdeniedsuspendedbudget cut-offsspend
Cornelia·SRE6122.6s931$22
Rune·Release4181.9s210$11
Atlas·Ops2543.1s311$9
audit · append-only journal · tamper-evident
seqtimeagent · tenantstep / callverdictmodel / sourceidem
118414:02:14Cornelia·acmeresume:remediateapproved-by kca3f-04
118214:02:11Cornelia·acmecall:diagnose (infer)admittedclaude-sonneta3f-03
118114:02:09Cornelia·acmedata:metrics.query (data)redacted PIIprometheus·roa3f-03
118014:02:08Cornelia·acmecall:triage (infer)admittedclaude-haikua3f-02
117614:01:40Vega·acmestep:scale 50→500denied · budgete22-01
seq 1100–1184 · gapless ✓ (no tamper)⏎ open call detail
call detail · seq 1182 · call:diagnose · Cornelia · acme
prompt(captured) DiagnoseInput { errorRate: 0.092, p95: 340, deploy: v412 } …output(captured) Diagnosis { cause: "v412 regression", confidence: 0.86 }verdictadmitted · policy budget-per-run ok · gateway inference · claude-sonnetrunincident #4821 · span diagnose
re-issues same input/model/route · diffs vs captured output
coach · Cornelia · SRE · run incident #4820 · step triage
agent outputyour correction
severity sev3
"elevated latency, monitor"
severity sev2
"p99 5s on checkout is customer-facing → page"
eval set: Cornelia (37 cases) · 4 pending review
pending corrections
Cornelia · diagnose · over-confident cause attribution2h ago · review →
Rune · review · missed migration step5h ago · review →
policies · 6 active · admission control runs BEFORE every step
policyapplies toeffect
remediation-needs-approval*.remediate · destructiverequire-approval(on-call)
no-public-bucketsiac.applydeny
budget-per-run*cap-budget ≤ $2
pii-redactiongateway.inference + dataredact
prod-data-read-onlygateway.dataallow read · deny write
decision log · admission verdicts (the wedge in action)
deniedVega · scale 50→500 — over budget (budget-per-run)2m ago → run
approvalCornelia · rollout undo — destructive remediation4m ago → run
redactedCornelia · metrics.query — PII masked before egress4m ago → run
admittedRune · scale 6→10 — within budget6m ago → run
denials ▲ +9 vs 24h — review budget-per-run · simulate a policy edit against a past run before activating.
registry · agents · skills · harnesses · prod
namekindversionownerisolationlast eval
oncall · Corneliaagent1.4.0platformcontainer34/37 ✓
release · Runeagent0.9.2deployscontainer28/28 ✓
k8sskill2.1.0platform
runbooksskill1.0.6sre
on-callharness3.0.0agentistcontainer
oncall · Cornelia · versions
1.4.0live (prod) · eval 34/372d ago
1.3.1eval 31/37 · [ diff ] [ rollback → make live ]9d ago
1.3.0eval 30/37 · [ diff ] [ rollback ]21d ago
identity · agents
Cornelia · SRE on-call
personacalm, evidence-first incident responderchartermission: restore service safely · value: never act destructively without on-call approval (Cloud-managed)skillsk8s 2.1.0 · runbooks 1.0.6harnesson-call 3.0.0 · isolation containergatestriggers .approval("on-call") on *.remediate(destructive)
charter · Cornelia · source: Cloud-managed · v7
missionRestore service safely and quickly.values— Never take a destructive action without on-call approval.
— Prefer the smallest reversible change.
amendments are versioned & journaled
gateway & cost · prod · one governed boundary, both directions
inference gateway · keys: vault (in boundary) ✓
routemodelproviderfallbackcache
Cornelia · diagnoseclaude-sonnetBedrock→ haiku31% hit
Rune · reviewgpt-4oOpenAI→ sonnet12% hit
* (default)claude-haikuBedrock44% hit
budgets in-path · spend $42 / cap $80 · 2 runs cut off at budget today → Runs: budget
data gateway
sourceaccessredactionquery audit
prometheus·roreadPII mask142 queries → Audit
pg · acmeread · deny writePII mask37 queries → Audit
invocation · inbound (gateway feature)
exposedCornelia, Rune as MCP tools · governed by inference gateway + policies
consumedgithub-mcp, pagerduty-mcp · budgeted, audited as data egress
surfacesREST · SDK · MCP · webhook · event — every call policy-checked & idempotency-keyed
security & compliance
SOC 2every egress + invocation journaled to the tamper-evident audit log — evidence on tap
controlskey custody in-vault · mTLS service-to-service · PII redaction in-path · nothing leaves your boundary
access · roles & scopes · SSO: Okta · SCIM ✓
memberrolescopeapproval authority
kc@kc.ioadminorgall gates
sre-teamoperatoroncall · approvals.approval("on-call")
deploysoperatorrelease · policies.approval("release")
viewer@acmeviewerread-onlynone
every console action (approve, activate policy, rollback) is journaled → Audit
Console

Operate

Run the fleet live — watch executions, clear the approvals agents are waiting on, and talk to them directly. Every action is governed and lands on one trail.

Runs — list, then drill into the trace

A live list of every execution; open one to drop into its typed trace — spans, budgets, and the exact model and data calls.

runs · prod · last 24h incident #4821 Cornelia·SRE suspended(approval) 12m incident #4820 Cornelia·SRE done 1.4s checkout-watch Otto·SRE running47 runs · 3 suspended on human · 2 running · 1 failed · ⏎ open trace

Approvals

Every run suspended on a human, with just enough to decide in one keystroke — the proposed command, the rationale, and the diff.

incident #4821 · restore deploy v411 error rate 9.2% post-deploy · p95 +340ms → restores last-good v411 [ deny ] [ edit ] [ approve ] also live in Slack #sre-oncall

Chat

Talk to any agent, or pick a run up as a conversation. A gated step shows up inline — answer it and the run resumes.

Cornelia · SRE > p99 on checkout is climbing. I'd roll back deploy 4f2a. ⏸ approval needed · k8s.rollback(4f2a) you › go → resumed · recorded to audit
Console

Observe

Watch the fleet's health, shape, and spend. Every tile is a real operator question with a threshold and a drill target — not a vanity number.

Signals

What's stuck on a human, where admission is denying, who's near budget — each tile clicks through to the run or policy behind it.

stuck on humans 3 ▲ → Approvals admission denials 11 ▲ → Policies log budget pressure 2 → Runs ≥80% budget gateway egress ok

Fleet

The bird's-eye view — agents, how they call one another, and per-agent health and cost. Drill from the map into one agent's dashboard.

Cornelia·SRE 98.7% ok $42/24h Otto·SRE 99.1% ok $18/24h billing-bot 91.2% ok $7/24h ▲ error rate 12 agents · 2 flagged

Cost

Spend and token throughput per agent and per model, ranked — find the expensive ones before finance does.

by model claude-sonnet 1.2M tok $310 llama-70b (self) 4.4M tok $61 by agent Cornelia·SRE $42 Otto·SRE $18 billing-bot $7 budget $900 / $1,500 this month
Console

Govern

The trust surface — the rules that gate every step, the tamper-evident record of what happened, and who's allowed to do what.

Policies

Every admission policy, what it gates, and its live hit-rate — spot policy fighting the fleet or a misconfiguration at a glance.

budget-per-run gate: every step hits 4.4k denies 0 destructive-approval gate: k8s.* db.* hits 312 approvals 28 pii-redact gate: data egress redactions 1.1k

Audit

The append-only, tamper-evident journal — every step, gateway call, and verdict, gapless and replayable. Re-issue any captured call.

seq 1182 call:diagnose Cornelia·acme admitted seq 1183 egress:bedrock claude-sonnet 1,840 tok seq 1184 state:write incident #4821 seq 1100–1184 · gapless ✓ · [ replay exact call ]

Access

Who can do what — roles, scopes, and agent identities — with the SSO and SCIM behind them. Every change is itself audited.

kc@kc.io admin all oncall@acme operator approve · resume · chat Cornelia (agent) identity SRE · charter: acme SSO: Okta · SCIM provisioned
Console

Manage

Curate the fleet — the catalog of agents and versions, who each one is, and how they get better over time.

Registry

The catalog of every agent — its version, skills, and the policies bound to it. Deploy, pin, or roll back from here.

Cornelia · SRE v4.3.1 skills: k8s, prom, runbooks Otto · SRE v2.0.0 skills: k8s billing-bot v1.4.0 ▲ 1 version behind [ deploy ] [ pin ] [ rollback ]

Identity

Each agent's identity and charter — name, role, persona, and the mission and values it inherits. The agent is someone.

Cornelia · On-call SRE persona calm, terse, evidence-driven charter acme — keep systems reliable & trust intact duties triage · diagnose · remediate

Coach

Turn corrections into evals — review a run's reasoning, save the fix as a test case, or note it to the charter.

Cornelia · diagnose · over-confident cause attribution [ save as eval ] [ note to charter ] eval set: Cornelia (37 cases) · 4 pending review
Connectors

Pluggable connectors

Connectors plug the platform into the tools your teams already use — as triggers, actions, and approval channels. Install one, and your agents and your people reach it through a typed, governed interface.

Triggersstart workflows from GitHub, PagerDuty, Alertmanager, cron or webhooks
Actionspost to Slack, open a GitHub PR, page on-call, update ServiceNow
Approvals anywhereresolve an approval in Slack, Jira, GitHub, email or the Console — same audit trail
Claude & MCPexpose agents to Claude/Cursor over MCP; use Claude as a connector
Typed & governedevery connector call is schema-validated and policy-checked
Secrets handledcredentials held by the runtime, never by agents
Build your owna small interface — ship a connector in an afternoon
Marketplaceinstall vetted connectors; enterprise packs via Agentist Cloud
Approval requested
Slack
Jira
GitHub
Email
Console
Approved → resume
Data plane

Secure gateways to your data lakes

Agents are only as intelligent as the data they can reach. The data plane is a governed gateway to your lakes and warehouses — agents query real enterprise data through it, never with raw credentials, and every access is policy-checked and audited.

Agent
Secure gatewaypolicy · masking · audit
Snowflake
BigQuery
Databricks
S3 · Postgres
Data gatewaysSnowflake · BigQuery · Databricks · S3 · Postgres — reached through the gateway, never raw creds
Governed accessrow- & column-level policy, PII masking, per-agent scopes — enforced at the gateway
Grounded intelligencesemantic + SQL retrieval; agents reason over your real data
Every query auditedwho asked, what was read, and why — replayable
Sovereigndata never leaves your boundary; the gateway mediates every read
Connectorswarehouses, lakes, vector stores, and internal APIs
Caching & freshnessfast repeat reads with controlled staleness
Telemetry outOpenTelemetry & audit streamed back to your observability stack
Data plane

The event store

Durable state, the audit journal, and memory all live in one Postgres you run. The runtime owns the schema — you point it at a database.

typescript
runtime({ store: postgres(process.env.DATABASE_URL) })
// event-sourced: state, journal & vectors in one database
// SQLite for local dev · Redis for queues
Event-sourcedevery run's state is the ordered log of its events, not a mutable row to corrupt
The audit journalappend-only, tamper-evident, gapless — the same store is your compliance evidence
One databasestate, journal & vector memory in the Postgres you already operate — no extra service
Migrations ownedthe runtime manages its schema and migrations; you just bring the connection
PluggablePostgres in prod; SQLite for local dev; Redis for queues
Backups & PITRit's your database — your backups, your point-in-time recovery
Data plane

Memory

Agents remember — durably, and scoped to whoever they're acting for. More than a chat buffer: working memory, semantic recall, and pinned facts, all policy-aware.

typescript
await ctx.memory.put("user.tier", "enterprise")   // working memory
const tier = await ctx.memory.get("user.tier")
const hits = await ctx.memory.search("past incidents", { topK: 5 })
Working memorystructured, persistent facts about the task or user, carried across turns
Semantic recallretrieve past context by meaning, not just recency, over pgvector
Scoped per tenantmemory partitioned by isolation key; recall never crosses a boundary
Policy-awareretrieval passes through admission control — redaction and access rules apply to what returns
TTL & pinningexpire ephemeral context, pin durable facts — control what an agent remembers
Typed get · put · searcha small, typed API — no bespoke memory plumbing per agent
Data plane

Retrieval & RAG

A complete retrieval pipeline — chunk, embed, search, rerank — so agents reason over your documents and data with precision, not a similarity guess.

typescript
const kb = retrieval({
  chunk:  { strategy: "recursive", size: 800, overlap: 100 },
  embed:  "text-embedding-3-large",
  store:  vectors.pgvector(),
  rerank: "rerank-v3",                  // precision pass before the model
})
await kb.add(docs)
const hits = await kb.query("why did checkout fail?", { topK: 8 })
Chunkingrecursive, sliding-window & structure-aware strategies with overlap control
Embeddingsany embedding model through the gateway; batch-embed at ingest
Hybrid searchkeyword + vector, ranked together — exact and semantic in one query
Rerankinga reranker reorders candidates for precision before they reach the model
Graph retrievaltraverse links between chunks to surface connected context, not just nearest neighbors
Grounded & citedanswers carry their sources; every retrieval is policy-checked and audited
Data plane

Vectors

Vector storage is built in on pgvector — no extra database to run — and pluggable to an external store when you want one.

typescript
const v = vectors.pgvector()           // or pinecone() · qdrant()
await v.query(embedding, {
  topK: 10,
  filter: { tenant: "acme", kind: "runbook" },   // metadata filter
})
pgvector by defaultvectors live in the same Postgres as state & the journal — one less thing to operate
External storesbind Pinecone, Qdrant, Weaviate, or Milvus when you need them
Metadata filteringcombine vector similarity with structured filters in one query
Index tuningHNSW / IVF parameters exposed — tune recall against latency
Tenant-scopedvector namespaces partitioned by isolation key
Governedevery vector read flows through admission control like any other data access
Data plane

Object storage

Large artifacts — files an agent reads or writes, model weights, run outputs — live in object storage, bound to your cloud's bucket.

typescript
const bucket = storage(s3("artifacts"))   // or gcs · azureBlob · minio
await bucket.put(`runs/${runId}/report.pdf`, bytes)
const url = await bucket.signedUrl(key, { expires: "1h" })
Artifactsinputs and outputs of a run persisted and linked straight from its trace
Bind your bucketS3, GCS, Azure Blob, or in-cluster MinIO behind one interface
Volumesshared, mountable volumes for model weights & caches — see Models
Lifecycleretention and expiry policies on ephemeral artifacts
Scoped accessagents reach storage through the gateway, scoped to their tenant
Large-file safestream big objects without loading them into memory
Examples

The whole use case, in one file

Each example is a single typed file — the entire implementation, infrastructure and all: charter, agent, tools, workflow, governance, the engine, and the stack that deploys it to your cloud. Not snippets — the real thing, top to bottom. Five builds, each leaning on a different part of the platform. Each ends with how it's invoked — a trigger, a typed call, or a prompt; one input contract, every caller. (More to come, plus a dedicated examples repo.)

Examples

Incident

Otto, the on-call SRE agent: a paging alert arrives, he triages it, delegates parallel investigators across app, db & network, diagnoses the root cause, and proposes a fix — but anything destructive waits for a human. Shows: charter · skills · governed tools · parallel delegation · admission control · approvals · memory · engine · infrastructure — one file.

alert
triage
delegate
diagnose
approval
remediate
typescript
// src/incident.ts — Otto, the on-call SRE: triage → investigate → fix, one file.
import { agent, duty, tool, workflow, policy, charter, harness, runtime, postgres } from "@agentist/sdk"
import { agentist } from "@agentist/engine"
import { stack, gateway, vectors } from "@agentist/infra"
import { bedrock } from "@agentist/infra/aws"
import { z } from "zod"

// ── Charter — the culture every agent inherits ───────────────────────────
const acme = charter({
  mission: "Keep systems reliable and customer trust intact.",
  values: ["Bias to safety", "Explain your reasoning", "Escalate when unsure"],
})

// ── Schemas ───────────────────────────────────────────────────────────────
const Alert = z.object({ service: z.string(), symptom: z.string(), severity: z.enum(["sev1","sev2","sev3"]) })

// ── Tools — server-side & validated; the agent never sees a credential ──
const queryMetrics = tool({
  input:  z.object({ service: z.string(), window: z.string().default("30m") }),
  output: z.object({ p99: z.number(), errorRate: z.number() }),
  run: (a, ctx) => ctx.tool("prometheus.query", a),          // via the egress gateway
})
const rollback = tool({
  input: z.object({ deploy: z.string() }),
  meta:  { destructive: true },                              // surfaced to admission control
  run: (a, ctx) => ctx.tool("k8s.rollback", a),
})

// ── Investigator — one cheap agent, fanned out across subsystems ─────────
const investigate = agent({
  id: "investigate",
  identity: { name: "Iris", role: "Investigator", persona: "Methodical, fast." },
  charter: acme,
  tools: [queryMetrics],
  duties: {
    scan: duty({
      input:  z.object({ alert: Alert, area: z.enum(["app","db","network"]) }),
      output: z.object({ area: z.string(), finding: z.string() }),
      run: (a, ctx) => ctx.llm({ prompt: `Inspect ${a.area} for ${a.alert.symptom}` }),
    }),
  },
})

// ── Otto — the on-call SRE, extends a base harness ───────────────────────
export const oncall = agent({
  extends: harness("on-call"),           // reusable SRE preset — extend, don't rebuild
  id: "oncall",
  identity: { name: "Otto", role: "On-call SRE", persona: "Calm, terse, evidence-driven." },
  charter: acme,
  tools: [queryMetrics, rollback],
  duties: {
    triage: duty({
      input:  Alert,
      output: z.object({ summary: z.string() }),
      run: async (a, ctx) => {
        await ctx.context.recall("past incidents on " + a.service, { topK: 5 })   // remember
        return ctx.llm({ prompt: `Triage: ${a.symptom}` })
      },
    }),
    diagnose: duty({
      input:  z.object({ findings: z.array(z.object({ area: z.string(), finding: z.string() })) }),
      output: z.object({ rootCause: z.string(), destructive: z.boolean() }),
      run: (a, ctx) => ctx.llm({ prompt: `Root cause from: ${JSON.stringify(a.findings)}` }),
    }),
    remediate: duty({
      input: z.object({ rootCause: z.string() }),
      run: (a, ctx) => ctx.tool("k8s.rollback", { deploy: "latest" }),
    }),
  },
})

// ── Policy — admission control: destructive fixes wait for a human ───────
const remediationGate = policy({
  id: "remediation-needs-approval",
  applies: { duty: "oncall.remediate" },
  check: ({ state }) => state.diagnose.destructive
    ? { effect: "approval", reason: "Destructive remediation needs approval" }
    : { effect: "allow" },
})

// ── Workflow — the typed path an incident takes ──────────────────────────
export const incident = workflow("incident")
  .input(Alert)
  .step("triage", oncall.duties.triage)
  .delegate("findings", (alert) =>
    ["app","db","network"].map((area) => investigate.duties.scan.with({ alert, area })),
  )                                      // parallel sub-agents — durable & governed
  .step("diagnose", oncall.duties.diagnose, (s) => ({ findings: s.findings }))
  .approval("on-call")                   // only fires when remediationGate demands it
  .step("remediate", oncall.duties.remediate)
  .commit()

// ── Infrastructure — provision the platform into your own cloud (IaC) ─────
const infra = stack({
  target:  "k8s",
  store:   postgres({ instance: "db.r6g.large" }),
  models:  [bedrock("claude-sonnet-4")],
  gateway: gateway({ egress: "deny", redactPII: true }),
  vectors: vectors.pgvector(),
})

// ── Runtime — engine + governance, wired to the infra above ──────────────
const rt = runtime({
  engine:   agentist(),                  // or adk() · langgraph()
  store:    infra.store,
  llm:      infra.models.claude,
  policies: [remediationGate],
  memory:   infra.vectors,
})
rt.register(oncall, investigate, incident)
rt.on("deploy.failed", oncall.duties.triage)   // page Otto on a failed rollout
rt.serve()                               // API · MCP · the Console
//   agentist deploy --to your-cluster      # provisions infra + rolls out the runtime

Invoke it. Start the local runtime and run the workflow from the CLI — it stops for a human before anything destructive:

$ agentist dev # local runtime + console on :3000 $ agentist run incident --json '{"service":"checkout","symptom":"p99 5s"}' triage "checkout p99 5s — likely the latest deploy" delegate app · db · network 3 findings diagnose root cause: v411 regressed the cache path remediate ⏸ paused — destructive, needs approval $ agentist approve inc_9f2a --by kc@kc.io remediate ✓ rolled back v411 ✔ resolved journaled locally · replay in the console
Examples

Analyst

Dana turns a plain-English question into read-only SQL, runs it through the data gateway, and explains the result — but a critic checks the query first, and a policy makes writes impossible by construction. Shows: the data plane · governed SQL · a critic gate · read-only admission control — one file.

question
plan
critic
run
explain
typescript
// src/analyst.ts — Dana, the data analyst: ask in English, get a governed answer.
import { agent, duty, tool, workflow, policy, charter, runtime, postgres } from "@agentist/sdk"
import { agentist } from "@agentist/engine"
import { stack, gateway } from "@agentist/infra"
import { bedrock } from "@agentist/infra/aws"
import { z } from "zod"

const acme = charter({
  mission: "Keep systems reliable and customer trust intact.",
  values: ["Show your work", "Cite the query", "Never guess at a number"],
})

// ── Tool — read-only SQL, executed server-side through the data gateway ──
const runSql = tool({
  input:  z.object({ sql: z.string() }),
  output: z.array(z.record(z.any())),
  meta:   { readOnly: true },                  // surfaced to admission control
  run: (a, ctx) => ctx.db.query(a.sql),        // PII redacted at the gateway
})

// ── Agent — turns a question into SQL, then reads the result back ────────
export const dana = agent({
  id: "analyst",
  identity: { name: "Dana", role: "Data analyst", persona: "Precise; cites the query." },
  charter: acme,
  tools: [runSql],
  duties: {
    plan: duty({
      input:  z.object({ question: z.string() }),
      output: z.object({ sql: z.string() }),
      run: (a, ctx) => ctx.llm({ prompt: `Write read-only SQL for: ${a.question}` }),
    }),
    explain: duty({
      input:  z.object({ rows: z.array(z.record(z.any())) }),
      output: z.object({ answer: z.string() }),
      run: (a, ctx) => ctx.llm({ prompt: `Summarize these rows: ${JSON.stringify(a.rows)}` }),
    }),
  },
})

// ── Policy — the analyst is read-only by construction; writes can't happen ─
const readOnly = policy({
  id: "analyst-read-only",
  applies: { tool: "*" },
  check: ({ tool }) => tool.meta.readOnly
    ? { effect: "allow" }
    : { effect: "deny", reason: "Analyst is read-only" },
})

// ── Workflow — plan → critic vets the SQL → run → explain ────────────────
export const ask = workflow("ask")
  .input(z.object({ question: z.string() }))
  .step("plan", dana.duties.plan)
  .critic("sql-safe", (s) => `Is this SQL read-only and correct? ${s.plan.sql}`)
  .step("run", runSql, (s) => ({ sql: s.plan.sql }))
  .step("explain", dana.duties.explain, (s) => ({ rows: s.run }))
  .commit()

const infra = stack({
  target:  "k8s",
  store:   postgres({ instance: "db.r6g.large" }),
  models:  [bedrock("claude-sonnet-4")],
  gateway: gateway({ egress: "deny", redactPII: true }),
})

const rt = runtime({
  engine:   agentist(),
  store:    infra.store,
  llm:      infra.models.claude,
  policies: [readOnly],
})
rt.register(dana, ask)
rt.serve()                               // ask it in the Console, or over MCP

Invoke it. Run it locally — the question is one field, so it's positional on the CLI:

$ agentist dev $ agentist run ask "error rate on checkout last week" plan SELECT day, errors::float/total AS rate FROM events ... critic ✓ read-only · touches no PII columns · approved run 7 rows explain "Averaged 0.4%, peaking 1.9% on Tuesday's deploy." ✔ done the answer cites the exact query it ran
Examples

Review

Rex reviews a pull request — but first it clones the branch and runs the test suite inside a throwaway microVM sandbox, so untrusted code never touches your runtime. A GitHub webhook starts the run; the review posts back through an MCP tool. Shows: sandbox isolation · webhook triggers · MCP tools — one file.

PR
tests
review
comment
typescript
// src/review.ts — Rex, the code reviewer: tests run in a sandbox before review.
import { agent, duty, tool, workflow, charter, runtime, postgres } from "@agentist/sdk"
import { agentist } from "@agentist/engine"
import { stack, gateway, sandbox } from "@agentist/infra"
import { bedrock } from "@agentist/infra/aws"
import { z } from "zod"

const acme = charter({
  mission: "Keep systems reliable and customer trust intact.",
  values: ["Evidence over opinion", "Be specific", "Be kind"],
})

const PR = z.object({ repo: z.string(), number: z.number(), sha: z.string() })

// ── Tool — run the PR's tests in a fresh microVM; nothing lingers ────────
const runTests = tool({
  input:  PR,
  output: z.object({ pr: PR, passed: z.boolean(), log: z.string() }),
  run: async (a, ctx) => {
    const box = await sandbox.create({ image: "node:20", isolation: "microvm" })
    await box.exec(`git clone ${a.repo} app && cd app && git checkout ${a.sha}`)
    const { exitCode, stdout } = await box.exec("cd app && npm ci && npm test")
    await box.terminate()                      // ephemeral — auto-cleaned regardless
    return { pr: a, passed: exitCode === 0, log: stdout }
  },
})

// ── Agent — reviews the diff, and posts the verdict back to GitHub ───────
export const rex = agent({
  id: "reviewer",
  identity: { name: "Rex", role: "Code reviewer", persona: "Terse, specific, kind." },
  charter: acme,
  tools: [runTests],
  duties: {
    review: duty({
      input:  z.object({ pr: PR, tests: z.object({ passed: z.boolean(), log: z.string() }) }),
      output: z.object({ verdict: z.enum(["approve", "request-changes"]), notes: z.string() }),
      run: async (a, ctx) => {
        const r = await ctx.llm({ prompt: `Review PR #${a.pr.number}; tests: ${a.tests.log}` })
        await ctx.tool("github.comment", { ...a.pr, body: r.notes })   // post via MCP tool
        return r
      },
    }),
  },
})

// ── Workflow — test in isolation, then review ────────────────────────────
export const reviewPR = workflow("review-pr")
  .input(PR)
  .step("tests", runTests)
  .step("review", rex.duties.review, (s) => ({ pr: s.tests.pr, tests: s.tests }))
  .commit()

const infra = stack({
  target:  "k8s",
  store:   postgres({ instance: "db.r6g.large" }),
  models:  [bedrock("claude-sonnet-4")],
  gateway: gateway({ egress: "deny" }),        // the sandbox reaches the net only through here
})

const rt = runtime({ engine: agentist(), store: infra.store, llm: infra.models.claude })
rt.mcp({ consume: ["github"] })                // GitHub's MCP server → agent tools
rt.register(rex, reviewPR)
rt.on("github.pull_request.opened", reviewPR)  // webhook → typed, governed run
rt.serve()

Invoke it. Test it locally on a PR — the tests run first, inside a sandbox:

$ agentist dev $ agentist run review-pr --json '{"repo":"acme/api","number":1421,"sha":"4f2a9c"}' tests microVM sandbox · npm test ✓ 42/42 passed review request-changes · "missing null-check · handler.ts:88" ✔ done no untrusted code touched the runtime · verdict in the console
Examples

Knowledge

Quinn answers questions from your docs with citations — and refuses to answer beyond its sources. It indexes your docs into pgvector, retrieves with hybrid search and reranking, and is always-on: talk to it in the Console, Slack, or over MCP. Shows: retrieval & vectors · grounded answers · a conversational, addressable agent — one file.

index
ask
retrieve
answer
typescript
// src/docs.ts — Quinn, the knowledge assistant: grounded answers, with citations.
import { agent, duty, charter, runtime, postgres } from "@agentist/sdk"
import { agentist } from "@agentist/engine"
import { stack, gateway, vectors, retrieval } from "@agentist/infra"
import { bedrock } from "@agentist/infra/aws"
import { z } from "zod"

const acme = charter({
  mission: "Keep systems reliable and customer trust intact.",
  values: ["Cite your sources", "Say 'I don't know'", "Never invent an answer"],
})

// ── Retrieval — chunk, embed & index your docs into pgvector ────────────
const kb = retrieval({
  chunk:  { strategy: "recursive", size: 800, overlap: 100 },
  embed:  "text-embedding-3-large",
  store:  vectors.pgvector(),
  rerank: "rerank-v3",                         // precision pass before the model
})

// ── Agent — grounded Q&A; answers only from what it retrieved ───────────
export const quinn = agent({
  id: "docs",
  identity: { name: "Quinn", role: "Knowledge assistant", persona: "Helpful, grounded, honest." },
  charter: acme,
  duties: {
    answer: duty({
      input:  z.object({ question: z.string() }),
      output: z.object({ answer: z.string(), sources: z.array(z.string()) }),
      run: async (a, ctx) => {
        const hits = await kb.query(a.question, { topK: 8 })   // hybrid search + rerank
        return ctx.llm({ prompt: a.question, ground: hits })   // answer only from sources
      },
    }),
  },
})

const infra = stack({
  target:  "k8s",
  store:   postgres({ instance: "db.r6g.large" }),
  models:  [bedrock("claude-sonnet-4")],
  gateway: gateway({ egress: "deny" }),
  vectors: vectors.pgvector(),
})

const rt = runtime({
  engine: agentist(),
  store:  infra.store,
  llm:    infra.models.claude,
  memory: infra.vectors,
})
rt.register(quinn)
await kb.add("./docs/**/*.md")           // ingest once; re-runs are incremental

const q = rt.agent("docs")               // every agent is addressable & always-on
await q.ask("how do I rotate the gateway's signing key?")
rt.serve()                               // Console chat · Slack · MCP

Invoke it. Index your docs, then ask from the CLI — answers cite their sources:

$ agentist dev $ agentist run docs.answer "how do I rotate the gateway's signing key?" retrieve hybrid search + rerank 3 chunks answer "Run `agentist secrets rotate gateway-key` — reloads with no downtime." sources ops/gateway.md#rotation · runbooks/secrets.md ✔ grounded says "I don't know" when the docs don't cover it
Examples

Audit

Cass runs every night: it pulls your service catalog, audits all of them in parallel with delegate, and a critic throws out the noisy flags before it posts a waste report. Shows: cron triggers · parallel delegation · a critic for quality · a cheap model for batch — one file.

cron
catalog
delegate
critic
report
typescript
// src/audit.ts — Cass, the cost auditor: every night, flag over-provisioned services.
import { agent, duty, tool, workflow, charter, runtime, postgres } from "@agentist/sdk"
import { agentist } from "@agentist/engine"
import { stack, gateway } from "@agentist/infra"
import { bedrock } from "@agentist/infra/aws"
import { z } from "zod"

const acme = charter({
  mission: "Keep systems reliable and customer trust intact.",
  values: ["Bias to safety", "Show the numbers", "No false alarms"],
})

const Service = z.object({ name: z.string(), monthlySpend: z.number() })

// ── Tools — pull the catalog & post the report (external, via connectors) ─
const listServices = tool({
  input:  z.object({}),
  output: z.object({ services: z.array(Service) }),
  run: (a, ctx) => ctx.tool("billing.services", {}),
})
const postReport = tool({
  input: z.object({ findings: z.array(z.any()) }),
  run: (a, ctx) => ctx.tool("slack.post", { channel: "#finops", ...a }),
})

// ── Agent — judges one service against its utilization ───────────────────
export const cass = agent({
  id: "auditor",
  identity: { name: "Cass", role: "Cost auditor", persona: "Skeptical, exact." },
  charter: acme,
  duties: {
    audit: duty({
      input:  Service,
      output: z.object({ service: z.string(), wasteUsd: z.number(), reason: z.string() }),
      run: async (a, ctx) => {
        const util = await ctx.tool("prometheus.query", { service: a.name })
        return ctx.llm({ prompt: `Is ${a.name} over-provisioned? util=${JSON.stringify(util)}` })
      },
    }),
  },
})

// ── Workflow — catalog → audit all in parallel → critic → report ─────────
export const nightly = workflow("nightly-audit")
  .input(z.object({}))
  .step("catalog", listServices)
  .delegate("findings", (s) =>                 // every service, in parallel — durable & governed
    s.catalog.services.map((svc) => cass.duties.audit.with(svc)))
  .critic("no-false-alarms", (s) =>            // drop flags the evidence doesn't support
    `Are these waste flags justified? ${JSON.stringify(s.findings)}`)
  .step("report", postReport, (s) => ({ findings: s.findings }))
  .commit()

const infra = stack({
  target:  "k8s",
  store:   postgres({ instance: "db.r6g.large" }),
  models:  [bedrock("claude-haiku-4")],        // cheap model — it's a big nightly batch
  gateway: gateway({ egress: "deny" }),
})

const rt = runtime({ engine: agentist(), store: infra.store, llm: infra.models.claude })
rt.register(cass, nightly)
rt.cron("0 6 * * *", nightly)            // 6am daily — timezone-aware, with a missed-run policy
rt.serve()

Invoke it. Run it on demand from the CLI to test it (in production a cron fires it nightly):

$ agentist dev $ agentist run nightly-audit catalog 12 services delegate audited 12 in parallel claude-haiku-4 critic dropped 2 weak flags 4 confirmed ✔ done every step admission-checked & journaled · results in the console
Reference

CLI

One binary for the whole lifecycle — author, run, deploy, operate.

bash
agentist init <name>                    # scaffold a project
agentist dev                           # local runtime + console :3000
agentist run <agent.duty> "<prompt>"   # positional prompt (default); --json for typed input
agentist deploy --target k8s|docker    # deploy runtime + manifest
agentist cloud connect                 # plug in Agentist Cloud
agentist logs <run>                     # tail a run
agentist rollback <agent> <version>     # roll back a version
agentist secrets set <key>             # manage secrets
Reference · Comparison

How Agentist compares

Today you'd stitch together a handful of tools — Mastra or Google's ADK for typed agents, Trinity or Kagent to run them in your own cloud, Modal for GPU infrastructure. Agentist is the only one that brings them together — everything a platform engineer needs in a single framework.

Marks: native · partial · none · not applicable.

Typed & code-owned
CapabilityAgentistMastraTrinityModalKagentADK
Zod-typed boundaries everywhere
Code-owned, not UI or YAML
Typed, deterministic workflows
A small primitive set
Pluggable models, any provider
Critics that don't self-grade
Durable, governed & sovereign
CapabilityAgentistMastraTrinityModalKagentADK
Durable, event-sourced execution
Runs in your cloud (sovereign)
Per-agent isolation tiers
Append-only, tamper-evident audit
Human approvals on one trail
Workers · leases · retries
Always-on agents you talk to
Infrastructure & deployment
CapabilityAgentistMastraTrinityModalKagentADK
Infra-as-code (compute & images)
Model endpoints on GPUs
Granular compute & autoscaling
Sandboxes for untrusted code
Scale-to-zero compute
Provisions its own components
Deploys into your cloud
Only Agentist
CapabilityAgentistMastraTrinityModalKagentADK
Admission control, per step
Typed and durable together
Durable, no separate cluster
One log = audit = replay = SOC 2
Conversation = approval = audit
One typed invocation contract
Charter inherited by every agent
FAQ

Questions, answered

The things engineers ask first — answered plainly. For anything else, reach out.

When can I try it?
We're still building it — Agentist isn't generally available yet. Send us an email and we'll let you know the moment you can get in.
How is this different from LangGraph, CrewAI, or a prompt framework?
Those give you the reasoning loop. Agentist is the harness around it: admission control before every step, a durable event-sourced runtime, governed gateways, isolation, and an operator console — and it runs your existing engine. See the comparison.
Can my AI coding assistant build Agentist agents, or just humans?
Both — by design. Agents are typed code, and the docs are written to be read by machines as much as people, so you can point Claude, Cursor, or any coding agent at them and have it author duties, tools, policies, and the infrastructure. Because everything is typed and the runtime admission-checks every step, what the assistant writes is verified and governed — not taken on faith.
Do I have to rewrite my agents to adopt Agentist?
No. The engine is pluggable — bring an existing agent on ADK or LangGraph through an adapter and Agentist governs, persists, and operates it unchanged, or author natively with the SDK. You govern at the boundary, not by rewriting the loop.
What language do I write agents in?
TypeScript, typed end to end — agents, duties, tools, policies, workflows, and even the infrastructure are one typed source. The SDK compiles it to a manifest the engine runs and the runtime governs.
Which models can I use — and can I run my own?
Any. Every model call crosses the gateway, so you can run Bedrock, Anthropic, or OpenAI, or your own open models on GPUs (vLLM) — and switch providers without touching agent code.
Does my data ever leave my cloud?
Not unless you send it. The whole platform is composed as TypeScript infrastructure-as-code and deploys into your own cloud — models, data, and the journal all stay with you. Agentist Cloud is an optional managed control plane, never a required hop.
What does "admission control" actually do?
A policy runs before every step an agent proposes — a tool call, a model call, a data read. It can allow it, deny it, or pause for human approval, and every decision is journaled. The agent proposes; the runtime commits.
What happens if an agent crashes mid-run?
Nothing is lost. Runs are event-sourced and journaled, so a workflow resumes exactly where it stopped — including across an approval that waits for days. That same journal is your audit trail and replay log.
Can I use my existing tools and MCP servers?
Yes. Tools are typed functions with validated I/O. External MCP servers are consumed as agent tools, and every agent is exposed as an MCP tool in return — so Claude, Cursor, or another agent can call it.
How does local development work?
No cloud account needed: agentist dev runs the full runtime and console on your machine, and agentist run <agent.duty> "<prompt>" invokes a duty straight from the CLI. See the examples.
Is it open-source, and what does it cost?
The SDK and runtime are open-source and self-hostable in your own cloud. Agentist Cloud — the managed control plane, connectors, and burst capacity — is the optional paid layer.
Early access

Be first to build on Agentist

We're onboarding a small group of design partners now —
reach out and we'll be in touch.

Build with us