Agentist Docs — Harness Engineering for AI Agents

Architecture

Everything an agent needs to run

You author agents with the SDK; an Engine runs their reasoning — pluggable, ours or ADK / LangGraph; the Runtime is the harness that governs and persists every step; the Console is where you operate them; the Data plane gives governed access to your lakes; and the AI Cloud is the infrastructure it all runs in — your cloud, sovereign.

☁ Your AI Cloudyour infrastructure · multi-cloud · sovereign

SDKauthor · typed code

→

Runtimegovern · durable · hosts the engine

→

Consoleoperate · observe · govern

↑ proposes · commits ↓

Enginereason · pluggable: Agentist · ADK · LangGraph

↑

Data planesecure gateways → your data lakes

↕ Optionally connect to Agentist Cloud for a managed control plane, connectors & burst.

The runtime doesn't think — it governs, persists, and operates whatever does. Bring your own engine or use ours; either way every model, data, and tool call crosses the runtime's gateways (admission, budgets, audit), runs durably, and lands in one console — a complete harness around the model, composed and deployed as TypeScript infrastructure-as-code.

Harness engineering

You're building a harness — do it right

Putting an LLM in production isn't prompt-writing — it's harness engineering: building the structure, governance, memory, and observability around the model that make it safe. That's the platform engineer's job, and Agentist is how you do it in code you own — not a pile of prompt hacks that drift.

Structuretyped workflows, not an open-ended loop

Governancepolicy gates & human approvals, in the path

Memorydurable state & scoped recall

Gatewaysgoverned access to models & data

Observabilityevery call traced, audited, replayable

Connectorsthe outside world, typed & permissioned

Don't start from scratch: extend a base harness — on-call SRE, support, data analyst — with your own identity, charter, skills, and policies. Harness engineering by composition.

The model reasons; the harness keeps it bounded, observable, and accountable — the line between a demo and production, and between writing prompts and engineering a system.

Get started

Quickstart

From zero to a triaged alert in about ten minutes.

1 · Create a project

bash

npx agentist init oncall
cd oncall

2 · Run it locally

agentist dev starts a local runtime (in-memory) and a console at localhost:3000.

bash

agentist dev
agentist run oncall.triage "checkout p99 latency is 5s"   # positional prompt — no flag

json

{ "severity": "sev2" }

One typed contract, every caller

Every agent has a typed input schema. The runtime validates every invocation against it — from code, an event, or a person — so nobody hand-writes JSON.

API & SDKpost the typed shape

Webhooks & eventsstructured payloads, mapped

Console formauto-generated from the schema

The Console renders a form straight from the schema, so an operator can trigger a run without touching code.

SDK · @agentist/sdk

The authoring layer

Define agents as typed TypeScript. The SDK compiles your code into a manifest the engine executes and the runtime governs — three distinct layers, one typed source.

Typed primitivesagent · duty · skill · tool · workflow · policy · charter — held in your head

Schemas everywhereZod on every boundary; outputs validated, auto re-prompt on mismatch

Identity & cultureagents have a persona and inherit your mission & values

Skills & base harnessesreusable bundles of tools & duties; extend a preset (SRE, support, data)

Generator ≠ evaluatorcritics are separate agents — no self-grading

Versionedsemver-pinned agents; manifests are rollback-ready

Each gets its own section below — Agents, Tooling, Workflows, Governance, Approvals, Conversations.

SDK

Agents have an identity — and a culture

An agent isn't a pile of rules. It has an identity — who it is — and inherits your company's charter: the mission and values every agent shares. That's what gives it judgment and character, not rigidity.

typescript

// your company's charter — defined here, or pulled from Agentist Cloud
const acme = charter({
  mission: "Keep systems reliable and customer trust intact.",
  values: ["Bias to safety", "Explain your reasoning", "Escalate when unsure"],
})

const oncall = agent({
  extends: harness("on-call"),            // a base harness — extend, don't rebuild
  id: "oncall",
  identity: { name: "Otto", role: "On-call SRE", persona: "Calm, terse, evidence-driven." },
  charter: acme,                          // shared mission & culture
  skills: [k8s, prometheus, runbooks],    // reusable capabilities
  duties: { triage, remediate },
})

Identity — name, role, persona. The agent is someone, not a tone.
Charter — your mission and values, inherited by every agent. Define it in code, or manage it centrally in Agentist Cloud and pull it in — one mission across every team.
Duties — each is one typed call. Culture gives judgment; policies give hard guardrails. Character and control.
Skills — reusable capabilities (tools + duties) you compose in. Extend a base harness (SRE, support, data) instead of starting from zero.

SDK

Tooling

Typed functions an agent may call. The runtime runs them — the agent never touches a path or a raw query.

typescript

const queryMetrics = tool({
  id: "queryMetrics",
  input:  z.object({ service: z.string(), window: z.string() }),
  output: z.object({ p99Ms: z.number(), errorRate: z.number() }),
  run: ({ service, window }) => prometheus.query(service, window),
})

// inside a duty:
const m = await ctx.tools.queryMetrics({ service: a.service, window: "15m" })

SDK

Workflows

Compose duties into a typed graph. The control flow is explicit — not hidden inside the model.

typescript

export const incident = workflow("incident")
  .input(Alert)
  .step("triage", oncall.duties.triage)
  .delegate("findings", (alert) =>
    ["app","db","network"].map((area) => investigate.duties.scan.with({ alert, area })),
  )                                   // parallel sub-agents — durable & governed
  .step("diagnose", oncall.duties.diagnose, (s) => ({ findings: s.findings }))
  .approval("on-call")
  .step("remediate", oncall.duties.remediate)
  .commit()

.step · .branch on a typed field · .delegate to sub-agents in parallel · .approval for a human · .critic to evaluate.
.delegate runs sub-agents concurrently — each a durable, admission-checked step, gathered under a shared budget and one trace.
.commit() freezes the graph into the manifest the engine runs and the runtime governs.

SDK

Governance

Governance is enforced with policies — hard gates the runtime checks before every step. Agents propose, the runtime commits, and every decision is recorded.

typescript

const remediationGate = policy({
  id: "remediation-needs-approval",
  applies: { duty: "oncall.remediate" },
  check: ({ state }) =>
    state.remediate.destructive
      ? { effect: "approval", reason: "Destructive remediation needs approval" }
      : { effect: "allow" },
})

runtime({ policies: [remediationGate] })   // applies to every agent — no one can forget it

SDK

Approvals

An approval pauses a run until a human signs off. The workflow suspends durably, waits in the Console, and resumes exactly where it paused — even days later.

typescript

workflow("remediate")
  .step("plan", oncall.duties.plan)
  .approval("on-call")                     // pauses → waits in the approvals inbox
  .step("apply", oncall.duties.apply)
  .commit()

await rt.resume(runId, "on-call", { by: "kc@kc.io", decision: "approve" })

Approvals don't have to live in the Console — route one to Slack, Jira, GitHub, or email through connectors, wherever your on-call already is. Same audit trail, anywhere.

SDK

Agents you can talk to

Your agents are always on and addressable. You talk to them; they talk back; and when something needs doing, they can start the conversation. But this isn't a chatbot bolted on — the conversation runs on the same governed, durable, audited substrate as everything else. The chat is the control plane.

typescript

const otto = rt.agent("oncall")   // every agent is addressable, always on

// talk to it — its typed duties run underneath the conversation
await otto.say("why is checkout slow right now?")

// it can open the thread itself — proactive, on its own triggers
await otto.notify("on-call", "p99 is climbing on checkout — I'd roll back 4f2a.")

// a destructive step surfaces as an approval *in the same thread*,
// admission-checked like any other step — reply to resume the run

Because a run is already an event-sourced thread and an agent already has an identity, "talk to your agent" and "continue this run" are the same thing. The approval, the chat, and the audit entry are one journaled trail — not three separate surfaces.

Always onevery agent is addressable the moment it's registered — and any past or running execution resumes as a conversation

Proactiveagents open the thread when something needs you — on a schedule, an event, or a budget breach

Governed conversationevery turn passes admission control — an agent chats freely but cannot act destructively without policy clearing it

Conversation = approval = audita gated step appears inline; reply to resume — and the chat, the approval, and the audit are one journaled trail

It's someoneconsistent identity, persona & charter — an agent with character, anywhere it runs (Console, Slack, your app)

You ↔ Ottoalways-on, addressable

→

Destructive stepadmission check

→

Approve in threadreply "go"

→

Resume · auditedone journaled trail

Engine · @agentist/engine

The reasoning engine — pluggable

The engine is the only layer that thinks: it runs the reason → act → observe loop and multi-agent orchestration. Everything around it — authoring, governance, durability, operations — is the runtime. The engine plugs in behind one contract — it proposes a step; the runtime commits it — so whether you run the Agentist engine or bring ADK, every reasoning step, tool call, and delegation is governed the same way.

reasonplan next step

→

admitpolicy · budget · types

→

acttool · model · delegate

→

observejournal result

↻

Engine

Reasoning steps

Each turn of the loop is typed and governed: the engine proposes a step, the runtime admits it, the result is journaled. The whole reasoning trace is replayable — you see exactly what the agent thought and did.

reason p99 on checkout is 5s — check the last deploy act k8s.deploys.recent() ✓ admitted observe { deploy: v412, at: 14:02 } reason v412 correlates — propose a rollback to v411 act remediate(v411) ⏸ approval · on-call

Typed steps — each step has typed inputs and outputs, not free text.
Governed per step — admitted before it runs; budget enforced as it goes.
Replayable — the full trace is journaled; re-issue any captured step.

Engine

Delegate to sub-agents

The engine hands work to sub-agents in parallel and gathers typed results. Each branch is a durable, admission-checked step under a shared budget — in one trace.

typescript

const [diagnosis, signals, logs] = await ctx.delegate([
  k8s.duties.diagnose(a),
  metrics.duties.analyze(a),
  logs.duties.scan(a),
])   // parallel sub-agents — durable, governed, gathered

Engine

Tool use, governed

The engine selects and calls tools — typed functions and MCP servers alike. Every call crosses the gateway, so it's validated, budgeted, and audited; the agent never touches a credential or a path.

typescript

const recentDeploys = tool({
  input: z.object({ ns: z.string() }),
  run: (a) => kube.deploys(a.ns),     // server-side, validated I/O
})
// the engine calls it; the runtime admits & journals every invocation

Engine

Context engineering

The hardest part of any agent loop is deciding what the model reasons over. The engine gives you tools to shape the working window — pin facts, pull in recall, compact old steps, drop noise — all budget-aware. (Distinct from durable state and long-term recall.)

typescript

// shape what the model reasons over, mid-loop
ctx.context.pin(runbook)                                   // keep all run
const hits = await ctx.context.recall("past p99 incidents", { topK: 5 })
ctx.context.compact({ olderThan: 12 })                    // summarize old steps
ctx.context.drop("raw.logs")                              // evict noisy output

Or declare a context policy on the engine, applied every step:

typescript

agentist({
  context: {
    budget:  "32k",                 // token budget for the window
    include: [charter, runbooks],   // always in context
    compact: "summarize",           // when it fills, summarize the oldest
  },
})

Pin & include — keep the charter, runbooks, or key facts in the window for the whole run.
Recall into context — pull only the relevant, policy-checked slices from the data plane.
Compact & drop — summarize old steps and evict noisy tool output to stay in budget.

Engine

Bring a proven engine

An adapter conforms an existing framework to the engine port — so it runs inside the Agentist harness, in your cloud, durable and governed, without rewriting your agents.

typescript

import { runtime } from "@agentist/sdk"
import { adk } from "@agentist/engine-adk"   // also: langgraph · autogen · crewai

runtime({ engine: adk() })   // ADK reasons; the runtime governs every call

Conforms to the port — propose→commit; the harness, console, and contracts don't change.
Durable at the boundary — each call is a journaled step (the native engine adds mid-loop durability).
ADK · LangGraph · AutoGen · CrewAI — swap without touching governance.

Engine

The Agentist engine

The first-party engine goes where a wrapper can't — because it owns the loop, governance and durability reach inside the reasoning, not only the egress. It's the default.

typescript

import { runtime } from "@agentist/sdk"
import { agentist } from "@agentist/engine"

runtime({ engine: agentist() })   // typed reasoning, governed end to end

Admission at the reasoning step — shape what the model may propose, not only what it runs.
Mid-loop durable execution — event-source every step; resume mid-thought.
Typed end to end — the reasoning graph is typed; delegate is a primitive.
Charter-native judgment — your mission & values govern the reasoning as it happens.

Engine

MCP & A2A

Whichever engine runs, your agents stay reachable and composable: expose them as MCP tools and A2A servers, and consume external MCP tools and A2A agents — all through the governed gateway.

typescript

rt.mcp({ port: 8080 })                     // every agent → an MCP tool
oncall.a2a()                               // expose as an A2A server (Agent Card)
const peer = a2a("https://acme.dev/sre")   // consume an external A2A agent

Runtime · @agentist/runtime

The harness — it governs, it doesn't think

The runtime is the harness, not the brain: it hosts the engine behind one contract — the engine proposes, the runtime commits — and runs every step as a durable state machine in your own cloud, committing only if policy, budget, and types allow. Swap the engine; the harness is unchanged.

Every run is a durable state machine

cron

event

webhook

API

MCP

→

admitidempotency-keyed · dedup

→

durable queuebacklog · worker leases

→

per stepadmit → gateway → journal

→

complete

A trigger admits a run under an idempotency key; a worker leases each step, runs it through governance and the gateway, and journals the result before moving on. Crash mid-run — a deploy, an OOM, a lost node — and it resumes from the last journaled step. A step can also suspend for days awaiting an approval, a timer, or a signal, then resume at the exact point. Exactly once, every time.

Configure once — it applies to every agent

typescript

const rt = runtime({
  store:     postgres(process.env.DATABASE_URL),  // durable, resumable state
  llm:       anthropic("claude-sonnet"),          // default model
  policies:  [remediationGate],                   // admission control — runs before every step
  memory:    pgvector(),                           // scoped, per-tenant recall
  isolation: "container",                          // sandbox each agent
  audit:     true,                                 // OpenTelemetry + replayable log
})

rt.register(oncall, incident)
rt.serve()   // exposes the API, MCP, and the scheduler

Each concern below — durability, admission control, isolation, scale, observability, memory, triggers — is configured once here and enforced uniformly across every agent you register.

Runtime · Local

Local Development

One command. In-memory runtime, local console, instant feedback — then deploy the same manifest to your cloud.

bash

agentist dev          # runtime + console at localhost:3000

Runtime

Durability & recovery

Every run is event-sourced: each step's result is appended to a journal in Postgres before the next step begins. State lives in the database you already run — there's no separate workflow cluster to operate.

typescript

const rt = runtime({ store: postgres(process.env.DATABASE_URL) })
// crash mid-run? it resumes from the last journaled step —
await rt.resume(runId)        // exactly once, on any worker

Event-sourced executionevery step appended to an ordered journal; a run's state is the replay of its events, not in-memory hope

Crash-resumedeploy, OOM, or a lost node — the run resumes from the last journaled step on any worker, no work redone

Exactly-onceidempotency keys on every trigger and step de-dupe redeliveries and retries — an agent acts once, never twice

Suspend & resumepark a run for days on an approval, timer, or external signal; it survives restarts and resumes at the exact point

Retries & dead-letterper-step retry policy with backoff; exhausted steps land in a dead-letter queue for inspection & replay

Storage adaptersPostgres primary; SQLite & Redis for dev and queues; object storage for large artifacts

Runtime

Admission control

Most frameworks watch an agent act and escalate afterward. Agentist runs policy as an admission check before every step and state change — the agent proposes an action, and the runtime commits it only if policy, budget, and types allow. Preventive, not forensic.

typescript

const gate = policy({
  applies: { duty: "oncall.remediate" },
  check: (a, ctx) => a.sev === "1"
    ? ctx.approval("on-call")   // pause for a human
    : ctx.allow(),              // otherwise admit
})

Propose → commitevery tool call, model call, and state write is gated by policy before it runs; nothing side-effects without admission

Typed boundariesZod-validated I/O on every step; malformed output is rejected at the boundary, not discovered downstream

Budgets in-pathper-run caps on tokens & tool calls, enforced as the run executes — cut off before overspend, not after

Human approvals.approval(role) suspends the run durably and routes to Slack, Jira, GitHub, or the Console

Verdicts journaledevery admission decision — admitted, denied, redacted, approved-by — is recorded for audit & replay

Invocation contextevery call carries caller, tenant isolation key, trace id, and budget — policy decides with full context

Runtime

Isolation & security

Agents are untrusted code by default. Each runs sandboxed, scoped to its tenant, with secrets and provider keys held by the runtime — never within agent reach.

typescript

runtime({ isolation: "microvm" })  // process · container · microvm
// per-agent sandbox; the runtime holds secrets, never the agent

Isolation tiersprocess now, container for untrusted code, microVM on the horizon — pick the blast radius per agent

Tenant isolationmemory, state, and recall scoped per tenant by isolation key — one tenant never sees another's data

Secrets vaultcredentials and provider keys held by the runtime; agents reference them, never read them

mTLS everywhereservice-to-service traffic mutually authenticated and encrypted inside your cluster

Egress controlagents reach the network only through the gateway — no unsanctioned outbound calls

Sovereign by defaultthe whole runtime runs in your boundary; agents and data never leave it

Runtime

Scaling & concurrency

Workers are stateless and the queue is durable, so throughput is just a function of how many workers you run. One pool runs both live agents and batch jobs; scale horizontally, autoscale on backlog, and keep tenants fair.

typescript

runtime({ workers: { min: 2, max: 50, scaleOn: "queueDepth" } })
// stateless workers lease a durable queue — add more to scale

Horizontal workersstateless workers — run one or a hundred; the durable queue load-balances leases across them

Autoscalingscale on queue depth & lease backlog — idle to zero, burst on a flood of triggers

Concurrency & fairnessper-tenant and per-agent concurrency caps so one workload can't starve another

Rate & backpressureadmission throttles when downstream models or data sources saturate — no thundering herd

Burst beyond your clusteroverflow to Agentist Cloud or a second cloud when local capacity saturates

Zero-downtime deploysdrain leases and roll workers without dropping in-flight runs — they resume from the journal

Runtime

Observability & tracing

Every step, call, and decision emits a span and a journal entry — so you can watch the fleet live, trace one request across many agents, and reconstruct any past run exactly.

typescript

runtime({ audit: true })       // OTel spans + append-only journal
const out = await rt.replay(seq)   // re-issue the exact call
// same input, model & route — diff vs what was captured

OpenTelemetry nativetraces, metrics & logs on every step and gateway call — export to Datadog, Grafana, Honeycomb, anything OTLP

Multi-agent tracingone distributed trace spans agent→agent and agent→tool calls — follow a request across the whole fleet, not a single process

Performance monitoringper-step latency, token & tool-call throughput, queue depth and lease lag — the operational vitals, in your APM

Append-only audita tamper-evident journal of every step, call, and governance verdict — gapless sequence, evidence on tap

Exact replayre-issue any captured model or data call with the same inputs and route; diff the new output against the original

Live run inspectionattach to a running trace in the console — spans, budgets, and verdicts as they happen

Runtime

Durable state

The runtime persists run and agent state with the journal — it survives restarts and is available on resume. Long-term recall lives in the data plane; the engine's short-term working context is separate.

typescript

await ctx.state.put("incident.severity", "sev1")   // per-run, journaled
const sev = await ctx.state.get("incident.severity")

Persisted with the journalper-run and per-agent state — survives restarts, available on resume

Tenant-scopedpartitioned by isolation key; state never crosses a tenant boundary

TTL & pinningexpire ephemeral values, pin durable facts — control what persists and for how long

Runtime

Triggers & scheduling

A run can start from anywhere — and every entry point is idempotency-keyed, so a redelivered event or a double-click never fires an agent twice.

typescript

rt.cron("0 9 * * 1", weekly.report)    // schedule
rt.on("deploy.failed", oncall.triage)  // internal event
rt.serve()   // + webhooks, API & MCP — all idempotency-keyed

Cron & schedulestime-based runs with timezone-aware schedules and a missed-run policy

Eventsfire on internal domain events or an external bus — fan out to many agents from one event

Webhooksturn any external system's webhook into a governed, typed run

API & SDKPOST a run or call the typed client — see Invocation

MCPevery agent is exposed as an MCP tool; external MCP tools are consumed as agent tools

Idempotency-keyedevery trigger carries a key; redeliveries and retries de-dupe to exactly-once

Runtime · Gateway

The gateway

One governed boundary, both directions: agents reach out to models, data, and tools, and the world reaches in to call your agents. Every call — egress or invocation — is validated, policy-checked, and audited.

Agent

→

Gatewaypolicy · keys · audit

→

InferenceClaude · GPT · open

Datalakes · warehouses

ConnectorsSlack · GitHub · …

Pick a model per agent — or per duty. The gateway routes each to whatever your cloud runs.

typescript

const oncall = agent({
  id: "oncall",
  model: "claude-sonnet",                            // default for this agent
  duties: {
    triage:   duty({ model: "claude-haiku", /* … */ }),   // fast & cheap
    diagnose: duty({ model: "claude-opus",  /* … */ }),   // deep reasoning
  },
})
// gateway maps models per cloud: AWS→Bedrock · GCP→Vertex · Azure→OpenAI · bare-metal→vLLM

Multi-model routingClaude, GPT, open models — per agent, per duty, or by rule

Fallback & retriesdegrade to a backup model on error or rate limit

Cachingdedupe identical calls; cut latency and spend

Cost & rate controlsper-agent budgets and limits, enforced in the path

Prompt governancePII redaction & policy checks before anything leaves your boundary

Key managementprovider keys held by the gateway, never by agents

Bring your own endpointself-hosted or private models via the LLMProvider interface

Full auditevery model call captured and replayable

How it's deployed

The gateway ships inside the runtime — it deploys with it, in your cluster. Provider keys live in the gateway, in your boundary; models connect through each cloud's native service (Bedrock, Vertex, Azure OpenAI) or a self-hosted endpoint (vLLM, Ollama). Self-hosted, nothing routes through us.

With Agentist Cloud

Connect Agentist Cloud and the gateway gains a managed layer — pooled model rates, cross-cloud routing & GPU burst, a shared response cache, automatic provider fallback, and org-wide spend governance — while your prompts and data still never leave your boundary.

Security & compliance

Because every model and data call crosses the gateway, it's the one place compliance is both enforced and evidenced. Provider keys stay in-vault, services talk over mTLS, PII is redacted in-path, and every egress and invocation is journaled to the tamper-evident audit log — so SOC 2 evidence is a query, not a quarter-long scramble. Nothing leaves your boundary.

Invocation — one input, every caller

The same gateway governs the inbound side. An agent or duty declares one typed input schema; every caller — a human, another agent, your app, an API, or an MCP client — satisfies that one contract. Each entry is policy-checked, idempotency-keyed, and audited like any other call.

Human · CLIa positional prompt is the default — agentist run oncall.triage "p99 5s"; use flags or --json for typed multi-field input

Human · Consolestart a run from the console with a prompt box or a typed form

Agent → agenta typed call — triage.with({ … }) — traced end to end across agents

API · SDKtyped client, or POST /v1/runs/<agent.duty> with a JSON body matching the schema

MCPthe input schema becomes the tool's inputSchema — call from Claude or Cursor

Webhook · event · cronmap an external payload to the schema; validated & idempotency-keyed on the way in

typescript

// one typed input contract — a single field is positional on the CLI
const triage = duty({ input: z.object({ prompt: z.string() }), /* … */ })

// HUMAN · CLI — the prompt is the default; no flag needed
//   agentist run oncall.triage "checkout p99 latency is 5s"

// AGENT → AGENT — typed, traced across agents
const out = await triage.with({ prompt: "checkout p99 5s" })

// API · SDK — same contract, over the network
const run = await client.run(oncall.triage, { prompt: "checkout p99 5s" })
//   POST /v1/runs/oncall.triage   { "prompt": "checkout p99 5s" }

// MCP — the schema is the tool; expose every agent
rt.mcp({ port: 8080 })

Infrastructure · @agentist/infra

The platform, composed in code

Defining agents is half the job; the other half is standing up what they run on — a runtime, a store, model endpoints, the gateway, a vector store, sandboxes. Agentist gives you that as TypeScript infrastructure-as-code: compose the platform's components in one typed file, and agentist deploy materializes them onto Kubernetes — in your cloud or ours, from the same code.

typescript

import { stack, gateway, models, vectors, postgres, image } from "@agentist/infra"
import { bedrock } from "@agentist/infra/aws"

export default stack({
  target: "k8s",                          // the universal substrate
  store:  postgres({ instance: "db.r6g.large" }),
  models: [
    models.serve("llama-3.1-70b", {
      gpu: "H100:2",                      // multi-GPU, tensor-parallel
      image: image.vllm("0.6.3"),         // the serving image, in code
      autoscale: { min: 0, max: 8, scaledown: "60s" },   // scale-to-zero
      concurrency: 32,                    // inputs in flight per replica
      snapshot: true,                     // GPU snapshot → seconds-cold-start
    }),
    models.bind(bedrock("claude-sonnet-4")),   // or bind your cloud's
  ],
  gateway:   gateway({ egress: "deny", redactPII: true }),
  vectors:   vectors.pgvector(),
  sandboxes: { isolation: "microvm", idleTimeout: "5m" },
})
//   agentist deploy --to your-cluster   # your cloud or ours, same code

stack.tsTypeScript IaC

→

agentist deploycompile to k8s

→

Runtime · Gateway

Models on GPUs

Postgres · pgvector

Sandboxes

→

Your cloudsovereign

One language end to endinfrastructure in the same typed TypeScript as your agents — types and autocomplete across the whole platform

Kubernetes everywhereone definition runs in your cloud, ours, on-prem, or air-gapped — k8s is the only assumption

Granular per-workload controlGPU type & count, CPU, memory, autoscaling, concurrency & batching — set on each component independently

Provider abstractionsdeclare a need — object storage, a Postgres, a model — and bind it to Bedrock, S3, RDS, GCS, or self-hosted

Sovereign, not hostedunlike hosted infra clouds, the components land in your boundary — data and models never leave

Scale-to-zeroGPU model endpoints autoscale on demand and idle down to nothing — pay per second of use

Infrastructure

The stack

A stack is the unit of infrastructure — every component an agent platform needs, declared in one typed file: runtime, store, models, gateway, vectors, sandboxes, tool backends, schedules and queues. agentist deploy plans the diff and applies it incrementally; the stack's typed outputs wire straight into runtime({…}).

typescript

import platform from "./stack"     // the typed infra definition
await platform.plan()              // show the diff first
const out = await platform.apply() // incremental & reversible

// the stack's typed outputs wire straight into the runtime
runtime({ store: out.store, llm: out.models.claude })

Infra as typed codethe same TypeScript as your agents — types, autocomplete & refactors across the whole platform

The component setruntime, store, gateway, vectors, models, sandboxes, tools, schedules & queues — composed, not hand-wired

Plan & applyevery deploy shows a diff first; apply is incremental and reversible — no surprise teardowns

Typed outputsa stack returns typed handles — store, models, gateway — the runtime consumes directly; no copy-pasting URLs

Environments & modulesdev, staging & prod from one definition; group components into reusable modules shared across teams

State & driftdeploy state is tracked; drift is detected and reconciled on the next apply

Infrastructure

Compute & autoscaling

Every workload — a model endpoint, a tool backend, a batch job — gets fine-grained, per-component control over resources and scaling. The granularity of a serverless platform, declared in code and run on your own Kubernetes.

typescript

models.serve("llama-3.1-70b", {
  gpu: "H100:2", cpu: 8, memory: "32Gi",   // per-workload resources
  autoscale: { min: 0, max: 8, scaledown: "60s" },
  concurrency: 32,                          // inputs in flight / replica
  batch: { maxSize: 16, window: "10ms" },   // dynamic batching
  snapshot: true,                           // seconds-cold-start
})

Per-workload resourcesset GPU type & count, CPU, and memory on each component independently — right-size every piece

Autoscale knobsmin, max & buffer replicas, target concurrency, and a scaledown window — per workload, not one global setting

Scale-to-zeroidle workloads drop to zero replicas and cold-start on demand — pay only for what actually runs

Concurrency & batchinginputs-in-flight per replica and dynamic request batching for throughput under load

Fast cold startsmemory & GPU snapshots restore warm state — a scaled-to-zero endpoint resumes in seconds, not minutes

Timeouts & retriesper-component execution timeouts, retry policy, and backpressure when downstream saturates

Infrastructure

Images & environments

Define the exact runtime a component runs in — base image, system packages, dependencies — in the same TypeScript. Builds are layered and content-addressed, so a one-line change rebuilds one layer, not the world.

typescript

const env = image
  .debianSlim()
  .apt("git", "ffmpeg")
  .pip("vllm==0.6.3", "transformers")
  .run("python -m warm_cache")     // build-time step, cached layer

Images in codedeclare base image, apt & pip/npm deps, and build steps with a typed, chainable builder

Layered cachecontent-addressed layers — change one dependency, rebuild one layer

Bring a Dockerfilestart from an existing Dockerfile or registry image when you'd rather

GPU build stepsrun build-time steps on a GPU — compile kernels, warm caches, bake weights

Pinned & reproduciblelockfile-pinned, content-hashed images — the same build everywhere

Per-componenteach model, sandbox, or tool backend runs its own image — no monolith

Infrastructure

Models & GPUs

Serve open models on your own GPUs, or bind to your cloud's hosted models — the same models interface either way, every call routed through the governed gateway.

typescript

// self-host on your GPUs — OpenAI-compatible endpoint
const llama  = models.serve("llama-3.1-70b", { gpu: "H100" })
// or bind your cloud's hosted model — no servers to run
const claude = models.bind(bedrock("claude-sonnet-4"))

Self-host on GPUsLlama, Qwen, Mistral on T4 → H100/H200 — single- or multi-GPU with tensor parallelism

vLLM-backedhigh-throughput serving with continuous batching behind an OpenAI-compatible API

Bind hosted modelsBedrock, Vertex, Anthropic, or OpenAI in one line — no servers to run

Scale-to-zero + snapshotsGPU endpoints idle to nothing and cold-start in seconds from a snapshot

Weight & cache volumesmodel weights and compile artifacts cached on a volume across restarts

Per-duty routingcheap duties to a small model, deep reasoning to a large one — per agent, per duty, or by rule

Infrastructure

Sandboxes for untrusted code

When an agent writes code, it runs in a disposable sandbox — an isolated container provisioned on demand, scoped, and torn down. The execution substrate for tools that run generated code.

typescript

const box = await sandbox.create({ image: env, isolation: "microvm" })
const { stdout, exitCode } = await box.exec("python solve.py")
await box.snapshot("checkpoint")   // pause & resume later
await box.terminate()

On-demand isolationa fresh sandbox per task — process, container, or microVM blast radius

Exec & streamrun commands, stream stdout/stderr, capture exit codes

Ephemeral by defaultauto-terminate on idle or timeout — nothing lingers

Snapshotscheckpoint a sandbox's filesystem to pause and resume long-running sessions

Tunnels & mountsscoped network tunnels and only the volumes & secrets a task needs

Governed egressa sandbox reaches the network only through the gateway and policy

Infrastructure

Jobs & batch

Not everything is a live request. Embedding a corpus, an eval suite, or a nightly backfill is a batch job — run on the same worker pool as live agents, burst up for the work and back to zero when it drains. (Sub-agent parallelism within a run is delegate; this is data-level parallelism over a collection. Scheduling lives in Triggers.)

typescript

// map a function across a collection — data parallelism on the worker pool
const vectors = await job.map(corpus, embed)   // 10k docs, results gathered
// or long-running work, collected later
const handle  = await job.spawn(nightlyBackfill)

Parallel mapmap across thousands of inputs on the worker pool — ordered results, failures isolated per item

Background jobsspawn long-running work and collect it later — no request held open

Batch pipelineschain stages with durable queues between them — ingest → embed → index

Burst & scale-to-zerothe worker pool scales up for the batch and back to nothing when it's done

Same governanceevery job step is admission-checked, budgeted, and audited like any run

Infrastructure

Provider abstractions

A component declares an abstract need — object storage, a Postgres, a model. You bind it to a cloud-specific provider. Swap the binding, not the code.

typescript

stack({
  store:   rds("agentist-prod"),   // or cloudSql() · alloyDb() · a URL
  storage: s3("artifacts"),         // or gcs() · azureBlob() · minio()
  secrets: awsSecrets(),            // or vault() · gcpSecrets()
})

Object storageS3, GCS, Azure Blob, or in-cluster MinIO — behind one interface

Managed PostgresRDS, Cloud SQL, AlloyDB, CloudNativePG, or an existing connection URL

Model providersBedrock, Vertex, Anthropic, OpenAI, or a self-hosted vLLM endpoint

Secretsyour cloud's secret manager or the runtime vault, injected at deploy time

NetworkingVPC, private endpoints, and ingress mapped to each cloud's primitives

One matrix, your pickchoose per environment — in-cluster for dev, managed for prod

Infrastructure

Deploy anywhere

agentist deploy compiles the stack to Kubernetes and rolls it out — your cloud, ours, on-prem, or air-gapped. Kubernetes is the only assumption.

bash

agentist deploy --plan          # dry-run: show the diff first
agentist deploy --to prod       # compile → apply to your cluster
agentist deploy --to air-gapped --offline

One commandcompile, diff, and apply the whole platform with agentist deploy

GitOps-readythe stack is code — review infrastructure in PRs and deploy from CI

Zero-downtime rolloutsdrain and roll workers without dropping in-flight runs — they resume from the journal

Air-gappeddeploy into disconnected environments — no phone-home required

Your cloud or oursidentical definition; managed conveniences in Agentist Cloud, never lock-in

Helm & operators underneathstandard Kubernetes artifacts you can inspect, own, and extend

Cloud

Your AI Cloud, composed in the cloud

The whole platform runs in your cloud — deploy the runtime with one Helm chart into AWS, GCP, Azure, DigitalOcean, or bare metal. Your agents and data never leave your boundary, and it's all stood up as infrastructure-as-code.

Optional: Agentist Cloud

Plug in Agentist Cloud — the managed control plane to govern the whole agentic estate (charter, identities, RBAC, policies, registry, cost) across every team, with connectors and burst capacity. Your data plane still never leaves your boundary.

☁ Your AI CloudAWS · GCP · Azure · DigitalOcean · bare metal

Agentist Runtimeyour agents & data · in your boundary

Gatewaysyour models · your data lakes

↕ Optionally connect Agentist Cloud — managed control plane, connectors & burst.

bash

# deploy to any cloud, then connect Agentist
agentist deploy --target k8s
agentist cloud connect

☁ Your AI Cloudself-hosted runtime + data

⇄

Agentist Cloudmanaged plane

→

Charter & culture

Agent registry

Identity & RBAC

Policy management

Connectors & compliance

Cost & multi-cloud burst

Charter & culturemanage mission & values org-wide; every agent inherits them

Agent registrycatalog every agent, skill & base harness across teams, versioned

Identity & RBACSSO, SCIM, roles & scopes per team and per agent

Policy managementorg-wide policies & approval routes, centrally enforced

Skills & harness librarypublish & share base harnesses and skills company-wide

Training & evalsturn runs & coaching feedback into eval sets and improved agents

Secrets & keysa central vault — credentials rotated and scoped per agent

Cost & quota governancebudgets, limits & spend analytics per team and agent

Managed control planehosted Console & one view across every team's fleet

Connector marketplacevetted, enterprise-grade connectors & vertical packs

Compliance packsSOC2 / HIPAA bundles + exportable evidence

Multi-cloud, burstableburst compute & GPUs across AWS, GCP & Azure on demand

Support & LTSSLAs, long-term support, dedicated assistance

Self-host vs. Agentist Cloud

	Self-host (open core)	+ Agentist Cloud
Runtime & data	Your cloud / cluster	Your cloud — unchanged
Compute	Your cluster	+ multi-cloud burst & GPUs
Control plane	Self-hosted Console	Managed / hosted Console
Connectors	Open core + DIY	Marketplace + vertical packs
Compliance	DIY	SOC2 / HIPAA packs + evidence
Support	Community	SLAs · LTS

Cloud · AWS

Deploy on AWS

Run the runtime on EKS, with RDS for Postgres and S3 for object storage. Models run on Amazon Bedrock (Claude, Llama, Titan); burst GPU on EC2 for self-hosted models.

bash

agentist deploy --target eks \
  --set postgres.url=$RDS_URL \
  --set objectStore=s3://acme-agentist \
  --set gateway.models=bedrock          # Claude on Bedrock

Cloud · GCP

Deploy on Google Cloud

Run on GKE, with Cloud SQL for Postgres and GCS for object storage. Models run on Vertex AI (Claude, Gemini); burst to GPU node pools on demand.

bash

agentist deploy --target gke \
  --set postgres.url=$CLOUDSQL_URL \
  --set objectStore=gs://acme-agentist \
  --set gateway.models=vertex           # Claude / Gemini on Vertex

Cloud · Azure

Deploy on Azure

Run on AKS, with Azure Database for PostgreSQL and Blob Storage. Models run on Azure OpenAI or Azure-hosted endpoints.

bash

agentist deploy --target aks \
  --set postgres.url=$AZURE_PG_URL \
  --set objectStore=az://acme-agentist \
  --set gateway.models=azure-openai

Cloud · DigitalOcean

Deploy on DigitalOcean

Run on DOKS, with Managed Postgres and Spaces for object storage. Use the Anthropic API or self-host models on GPU droplets.

bash

agentist deploy --target doks \
  --set postgres.url=$DO_PG_URL \
  --set objectStore=spaces://acme-agentist \
  --set gateway.models=anthropic

Cloud · Bare metal

Deploy on bare metal

Air-gapped or on-prem. Run on k3s or kubeadm, with self-managed Postgres and MinIO. Self-host models with vLLM or Ollama — no external calls leave the building.

bash

agentist deploy --target k8s \
  --set postgres.url=$PG_URL \
  --set objectStore=minio://agentist \
  --set gateway.models=vllm \
  --set airgapped=true

Cloud · Multi-cloud

Multi-cloud & burst

Run the data plane in your primary cloud and burst compute, GPUs, or inference into others on demand — one logical fleet, one control plane, governed centrally.

Why fleets need it: agent load is spiky — one incident or batch can fire hundreds of agents at once, each making GPU-heavy inference calls. Bursting spills that spike to spare GPUs in another cloud instead of queuing or failing, so latency stays flat and you pay for peak capacity only when you actually hit it.

☁ Your AI Cloudone fleet across providers

PrimaryAWS · runtime + data

⇄

BurstGCP · GPU + inference

⇄

Burstbare metal · private models

Cost- & latency-aware scheduling routes work to the cheapest healthy capacity; data stays pinned to your primary.

bash

# primary cloud + burst targets — one fleet
agentist deploy --target eks --set primary=true
agentist cloud burst add --target gke --gpu
agentist cloud burst add --target k8s --models=vllm

One fleetmany clouds, one control plane & audit trail

GPU burstspill to another cloud's GPUs under load

Model routingper-agent models mapped to each cloud's provider

Cost-awareschedule to the cheapest healthy capacity

Data stays putdata plane pinned to primary; only compute bursts

Failoverdrain & reschedule if a region or cloud degrades

Console

Operate · observe · coach

The control plane for your agents — watch every run, audit any decision, and coach agents to get better. CLI-first: launch agentist console from your terminal, or host it centrally in your cloud's data plane. Self-hosted, or managed via Agentist Cloud.

agentist console — incident fleet

env: prod● self-hosted · data in your boundarylast 24h

Runs Approvals2 Observe Audit Coach Policies Registry Identity Gateway Access

runs · prod · last 24h

suspendedrunningfaileddeniedmine

run	agent	state	waiting-on	budget	trigger	age
incident #4821	Cornelia·SRE	suspended(approval)	on-call · 4m 12s	41k/60k	webhook·a3f	4m 12s
deploy-gate #903	Rune·Release	suspended(approval)	release · 1m 02s	6k/60k	api·77c	1m 02s
backup-verify #88	Atlas·Ops	suspended(timer)	timer · 17h 51m	2k/60k	cron·d10	6h 09m
incident #4822	Cornelia·SRE	running	—	22k/60k	webhook·b1e	1m 40s
cost-report #77	Vega·Cost	running	—	9k/60k	cron·e22	0m 31s
reindex #410	Atlas·Ops	failed	— step 3/6	18k/60k	event·f90	2m ago
incident #4820	Cornelia·SRE	done	—	31k/60k	webhook·9c2	3m 41s

47 runs · 3 suspended on human · 2 running · 1 failed⏎ open trace

approvals · 2 pending · routed to #sre-oncall (Slack), Console

destructive gate .approval("on-call") · 4m 12s in suspend

Cornelia · SRE on-call · incident #4821

kubectl rollout undo deploy/checkout

error rate 9.2% post-deploy · p95 +340ms — restores last-good v411.

- image: checkout:v412+ image: checkout:v411

also live in Slack #sre-oncall

gate .approval("release") · 1m 02s in suspend

Rune · Release · deploy-gate #903

scale checkout 6 → 10

sustained 5xx under load — +4 replicas, within budget.

observe · prod · last 24h · ▲ vs prior 24h

⛔ stuck on humans

3 runs

oldest 4m ▲

→ Approvals

◷ suspend backlog

11 runs

2 overdue ▲

→ Runs: suspended

⛔ admission denials

▲ +9 vs 24h

→ Policies: log

⚠ budget pressure

2 runs

≥80% budget

→ Runs: budget

✓ failed & retrying

1 failed

retries exhausted

→ Runs: failed

✓ lease health

8/8

0 orphaned workers

→ workers

⚠ gateway egress

data 2% 429

fallback ON · inference ok

→ Gateway

✓ p95 step latency

2.3s

within SLA · per-agent below

→ traces

by agent · latency & spend live here, where they're actionable

agent	runs	p95 step	denied	suspended	budget cut-offs	spend
Cornelia·SRE	612	2.6s	9	3	1	$22
Rune·Release	418	1.9s	2	1	0	$11
Atlas·Ops	254	3.1s	3	1	1	$9

audit · append-only journal · tamper-evident

seq	time	agent · tenant	step / call	verdict	model / source	idem
1184	14:02:14	Cornelia·acme	resume:remediate	approved-by kc	—	a3f-04
1182	14:02:11	Cornelia·acme	call:diagnose (infer)	admitted	claude-sonnet	a3f-03
1181	14:02:09	Cornelia·acme	data:metrics.query (data)	redacted PII	prometheus·ro	a3f-03
1180	14:02:08	Cornelia·acme	call:triage (infer)	admitted	claude-haiku	a3f-02
1176	14:01:40	Vega·acme	step:scale 50→500	denied · budget	—	e22-01

seq 1100–1184 · gapless ✓ (no tamper)⏎ open call detail

call detail · seq 1182 · call:diagnose · Cornelia · acme

prompt(captured) DiagnoseInput { errorRate: 0.092, p95: 340, deploy: v412 } …output(captured) Diagnosis { cause: "v412 regression", confidence: 0.86 }verdictadmitted · policy budget-per-run ok · gateway inference · claude-sonnetrunincident #4821 · span diagnose

re-issues same input/model/route · diffs vs captured output

coach · Cornelia · SRE · run incident #4820 · step triage

agent output	your correction
severity sev3 "elevated latency, monitor"	severity sev2 "p99 5s on checkout is customer-facing → page"

eval set: Cornelia (37 cases) · 4 pending review

pending corrections

Cornelia · diagnose · over-confident cause attribution2h ago · review →

Rune · review · missed migration step5h ago · review →

policies · 6 active · admission control runs BEFORE every step

policy	applies to	effect
remediation-needs-approval	*.remediate · destructive	require-approval(on-call)
no-public-buckets	iac.apply	deny
budget-per-run	*	cap-budget ≤ $2
pii-redaction	gateway.inference + data	redact
prod-data-read-only	gateway.data	allow read · deny write

decision log · admission verdicts (the wedge in action)

deniedVega · scale 50→500 — over budget (budget-per-run)2m ago → run

approvalCornelia · rollout undo — destructive remediation4m ago → run

redactedCornelia · metrics.query — PII masked before egress4m ago → run

admittedRune · scale 6→10 — within budget6m ago → run

denials ▲ +9 vs 24h — review budget-per-run · simulate a policy edit against a past run before activating.

registry · agents · skills · harnesses · prod

name	kind	version	owner	isolation	last eval
oncall · Cornelia	agent	1.4.0	platform	container	34/37 ✓
release · Rune	agent	0.9.2	deploys	container	28/28 ✓
k8s	skill	2.1.0	platform	—	—
runbooks	skill	1.0.6	sre	—	—
on-call	harness	3.0.0	agentist	container	—

oncall · Cornelia · versions

1.4.0live (prod) · eval 34/372d ago

1.3.1eval 31/37 · [ diff ] [ rollback → make live ]9d ago

1.3.0eval 30/37 · [ diff ] [ rollback ]21d ago

identity · agents

Cornelia · SRE on-call

personacalm, evidence-first incident responderchartermission: restore service safely · value: never act destructively without on-call approval (Cloud-managed)skillsk8s 2.1.0 · runbooks 1.0.6harnesson-call 3.0.0 · isolation containergatestriggers .approval("on-call") on *.remediate(destructive)

charter · Cornelia · source: Cloud-managed · v7

missionRestore service safely and quickly.values— Never take a destructive action without on-call approval.
— Prefer the smallest reversible change.

amendments are versioned & journaled

gateway & cost · prod · one governed boundary, both directions

inference gateway · keys: vault (in boundary) ✓

route	model	provider	fallback	cache
Cornelia · diagnose	claude-sonnet	Bedrock	→ haiku	31% hit
Rune · review	gpt-4o	OpenAI	→ sonnet	12% hit
* (default)	claude-haiku	Bedrock	—	44% hit

budgets in-path · spend $42 / cap $80 · 2 runs cut off at budget today → Runs: budget

data gateway

source	access	redaction	query audit
prometheus·ro	read	PII mask	142 queries → Audit
pg · acme	read · deny write	PII mask	37 queries → Audit

invocation · inbound (gateway feature)

exposedCornelia, Rune as MCP tools · governed by inference gateway + policies

consumedgithub-mcp, pagerduty-mcp · budgeted, audited as data egress

surfacesREST · SDK · MCP · webhook · event — every call policy-checked & idempotency-keyed

security & compliance

SOC 2every egress + invocation journaled to the tamper-evident audit log — evidence on tap

controlskey custody in-vault · mTLS service-to-service · PII redaction in-path · nothing leaves your boundary

access · roles & scopes · SSO: Okta · SCIM ✓

member	role	scope	approval authority
kc@kc.io	admin	org	all gates
sre-team	operator	oncall · approvals	.approval("on-call")
deploys	operator	release · policies	.approval("release")
viewer@acme	viewer	read-only	none

every console action (approve, activate policy, rollback) is journaled → Audit

Console

Operate

Run the fleet live — watch executions, clear the approvals agents are waiting on, and talk to them directly. Every action is governed and lands on one trail.

Runs — list, then drill into the trace

A live list of every execution; open one to drop into its typed trace — spans, budgets, and the exact model and data calls.

runs · prod · last 24h incident #4821 Cornelia·SRE suspended(approval) 12m incident #4820 Cornelia·SRE done 1.4s checkout-watch Otto·SRE running — 47 runs · 3 suspended on human · 2 running · 1 failed · ⏎ open trace

Approvals

Every run suspended on a human, with just enough to decide in one keystroke — the proposed command, the rationale, and the diff.

incident #4821 · restore deploy v411 error rate 9.2% post-deploy · p95 +340ms → restores last-good v411 [ deny ] [ edit ] [ approve ] also live in Slack #sre-oncall

Chat

Talk to any agent, or pick a run up as a conversation. A gated step shows up inline — answer it and the run resumes.

Cornelia · SRE > p99 on checkout is climbing. I'd roll back deploy 4f2a. ⏸ approval needed · k8s.rollback(4f2a) you › go → resumed · recorded to audit

Console

Observe

Watch the fleet's health, shape, and spend. Every tile is a real operator question with a threshold and a drill target — not a vanity number.

Signals

What's stuck on a human, where admission is denying, who's near budget — each tile clicks through to the run or policy behind it.

stuck on humans 3 ▲ → Approvals admission denials 11 ▲ → Policies log budget pressure 2 → Runs ≥80% budget gateway egress ok

Fleet

The bird's-eye view — agents, how they call one another, and per-agent health and cost. Drill from the map into one agent's dashboard.

Cornelia·SRE 98.7% ok $42/24h ● Otto·SRE 99.1% ok $18/24h ● billing-bot 91.2% ok $7/24h ▲ error rate 12 agents · 2 flagged

Cost

Spend and token throughput per agent and per model, ranked — find the expensive ones before finance does.

by model claude-sonnet 1.2M tok $310 llama-70b (self) 4.4M tok $61 by agent Cornelia·SRE $42 Otto·SRE $18 billing-bot $7 budget $900 / $1,500 this month

Console

Govern

The trust surface — the rules that gate every step, the tamper-evident record of what happened, and who's allowed to do what.

Policies

Every admission policy, what it gates, and its live hit-rate — spot policy fighting the fleet or a misconfiguration at a glance.

budget-per-run gate: every step hits 4.4k denies 0 destructive-approval gate: k8s.* db.* hits 312 approvals 28 pii-redact gate: data egress redactions 1.1k

Audit

The append-only, tamper-evident journal — every step, gateway call, and verdict, gapless and replayable. Re-issue any captured call.

seq 1182 call:diagnose Cornelia·acme admitted seq 1183 egress:bedrock claude-sonnet 1,840 tok seq 1184 state:write incident #4821 seq 1100–1184 · gapless ✓ · [ replay exact call ]

Access

Who can do what — roles, scopes, and agent identities — with the SSO and SCIM behind them. Every change is itself audited.

kc@kc.io admin all oncall@acme operator approve · resume · chat Cornelia (agent) identity SRE · charter: acme SSO: Okta · SCIM provisioned

Console

Manage

Curate the fleet — the catalog of agents and versions, who each one is, and how they get better over time.

Registry

The catalog of every agent — its version, skills, and the policies bound to it. Deploy, pin, or roll back from here.

Cornelia · SRE v4.3.1 skills: k8s, prom, runbooks Otto · SRE v2.0.0 skills: k8s billing-bot v1.4.0 ▲ 1 version behind [ deploy ] [ pin ] [ rollback ]

Identity

Each agent's identity and charter — name, role, persona, and the mission and values it inherits. The agent is someone.

Cornelia · On-call SRE persona calm, terse, evidence-driven charter acme — keep systems reliable & trust intact duties triage · diagnose · remediate

Coach

Turn corrections into evals — review a run's reasoning, save the fix as a test case, or note it to the charter.

Cornelia · diagnose · over-confident cause attribution [ save as eval ] [ note to charter ] eval set: Cornelia (37 cases) · 4 pending review

Connectors

Pluggable connectors

Connectors plug the platform into the tools your teams already use — as triggers, actions, and approval channels. Install one, and your agents and your people reach it through a typed, governed interface.

Triggersstart workflows from GitHub, PagerDuty, Alertmanager, cron or webhooks

Actionspost to Slack, open a GitHub PR, page on-call, update ServiceNow

Approvals anywhereresolve an approval in Slack, Jira, GitHub, email or the Console — same audit trail

Claude & MCPexpose agents to Claude/Cursor over MCP; use Claude as a connector

Typed & governedevery connector call is schema-validated and policy-checked

Secrets handledcredentials held by the runtime, never by agents

Build your owna small interface — ship a connector in an afternoon

Marketplaceinstall vetted connectors; enterprise packs via Agentist Cloud

Approval requested

→

Slack

Jira

GitHub

Console

→

Approved → resume

Data plane

Secure gateways to your data lakes

Agents are only as intelligent as the data they can reach. The data plane is a governed gateway to your lakes and warehouses — agents query real enterprise data through it, never with raw credentials, and every access is policy-checked and audited.

Agent

→

Secure gatewaypolicy · masking · audit

→

Snowflake

BigQuery

Databricks

S3 · Postgres

Data gatewaysSnowflake · BigQuery · Databricks · S3 · Postgres — reached through the gateway, never raw creds

Governed accessrow- & column-level policy, PII masking, per-agent scopes — enforced at the gateway

Grounded intelligencesemantic + SQL retrieval; agents reason over your real data

Every query auditedwho asked, what was read, and why — replayable

Sovereigndata never leaves your boundary; the gateway mediates every read

Connectorswarehouses, lakes, vector stores, and internal APIs

Caching & freshnessfast repeat reads with controlled staleness

Telemetry outOpenTelemetry & audit streamed back to your observability stack

Data plane

The event store

Durable state, the audit journal, and memory all live in one Postgres you run. The runtime owns the schema — you point it at a database.

typescript

runtime({ store: postgres(process.env.DATABASE_URL) })
// event-sourced: state, journal & vectors in one database
// SQLite for local dev · Redis for queues

Event-sourcedevery run's state is the ordered log of its events, not a mutable row to corrupt

The audit journalappend-only, tamper-evident, gapless — the same store is your compliance evidence

One databasestate, journal & vector memory in the Postgres you already operate — no extra service

Migrations ownedthe runtime manages its schema and migrations; you just bring the connection

PluggablePostgres in prod; SQLite for local dev; Redis for queues

Backups & PITRit's your database — your backups, your point-in-time recovery

Data plane

Memory

Agents remember — durably, and scoped to whoever they're acting for. More than a chat buffer: working memory, semantic recall, and pinned facts, all policy-aware.

typescript

await ctx.memory.put("user.tier", "enterprise")   // working memory
const tier = await ctx.memory.get("user.tier")
const hits = await ctx.memory.search("past incidents", { topK: 5 })

Working memorystructured, persistent facts about the task or user, carried across turns

Semantic recallretrieve past context by meaning, not just recency, over pgvector

Scoped per tenantmemory partitioned by isolation key; recall never crosses a boundary

Policy-awareretrieval passes through admission control — redaction and access rules apply to what returns

TTL & pinningexpire ephemeral context, pin durable facts — control what an agent remembers

Typed get · put · searcha small, typed API — no bespoke memory plumbing per agent

Data plane

Retrieval & RAG

A complete retrieval pipeline — chunk, embed, search, rerank — so agents reason over your documents and data with precision, not a similarity guess.

typescript

const kb = retrieval({
  chunk:  { strategy: "recursive", size: 800, overlap: 100 },
  embed:  "text-embedding-3-large",
  store:  vectors.pgvector(),
  rerank: "rerank-v3",                  // precision pass before the model
})
await kb.add(docs)
const hits = await kb.query("why did checkout fail?", { topK: 8 })

Chunkingrecursive, sliding-window & structure-aware strategies with overlap control

Embeddingsany embedding model through the gateway; batch-embed at ingest

Hybrid searchkeyword + vector, ranked together — exact and semantic in one query

Rerankinga reranker reorders candidates for precision before they reach the model

Graph retrievaltraverse links between chunks to surface connected context, not just nearest neighbors

Grounded & citedanswers carry their sources; every retrieval is policy-checked and audited

Data plane

Vectors

Vector storage is built in on pgvector — no extra database to run — and pluggable to an external store when you want one.

typescript

const v = vectors.pgvector()           // or pinecone() · qdrant()
await v.query(embedding, {
  topK: 10,
  filter: { tenant: "acme", kind: "runbook" },   // metadata filter
})

pgvector by defaultvectors live in the same Postgres as state & the journal — one less thing to operate

External storesbind Pinecone, Qdrant, Weaviate, or Milvus when you need them

Metadata filteringcombine vector similarity with structured filters in one query

Index tuningHNSW / IVF parameters exposed — tune recall against latency

Tenant-scopedvector namespaces partitioned by isolation key

Governedevery vector read flows through admission control like any other data access

Data plane

Object storage

Large artifacts — files an agent reads or writes, model weights, run outputs — live in object storage, bound to your cloud's bucket.

typescript

const bucket = storage(s3("artifacts"))   // or gcs · azureBlob · minio
await bucket.put(`runs/${runId}/report.pdf`, bytes)
const url = await bucket.signedUrl(key, { expires: "1h" })

Artifactsinputs and outputs of a run persisted and linked straight from its trace

Bind your bucketS3, GCS, Azure Blob, or in-cluster MinIO behind one interface

Volumesshared, mountable volumes for model weights & caches — see Models

Lifecycleretention and expiry policies on ephemeral artifacts

Scoped accessagents reach storage through the gateway, scoped to their tenant

Large-file safestream big objects without loading them into memory

Examples

The whole use case, in one file

Each example is a single typed file — the entire implementation, infrastructure and all: charter, agent, tools, workflow, governance, the engine, and the stack that deploys it to your cloud. Not snippets — the real thing, top to bottom. Five builds, each leaning on a different part of the platform. Each ends with how it's invoked — a trigger, a typed call, or a prompt; one input contract, every caller. (More to come, plus a dedicated examples repo.)

Examples

Incident

Otto, the on-call SRE agent: a paging alert arrives, he triages it, delegates parallel investigators across app, db & network, diagnoses the root cause, and proposes a fix — but anything destructive waits for a human. Shows: charter · skills · governed tools · parallel delegation · admission control · approvals · memory · engine · infrastructure — one file.

alert

→

triage

→

delegate

→

diagnose

→

approval

→

remediate

typescript

// src/incident.ts — Otto, the on-call SRE: triage → investigate → fix, one file.
import { agent, duty, tool, workflow, policy, charter, harness, runtime, postgres } from "@agentist/sdk"
import { agentist } from "@agentist/engine"
import { stack, gateway, vectors } from "@agentist/infra"
import { bedrock } from "@agentist/infra/aws"
import { z } from "zod"

// ── Charter — the culture every agent inherits ───────────────────────────
const acme = charter({
  mission: "Keep systems reliable and customer trust intact.",
  values: ["Bias to safety", "Explain your reasoning", "Escalate when unsure"],
})

// ── Schemas ───────────────────────────────────────────────────────────────
const Alert = z.object({ service: z.string(), symptom: z.string(), severity: z.enum(["sev1","sev2","sev3"]) })

// ── Tools — server-side & validated; the agent never sees a credential ──
const queryMetrics = tool({
  input:  z.object({ service: z.string(), window: z.string().default("30m") }),
  output: z.object({ p99: z.number(), errorRate: z.number() }),
  run: (a, ctx) => ctx.tool("prometheus.query", a),          // via the egress gateway
})
const rollback = tool({
  input: z.object({ deploy: z.string() }),
  meta:  { destructive: true },                              // surfaced to admission control
  run: (a, ctx) => ctx.tool("k8s.rollback", a),
})

// ── Investigator — one cheap agent, fanned out across subsystems ─────────
const investigate = agent({
  id: "investigate",
  identity: { name: "Iris", role: "Investigator", persona: "Methodical, fast." },
  charter: acme,
  tools: [queryMetrics],
  duties: {
    scan: duty({
      input:  z.object({ alert: Alert, area: z.enum(["app","db","network"]) }),
      output: z.object({ area: z.string(), finding: z.string() }),
      run: (a, ctx) => ctx.llm({ prompt: `Inspect ${a.area} for ${a.alert.symptom}` }),
    }),
  },
})

// ── Otto — the on-call SRE, extends a base harness ───────────────────────
export const oncall = agent({
  extends: harness("on-call"),           // reusable SRE preset — extend, don't rebuild
  id: "oncall",
  identity: { name: "Otto", role: "On-call SRE", persona: "Calm, terse, evidence-driven." },
  charter: acme,
  tools: [queryMetrics, rollback],
  duties: {
    triage: duty({
      input:  Alert,
      output: z.object({ summary: z.string() }),
      run: async (a, ctx) => {
        await ctx.context.recall("past incidents on " + a.service, { topK: 5 })   // remember
        return ctx.llm({ prompt: `Triage: ${a.symptom}` })
      },
    }),
    diagnose: duty({
      input:  z.object({ findings: z.array(z.object({ area: z.string(), finding: z.string() })) }),
      output: z.object({ rootCause: z.string(), destructive: z.boolean() }),
      run: (a, ctx) => ctx.llm({ prompt: `Root cause from: ${JSON.stringify(a.findings)}` }),
    }),
    remediate: duty({
      input: z.object({ rootCause: z.string() }),
      run: (a, ctx) => ctx.tool("k8s.rollback", { deploy: "latest" }),
    }),
  },
})

// ── Policy — admission control: destructive fixes wait for a human ───────
const remediationGate = policy({
  id: "remediation-needs-approval",
  applies: { duty: "oncall.remediate" },
  check: ({ state }) => state.diagnose.destructive
    ? { effect: "approval", reason: "Destructive remediation needs approval" }
    : { effect: "allow" },
})

// ── Workflow — the typed path an incident takes ──────────────────────────
export const incident = workflow("incident")
  .input(Alert)
  .step("triage", oncall.duties.triage)
  .delegate("findings", (alert) =>
    ["app","db","network"].map((area) => investigate.duties.scan.with({ alert, area })),
  )                                      // parallel sub-agents — durable & governed
  .step("diagnose", oncall.duties.diagnose, (s) => ({ findings: s.findings }))
  .approval("on-call")                   // only fires when remediationGate demands it
  .step("remediate", oncall.duties.remediate)
  .commit()

// ── Infrastructure — provision the platform into your own cloud (IaC) ─────
const infra = stack({
  target:  "k8s",
  store:   postgres({ instance: "db.r6g.large" }),
  models:  [bedrock("claude-sonnet-4")],
  gateway: gateway({ egress: "deny", redactPII: true }),
  vectors: vectors.pgvector(),
})

// ── Runtime — engine + governance, wired to the infra above ──────────────
const rt = runtime({
  engine:   agentist(),                  // or adk() · langgraph()
  store:    infra.store,
  llm:      infra.models.claude,
  policies: [remediationGate],
  memory:   infra.vectors,
})
rt.register(oncall, investigate, incident)
rt.on("deploy.failed", oncall.duties.triage)   // page Otto on a failed rollout
rt.serve()                               // API · MCP · the Console
//   agentist deploy --to your-cluster      # provisions infra + rolls out the runtime

Invoke it. Start the local runtime and run the workflow from the CLI — it stops for a human before anything destructive:

$ agentist dev # local runtime + console on :3000 $ agentist run incident --json '{"service":"checkout","symptom":"p99 5s"}' triage "checkout p99 5s — likely the latest deploy" delegate app · db · network 3 findings diagnose root cause: v411 regressed the cache path remediate ⏸ paused — destructive, needs approval $ agentist approve inc_9f2a --by kc@kc.io remediate ✓ rolled back v411 ✔ resolved journaled locally · replay in the console

Examples

Analyst

Dana turns a plain-English question into read-only SQL, runs it through the data gateway, and explains the result — but a critic checks the query first, and a policy makes writes impossible by construction. Shows: the data plane · governed SQL · a critic gate · read-only admission control — one file.

question

→

plan

→

critic

→

run

→

explain

typescript

// src/analyst.ts — Dana, the data analyst: ask in English, get a governed answer.
import { agent, duty, tool, workflow, policy, charter, runtime, postgres } from "@agentist/sdk"
import { agentist } from "@agentist/engine"
import { stack, gateway } from "@agentist/infra"
import { bedrock } from "@agentist/infra/aws"
import { z } from "zod"

const acme = charter({
  mission: "Keep systems reliable and customer trust intact.",
  values: ["Show your work", "Cite the query", "Never guess at a number"],
})

// ── Tool — read-only SQL, executed server-side through the data gateway ──
const runSql = tool({
  input:  z.object({ sql: z.string() }),
  output: z.array(z.record(z.any())),
  meta:   { readOnly: true },                  // surfaced to admission control
  run: (a, ctx) => ctx.db.query(a.sql),        // PII redacted at the gateway
})

// ── Agent — turns a question into SQL, then reads the result back ────────
export const dana = agent({
  id: "analyst",
  identity: { name: "Dana", role: "Data analyst", persona: "Precise; cites the query." },
  charter: acme,
  tools: [runSql],
  duties: {
    plan: duty({
      input:  z.object({ question: z.string() }),
      output: z.object({ sql: z.string() }),
      run: (a, ctx) => ctx.llm({ prompt: `Write read-only SQL for: ${a.question}` }),
    }),
    explain: duty({
      input:  z.object({ rows: z.array(z.record(z.any())) }),
      output: z.object({ answer: z.string() }),
      run: (a, ctx) => ctx.llm({ prompt: `Summarize these rows: ${JSON.stringify(a.rows)}` }),
    }),
  },
})

// ── Policy — the analyst is read-only by construction; writes can't happen ─
const readOnly = policy({
  id: "analyst-read-only",
  applies: { tool: "*" },
  check: ({ tool }) => tool.meta.readOnly
    ? { effect: "allow" }
    : { effect: "deny", reason: "Analyst is read-only" },
})

// ── Workflow — plan → critic vets the SQL → run → explain ────────────────
export const ask = workflow("ask")
  .input(z.object({ question: z.string() }))
  .step("plan", dana.duties.plan)
  .critic("sql-safe", (s) => `Is this SQL read-only and correct? ${s.plan.sql}`)
  .step("run", runSql, (s) => ({ sql: s.plan.sql }))
  .step("explain", dana.duties.explain, (s) => ({ rows: s.run }))
  .commit()

const infra = stack({
  target:  "k8s",
  store:   postgres({ instance: "db.r6g.large" }),
  models:  [bedrock("claude-sonnet-4")],
  gateway: gateway({ egress: "deny", redactPII: true }),
})

const rt = runtime({
  engine:   agentist(),
  store:    infra.store,
  llm:      infra.models.claude,
  policies: [readOnly],
})
rt.register(dana, ask)
rt.serve()                               // ask it in the Console, or over MCP

Invoke it. Run it locally — the question is one field, so it's positional on the CLI:

$ agentist dev $ agentist run ask "error rate on checkout last week" plan SELECT day, errors::float/total AS rate FROM events ... critic ✓ read-only · touches no PII columns · approved run 7 rows explain "Averaged 0.4%, peaking 1.9% on Tuesday's deploy." ✔ done the answer cites the exact query it ran

Examples

Review

Rex reviews a pull request — but first it clones the branch and runs the test suite inside a throwaway microVM sandbox, so untrusted code never touches your runtime. A GitHub webhook starts the run; the review posts back through an MCP tool. Shows: sandbox isolation · webhook triggers · MCP tools — one file.

→

tests

→

review

→

comment

typescript

// src/review.ts — Rex, the code reviewer: tests run in a sandbox before review.
import { agent, duty, tool, workflow, charter, runtime, postgres } from "@agentist/sdk"
import { agentist } from "@agentist/engine"
import { stack, gateway, sandbox } from "@agentist/infra"
import { bedrock } from "@agentist/infra/aws"
import { z } from "zod"

const acme = charter({
  mission: "Keep systems reliable and customer trust intact.",
  values: ["Evidence over opinion", "Be specific", "Be kind"],
})

const PR = z.object({ repo: z.string(), number: z.number(), sha: z.string() })

// ── Tool — run the PR's tests in a fresh microVM; nothing lingers ────────
const runTests = tool({
  input:  PR,
  output: z.object({ pr: PR, passed: z.boolean(), log: z.string() }),
  run: async (a, ctx) => {
    const box = await sandbox.create({ image: "node:20", isolation: "microvm" })
    await box.exec(`git clone ${a.repo} app && cd app && git checkout ${a.sha}`)
    const { exitCode, stdout } = await box.exec("cd app && npm ci && npm test")
    await box.terminate()                      // ephemeral — auto-cleaned regardless
    return { pr: a, passed: exitCode === 0, log: stdout }
  },
})

// ── Agent — reviews the diff, and posts the verdict back to GitHub ───────
export const rex = agent({
  id: "reviewer",
  identity: { name: "Rex", role: "Code reviewer", persona: "Terse, specific, kind." },
  charter: acme,
  tools: [runTests],
  duties: {
    review: duty({
      input:  z.object({ pr: PR, tests: z.object({ passed: z.boolean(), log: z.string() }) }),
      output: z.object({ verdict: z.enum(["approve", "request-changes"]), notes: z.string() }),
      run: async (a, ctx) => {
        const r = await ctx.llm({ prompt: `Review PR #${a.pr.number}; tests: ${a.tests.log}` })
        await ctx.tool("github.comment", { ...a.pr, body: r.notes })   // post via MCP tool
        return r
      },
    }),
  },
})

// ── Workflow — test in isolation, then review ────────────────────────────
export const reviewPR = workflow("review-pr")
  .input(PR)
  .step("tests", runTests)
  .step("review", rex.duties.review, (s) => ({ pr: s.tests.pr, tests: s.tests }))
  .commit()

const infra = stack({
  target:  "k8s",
  store:   postgres({ instance: "db.r6g.large" }),
  models:  [bedrock("claude-sonnet-4")],
  gateway: gateway({ egress: "deny" }),        // the sandbox reaches the net only through here
})

const rt = runtime({ engine: agentist(), store: infra.store, llm: infra.models.claude })
rt.mcp({ consume: ["github"] })                // GitHub's MCP server → agent tools
rt.register(rex, reviewPR)
rt.on("github.pull_request.opened", reviewPR)  // webhook → typed, governed run
rt.serve()

Invoke it. Test it locally on a PR — the tests run first, inside a sandbox:

$ agentist dev $ agentist run review-pr --json '{"repo":"acme/api","number":1421,"sha":"4f2a9c"}' tests microVM sandbox · npm test ✓ 42/42 passed review request-changes · "missing null-check · handler.ts:88" ✔ done no untrusted code touched the runtime · verdict in the console

Examples

Knowledge

Quinn answers questions from your docs with citations — and refuses to answer beyond its sources. It indexes your docs into pgvector, retrieves with hybrid search and reranking, and is always-on: talk to it in the Console, Slack, or over MCP. Shows: retrieval & vectors · grounded answers · a conversational, addressable agent — one file.

index

→

ask

→

retrieve

→

answer

typescript

// src/docs.ts — Quinn, the knowledge assistant: grounded answers, with citations.
import { agent, duty, charter, runtime, postgres } from "@agentist/sdk"
import { agentist } from "@agentist/engine"
import { stack, gateway, vectors, retrieval } from "@agentist/infra"
import { bedrock } from "@agentist/infra/aws"
import { z } from "zod"

const acme = charter({
  mission: "Keep systems reliable and customer trust intact.",
  values: ["Cite your sources", "Say 'I don't know'", "Never invent an answer"],
})

// ── Retrieval — chunk, embed & index your docs into pgvector ────────────
const kb = retrieval({
  chunk:  { strategy: "recursive", size: 800, overlap: 100 },
  embed:  "text-embedding-3-large",
  store:  vectors.pgvector(),
  rerank: "rerank-v3",                         // precision pass before the model
})

// ── Agent — grounded Q&A; answers only from what it retrieved ───────────
export const quinn = agent({
  id: "docs",
  identity: { name: "Quinn", role: "Knowledge assistant", persona: "Helpful, grounded, honest." },
  charter: acme,
  duties: {
    answer: duty({
      input:  z.object({ question: z.string() }),
      output: z.object({ answer: z.string(), sources: z.array(z.string()) }),
      run: async (a, ctx) => {
        const hits = await kb.query(a.question, { topK: 8 })   // hybrid search + rerank
        return ctx.llm({ prompt: a.question, ground: hits })   // answer only from sources
      },
    }),
  },
})

const infra = stack({
  target:  "k8s",
  store:   postgres({ instance: "db.r6g.large" }),
  models:  [bedrock("claude-sonnet-4")],
  gateway: gateway({ egress: "deny" }),
  vectors: vectors.pgvector(),
})

const rt = runtime({
  engine: agentist(),
  store:  infra.store,
  llm:    infra.models.claude,
  memory: infra.vectors,
})
rt.register(quinn)
await kb.add("./docs/**/*.md")           // ingest once; re-runs are incremental

const q = rt.agent("docs")               // every agent is addressable & always-on
await q.ask("how do I rotate the gateway's signing key?")
rt.serve()                               // Console chat · Slack · MCP

Invoke it. Index your docs, then ask from the CLI — answers cite their sources:

$ agentist dev $ agentist run docs.answer "how do I rotate the gateway's signing key?" retrieve hybrid search + rerank 3 chunks answer "Run `agentist secrets rotate gateway-key` — reloads with no downtime." sources ops/gateway.md#rotation · runbooks/secrets.md ✔ grounded says "I don't know" when the docs don't cover it

Examples

Audit

Cass runs every night: it pulls your service catalog, audits all of them in parallel with delegate, and a critic throws out the noisy flags before it posts a waste report. Shows: cron triggers · parallel delegation · a critic for quality · a cheap model for batch — one file.

cron

→

catalog

→

delegate

→

critic

→

report

typescript

// src/audit.ts — Cass, the cost auditor: every night, flag over-provisioned services.
import { agent, duty, tool, workflow, charter, runtime, postgres } from "@agentist/sdk"
import { agentist } from "@agentist/engine"
import { stack, gateway } from "@agentist/infra"
import { bedrock } from "@agentist/infra/aws"
import { z } from "zod"

const acme = charter({
  mission: "Keep systems reliable and customer trust intact.",
  values: ["Bias to safety", "Show the numbers", "No false alarms"],
})

const Service = z.object({ name: z.string(), monthlySpend: z.number() })

// ── Tools — pull the catalog & post the report (external, via connectors) ─
const listServices = tool({
  input:  z.object({}),
  output: z.object({ services: z.array(Service) }),
  run: (a, ctx) => ctx.tool("billing.services", {}),
})
const postReport = tool({
  input: z.object({ findings: z.array(z.any()) }),
  run: (a, ctx) => ctx.tool("slack.post", { channel: "#finops", ...a }),
})

// ── Agent — judges one service against its utilization ───────────────────
export const cass = agent({
  id: "auditor",
  identity: { name: "Cass", role: "Cost auditor", persona: "Skeptical, exact." },
  charter: acme,
  duties: {
    audit: duty({
      input:  Service,
      output: z.object({ service: z.string(), wasteUsd: z.number(), reason: z.string() }),
      run: async (a, ctx) => {
        const util = await ctx.tool("prometheus.query", { service: a.name })
        return ctx.llm({ prompt: `Is ${a.name} over-provisioned? util=${JSON.stringify(util)}` })
      },
    }),
  },
})

// ── Workflow — catalog → audit all in parallel → critic → report ─────────
export const nightly = workflow("nightly-audit")
  .input(z.object({}))
  .step("catalog", listServices)
  .delegate("findings", (s) =>                 // every service, in parallel — durable & governed
    s.catalog.services.map((svc) => cass.duties.audit.with(svc)))
  .critic("no-false-alarms", (s) =>            // drop flags the evidence doesn't support
    `Are these waste flags justified? ${JSON.stringify(s.findings)}`)
  .step("report", postReport, (s) => ({ findings: s.findings }))
  .commit()

const infra = stack({
  target:  "k8s",
  store:   postgres({ instance: "db.r6g.large" }),
  models:  [bedrock("claude-haiku-4")],        // cheap model — it's a big nightly batch
  gateway: gateway({ egress: "deny" }),
})

const rt = runtime({ engine: agentist(), store: infra.store, llm: infra.models.claude })
rt.register(cass, nightly)
rt.cron("0 6 * * *", nightly)            // 6am daily — timezone-aware, with a missed-run policy
rt.serve()

Invoke it. Run it on demand from the CLI to test it (in production a cron fires it nightly):

$ agentist dev $ agentist run nightly-audit catalog 12 services delegate audited 12 in parallel claude-haiku-4 critic dropped 2 weak flags 4 confirmed ✔ done every step admission-checked & journaled · results in the console

Reference

CLI

One binary for the whole lifecycle — author, run, deploy, operate.

bash

agentist init <name>                    # scaffold a project
agentist dev                           # local runtime + console :3000
agentist run <agent.duty> "<prompt>"   # positional prompt (default); --json for typed input
agentist deploy --target k8s|docker    # deploy runtime + manifest
agentist cloud connect                 # plug in Agentist Cloud
agentist logs <run>                     # tail a run
agentist rollback <agent> <version>     # roll back a version
agentist secrets set <key>             # manage secrets

Reference · Comparison

How Agentist compares

Today you'd stitch together a handful of tools — Mastra or Google's ADK for typed agents, Trinity or Kagent to run them in your own cloud, Modal for GPU infrastructure. Agentist is the only one that brings them together — everything a platform engineer needs in a single framework.

Marks: ✅ native · ◑ partial · ✕ none · — not applicable.

Typed & code-owned

Capability	Agentist	Mastra	Trinity	Modal	Kagent	ADK
Zod-typed boundaries everywhere	✅	✅	✕	—	✕	✅
Code-owned, not UI or YAML	✅	✅	◑	—	◑	✅
Typed, deterministic workflows	✅	✅	✕	—	✕	✅
A small primitive set	✅	◑	◑	—	◑	◑
Pluggable models, any provider	✅	✅	◑	◑	✅	✅
Critics that don't self-grade	✅	◑	✕	—	✕	◑

Durable, governed & sovereign

Capability	Agentist	Mastra	Trinity	Modal	Kagent	ADK
Durable, event-sourced execution	✅	◑	✅	◑	✕	◑
Runs in your cloud (sovereign)	✅	✕	✅	✕	✅	◑
Per-agent isolation tiers	✅	✕	◑	✅	◑	◑
Append-only, tamper-evident audit	✅	✕	✅	◑	✕	◑
Human approvals on one trail	✅	◑	✅	✕	◑	◑
Workers · leases · retries	✅	✕	✅	✅	✕	✕
Always-on agents you talk to	✅	✕	✅	—	◑	◑

Infrastructure & deployment

Capability	Agentist	Mastra	Trinity	Modal	Kagent	ADK
Infra-as-code (compute & images)	✅	◑	◑	✅	◑	✕
Model endpoints on GPUs	✅	✕	✕	✅	✕	✕
Granular compute & autoscaling	✅	◑	✕	✅	◑	◑
Sandboxes for untrusted code	✅	✕	◑	✅	◑	✅
Scale-to-zero compute	✅	◑	✕	✅	◑	◑
Provisions its own components	✅	✕	◑	◑	✕	✕
Deploys into your cloud	✅	✕	✅	✕	✅	◑

Only Agentist

Capability	Agentist	Mastra	Trinity	Modal	Kagent	ADK
Admission control, per step	✅	✕	✕	—	✕	✕
Typed and durable together	✅	✕	✕	—	✕	✕
Durable, no separate cluster	✅	✕	✕	✕	✕	✕
One log = audit = replay = SOC 2	✅	✕	◑	✕	✕	✕
Conversation = approval = audit	✅	✕	✕	—	✕	✕
One typed invocation contract	✅	✕	✕	—	✕	✕
Charter inherited by every agent	✅	✕	◑	—	✕	✕

FAQ

Questions, answered

The things engineers ask first — answered plainly. For anything else, reach out.

When can I try it?

We're still building it — Agentist isn't generally available yet. Send us an email and we'll let you know the moment you can get in.

How is this different from LangGraph, CrewAI, or a prompt framework?

Those give you the reasoning loop. Agentist is the harness around it: admission control before every step, a durable event-sourced runtime, governed gateways, isolation, and an operator console — and it runs your existing engine. See the comparison.

Can my AI coding assistant build Agentist agents, or just humans?

Both — by design. Agents are typed code, and the docs are written to be read by machines as much as people, so you can point Claude, Cursor, or any coding agent at them and have it author duties, tools, policies, and the infrastructure. Because everything is typed and the runtime admission-checks every step, what the assistant writes is verified and governed — not taken on faith.

Do I have to rewrite my agents to adopt Agentist?

No. The engine is pluggable — bring an existing agent on ADK or LangGraph through an adapter and Agentist governs, persists, and operates it unchanged, or author natively with the SDK. You govern at the boundary, not by rewriting the loop.

What language do I write agents in?

TypeScript, typed end to end — agents, duties, tools, policies, workflows, and even the infrastructure are one typed source. The SDK compiles it to a manifest the engine runs and the runtime governs.

Which models can I use — and can I run my own?

Any. Every model call crosses the gateway, so you can run Bedrock, Anthropic, or OpenAI, or your own open models on GPUs (vLLM) — and switch providers without touching agent code.

Does my data ever leave my cloud?

Not unless you send it. The whole platform is composed as TypeScript infrastructure-as-code and deploys into your own cloud — models, data, and the journal all stay with you. Agentist Cloud is an optional managed control plane, never a required hop.

What does "admission control" actually do?

A policy runs before every step an agent proposes — a tool call, a model call, a data read. It can allow it, deny it, or pause for human approval, and every decision is journaled. The agent proposes; the runtime commits.

What happens if an agent crashes mid-run?

Nothing is lost. Runs are event-sourced and journaled, so a workflow resumes exactly where it stopped — including across an approval that waits for days. That same journal is your audit trail and replay log.

Can I use my existing tools and MCP servers?

Yes. Tools are typed functions with validated I/O. External MCP servers are consumed as agent tools, and every agent is exposed as an MCP tool in return — so Claude, Cursor, or another agent can call it.

How does local development work?

No cloud account needed: agentist dev runs the full runtime and console on your machine, and agentist run <agent.duty> "<prompt>" invokes a duty straight from the CLI. See the examples.

Is it open-source, and what does it cost?

The SDK and runtime are open-source and self-hostable in your own cloud. Agentist Cloud — the managed control plane, connectors, and burst capacity — is the optional paid layer.

Build, run & govern agents in production.

Everything an agent needs to run

You're building a harness — do it right

Quickstart

1 · Create a project

2 · Run it locally

One typed contract, every caller

The authoring layer

Agents have an identity — and a culture

Tooling

Workflows

Governance

Approvals

Agents you can talk to

The reasoning engine — pluggable

Reasoning steps

Delegate to sub-agents

Tool use, governed

Context engineering

Bring a proven engine

The Agentist engine

MCP & A2A

The harness — it governs, it doesn't think

Every run is a durable state machine

Configure once — it applies to every agent

Local Development

Durability & recovery

Admission control

Isolation & security

Scaling & concurrency

Observability & tracing

Durable state

Triggers & scheduling

The gateway

How it's deployed

With Agentist Cloud

Security & compliance

Invocation — one input, every caller

The platform, composed in code

The stack

Compute & autoscaling

Images & environments

Models & GPUs

Sandboxes for untrusted code

Jobs & batch

Provider abstractions

Deploy anywhere

Your AI Cloud, composed in the cloud

Optional: Agentist Cloud

Self-host vs. Agentist Cloud

Deploy on AWS

Deploy on Google Cloud

Deploy on Azure

Deploy on DigitalOcean

Deploy on bare metal

Multi-cloud & burst

Operate · observe · coach

Operate

Runs — list, then drill into the trace

Approvals

Chat

Observe

Signals

Fleet

Cost

Govern

Policies

Audit

Access

Manage

Registry

Identity

Coach

Pluggable connectors

Secure gateways to your data lakes

The event store

Memory

Retrieval & RAG

Vectors

Object storage