Workflows are the orchestration unit in AgentOS. A defworkflow is a deterministic state machine
that consumes canonical events, updates state, emits domain events, and requests declared effects
through the kernel.
This document describes the active workflow runtime contract: what a workflow owns, how it is invoked, how effects and receipts move through the system, and which invariants make replay, audit, and governance reliable.
Workflow orchestration is code-defined and event-driven:
defworkflowis the orchestration/state-machine unit.defmodulesupplies the runtime/artifact used by the workflow implementation.- Manifest startup and domain ingress wiring use
routing.subscriptions.
In practice, a workflow owns the end-to-end progression of a business process:
- it receives a domain event or receipt continuation
- it decides the next state transition
- it emits follow-up domain events for other workflows or observers
- it requests side effects when external work is required
- it resumes when receipts arrive for previously emitted intents
Workflow instances may be unkeyed or keyed. Keyed workflows partition state by instance key and use cells, described below.
Workflows own:
- domain state
- business invariants
- transition logic
- retry and compensation decisions
Kernel + execution runtime own:
- deterministic stepping
- declared-effect admission
- effect emission and open-work tracking
- continuation admission and receipt ingestion
Executors/adapters own:
- side-effect execution
- non-authoritative progress reporting
- signed receipt production
This split keeps orchestration logic in workflow code while preserving a small deterministic runtime. The workflow decides what should happen, the kernel decides whether and when it is admitted, and executors perform external work and return auditable continuations.
The owner/executor seam for open external work is defined in spec/05-effects.md.
- Only workflows may originate workflow-emitted effects.
- Workflows must declare
workflow.effects_emitted. - Emitted effects must name effects, not semantic effect strings.
- Kernel rejects undeclared effects before enqueue.
- Multiple effects per step are allowed; deterministic kernel output limits apply.
- Event payloads are schema-validated and canonicalized on ingress.
- Effect params are schema-validated and canonicalized before intent hashing/enqueue.
- Receipt payloads are schema-validated/canonicalized before continuation delivery.
- Journal + snapshot persist canonical CBOR forms used for replay.
- Runtime decode fallbacks for non-canonical event/receipt payload shapes are not part of the active contract.
Receipt continuation routing is keyed by recorded origin identity:
- origin workflow
- origin workflow hash when available
- origin instance key
- intent hash identity
Intent identity binds origin instance identity to avoid ambiguous concurrent wakeups. Continuation
routing is manifest-independent. routing.subscriptions is for domain-event ingress only.
Settled effects produce a generic workflow receipt envelope (sys/EffectReceiptEnvelope@1) with at
least:
- origin workflow identity
- origin instance key when keyed
- intent identity
- effect identity
- executor module/entrypoint identity when resolved
- optional issuer reference echoed from the emitted effect
- receipt payload bytes
- receipt status
- emitted sequence metadata
If receipt payload decoding/normalization fails:
- The failing intent is settled and removed from pending.
- If the workflow event schema supports
sys/EffectReceiptRejected@1, the kernel emits it. - If not supported, the kernel marks the workflow instance failed and drops remaining pending receipts for that instance.
Kernel persists workflow instance runtime state, conceptually including:
- state bytes
- inflight intent set/map
- lifecycle status:
running | waiting | completed | failed - last processed sequence marker
- workflow/module version metadata for diagnostics
Replay must restore this state deterministically.
Manifest apply is blocked when any of the following hold:
- non-terminal workflow instances exist
- any workflow has inflight intents
- effect queue/scheduler still has pending work
No implicit abandonment or clearing of in-flight workflow state occurs during apply.
Shadow/governance reporting is bounded to the observed execution horizon:
- observed effects so far
- pending workflow receipts/intents
- workflow instance statuses
- workflow effect allowlists
- relevant state deltas
Shadow does not promise complete static future-effect prediction for unexecuted branches.
- Domain event is appended and canonicalized.
- Router evaluates
routing.subscriptionsand delivers to matching workflows. - Workflow entrypoint runs deterministically with current state + event.
- Workflow returns new state, domain events, and effect intents.
- Kernel enforces
workflow.effects_emitted, validates effect params, then records open work. - The unified node publishes opened async effects only after durable frame flush.
- Executors emit stream frames and terminal receipts.
- Kernel canonicalizes admitted continuations and routes them to the recorded origin instance.
Workflows declare:
workflow.state: state schemaworkflow.event: event schemaworkflow.context: optional context schemaworkflow.annotations: optional annotation schemaworkflow.key_schema: optional key schema for cellsworkflow.effects_emitted: required list of effect namesimpl.moduleandimpl.entrypoint: runtime implementation target
sys/WorkflowContext@1 includes deterministic time/entropy, journal metadata, manifest hash,
workflow identity, optional workflow hash, optional key, and cell_mode.
routing.subscriptions maps event schema to workflow:
- required fields are
eventandop key_fieldis used for keyed workflow delivery- deterministic evaluation order is manifest order
- matching subscriptions fan out in order
A subscription is deliverable when its event schema exactly equals the target workflow's
workflow.event, or when the workflow event schema is a variant whose arm references the
subscription event schema. In the variant-arm case, runtime delivery wraps the incoming event as that
variant arm before invoking the workflow.
Continuation delivery from receipts does not use this routing table.
Cells are the keyed-instance model for workflows. They let one workflow manage many independent
instances of the same state machine, where each instance is identified by a key such as order_id,
ticket_id, or note_id.
Use cells when:
- each entity should have isolated state and pending work
- events should route directly to the correct instance
- receipts should resume only the instance that emitted the originating effect
- scheduling should remain deterministic across many active instances
- Workflow (keyed): one workflow whose state is partitioned by key.
- Cell: an instance of a keyed workflow identified by key bytes.
- Workflow work unit: scheduler unit for ready cells and queued workflow work.
Keyed workflows use the same canonical CBOR envelopes as unkeyed workflows:
- Input:
{ version:1, state: bytes|null, event:{schema:Name, value:bytes, key?:bytes}, ctx?:bytes } - Output:
{ state:bytes|null, domain_events?:[...], effects?:[...], ann?:bytes }
When a workflow declares sys/WorkflowContext@1, ctx carries key and cell_mode. In cell mode,
the workflow receives only that cell's state and key is required. Returning state = null in cell
mode deletes the cell.
Effect authority is structural:
- only workflows may emit effects
- emitted effects must be declared in
workflow.effects_emitted - the effect must be present in the active manifest
workflow.key_schema documents the key type for a keyed workflow.
manifest.routing.subscriptions[].key_field marks routed events whose value field contains the key
to target a cell:
{
"event": "com.acme/OrderEvent@1",
"workflow": "com.acme/order.step@1",
"key_field": "order_id"
}For variant event schemas, key_field typically points into the wrapped value, for example
$value.note_id.
On domain ingress, the kernel extracts the key from the event value, validates it against
workflow.key_schema, and targets (workflow, key). If the cell is missing, the kernel invokes
the workflow with state = null so the workflow can create the instance.
Each cell has its own mailbox for domain events and receipt events. Delivery appends to the journal and marks the cell ready. The scheduler uses deterministic fair round-robin across ready cells and other queued workflow work.
Receipt continuation routing is manifest-independent and keyed by recorded origin identity:
- origin workflow
- origin instance key
- intent hash identity
For keyed workflows, origin_instance_key maps directly to the target cell. This prevents receipt
cross-delivery between concurrent instances of the same workflow.
CAS stays immutable as logical hash -> bytes. Physical backends may pack many logical blobs into
one immutable backing object, but that does not change the logical CAS contract.
Per keyed workflow, the kernel maintains a content-addressed CellIndex:
key_hash -> { key_bytes, state_hash, size, last_active_ns }
The live head view is layered:
- base layer: snapshot-anchored
CellIndexroot - hot cache: recently used clean cells
- delta layer: dirty overrides (
upsertordelete)
Reads use delta -> hot cache -> CellIndex/CAS. Writes stage cell updates in the delta layer until
snapshot materialization.
The hot cache is bounded by entry count and defaults to 4096 cells per workflow
(AOS_CELL_CACHE_SIZE / kernel config).
Dirty delta entries may keep state bytes resident in memory, but large or old resident entries spill to CAS while remaining logically dirty. Spilling changes only storage residency; logical head state stays in the delta layer.
Snapshot creation requires runtime quiescence, then materializes pending cell deltas into each
workflow's CellIndex.
During materialization:
- resident dirty states are written to CAS if needed
CellIndexentries are upserted/deleted- new per-workflow root hashes are produced
- flushed clean entries are promoted back into hot cache
- dirty delta layers are cleared
Snapshots persist the resulting cell_index_root values. In-memory caches are derived runtime
state. Replay restores roots and repopulates caches lazily.
GC walks from snapshot-pinned roots. No side-channel CAS refs act as roots.
Journal entries for cell-scoped delivery include workflow identity plus key correlation for domain and receipt records. CLI/inspect supports listing cells, showing cell state, tailing events, and tracing per-cell timelines. Trace/diagnose correlates receipt continuations via intent identity.
Best when business transitions, retries, and compensations are tightly coupled.
Best when contexts or teams are split; workflows communicate through domain events.
Best for deadlines, backoff, and long-running lifecycle checkpoints.
enum Pc { Idle, AwaitingCharge, Done, Failed }
match (state.pc, event) {
(Pc::Idle, Event::OrderCreated { order_id, amount_cents }) => {
state.order_id = order_id;
state.pc = Pc::AwaitingCharge;
effects.push(emit("payment/charge@1", params, Some("payments")));
}
(Pc::AwaitingCharge, Event::EffectReceiptEnvelope { status, .. }) => {
state.pc = if status == "ok" { Pc::Done } else { Pc::Failed };
}
_ => {}
}{
"routing": {
"subscriptions": [
{
"event": "com.acme/OrderEvent@1",
"workflow": "com.acme/order.step@1",
"key_field": "order_id"
}
]
}
}- Include stable correlation fields in events and effect params.
- Use explicit idempotency keys for externally visible effects.
- Treat all continuation payloads as schema-bound inputs.
- Keep terminal states and duplicate fences in workflow state.
- Model retries with explicit attempt/backoff state.
- Transition tests:
(state,event) -> (state,events,effects). - Receipt progression tests for
ok/error/timeout/faultpaths. - Replay-or-die snapshot equivalence tests.
- Concurrency tests: no cross-delivery between keyed instances.
- Apply-safety tests: strict-quiescence block/unblock behavior.