Skip to Content

Observability

Production agents need observability. Aether Forge exposes deep health checks, Prometheus metrics, structured JSON logs, and replay-based debugging.

Structured Runtime Events

The runner and runtime can emit typed observability events through an EventSink. The built-in sinks are stdlib-only:

SinkUse case
ListEventSinkTests and local inspection
LoggingEventSinkRoute events through Python logging
JsonlEventSinkAppend events directly to a JSONL file
CompositeEventSinkFan out to multiple sinks
from aether_forge import Forge, ListEventSink sink = ListEventSink() project = Forge.open("./my-agent") project.run( environment="sandbox", max_ticks=1, event_sink=sink, persist_memory=False, persist_replays=False, ) for event in sink.events: print(event.kind, event.severity, event.details)

Every event serializes to this shape:

{ "eventId": "evt_...", "kind": "runner.tick.completed", "recordedAt": "2026-05-19T...", "severity": "info", "artifactSetId": "aset_...", "environment": "sandbox", "sessionId": "session_...", "tick": 1, "stepId": "step_3", "capabilityId": "cap-market-btc-price", "message": "Capability action executed.", "details": {"success": true, "outputType": "dict", "outputKeys": ["price_usd"]} }

Core event kinds:

KindEmitted when
runner.tick.started / runner.tick.completed / runner.tick.failedA runner tick starts or finishes
runtime.session.started / runtime.session.completed / runtime.session.failed / runtime.session.held / runtime.session.pausedA runtime session changes terminal state
planner.fallbackThe prompt-driven planner records last_planner_parse_failure and falls back
policy.deniedA capability proposal is denied or held by policy
action.executed / action.failedA capability execution succeeds or fails
memory.read / memory.write / memory.promoteA native memory operation completes
memory.write_failedThe runner fails to persist its tick summary
security.prompt_injection_detectedCapability output is sanitized before entering prompt context

Health Server

Run with --health-port 8080 to expose:

EndpointPurpose
GET /healthLiveness — process is responsive
GET /readyReadiness — agent is in a healthy state to do work
GET /statusCurrent agent state (status, ticks, errors)
GET /ticksLast 20 tick summaries
GET /metricsPrometheus text format

Liveness vs Readiness

  • /health always returns 200 if the process is alive — for K8s liveness probes
  • /ready returns 503 when:
    • Kill switch is active (halt file present)
    • Last 5 consecutive ticks have failed
    • Agent is warming up (no tick completed yet)
curl localhost:8080/ready # {"ready": false, "reason": "last 5 ticks failed (last status: timeout)"}

Prometheus Metrics

/metrics returns Prometheus text exposition format:

# HELP aether_ticks_total Total ticks completed # TYPE aether_ticks_total counter aether_ticks_total{agent="aset_eth-swing_abc",env="paper"} 47 # HELP aether_ticks_failed_total Tick failures # TYPE aether_ticks_failed_total counter aether_ticks_failed_total{agent="aset_eth-swing_abc",env="paper"} 2 # HELP aether_steps_per_tick_avg Average steps per recent tick # TYPE aether_steps_per_tick_avg gauge aether_steps_per_tick_avg{agent="aset_eth-swing_abc",env="paper"} 4.32 # HELP aether_agent_running 1 if agent is running, 0 otherwise # TYPE aether_agent_running gauge aether_agent_running{agent="aset_eth-swing_abc",env="paper"} 1 # HELP aether_agent_ready 1 if agent is ready to do work # TYPE aether_agent_ready gauge aether_agent_ready{agent="aset_eth-swing_abc",env="paper"} 1 # HELP aether_pending_approvals Steps waiting for human approval # TYPE aether_pending_approvals gauge aether_pending_approvals{agent="aset_eth-swing_abc",env="paper"} 0

Scrape with Prometheus, visualize in Grafana.

Per-Tick Timeout

Each tick is bounded by tick_timeout_seconds (default 120s). If the LLM hangs, the tick is marked timeout and the loop continues.

Circuit Breaker

After circuit_breaker_threshold consecutive failures (default 5), the agent enters a cooldown for circuit_breaker_cooldown_seconds (default 60s) before retrying. Prevents cost runaway when the LLM provider is down.

Replay Debugging

Every tick writes a replay file with the full step ledger:

forge replays ./my-agent # TICK STATUS STEPS TIME # ---------------------------------------------------------- # 1 complete 10 2026-04-15T14:30:01 # 2 complete 20 2026-04-15T14:30:35 # 3 failed 3 2026-04-15T14:31:12 forge replay-show ./my-agent/replays/tick_0003.json # Replay: tick_0003.json # Tick: 3 # Status: failed # # Step Ledger (3 steps): # # [ 1] reason — # desc: Checking ETH price before placing order # result: complete # # [ 2] use-capability cap-market-btc-price # desc: Get current ETH price # result: failed

Add --full to see complete payloads and outputs.

Structured JSON Logs

--json-log /path/to/agent.jsonl writes one JSON object per log record. RotatingFileHandler caps at 50MB with 3 backups.

tail -f agent.jsonl | jq -c 'select(.level=="ERROR") | {ts, msg}'

When JSON logging is enabled, observability events are attached under aetherEvent:

tail -f agent.jsonl | jq -c 'select(.aetherEvent.kind=="policy.denied") | .aetherEvent'

Production Deploy

The generated Dockerfile is multi-stage with a non-root user and uses /ready for the healthcheck:

docker build -t my-agent . docker run -p 8080:8080 \ -e OPENROUTER_API_KEY=$OPENROUTER_API_KEY \ my-agent
Last updated on