The agentic chat loop, end to end

Follow one chat request from the SSE endpoint through the reactive turn loop to tool dispatch — the system's beating heart, doc claims bound to the code.

2 · The agentic chat loop

The heart of the system is agentic_chat_async.py (~2,188 LOC). A single POST to /api/v1/agentic/chat (fastapi_app.py:128) opens a Server-Sent Events stream and drives a reactive turn loop until the model stops calling tools, a turn/budget limit is hit, or an approval gate fires.

Concurrency. Before the loop starts, an AsyncSessionGuard takes a file-based O_EXCL lock keyed by session_id (session_guard.py); a second concurrent request for the same session gets a 409 (fastapi_app.py:166).

The turn loop (per iteration, capped by max_turns, default 50, and max_budget_usd):

Context check — estimate tokens with tiktoken o200k_base; if usage ≥ 85 % of the model's context window, auto-compact via an LLM summarizer (compactor.py, thresholds in context_manager.py).
Stream the LLM call — content, reasoning (thinking), and tool-call deltas are accumulated; usage and cost are tallied per turn.
Dispatch tool calls — each call runs through AsyncToolExecutor (tool_executor.py), which awaits async handlers and offloads sync handlers to a thread pool, emits metrics, and returns a ToolResult.
Approval gates — enter_plan_mode / exit_plan_mode do not execute; they end the stream with an awaiting_approval marker, and the next request carries the user's decision back in.
Exit — when the model returns text with no tool calls, the loop streams the final answer and ends.

Tool-call rescue. If a model emits no native tool_calls but writes pseudo-XML (<tool>…</tool>) in its content, the loop parses it, validates the names against the allowlist, and synthesizes proper tool calls (tool_call_rescue.py).

flowchart TD
  start(["POST /api/v1/agentic/chat"]) --> lock{"acquire<br/>session lock?"}
  lock -->|no| err409["409 — session busy"]
  lock -->|yes| build["build messages:<br/>system prompt + memory layers<br/>+ thread history + user msg"]
  build --> turn{"turn < max_turns<br/>& cost < budget?"}
  turn -->|no| done["emit 'end' + release lock"]
  turn -->|yes| ctx{"context ≥ 85%?"}
  ctx -->|yes| compact["auto-compact (LLM summary)"] --> llm
  ctx -->|no| llm["stream chat.completions"]
  llm --> rescue{"native tool_calls?"}
  rescue -->|no, XML found| parse["tool_call_rescue → synth calls"] --> hastools
  rescue -->|yes| hastools{"tool calls present?"}
  parse --> hastools
  hastools -->|no| final["stream final text"] --> done
  hastools -->|plan/exit_plan| gate["emit awaiting_approval<br/>end stream"]
  hastools -->|yes| exec["AsyncToolExecutor.execute_async<br/>(async await / sync→thread)"]
  exec --> append["append tool results"] --> turn

Planner mode (enter_plan_mode/exit_plan_mode, plus a plan-and-execute path) is off by default — the tools are only registered when ANYLEGAL_PLANNER_MODE=enabled (workspace_tools.py); the endpoint also rejects planner_mode: true defensively when the flag is off (fastapi_app.py:156).

SSE event types emitted: start, text_chunk, thinking, tool_call, tool_result, system_message, document_created, error, end (agentic_chat_async.py, re-wrapped as event:/data: frames in fastapi_app.py:209).