The agentic chat loop, end to end
Follow one chat request from the SSE endpoint through the reactive turn loop to tool dispatch — the system's beating heart, doc claims bound to the code.
2 · The agentic chat loop
The heart of the system is
agentic_chat_async.py
(~2,188 LOC). A single POST to /api/v1/agentic/chat
(fastapi_app.py:128) opens a
Server-Sent Events stream and drives a reactive turn loop until the model
stops calling tools, a turn/budget limit is hit, or an approval gate fires.
Concurrency. Before the loop starts, an AsyncSessionGuard takes a
file-based O_EXCL lock keyed by session_id
(session_guard.py); a second
concurrent request for the same session gets a 409
(fastapi_app.py:166).
The turn loop (per iteration, capped by max_turns, default 50, and
max_budget_usd):
- Context check — estimate tokens with tiktoken
o200k_base; if usage ≥ 85 % of the model's context window, auto-compact via an LLM summarizer (compactor.py, thresholds incontext_manager.py). - Stream the LLM call — content, reasoning (
thinking), and tool-call deltas are accumulated; usage and cost are tallied per turn. - Dispatch tool calls — each call runs through
AsyncToolExecutor(tool_executor.py), which awaits async handlers and offloads sync handlers to a thread pool, emits metrics, and returns aToolResult. - Approval gates —
enter_plan_mode/exit_plan_modedo not execute; they end the stream with anawaiting_approvalmarker, and the next request carries the user's decision back in. - Exit — when the model returns text with no tool calls, the loop streams the final answer and ends.
Tool-call rescue. If a model emits no native tool_calls but writes
pseudo-XML (<tool>…</tool>) in its content, the loop parses it, validates the
names against the allowlist, and synthesizes proper tool calls
(tool_call_rescue.py).
flowchart TD
start(["POST /api/v1/agentic/chat"]) --> lock{"acquire<br/>session lock?"}
lock -->|no| err409["409 — session busy"]
lock -->|yes| build["build messages:<br/>system prompt + memory layers<br/>+ thread history + user msg"]
build --> turn{"turn < max_turns<br/>& cost < budget?"}
turn -->|no| done["emit 'end' + release lock"]
turn -->|yes| ctx{"context ≥ 85%?"}
ctx -->|yes| compact["auto-compact (LLM summary)"] --> llm
ctx -->|no| llm["stream chat.completions"]
llm --> rescue{"native tool_calls?"}
rescue -->|no, XML found| parse["tool_call_rescue → synth calls"] --> hastools
rescue -->|yes| hastools{"tool calls present?"}
parse --> hastools
hastools -->|no| final["stream final text"] --> done
hastools -->|plan/exit_plan| gate["emit awaiting_approval<br/>end stream"]
hastools -->|yes| exec["AsyncToolExecutor.execute_async<br/>(async await / sync→thread)"]
exec --> append["append tool results"] --> turn
Planner mode (enter_plan_mode/exit_plan_mode, plus a plan-and-execute
path) is off by default — the tools are only registered when
ANYLEGAL_PLANNER_MODE=enabled
(workspace_tools.py);
the endpoint also rejects planner_mode: true defensively when the flag is off
(fastapi_app.py:156).
SSE event types emitted: start, text_chunk, thinking, tool_call,
tool_result, system_message, document_created, error, end
(agentic_chat_async.py,
re-wrapped as event:/data: frames in
fastapi_app.py:209).