The DOCX tracked-change pipeline
How a plain-text edit from the model becomes surgical Word tracked-change markup that preserves formatting — from the skill the model reads to the OOXML engine and LibreOffice finalization.
4 · The DOCX tracked-change pipeline
The most distinctive subsystem treats OOXML as a compilation target: the LLM
edits in plain text, and the backend generates surgical Word tracked-change
markup that preserves the original run formatting. The dispatcher is
document_tools.py
(edit_document); the OOXML engine is
docx_xml_service.py
(~2,455 LOC).
flowchart LR
llm["LLM: edit_document<br/>old_text → new_text"] --> clone["auto-clone original → _v2<br/>(first mutation)"]
clone --> match["match old_text:<br/>exact → quote-norm → case-insens<br/>→ cross-run → cross-paragraph"]
match --> markup["wrap in <w:del>/<w:ins><br/>preserve <w:rPr> · alloc revision id"]
markup --> simplify["merge adjacent same-author<br/>redlines"]
simplify --> light["light validation:<br/>well-formed · delText/ins rules"]
light --> save["repack DOCX · session.save()"]
save --> finalize{"accept/reject<br/>requested?"}
finalize -->|yes| lo["libreoffice-service :8002<br/>.uno:Accept/RejectAllTrackedChanges"]
finalize -->|no| done["redlined DOCX"]
Highlights, all verified in the source:
- Five-pass matching so the model's plain-text target is found even across
run and paragraph boundaries, with
near_textdisambiguation when a match is ambiguous (docx_xml_service.py). - Formatting preservation — each matched run's
<w:rPr>is carried into the<w:del>/<w:ins>; markdown**bold**in the insertion becomes<w:b/>. - A validation cascade — fast per-edit checks
(
validators/docx_validator.py) plus full XSD validation against a vendored ISO/IEC 29500 schema set (39 XSDs undervalidators/schemas) with auto-repair inxsd_docx.py. - Comments —
add_commentwrites the full four-file Word comment apparatus (comments.xml,commentsExtended.xml,commentsIds.xml,commentsExtensible.xml) plus range markers (comment_tools.py, templates incomment_templates/). - Finalization is offloaded to LibreOffice via UNO macros
(
.uno:AcceptAllTrackedChanges,.uno:RejectAllTrackedChanges,.uno:CompareDocuments) rather than hand-rolled lxml (libreoffice-service/main.py).