The Atlas AnyLegal OSS — documentation bound to its code
20 documents

The DOCX tracked-change pipeline

How a plain-text edit from the model becomes surgical Word tracked-change markup that preserves formatting — from the skill the model reads to the OOXML engine and LibreOffice finalization.

4 · The DOCX tracked-change pipeline

The most distinctive subsystem treats OOXML as a compilation target: the LLM edits in plain text, and the backend generates surgical Word tracked-change markup that preserves the original run formatting. The dispatcher is document_tools.py (edit_document); the OOXML engine is docx_xml_service.py (~2,455 LOC).

flowchart LR
  llm["LLM: edit_document<br/>old_text → new_text"] --> clone["auto-clone original → _v2<br/>(first mutation)"]
  clone --> match["match old_text:<br/>exact → quote-norm → case-insens<br/>→ cross-run → cross-paragraph"]
  match --> markup["wrap in &lt;w:del&gt;/&lt;w:ins&gt;<br/>preserve &lt;w:rPr&gt; · alloc revision id"]
  markup --> simplify["merge adjacent same-author<br/>redlines"]
  simplify --> light["light validation:<br/>well-formed · delText/ins rules"]
  light --> save["repack DOCX · session.save()"]
  save --> finalize{"accept/reject<br/>requested?"}
  finalize -->|yes| lo["libreoffice-service :8002<br/>.uno:Accept/RejectAllTrackedChanges"]
  finalize -->|no| done["redlined DOCX"]

Highlights, all verified in the source:

  • Five-pass matching so the model's plain-text target is found even across run and paragraph boundaries, with near_text disambiguation when a match is ambiguous (docx_xml_service.py).
  • Formatting preservation — each matched run's <w:rPr> is carried into the <w:del>/<w:ins>; markdown **bold** in the insertion becomes <w:b/>.
  • A validation cascade — fast per-edit checks (validators/docx_validator.py) plus full XSD validation against a vendored ISO/IEC 29500 schema set (39 XSDs under validators/schemas) with auto-repair in xsd_docx.py.
  • Commentsadd_comment writes the full four-file Word comment apparatus (comments.xml, commentsExtended.xml, commentsIds.xml, commentsExtensible.xml) plus range markers (comment_tools.py, templates in comment_templates/).
  • Finalization is offloaded to LibreOffice via UNO macros (.uno:AcceptAllTrackedChanges, .uno:RejectAllTrackedChanges, .uno:CompareDocuments) rather than hand-rolled lxml (libreoffice-service/main.py).