"We use GPT" is not an architecture

Most regulatory AI vendors today describe their product as a wrapper around a frontier model — GPT, Claude, Gemini — sometimes paired with a vector database for retrieval. That works for general-purpose Q&A. It does not work for producing a 200-page CSR that a regulator will read line by line.

The reason is that frontier models, by themselves, are missing the four things that actually make a regulatory draft acceptable: a structured document model, evidence-grounded retrieval, planning over writer-specified sources, and verifiable provenance for every claim.

What's wrong with "frontier model + vector DB"

Generic RAG — chunk the source documents, embed them, retrieve the top-k chunks for each prompt — has well-known limits in regulated writing:

  • Chunk loss. A safety table broken across pages becomes two unrelated chunks. The narrative the writer needs cannot be reconstructed from either chunk alone.
  • No structural awareness. The system doesn't know that section 12.4.2 of a CSR depends on data summarized in section 11.3.1. Chunks don't carry document semantics.
  • Citation drift. Even when retrieval returns the right chunk, the LLM rephrases it loosely. The "citation" points to a chunk that no longer matches the sentence it grounds.

For an internal-tools use case, this is acceptable. For a CSR, it is not — every paragraph that doesn't survive a writer's verification has to be rewritten by hand, and the AI's time savings disappears.

The four-layer stack

In our technical whitepaper we describe the four layers a purpose-built regulatory AI system needs:

  1. Document model. A typed representation of the document being produced — sections, dependencies, ICH guidance for each section type, conventions for how content flows between sections.
  2. Evidence layer. Retrieval that respects document structure: tables stay intact, figures remain linked to their captions, and a writer can specify "section 12.4.2 should draw from these three CSR sections of this prior study."
  3. Planning layer. An agent that takes the writer's section-level intent and produces a plan — what to retrieve, what to draft, what to verify — before any text is generated. The plan is itself reviewable.
  4. Verification layer. Every generated sentence carries a citation back to the file, page, and span that supports it. The writer can audit any sentence in one click.

Each layer is doing work that the frontier model alone cannot do. Removing any one of them reintroduces a class of failure that regulatory review will surface.

Why this matters when picking a tool

Our recommendation in the whitepaper is blunt: don't pick a regulatory writing tool by which LLM it uses. Pick it by what stack sits between the LLM and the document. The frontier model will get better every quarter regardless of who you choose. The stack is what determines whether your draft is reviewable.

What's in the whitepaper

The full technical whitepaper covers:

  • The specific failure modes of generic RAG in CSR, PSUR, and CMC writing, with examples
  • Where prompt engineering hits its ceiling in regulated outputs
  • How the four-layer stack composes — and what each layer must guarantee
  • Customer outcomes: where the stack measurably changes drafting time and review effort

If you're evaluating regulatory writing tools, or building one, the whitepaper is the architectural argument for why "which model" is the wrong first question.

Read the full whitepaper →