Why LLMs fail at document automation

If you've evaluated AI solutions for document-heavy workflows in the past two years, you've encountered the same story multiple times: a vendor demonstrates GPT-5 or a fine-tuned LLM extracting data from your documents. It looks impressive. You run a pilot. Accuracy comes back at 75-85%. The vendor says they need more time, more data, and more iteration. Six months later, you still need a human checking every output.

The reason for this is a fundamental mismatch between what large language models are designed to do and what enterprise automation actually requires.

What LLMs are actually built for

Large language models are remarkable technology. They were trained on a broad sweep of human-generated text — billions of documents, across thousands of domains — to develop generalised language understanding. That breadth is their strength and their limitation.

An LLM can read almost anything. It can summarise a contract, draft a response to an email, explain a complex concept, or generate code. It performs credibly across a huge range of inputs. What it cannot do reliably is achieve the precision that enterprise automation demands.

Here's why:

An LLM hasn't been trained on your supplier's specific invoice format. It hasn't learned that one of your key customers always sends orders with non-standard unit codes that need to be mapped to your ERP's product master. It hasn't internalised your organisation's specific exception rules, what to do when a claim arrives missing a required field, or when a broker statement doesn't reconcile.

LLMs can be prompted. They can be given context. But they cannot be effectively specialised for your specific workflows without retraining. And retraining a large language model on your proprietary documents would cost hundreds of millions of dollars and take years. It's not a practical option. It's why the fine-tuned LLM approach consistently hits a ceiling in production.

‍

The gap between general models and real workflows

Manufacturing provides the clearest illustration. Consider a purchase order processing workflow. When a customer order arrives, the system needs to:

Identify the customer from the document content and map them to your ERP's customer master.
Extract product codes that may differ from your internal codes and reconcile them.
Pull shipping details and validate against available inventory.
Process order line items, apply pricing rules, and trigger fulfilment.

Each of these steps requires specialised knowledge: your customer database, your product master, your pricing logic, your exception rules. No off-the-shelf model carries this knowledge. And feeding it as context in a prompt is not reliable enough for production. Because at the precision levels required to remove humans from the process, any ambiguity in how the model interprets that context translates directly into errors.

One percent of accuracy drop doesn't sound significant. In a process running 500 documents a day, it means five additional manual exceptions, every day, permanently. That's a full-time role created by a single percentage point of model degradation.

The same problem compounds in insurance, where policy documents, broker statements, and claims arrive in hundreds of formats across a market of heterogeneous counterparties. Or in healtcare, where handling patient data involves applying evolving clinical and regulatory rules with full auditability.

Generic models are not built for this level of domain specificity. Specialised ones must be.

KAPTO’s entirely different architecture

KAPTO approaches this problem from a different direction entirely.

Rather than prompting a general model and hoping it extracts the right things, KAPTO transforms incoming documents into constrained knowledge graphs — structured, domain-specific representations of the information in each document, designed around the business rules and data structures of the specific workflow.

Think of it this way:

An LLM reads a document and produces text. KAPTO reads a document and produces a structured, validated, machine-executable knowledge object, that already understands the difference between your internal product code and the code your supplier used, and has resolved that difference according to your business logic.

Those knowledge graphs feed KAPTO's execution layer, which then takes action directly in your systems: updating the ERP record, triggering the workflow, routing the exception, closing the loop.

The models underlying this process are not large. They are specifically sized for the task: right-sized architectures trained on customer document corpora, with different model designs for different document types. An insurance broker statement processing model looks nothing like a manufacturing order model. Both are purpose-built, production-tested, and maintained as proprietary assets.

The build vs. buy question in automation

Enterprise organisations have wrestled with the build vs. buy question in automation for decades. Build something internally and it matches your exact needs but creates long-term dependency on internal teams. Or buy an off-the-shelf product and it's fast to deploy but never quite fits your workflows.

KAPTO resolves this tension differently. The platform deploys as a managed external solution, no internal development required, no model engineering team needed. But once implemented, it operates as a deeply integrated operational asset: trained on your documents, embedded in your systems, continuously learning from your specific process patterns.

In procurement, it's a SaaS deployment with a six-week implementation timeline to first live process. In daily operations, it functions as a domain expert that has been built from your data: specialised, predictable, and auditable in a way that a general-purpose model never can be.

Nobody builds their own database engine from scratch anymore. At some point in the nineties, enterprises stopped debating whether to build their own Oracle and started buying infrastructure, then customizing on top of it. AI automation is at the same inflection point. The question is not whether to build or buy. It's whether to buy the right thing.

What KAPTO looks like in production

Technical leaders evaluating automation platforms ask the right question: does this actually work on our documents, at our volumes, within our compliance requirements?

KAPTO's answer is grounded in what it's built for:

Proprietary models trained on customer document types, not generic extraction.
Rich flexible extraction: text, tables, forms, signatures, stamps, barcodes, checkmarks - whatever the document contains.
Continuous learning with human review loops, confidence rules, and verification thresholds.
Full API integration with any stack: Oracle, SAP, Gmail, document repositories, legacy ERPs.
Security by design: isolated environments, end-to-end encryption, GDPR and AI Act compliant.
Integration assistance: In addition to providing a tool, KAPTO helps customers connect the automation to their existing systems, which is usually where the real complexity lives.

AI that extracts data is not the same as AI that executes work. The difference is everything.

‍

Gabriel De Dominicis

Gabriel is a co-founder and serves as the Managing Director and Head of AI in KAPTO. He leads KAPTO’s vision and technology. A mathematician-turned serial entrepreneur with 25+ years in enterprise IT, he focuses on execution-first AI for complex, regulated operations.

Have some questions about the topic?
Drop a message to me on LinkedIn.