Closing the Last Mile in Language Model Performance

Large language models are brilliant generalists. But they know nothing about your business. The gap is smaller to close than most people think.

A large language model can draft a client report in seconds. The structure is sound. The grammar is clean. And the terminology is wrong in ways that would embarrass you in front of a customer.

This is the core tension with off-the-shelf LLMs. They are extraordinary at generalising. They know language, they know structure, they know how to reason. What they do not know is your business. Your product names. Your internal terminology. The way your team writes to clients versus how it writes internally. The difference between how you talk about pricing in a proposal and how you talk about it in a board deck.

Most people assume that closing this gap is a large, expensive project. Retrain the model. Hire ML engineers. Spend months on data preparation. That assumption is outdated.

A technique called LoRA — Low-Rank Adaptation — changed the economics of fine-tuning almost overnight. Instead of retraining every parameter in a model with hundreds of billions of weights, LoRA freezes the original model and injects a small set of trainable layers. How small? Typically less than one percent of the total parameters. The resulting file is a few megabytes, not tens of gigabytes.

The results are disproportionate to the change. A model fine-tuned with LoRA on a thousand well-chosen examples from your domain — real reports, real emails, real classifications — can match the quality of a full fine-tune that cost orders of magnitude more. Research consistently shows that a small number of high-quality examples outperforms a large number of generic ones.

This matters because fine-tuning is not about teaching the model new facts. The knowledge is already there, absorbed during pretraining on vast amounts of text. Fine-tuning is about alignment — steering the model to use the right language, follow the right structure, and meet the right standard for your specific context.

Think of it this way. A new hire with strong general skills joins your company. They are smart and capable, but they do not know your processes, your terminology, or your standards. You do not send them back to university. You show them examples of good work. You give them feedback. Within weeks, they are producing output that sounds like it came from someone who has been there for years.

LoRA does the same thing for a language model. A few hundred examples of how your team actually writes — the tone, the structure, the domain-specific language — and the model stops sounding like a generalist and starts sounding like a colleague.

The practical implications are significant. Fine-tuning a model with LoRA takes hours, not months. It runs on a single GPU, not a cluster. The fine-tuned weights are small enough to store dozens of specialised models for the cost of one. And because the base model is unchanged, you can update it independently when a better foundation model is released.

Where does it fall short? Tasks that require genuinely new reasoning — complex mathematics, novel logic — benefit less from LoRA. But for the business tasks where most companies feel the gap — drafting, classification, extraction, summarisation, anything where tone and domain language matter — it is the right tool.

The question is no longer whether fine-tuning is worth it. The barrier has dropped far enough that the real question is whether you have a hundred good examples of the output you want. If you do, the last mile is shorter than you think.

Closing the Last Mile in Language Model Performance

More insights

The Lock-In Nobody Plans For

Unlocking LLM Performance with Inference Compute

Large Language Models Are Surprisingly Good Demand Forecasters