LLMs

LLMs in Enterprise Automation

By Lina Petrauskas2026-01-1410 min read

Hardware is a moving target. The Jetson Orin we benchmarked in January was outperformed by an off-the-shelf mini-PC by August. We re-run the benchmark matrix every quarter and have stopped making long-term hardware commitments.

Results

Tool definitions should read like API documentation written for a careful junior engineer. The model behaves better when each parameter has a concrete example, a unit, and an explicit statement of what happens when the value is omitted.

Documentation written by the team that builds the system tends to be more useful than documentation written by anyone else. The trade-off is consistency, which we address with a shared style guide and a lightweight review process.

Background

Cost modelling is now part of our pre-merge checklist. Every PR that touches an LLM call includes an estimate of the per-request token spend and the expected daily volume. Surprises in the monthly invoice have dropped to nearly zero.

The first version of this system was deliberately simple. We wanted a baseline that could be measured against, rather than an architecture that anticipated every possible failure mode. That decision paid off — most of the issues we eventually hit were unrelated to the ones we had originally feared.

Evaluation suites grow faster than the codebase they cover. We treat them as first-class artefacts: versioned, reviewed, and regenerated on a schedule. The team that owns the model owns the eval set, not a separate QA group.

LLMs in Enterprise Automation

Results

Background

Related articles

Lessons From Shipping 40 AI Agents in 18 Months

Scraping at Scale Without Getting Blocked

Building a RAG Pipeline That Actually Works in Production