All articles
35 posts across 10 categories.
Edge Deployment of Small Language Models on Industrial Devices
Documentation written by the team that builds the system tends to be more useful than documentation written by anyone else. The trade-off is consistency, which we address with a…
A Practical Guide to Evaluating LLM Outputs
Cost modelling is now part of our pre-merge checklist. Every PR that touches an LLM call includes an estimate of the per-request token spend and the expected daily volume.…
Building a RAG Pipeline That Actually Works in Production
Documentation written by the team that builds the system tends to be more useful than documentation written by anyone else. The trade-off is consistency, which we address with a…
Building a Knowledge Base That LLMs Can Actually Search
Observability for agent runs is qualitatively different from traditional APM. A single user request can spawn dozens of tool calls, each with its own latency, cost, and failure…
Headless Browser Fingerprinting and How We Mitigate It
Cost modelling is now part of our pre-merge checklist. Every PR that touches an LLM call includes an estimate of the per-request token spend and the expected daily volume.…
From PDF Chaos to Structured Data with Vision Models
Documentation written by the team that builds the system tends to be more useful than documentation written by anyone else. The trade-off is consistency, which we address with a…
Streaming Inference at Scale With Kafka and Triton
Observability for agent runs is qualitatively different from traditional APM. A single user request can spawn dozens of tool calls, each with its own latency, cost, and failure…
Real-Time Anomaly Detection on Manufacturing Lines
The first version of this system was deliberately simple. We wanted a baseline that could be measured against, rather than an architecture that anticipated every possible failure…
How AI Agents Automate Business Processes
In production, latency distributions matter far more than averages. A pipeline whose mean response time looks acceptable can still feel sluggish if the 95th percentile drifts…
Lessons From Shipping 40 AI Agents in 18 Months
The first version of this system was deliberately simple. We wanted a baseline that could be measured against, rather than an architecture that anticipated every possible failure…
Document Intelligence for Public Sector Procurement
Documentation written by the team that builds the system tends to be more useful than documentation written by anyone else. The trade-off is consistency, which we address with a…
Cost Control Strategies for High-Volume LLM Applications
In production, latency distributions matter far more than averages. A pipeline whose mean response time looks acceptable can still feel sluggish if the 95th percentile drifts…
Continuous Evaluation Pipelines for Generative Models
When the system is wrong, the user should be able to understand why in under thirty seconds. Citation links, confidence scores, and the exact retrieved passages are surfaced in…
Building Internal Copilots Without Leaking Confidential Data
Cost modelling is now part of our pre-merge checklist. Every PR that touches an LLM call includes an estimate of the per-request token spend and the expected daily volume.…
Synthetic Data Generation for Document Understanding
Documentation written by the team that builds the system tends to be more useful than documentation written by anyone else. The trade-off is consistency, which we address with a…
n8n vs Airflow vs Temporal for AI Workflows
Tool definitions should read like API documentation written for a careful junior engineer. The model behaves better when each parameter has a concrete example, a unit, and an…
Securing LLM Endpoints Against Prompt Injection
In production, latency distributions matter far more than averages. A pipeline whose mean response time looks acceptable can still feel sluggish if the 95th percentile drifts…
Computer Vision for Transport Monitoring
Hardware is a moving target. The Jetson Orin we benchmarked in January was outperformed by an off-the-shelf mini-PC by August. We re-run the benchmark matrix every quarter and…
Migrating From OpenAI to Self-Hosted Llama Models
Observability for agent runs is qualitatively different from traditional APM. A single user request can spawn dozens of tool calls, each with its own latency, cost, and failure…
Building a Compliance-Ready Audit Trail for AI Decisions
Hardware is a moving target. The Jetson Orin we benchmarked in January was outperformed by an off-the-shelf mini-PC by August. We re-run the benchmark matrix every quarter and…
Real-World Latency Numbers for GPT-5, Claude, and Gemini
Documentation written by the team that builds the system tends to be more useful than documentation written by anyone else. The trade-off is consistency, which we address with a…
Cold-Start Recommendation Systems for E-Commerce
Evaluation suites grow faster than the codebase they cover. We treat them as first-class artefacts: versioned, reviewed, and regenerated on a schedule. The team that owns the…
Reducing Hallucinations With Citation-First Retrieval
When the system is wrong, the user should be able to understand why in under thirty seconds. Citation links, confidence scores, and the exact retrieved passages are surfaced in…
Designing Tool-Using Agents for Customer Support
Documentation written by the team that builds the system tends to be more useful than documentation written by anyone else. The trade-off is consistency, which we address with a…
Why We Stopped Using LangChain (and What Replaced It)
In production, latency distributions matter far more than averages. A pipeline whose mean response time looks acceptable can still feel sluggish if the 95th percentile drifts…
Scraping at Scale Without Getting Blocked
Tool definitions should read like API documentation written for a careful junior engineer. The model behaves better when each parameter has a concrete example, a unit, and an…
How We Cut OCR Error Rates by 60% With Layout-Aware Models
Cost modelling is now part of our pre-merge checklist. Every PR that touches an LLM call includes an estimate of the per-request token spend and the expected daily volume.…
Why Most AI Agent Demos Break in Real Workflows
Retrieval quality is the lever that moves the most weight. No amount of prompt engineering compensates for a retriever that consistently surfaces the wrong passages. We spent two…
Multi-Agent Orchestration Patterns We Use in Production
Cost modelling is now part of our pre-merge checklist. Every PR that touches an LLM call includes an estimate of the per-request token spend and the expected daily volume.…
Fine-Tuning vs Prompting: Choosing the Right Approach
Documentation written by the team that builds the system tends to be more useful than documentation written by anyone else. The trade-off is consistency, which we address with a…
Forecasting Demand With Hybrid Statistical and ML Models
Tool definitions should read like API documentation written for a careful junior engineer. The model behaves better when each parameter has a concrete example, a unit, and an…
Vector Databases Compared: pgvector, Qdrant, Weaviate, Milvus
Retrieval quality is the lever that moves the most weight. No amount of prompt engineering compensates for a retriever that consistently surfaces the wrong passages. We spent two…
LLMs in Enterprise Automation
In production, latency distributions matter far more than averages. A pipeline whose mean response time looks acceptable can still feel sluggish if the 95th percentile drifts…
Voice Agents That Actually Understand Lithuanian
Observability for agent runs is qualitatively different from traditional APM. A single user request can spawn dozens of tool calls, each with its own latency, cost, and failure…
Image Classification on a Raspberry Pi 5 With ONNX Runtime
Retrieval quality is the lever that moves the most weight. No amount of prompt engineering compensates for a retriever that consistently surfaces the wrong passages. We spent two…