Home / Blog

All articles

35 posts across 10 categories.

MLOps2026-05-235 min

Edge Deployment of Small Language Models on Industrial Devices

Documentation written by the team that builds the system tends to be more useful than documentation written by anyone else. The trade-off is consistency, which we address with a…

MLOps2026-05-159 min

A Practical Guide to Evaluating LLM Outputs

Cost modelling is now part of our pre-merge checklist. Every PR that touches an LLM call includes an estimate of the per-request token spend and the expected daily volume.…

LLMs2026-05-144 min

Building a RAG Pipeline That Actually Works in Production

Documentation written by the team that builds the system tends to be more useful than documentation written by anyone else. The trade-off is consistency, which we address with a…

Automation2026-05-1310 min

Building a Knowledge Base That LLMs Can Actually Search

Observability for agent runs is qualitatively different from traditional APM. A single user request can spawn dozens of tool calls, each with its own latency, cost, and failure…

Automation2026-05-068 min

Headless Browser Fingerprinting and How We Mitigate It

Cost modelling is now part of our pre-merge checklist. Every PR that touches an LLM call includes an estimate of the per-request token spend and the expected daily volume.…

AI Agents2026-05-0213 min

From PDF Chaos to Structured Data with Vision Models

Documentation written by the team that builds the system tends to be more useful than documentation written by anyone else. The trade-off is consistency, which we address with a…

Automation2026-04-2212 min

Streaming Inference at Scale With Kafka and Triton

Observability for agent runs is qualitatively different from traditional APM. A single user request can spawn dozens of tool calls, each with its own latency, cost, and failure…

Edge AI2026-04-229 min

Real-Time Anomaly Detection on Manufacturing Lines

The first version of this system was deliberately simple. We wanted a baseline that could be measured against, rather than an architecture that anticipated every possible failure…

MLOps2026-04-214 min

How AI Agents Automate Business Processes

In production, latency distributions matter far more than averages. A pipeline whose mean response time looks acceptable can still feel sluggish if the 95th percentile drifts…

RAG2026-04-2110 min

Lessons From Shipping 40 AI Agents in 18 Months

The first version of this system was deliberately simple. We wanted a baseline that could be measured against, rather than an architecture that anticipated every possible failure…

Automation2026-04-189 min

Document Intelligence for Public Sector Procurement

Documentation written by the team that builds the system tends to be more useful than documentation written by anyone else. The trade-off is consistency, which we address with a…

MLOps2026-04-144 min

Cost Control Strategies for High-Volume LLM Applications

In production, latency distributions matter far more than averages. A pipeline whose mean response time looks acceptable can still feel sluggish if the 95th percentile drifts…

Observability2026-04-045 min

Continuous Evaluation Pipelines for Generative Models

When the system is wrong, the user should be able to understand why in under thirty seconds. Citation links, confidence scores, and the exact retrieved passages are surfaced in…

Forecasting2026-03-2411 min

Building Internal Copilots Without Leaking Confidential Data

Cost modelling is now part of our pre-merge checklist. Every PR that touches an LLM call includes an estimate of the per-request token spend and the expected daily volume.…

Forecasting2026-03-239 min

Synthetic Data Generation for Document Understanding

Documentation written by the team that builds the system tends to be more useful than documentation written by anyone else. The trade-off is consistency, which we address with a…

Data Pipelines2026-03-2210 min

n8n vs Airflow vs Temporal for AI Workflows

Tool definitions should read like API documentation written for a careful junior engineer. The model behaves better when each parameter has a concrete example, a unit, and an…

Forecasting2026-03-204 min

Securing LLM Endpoints Against Prompt Injection

In production, latency distributions matter far more than averages. A pipeline whose mean response time looks acceptable can still feel sluggish if the 95th percentile drifts…

Forecasting2026-03-194 min

Computer Vision for Transport Monitoring

Hardware is a moving target. The Jetson Orin we benchmarked in January was outperformed by an off-the-shelf mini-PC by August. We re-run the benchmark matrix every quarter and…

Automation2026-03-057 min

Migrating From OpenAI to Self-Hosted Llama Models

Observability for agent runs is qualitatively different from traditional APM. A single user request can spawn dozens of tool calls, each with its own latency, cost, and failure…

Forecasting2026-02-2710 min

Building a Compliance-Ready Audit Trail for AI Decisions

Hardware is a moving target. The Jetson Orin we benchmarked in January was outperformed by an off-the-shelf mini-PC by August. We re-run the benchmark matrix every quarter and…

Observability2026-02-2312 min

Real-World Latency Numbers for GPT-5, Claude, and Gemini

Documentation written by the team that builds the system tends to be more useful than documentation written by anyone else. The trade-off is consistency, which we address with a…

LLMs2026-02-218 min

Cold-Start Recommendation Systems for E-Commerce

Evaluation suites grow faster than the codebase they cover. We treat them as first-class artefacts: versioned, reviewed, and regenerated on a schedule. The team that owns the…

Edge AI2026-02-205 min

Reducing Hallucinations With Citation-First Retrieval

When the system is wrong, the user should be able to understand why in under thirty seconds. Citation links, confidence scores, and the exact retrieved passages are surfaced in…

Data Pipelines2026-02-185 min

Designing Tool-Using Agents for Customer Support

Documentation written by the team that builds the system tends to be more useful than documentation written by anyone else. The trade-off is consistency, which we address with a…

LLMs2026-02-156 min

Why We Stopped Using LangChain (and What Replaced It)

In production, latency distributions matter far more than averages. A pipeline whose mean response time looks acceptable can still feel sluggish if the 95th percentile drifts…

Computer Vision2026-02-057 min

Scraping at Scale Without Getting Blocked

Tool definitions should read like API documentation written for a careful junior engineer. The model behaves better when each parameter has a concrete example, a unit, and an…

LLMs2026-02-0313 min

How We Cut OCR Error Rates by 60% With Layout-Aware Models

Cost modelling is now part of our pre-merge checklist. Every PR that touches an LLM call includes an estimate of the per-request token spend and the expected daily volume.…

RAG2026-02-0212 min

Why Most AI Agent Demos Break in Real Workflows

Retrieval quality is the lever that moves the most weight. No amount of prompt engineering compensates for a retriever that consistently surfaces the wrong passages. We spent two…

Edge AI2026-01-275 min

Multi-Agent Orchestration Patterns We Use in Production

Cost modelling is now part of our pre-merge checklist. Every PR that touches an LLM call includes an estimate of the per-request token spend and the expected daily volume.…

Edge AI2026-01-248 min

Fine-Tuning vs Prompting: Choosing the Right Approach

Documentation written by the team that builds the system tends to be more useful than documentation written by anyone else. The trade-off is consistency, which we address with a…

Data Pipelines2026-01-1913 min

Forecasting Demand With Hybrid Statistical and ML Models

Tool definitions should read like API documentation written for a careful junior engineer. The model behaves better when each parameter has a concrete example, a unit, and an…

Observability2026-01-1710 min

Vector Databases Compared: pgvector, Qdrant, Weaviate, Milvus

Retrieval quality is the lever that moves the most weight. No amount of prompt engineering compensates for a retriever that consistently surfaces the wrong passages. We spent two…

LLMs2026-01-1410 min

LLMs in Enterprise Automation

In production, latency distributions matter far more than averages. A pipeline whose mean response time looks acceptable can still feel sluggish if the 95th percentile drifts…

RAG2026-01-0611 min

Voice Agents That Actually Understand Lithuanian

Observability for agent runs is qualitatively different from traditional APM. A single user request can spawn dozens of tool calls, each with its own latency, cost, and failure…

MLOps2026-01-0113 min

Image Classification on a Raspberry Pi 5 With ONNX Runtime

Retrieval quality is the lever that moves the most weight. No amount of prompt engineering compensates for a retriever that consistently surfaces the wrong passages. We spent two…