LLMs

Cold-Start Recommendation Systems for E-Commerce

By Ieva Ramanauskaite2026-02-218 min read

Cost modelling is now part of our pre-merge checklist. Every PR that touches an LLM call includes an estimate of the per-request token spend and the expected daily volume. Surprises in the monthly invoice have dropped to nearly zero.

Trade-offs

Retrieval quality is the lever that moves the most weight. No amount of prompt engineering compensates for a retriever that consistently surfaces the wrong passages. We spent two weeks tuning chunking and reranking before touching the prompt template.

Observability for agent runs is qualitatively different from traditional APM. A single user request can spawn dozens of tool calls, each with its own latency, cost, and failure mode. Flat traces become unreadable; we render them as collapsible trees.

What we changed

When the system is wrong, the user should be able to understand why in under thirty seconds. Citation links, confidence scores, and the exact retrieved passages are surfaced in the UI for every generated answer.

Evaluation suites grow faster than the codebase they cover. We treat them as first-class artefacts: versioned, reviewed, and regenerated on a schedule. The team that owns the model owns the eval set, not a separate QA group.

Hardware is a moving target. The Jetson Orin we benchmarked in January was outperformed by an off-the-shelf mini-PC by August. We re-run the benchmark matrix every quarter and have stopped making long-term hardware commitments.

Cold-Start Recommendation Systems for E-Commerce

Trade-offs

What we changed

Related articles

Voice Agents That Actually Understand Lithuanian

LLMs in Enterprise Automation

Real-World Latency Numbers for GPT-5, Claude, and Gemini