Writing

Notes from the team.

How to read AI conversations at scale, why behavioural groups matter, and what to do with the five that actually explain the rest.

Featured·May 4, 2026·22 min read

LLM Agent Observability and Product Analytics in 2026

Every LLM agent observability, evals, and product analytics tool compared in 2026 — LangSmith, Langfuse, Braintrust, Galileo, Arize Phoenix, Datadog, Amplitude, PostHog, Pendo, and more. Plus the user value layer no existing tool fills.

Read article

May 4, 2026·10 min read

How do you measure success of an AI customer support agent?

Deflection rate measures cost savings, not user value. Here are the five metrics that tell you whether your AI support agent is actually helping customers.

Read

May 2, 2026·8 min read

What is shadow rework in AI products?

Shadow rework is when a user accepts an AI agent's output then redoes the work elsewhere. Invisible to evals, traces, and dashboards. Here is how to detect it.

Read

May 2, 2026·9 min read

How do you measure trust in AI agent outputs?

Trust in AI agents shows up in what users do after the output, not in what they say. Here are four signals that measure it and one metric that rolls them up.

Read

May 1, 2026·9 min read

What Locus is, and why we built it.

Locus is product analytics for AI agents. It reads every conversation your agent has had, groups users by what they actually do, and shows the picture in one page.

Read

May 1, 2026·8 min read

Observability vs evals vs product analytics.

AI agent observability, evals, and product analytics answer different questions. Here is what each layer measures, where it stops, and which one tells you if users got value.

Read

May 1, 2026·8 min read

Why AI agents pass evals but still fail users.

AI agents can score 95% on evals and still lose user trust in production. Here is why eval suites miss silent failure, shadow rework, and intent drift, and what to measure instead.

Read

April 22, 2026·5 min read

The data is already there. Your team just can't read it.

Every AI agent logs every conversation. The volume is unreadable by hand and no traditional analytics tool can parse free text. Here's what that costs product teams and how to fix it with product analytics for AI agents.

Read

April 18, 2026·4 min read

Plan tier is not a behaviour.

The user groups that matter for product decisions in AI agents are not demographic — they are behavioural. Here is why plan-tier segmentation misses the point and what AI product managers should use instead.

Read

April 14, 2026·6 min read

Why the sample of twenty fails.

A human can read about twenty AI conversations a week. Most product decisions on AI agents are made on that sample. Here is why the sample is biased, why it misses agent value drift and silent failure, and what an AI product team can do about it.

Read

Notes from the team.

LLM Agent Observability and Product Analytics in 2026

How do you measure success of an AI customer support agent?

What is shadow rework in AI products?

How do you measure trust in AI agent outputs?

What Locus is, and why we built it.

Observability vs evals vs product analytics.

Why AI agents pass evals but still fail users.

The data is already there. Your team just can't read it.

Plan tier is not a behaviour.

Why the sample of twenty fails.

See what every user of your agent does.