Systems | Development | Analytics | API | Testing

WebSocket reconnection in AI agents: transport recovery vs. session recovery

Your AI agent is mid-task, waiting on the result of a search tool call it made 30 seconds ago. The user is watching a spinner. Then a network blip drops the connection. The application reconnects in under a second, fast enough that most monitoring wouldn't flag it. But the tool call result that came back during the gap is gone, and so are the 200 tokens the agent generated before the silence began. The reconnect succeeded - but the session didn't.

Durable Execution meets Durable Sessions: Resilient AI Agents with Temporal and Ably

Most teams building agents with Temporal have solved the backend problem: crashed workflows restart, LLM call failures retry automatically, and long-running tasks complete reliably. What they haven't solved is the client side -- what happens to the stream when the user's connection drops, when they switch devices, or when two sub-agents are working concurrently and the client needs a single coherent view.

Temporal made execution durable. Ably makes sessions durable.

When Temporal launched, a lot of people had the same reaction: "We have queues and retries. We don't need this." (Temporal's own blog addressed this directly.) That reaction made sense. Queues solve queue problems and they do it well. What Temporal gave you was something different: a named execution context that survives a server restart and picks up from its last checkpoint. Not a better queue. A different abstraction entirely. If you built with it, you couldn't imagine going back.

AI agent streaming in action: barge-in, human handover, and session continuity

You're mid-conversation with an AI support agent. You've explained the problem, the agent is halfway through a response, and the connection drops. When you reconnect, the response is gone. You type the same question again. The agent asks the same clarifying questions again. Three minutes of context, gone. Not because the model forgot it, but because the delivery layer stored nothing.

Durable Sessions: Why your AI UX keeps breaking and how to fix it

AI products today are being let down not by the models — but by the delivery layer between the agent and the user. In this session, Fiona Corden, Technical Product Manager, at Ably, breaks down why AI UX is eroding consumer trust, how to spot the delivery-layer problems hiding in your product data, and what the companies getting it right are doing differently. You'll come away knowing how to diagnose whether your AI product has a session layer problem, what durable sessions are, and why they're becoming the standard solution for resilient AI UX at scale.

Is WebSockets enough for AI chat?

WebSockets are the right protocol for production AI chat. But that fact doesn’t prevent the failure most teams hit first. An enterprise load balancer closes the idle connection at 60 seconds during a tool execution wait. Your reconnect logic fires in under a second, the agent keeps running server-side, and the client receives nothing from the gap. No tokens, no tool call results, no context. The reconnected socket has no view of what happened while it was down.

We built a Custom Transport for Vercel's AI SDK

Ably is a realtime messaging platform, it's a pub/sub product where you can publish messages to channels and clients subscribed to those channels will receive those messages in realtime. It turns out that the Ably realtime platform is really well suited to being the transport that sits between your AI models and the clients receiving the generated responses.

Conversation tree branching in @ably/ai-transport

Picture a developer pair-programming with an AI assistant. The model returns a function that almost works. The developer asks it to try again. The second attempt is worse. They want the first one back. In a linear chat, that history is gone, or it's a third bubble in the thread that pollutes context for every future turn.

The model is fine. The session is broken.

Take any AI agent demo from the last six months. It works. Now ship it to real users on real networks, real devices, real attention spans. A meaningful share of those users will never finish their first conversation cleanly. Not because the model gave a bad answer. Because the connection dropped, the tab refreshed, the phone took over from the laptop, or the spinner kept spinning forever.