Systems | Development | Analytics | API | Testing

WebSocket reconnection in AI agents: transport recovery vs. session recovery

Your AI agent is mid-task, waiting on the result of a search tool call it made 30 seconds ago. The user is watching a spinner. Then a network blip drops the connection. The application reconnects in under a second, fast enough that most monitoring wouldn't flag it. But the tool call result that came back during the gap is gone, and so are the 200 tokens the agent generated before the silence began. The reconnect succeeded - but the session didn't.

Temporal made execution durable. Ably makes sessions durable.

When Temporal launched, a lot of people had the same reaction: "We have queues and retries. We don't need this." (Temporal's own blog addressed this directly.) That reaction made sense. Queues solve queue problems and they do it well. What Temporal gave you was something different: a named execution context that survives a server restart and picks up from its last checkpoint. Not a better queue. A different abstraction entirely. If you built with it, you couldn't imagine going back.

AI agent streaming in action: barge-in, human handover, and session continuity

You're mid-conversation with an AI support agent. You've explained the problem, the agent is halfway through a response, and the connection drops. When you reconnect, the response is gone. You type the same question again. The agent asks the same clarifying questions again. Three minutes of context, gone. Not because the model forgot it, but because the delivery layer stored nothing.

Is WebSockets enough for AI chat?

WebSockets are the right protocol for production AI chat. But that fact doesn’t prevent the failure most teams hit first. An enterprise load balancer closes the idle connection at 60 seconds during a tool execution wait. Your reconnect logic fires in under a second, the agent keeps running server-side, and the client receives nothing from the gap. No tokens, no tool call results, no context. The reconnected socket has no view of what happened while it was down.

We built a Custom Transport for Vercel's AI SDK

Ably is a realtime messaging platform, it's a pub/sub product where you can publish messages to channels and clients subscribed to those channels will receive those messages in realtime. It turns out that the Ably realtime platform is really well suited to being the transport that sits between your AI models and the clients receiving the generated responses.

Conversation tree branching in @ably/ai-transport

Picture a developer pair-programming with an AI assistant. The model returns a function that almost works. The developer asks it to try again. The second attempt is worse. They want the first one back. In a linear chat, that history is gone, or it's a third bubble in the thread that pollutes context for every future turn.

The model is fine. The session is broken.

Take any AI agent demo from the last six months. It works. Now ship it to real users on real networks, real devices, real attention spans. A meaningful share of those users will never finish their first conversation cleanly. Not because the model gave a bad answer. Because the connection dropped, the tab refreshed, the phone took over from the laptop, or the spinner kept spinning forever.

Why we built a dedicated SDK for realtime AI streaming

If you've built a conversational AI feature, you know the pattern. Client sends a message, backend calls a model, response streams back over HTTP. SSE mostly, or WebSockets if you need bidirectional. For a single user on a single device, it works well. The trouble is the best AI products right now have moved well past that.