Systems | Development | Analytics | API | Testing

Why we built a dedicated SDK for realtime AI streaming

If you've built a conversational AI feature, you know the pattern. Client sends a message, backend calls a model, response streams back over HTTP. SSE mostly, or WebSockets if you need bidirectional. For a single user on a single device, it works well. The trouble is the best AI products right now have moved well past that.

Why production AI needs a session layer, not just a stream

I spoke at AI Engineer Europe last week, and came away with a clearer picture of where the industry actually is right now. My talk was about why AI user experience breaks at the transport layer. But the bigger takeaway wasn't from my own session. It was from watching what the rest of the room was building, and what problems they were running into.

The Durable Sessions stack is forming

By Matt O'Riordan, CEO and Co-Founder Across AI infrastructure right now, one word is doing a lot of work: durable. It is attached to execution. To agents. To workflows. To sessions. To streams. To transports. To memory. Every few weeks, another product ships with "durable" in the name. This is not branding noise. The underlying observation is the same in every case. AI systems are long-lived. They can fail at any layer. They need infrastructure that assumes failure rather than hopes against it.

Ably Python SDK v3: realtime for Python, built for AI

Python dominates AI development. It's where teams build their agents, orchestration layers, and the backend systems that turn LLM calls into products people actually use. Over the past year, those systems have matured rapidly. What used to live in notebooks and prototypes is now running in production, serving real users with real expectations around reliability and performance. That maturity brings infrastructure requirements. Tokens need to stream in order.

Multi-device AI session continuity: how cross-device conversation sync works

You start a research task on your laptop, the network drops during a meeting, and when you open your phone to continue, the conversation is gone – you re-prompt, get partial duplicate results, and lose 30 minutes of work. The delivery layer dropped it. That's one of the most consistent problems teams hit when building AI applications. It's particularly acute in customer support, where a session belongs to the conversation - not to any single device, connection, or participant.

Why AI support fails in production: The infrastructure problem behind every incident

HTTP streaming – the default transport underneath every major agent framework – was never designed for sessions that survive a tab close or hand off cleanly between participants. Two failures surface consistently in production CX products because of this. Both generate support tickets about conversation state and prompt quality. Both trace to the transport layer. The scenario that illustrates them: a customer contacts support about an order that's partially shipped and partially stuck.

Stateful agents, stateless infrastructure: the transport gap AI teams are patching by hand

Every major layer of the AI stack now has a name. Model providers - OpenAI, Anthropic, Google - handle inference. Agent frameworks - Vercel AI SDK, LangGraph, CrewAI - handle orchestration. Durable execution platforms like Temporal make backend workflows crash-proof.

Does your AI stack need a session layer? A maturity framework for teams building AI agents

Most teams building AI agents start with HTTP streaming. It's the right starting point. Every major agent framework defaults to it, it gets tokens on screen fast, and for a single-user prompt-response interaction it works well. The question is when it stops being enough - and how to recognise that before it turns into user experience problems, engineering waste, and technical debt that constrains what your product can do.