Systems | Development | Analytics | API | Testing

Cascading Failures Aren't Inevitable: Lessons from the AWS DNS Outage

AWS outages grab headlines because they affect millions, but the root cause often comes down to something invisible: DNS failures and cascading service dependencies. The complexity of modern cloud systems, combined with the advanced technology powering platforms like AWS, makes these outages particularly challenging to diagnose and resolve.

15 Best DevOps Tools for Software Teams [Free Evaluation Template]

DevOps is a practice, not a tool, but tools are needed to implement it. Breaking down walls of communication and creating visibility and trust across all the teams involved in delivering software and technology is challenging. The right tools make the automation and integrations needed across functional teams seamless, open, and scalable. This article looks at the top CI/CD, automation, orchestration, and other DevOps pipeline tools to give you a detailed list of our Top 15 DevOps tools in 2025.

Part 2: Building a Production-Grade Traffic Capture, Transform and Replay System

When developers try to build realistic mocks and automated tests from production network traffic, the real challenge isn’t just in the capturing—it’s in the data manipulation. Raw traffic is a chaotic sea of patterns, dynamic tokens, environment-specific secrets, and tangled dependencies that seem impossible to untangle by hand. Over my two decades of building these sytems, I learned that solving this problem requires more than brute-force parsing or ad hoc scripts.

The Load Testing Start Guide! #speedscale #stresstest #loadtesting #mocking #startup

Are you ready to get serious about load and stress testing, but don't know where to start? This guide highlights the trap most serious engineers fall into: trying to build a custom DIY testing environment. The traditional path means signing your team up for maintaining load drivers, test case frameworks, ephemeral environments, and endless custom mocks a massive drain on time and resources. There's a better, cheaper, and faster solution: Traffic Replay.

Stop Debugging Blindly! How Traffic Capture Can Help Your Code #speedscale #trafficcapture #ai

Is AI "slop" or new code pushing tons of bugs into production? You can't test everything forever. Learn how traffic capture is the most efficient way to understand how your code is actually running in the real world. By grabbing data from sidecars, packet captures, or logs, you get the context you need to prevent bugs and improve performance.

Part 1: Building a Production-Grade Traffic Capture and Replay System

A few years ago I was on call during the Super Bowl. At the time I was working for an observability vendor and one of our customers had an outage caused by a surge in user traffic. But our monitoring system didn’t have enough data to know what went wrong and I sat on a call for 2 hours painfully listening to them spinning up more servers and trying to catch up with the user load.

Mitmproxy vs Proxymock: Replaying Traffic for Realistic API Testing

Replaying traffic is a core tool in your toolbox when you need to reproduce a tricky bug or validate how your app behaves. Traffic replay is especially valuable for testing complex software applications that rely on APIs and microservices, where integration and functionality must be thoroughly validated.