Systems | Development | Analytics | API | Testing

Cascading Failures Aren't Inevitable: Lessons from the AWS DNS Outage

AWS outages grab headlines because they affect millions, but the root cause often comes down to something invisible: DNS failures and cascading service dependencies. The complexity of modern cloud systems, combined with the advanced technology powering platforms like AWS, makes these outages particularly challenging to diagnose and resolve.

Part 2: Building a Production-Grade Traffic Capture, Transform and Replay System

When developers try to build realistic mocks and automated tests from production network traffic, the real challenge isn’t just in the capturing—it’s in the data manipulation. Raw traffic is a chaotic sea of patterns, dynamic tokens, environment-specific secrets, and tangled dependencies that seem impossible to untangle by hand. Over my two decades of building these sytems, I learned that solving this problem requires more than brute-force parsing or ad hoc scripts.

Mitmproxy vs Proxymock: Replaying Traffic for Realistic API Testing

Replaying traffic is a core tool in your toolbox when you need to reproduce a tricky bug or validate how your app behaves. Traffic replay is especially valuable for testing complex software applications that rely on APIs and microservices, where integration and functionality must be thoroughly validated.

Part 1: Building a Production-Grade Traffic Capture and Replay System

A few years ago I was on call during the Super Bowl. At the time I was working for an observability vendor and one of our customers had an outage caused by a surge in user traffic. But our monitoring system didn’t have enough data to know what went wrong and I sat on a call for 2 hours painfully listening to them spinning up more servers and trying to catch up with the user load.

Debugging Without a Net: The Pain of Reproducing Production Issues

Every engineer has been there — a late-night page, a broken feature in production, and no clear way to reproduce it. The logs are vague. The metrics look normal. Your local environment works fine. Yet something somewhere is failing for real users. So begins the detective work — debugging a live system with almost no tools, no perfect test data, and no clone of production.

MySQL Mocking with Speedscale's Proxymock: A Complete Guide

Testing database-driven applications is notoriously painful. If your app depends on MySQL, you’ve probably spent hours setting up local databases, running migrations, loading data, and then cleaning everything up just to rerun your tests. This repetitive cycle slows development, breaks pipelines, and introduces inconsistency between local and production environments.
Sponsored Post

Testing AI Code in CI/CD Made Simple for Developers

Generative AI can produce code faster than humans, and developers feel more productive with it integrated into their IDEs. That productivity is only real if CI/CD tests are solid and automated. When not appropriately tested, you may encounter a production issue that you haven't seen before. According to the State of Software Delivery 2025 report, 67% of developers spend more time debugging and resolving security vulnerabilities in code generated by AI. That cancels out the efficient gains that they get from faster AI code generation.

QA Debt: The Silent Risk That Can Take Down Your Business

In engineering, we talk a lot about technical debt — the shortcuts and compromises made in code that pile up over time. But there’s another kind of debt that’s just as dangerous and far more invisible: QA debt. QA debt is what happens when testing isn’t given the same attention as features, architecture, or performance. It’s the accumulation of missed edge cases, outdated test suites, incomplete automation, or skipped regression checks.

A Developer's Guide to Improving AI Code Reliability

You’ve probably been there: your AI coding assistant just generated what looks like a perfect solution to your problem. Decent code quality, reasonable structure, and even some comments. You run it, and… it works. So you ship it. Three weeks later, your production logs are full of 500 errors from edge cases the AI never considered, or worse, you discover the code has been making unvalidated database calls that could have been prevented with basic input sanitization.

The Developer's Guide to Debugging AI-Generated Code

AI coding tools like ChatGPT, GitHub Copilot, and Claude have completely changed how we write software. From humble beginnings where non-AI-enabled code assistants made intelligent code suggestions, like Intellisense, the latest agentic tools can generate entire functions, suggest optimal algorithms, and even scaffold complete applications in minutes. However, as any developer who’s worked with AI-generated code knows, the output isn’t always perfect.