Systems | Development | Analytics | API | Testing

Latest Posts

Life of PII for Apache Kafka

Several years ago when I was working on a big data project, I saw something a data engineer shouldn’t see. Curious to understand the level of detail in a new credit score dataset we’d received in our data lake, I queried it. I was surprised at how easily and suddenly my screen was flooded with the mortgage history, overdraft limits and year-end financial statements of my colleagues, and I felt deeply uneasy.

Black Friday deal: $350 free Managed Kafka credits

Thanksgiving holiday is upon us. For many of our customers, this is one of the most important periods of the year, with more than 189.6 million U.S. shoppers buying up bargains from Thanksgiving day through Cyber Monday last year. For them and for us, it’s crucial that internal systems can handle high traffic volume without downtime or performance degradation.

SELECT ApacheKafka WITH StreamingSQL FROM RealTimeData

In another life, I taught the Book of Genesis to high school students, including The Tower of Babel excerpt. It struck me ironic that God’s wrath strikes down the tower, cofounds the universal language and scatters humans around the globe to teach King Nimrod a lesson in hubris; meanwhile, the boys in my class were texting their girlfriends across the country and playing video games with friends in Europe and Asia.

New Apache Kafka to AWS S3 Connector

Many in the community have been asking us to develop a new Kafka to S3 connector for some time. So we’re pleased to announce it's now available. It’s been designed to deliver a number of benefits over existing S3 connectors. Like our other Stream Reactors, the connector extends the standard connect config adding a parameter for a SQL command (Lenses Kafka Connect Query Language or “KCQL”). This defines how to map data from the source (in this case Kafka) to the target (S3).

Deploy turn-key DataOps for AWS MSK

Running your own Kafka is starting to feel like wading through oatmeal. We’re not the only ones thinking that. The majority of organizations we speak to have or are in the process of moving their Kafka to a managed service. If you’re already an AWS-shop, Managed Streaming for Apache Kafka (MSK) is a no-brainer. It is the same Kafka that we know and love and integrated with other AWS services such as IAM, Cloudwatch, Cloudtrail, KMS, VPC and more.

On the importance of load testing Kafka

Socrates preached, “To know thyself is the beginning of wisdom.” This ancient Greek anecdote applies to your modern Apache Kafka project: developers, go forth and load test your real-time application to understand the capacity and limitations of your project before deployment. Failure to do so will cost you time and money (e.g. Robinhood’s outage on a historic trading day). Load testing your real-time applications has three main objectives.

Get your GitOps for real-time apps on Apache Kafka & Kubernetes

Infrastructure as code has been an important practice of DevOps for years. Anyone running an Apache Kafka data infrastructure and running on Kubernetes, the chances are you’ve probably nailed defining your infrastructure this way. If you’re running on Kubernetes, you’re likely using operators as part of your CI/CD toolchain to automate your deployments.

Introducing the Apache Kafka App Catalog

Working with Apache Kafka and real-time applications comes with challenges. Visibility into the deployed applications and their dependency on what we call the “data fabric” is one of them (For the sake of this blog, it means Kafka and all its state and configuration). If you’ve built a multi-tenant real-time data platform with Kafka, where teams are deploying applications outside your jurisdiction, this is where the pain is particularly acute. It goes something like this.

The best of Kafka Summit 2020

After a self-isolated and event-free spring, some of us around the world welcomed a more promising summer. You might be taking some time away on a socially distanced holiday. You might be taking some time away from the day-to-day at home. But if a cold beer in the sun isn't enough to make up for these difficult months, the premier event for the Streaming Data Community is back! Kafka Summit has gone virtual this year and that means you can attend the event from anywhere.