Once upon a time (2017), in an office far far away, you may have been cornered in a conversation with someone from Legal about GDPR. It could have gone something like this: “You there, Data Engineer” “Yep, that’s me” “What PII do we have residing in this Apache Kafka database?” You probably mumbled something about Kafka not being a database. “And who can read/ write the data?
A decade ago, all developers could talk about was breaking down the monolith and event-driven architectures. Especially in the financial services industry, to become more nimble and accelerate their application delivery. They leveraged messaging systems to decouple the application, and specifically Apache Kafka has transitioned from being a data integration technology to the leading messaging system for microservices.
2021 is the year data engineers shift from analytics support to governing data flows. As data becomes an organization's most valuable commodity, it is those with the governance to confidently analyze and act on it that will develop the next big applications.
Splunk is a technology that made processing huge volumes and complex datasets accessible to security and IT teams. Despite its strengths for monitoring and investigation, Splunk is a bit of a one-way street. Once it's in Splunk, it's not that easy to stream the data elsewhere in great volume. And it doesn’t mean it’s the best technology for all IT and Security use cases. Or the cheapest.
In the previous posts in this series, we have discussed Kerberos and LDAP authentication for Kafka. In this post, we will look into how to configure a Kafka cluster to use a PAM backend instead of an LDAP one. The examples shown here will highlight the authentication-related properties in bold font to differentiate them from other required security properties, as in the example below. TLS is assumed to be enabled for the Apache Kafka cluster, as it should be for every secure cluster.
If anyone's ever been to AWS ReInvent in Vegas before, you'll know it's a crazy ride. This year we missed out (at least we have cleaner consciences and healthier wallets). But the high quality of content hadn't changed. We've been binging on sessions ‘til the bitter end (it officially ended Friday). So for our community, here is a summary of a few talks related to Apache Kafka.
“We’ve seen two years’ worth of digital transformation in two months” said Microsoft’s Satya Nadella. Due to COVID-19, digital transformation roadmaps have been deleted, redrafted, doubled down and accelerated by up to a decade. Traditional companies are moving by osmosis towards streaming technologies such as Apache Kafka to kick off new digital services. But how much should it cost to experience 2030 in 2021?
At the heart of Kafka is real-time data. With data at the center of any Kafka environment, it should be the area that gets the most attention, but typically it gets the least. This happens because we see most organizations split their Kafka efforts into three areas: infrastructure, monitoring, and data operations.
Are you running your organization's Apache Kafka on-premise? If you are and you’re still reading this article, it’s more than likely that Kafka is or will be a keystone of your data infrastructure. But it’s also likely your teams are tired of the cost and complexity required to scale it, meaning your honeymoon with Kafka is coming to an end. So what does the imminent migration mean?