A slow car has never won a Formula One race. The Olympics doesn’t reward slow times in swimming, track or any other clock-timed sport. Likewise, slow data speeds don’t win over customers or colleagues in the real-time business world. Microsoft’s own research once reported that a person visiting a website on a connected device is likely to wait no more than 10 seconds to see it before moving to a competitor’s site.
Many customers looking at modernizing their pipeline orchestration have turned to Apache Airflow, a flexible and scalable workflow manager for data engineers. With 100s of open source operators, Airflow makes it easy to deploy pipelines in the cloud and interact with a multitude of services on premise, in the cloud, and across cloud providers for a true hybrid architecture.
There are many ways that Apache Kafka has been deployed in the field. In our Kafka Summit 2021 presentation, we took a brief overview of many different configurations that have been observed to date. In this blog series, we will discuss each of these deployments and the deployment choices made along with how they impact reliability.
Leveraging the Internet of Things (IoT) allows you to improve processes and take your business in new directions. But it requires you to live on the edge. That’s where you find the ability to empower IoT devices to respond to events in real time by capturing and analyzing the relevant data.
There are two big gaps in the Apache Kafka project when we think of operating a cluster. The first is monitoring the cluster efficiently and the second is managing failures and changes in the cluster. There are no solutions for these inside the Kafka project but there are many good 3rd party tools for both problems. Cruise Control is one of the earliest open source tools to provide a solution for the failure management problem but lately for the monitoring problem as well.
Shared Data Experience (SDX) on Cloudera Data Platform (CDP) enables centralized data access control and audit for workloads in the Enterprise Data Cloud. The public cloud (CDP-PC) editions default to using cloud storage (S3 for AWS, ADLS-gen2 for Azure). This introduces new challenges around managing data access across teams and individual users. To solve these challenges for S3 and ADLS-gen2, Cloudera has introduced a new service — the Ranger Authorization Service (RAZ).
Across the federal government, agencies are struggling to identify, organize, analyze, and act on troves of data. It’s a problem that leaders are working actively to tackle, but they’re in a race against immeasurable volumes of data that is continuously being generated in perpetuity in stores known and unknown. At the Internal Revenue Service, decades’ worth of data exceeds even the most cutting-edge processing capabilities.