Systems | Development | Analytics | API | Testing

Automating Data Pipelines in CDP with CDE Managed Airflow Service

When we announced the GA of Cloudera Data Engineering back in September of last year, a key vision we had was to simplify the automation of data transformation pipelines at scale. By leveraging Spark on Kubernetes as the foundation along with a first class job management API many of our customers have been able to quickly deploy, monitor and manage the life cycle of their spark jobs with ease. In addition, we allowed users to automate their jobs based on a time-based schedule.

Announcing the GA of Cloudera DataFlow for the Public Cloud

Are you ready to turbo-charge your data flows on the cloud for maximum speed and efficiency? We are excited to announce the general availability of Cloudera DataFlow for the Public Cloud (CDF-PC) – a brand new experience on the Cloudera Data Platform (CDP) to address some of the key operational and monitoring challenges of standard Apache NiFi clusters that are overloaded with high-performant flows.

Cloudera DataFlow for the Public Cloud: A technical deep dive

We just announced Cloudera DataFlow for the Public Cloud (CDF-PC), the first cloud-native runtime for Apache NiFi data flows. CDF-PC enables Apache NiFi users to run their existing data flows on a managed, auto-scaling platform with a streamlined way to deploy NiFi data flows and a central monitoring dashboard making it easier than ever before to operate NiFi data flows at scale in the public cloud.

Why Modernizing the First Mile of the Data Pipeline Can Accelerate all Analytics

Every enterprise is trying to collect and analyze data to get better insights into their business. Whether it is consuming log files, sensor metrics, and other unstructured data, most enterprises manage and deliver data to the data lake and leverage various applications like ETL tools, search engines, and databases for analysis. This whole architecture made a lot of sense when there was a consistent and predictable flow of data to process.

The Foundations of a Modern Data-Driven Organisation: Change from Within (part 2 of 2)

In my previous blog post, I shared examples of how data provides the foundation for a modern organization to understand and exceed customers’ expectations. However, the important role data occupies extends beyond customer experience and revenue, as it becomes increasingly central in optimizing internal processes for the long-term growth of an organization.

Five Reasons Why Platforms Beat Point Solutions in Every Business Case

Once upon an IT time, everything was a “point product,” a specific application designed to do a single job inside a desktop PC, server, storage array, network, or mobile device. Point solutions are still used every day in many enterprise systems, but as IT continues to evolve, the platform approach beats point solutions in almost every use case. A few years ago, there were several choices of data deduplication apps for storage, and now, it’s a standard function in every system.

What's New in CDP Private Cloud Base 7.1.7?

With the release of CDP Private Cloud (PvC) Base 7.1.7, you can look forward to new features, enhanced security, and better platform performance to help your business drive faster insights and value. We understand that migrating your data platform to the latest version can be an intricate task, and at Cloudera we’ve worked hard to simplify this process for all our customers.

Generating and Viewing Lineage through Apache Ozone

As businesses look to scale-out storage, they need a storage layer that is performant, reliable and scalable. With Apache Ozone on the Cloudera Data Platform (CDP), they can implement a scale-out model and build out their next generation storage architecture without sacrificing security, governance and lineage. CDP integrates its existing Shared Data Experience (SDX) with Ozone for an easy transition, so you can begin utilizing object storage on-prem.

The Foundations of a Modern Data-Driven Organisation: Gaining a Clear View of the Customer

Today’s organizations face rising customer expectations in a fragmented marketplace amidst stiff competition. This landscape is one that presents opportunities for a modern data-driven organization to thrive. At the nucleus of such an organization is the practice of accelerating time to insights, using data to make better business decisions at all levels and roles.

Choosing Your Upgrade or Migration Path to Cloudera Data Platform

In our previous blog, we talked about the four paths to Cloudera Data Platform. If you haven’t read that yet, we invite you to take a moment and run through the scenarios in that blog. The four strategies will be relevant throughout the rest of this discussion. Today, we’ll discuss an example of how you might make this decision for a cluster using a “round of elimination” process based on our decision workflow.