Systems | Development | Analytics | API | Testing

September 2021

Migrate to CDP Private Cloud Base - A Step by Step Guide

Our recent blog discussed the four paths to get from legacy platforms to CDP Private Cloud Base. In this blog and accompanying video, we will deep dive into the mechanics of running an in-place upgrade from CDH5 or CDH6 to CDP Private Cloud Base. The overall upgrade follows a seven-step process illustrated below. In the video below we walk through a complete end to end upgrade of CDH to CDP Private Cloud Base.

Future of Data Meetup: CDP on Azure - Industrial Strength Data Engineering

Data Engineering is undergoing a huge evolution requiring faster and more reliable data pipelines. Apache Spark and Python are core foundational components of this new architecture enabling data engineers to quickly develop these pipelines. They also introduce challenges when moving to production. Come join us as we: Ask questions and learn. We will also have a raffle of Cloudera swag.

Serving the Public Through Data

Digital transformation has been talked about for many years, but the pandemic has accelerated the digital transformation journeys for many enterprises. Forced to adapt to changes in the business landscape and customer behavior, businesses have adopted more digital tools and technologies to drive innovation and increase resilience.

Closing the Gap Between the Digital Haves and Have-Nots

The digital race is on. To pull ahead of the pack, a company needs to know what to do with its data. Without a data-driven strategy, you’re bound to lose ground to competitors who apply their data to operational improvements, product development, go-to-market strategies, and the customer experience. It isn’t enough to collect, interpret, and act on the data. You have to do it fast.

Group vs Fine-Grained Access Control in Cloudera Data Platform Public Cloud

Cloudera Data platform (CDP) provides a Shared Data Experience (SDX) for centralized data access control and audit in the Enterprise Data Cloud. The Ranger Authorization Service (RAZ) is a new service added to help provide fine-grained access control (FGAC) for cloud storage. We covered the value this new capability provides in a previous blog.

Telecom Network Analytics: Transformation, Innovation, Automation

One of the most substantial big data workloads over the past fifteen years has been in the domain of telecom network analytics. Where does it stand today? What are its current challenges and opportunities? In a sense, there have been three phases of network analytics: the first was an appliance based monitoring phase; the second was an open-source expansion phase; and the third – that we are in right now – is a hybrid-data-cloud and governance phase. Let’s examine how we got here.

What's New in CDP Public Cloud? Hive and Impala Get a Facelift

Join us LIVE to discuss what’s new in CDP Public Cloud! Don’t miss the live Q&A as we learn about the new capabilities in Cloudera Data Warehouse. See how the Impala and Hive engines get a facelift. Also watch a demo of how you can run advanced analytics at scale using few easy steps

Speed Up Your Data Flow for Business Results

A slow car has never won a Formula One race. The Olympics doesn’t reward slow times in swimming, track or any other clock-timed sport. Likewise, slow data speeds don’t win over customers or colleagues in the real-time business world. Microsoft’s own research once reported that a person visiting a website on a connected device is likely to wait no more than 10 seconds to see it before moving to a competitor’s site.

Supercharge your Airflow Pipelines with the Cloudera Provider Package

Many customers looking at modernizing their pipeline orchestration have turned to Apache Airflow, a flexible and scalable workflow manager for data engineers. With 100s of open source operators, Airflow makes it easy to deploy pipelines in the cloud and interact with a multitude of services on premise, in the cloud, and across cloud providers for a true hybrid architecture.

Apache Kafka Deployments and Systems Reliability - Part 1

There are many ways that Apache Kafka has been deployed in the field. In our Kafka Summit 2021 presentation, we took a brief overview of many different configurations that have been observed to date. In this blog series, we will discuss each of these deployments and the deployment choices made along with how they impact reliability.

Living on the Edge: How to Accelerate Your Business with Real-time Analytics

Leveraging the Internet of Things (IoT) allows you to improve processes and take your business in new directions. But it requires you to live on the edge. That’s where you find the ability to empower IoT devices to respond to events in real time by capturing and analyzing the relevant data.

Operating Apache Kafka with Cruise Control

There are two big gaps in the Apache Kafka project when we think of operating a cluster. The first is monitoring the cluster efficiently and the second is managing failures and changes in the cluster. There are no solutions for these inside the Kafka project but there are many good 3rd party tools for both problems. Cruise Control is one of the earliest open source tools to provide a solution for the failure management problem but lately for the monitoring problem as well.

Streaming Analytics with SQL Stream Builder

SQL Stream Builder, part of Cloudera Streaming Analytics, allows developers, analysts, and data scientists to write streaming applications using industry-standard SQL. It provides an interactive experience, so the development process is quick, easy, and productive while taking advantage of Apache Flink’s streaming power. It provides an advanced materialized view engine to interface with applications, tooling, and services via REST API.

Enabling Multi-User Fine-Grained Access Control for Cloud Storage in CDP

Shared Data Experience (SDX) on Cloudera Data Platform (CDP) enables centralized data access control and audit for workloads in the Enterprise Data Cloud. The public cloud (CDP-PC) editions default to using cloud storage (S3 for AWS, ADLS-gen2 for Azure). This introduces new challenges around managing data access across teams and individual users. To solve these challenges for S3 and ADLS-gen2, Cloudera has introduced a new service — the Ranger Authorization Service (RAZ).

Cloudera and NVIDIA Help IRS Fight Fraud, Safeguard Taxpayers

Across the federal government, agencies are struggling to identify, organize, analyze, and act on troves of data. It’s a problem that leaders are working actively to tackle, but they’re in a race against immeasurable volumes of data that is continuously being generated in perpetuity in stores known and unknown. At the Internal Revenue Service, decades’ worth of data exceeds even the most cutting-edge processing capabilities.

Supporting Transformation with an Integrated Data Platform. Three Common Questions Answered.

In recent years there has been increased interest in how to safely and efficiently extend enterprise data platforms and workloads into the cloud. CDOs are under increasing pressure to reduce costs by moving data and workloads to the cloud, similar to what has happened with business applications during the last decade. Our upcoming webinar is centered on how an integrated data platform supports the data strategy and goals of becoming a data-driven company.

Optimizing Cloudera Data Engineering Autoscaling Performance

The shift to cloud has been accelerating, and with it, a push to modernize data pipelines that fuel key applications. That is why cloud native solutions which take advantage of the capabilities such as disaggregated storage & compute, elasticity, and containerization are more paramount than ever. At Cloudera, we introduced Cloudera Data Engineering (CDE) as part of our Enterprise Data Cloud product — Cloudera Data Platform (CDP) — to meet these challenges.