Systems | Development | Analytics | API | Testing

March 2023

No Average Patient - Leveraging Data for Precision Healthcare

The evolution of healthcare has come a long way since local physicians made house calls and homespun remedies were formulated using items from the kitchen spice rack. Today’s healthcare is driven as much by the promise of emerging technologies centered on data processing and advanced analytics as by developing new and specialized drugs.

Trusted Data: Alchemy For Misinformation

The best description of untrusted data I’ve ever heard is, “We all attend the QBR – Sales, Marketing, Finance – and present quarterly results, except the Sales reports and numbers don’t match Marketing numbers and neither match Finance reports. We argue about where the numbers came from, then after 45 minutes of digging for common ground, we chuck our shovels and abandon the call in disgust.” How would you go about fixing that situation?

Cloudera + Talend | Hybrid Cloud Heros

Learn more about why we are partnering with Talend. Talend is a leader in Year Gartner Magic Quadrant providing data integration tools in 2022. Talend provides enterprises with high quality data solutions to achieve data health. Talend’s Data Health and Lineage capability adds business context to data, enhancing our data governance, which helps enterprises accurately assess data risks.

Materialized Views in SQL Stream Builder

Cloudera SQL Stream Builder (SSB) gives the power of a unified stream processing engine to non-technical users so they can integrate, aggregate, query, and analyze both streaming and batch data sources in a single SQL interface. This allows business users to define events of interest for which they need to continuously monitor and respond quickly. There are many ways to distribute the results of SSB’s continuous queries to embed actionable insights into business processes.

Observe Everything

Over the past handful of years, systems architecture has evolved from monolithic approaches to applications and platforms that leverage containers, schedulers, lambda functions, and more across heterogeneous infrastructures. Cloudera Data Platform (CDP) is no different: it’s a hybrid data platform that meets organizations’ needs to get to grips with complex data anywhere, turning it into actionable insight quickly and easily.

Educating ChatGPT on Data Lakehouse

As the use of ChatGPT becomes more prevalent, I frequently encounter customers and data users citing ChatGPT’s responses in their discussions. I love the enthusiasm surrounding ChatGPT and the eagerness to learn about modern data architectures such as data lakehouses, data meshes, and data fabrics. ChatGPT is an excellent resource for gaining high-level insights and building awareness of any technology. However, caution is necessary when delving deeper into a particular technology.

Open Data Lakehouse powered by Apache Iceberg on Apache Ozone

With minimal setup, it is this simple to get started with Iceberg on Ozone in CDP Private Cloud. This ability allows you to reap the benefits of both a powerful exabyte-scale storage system and an optimized table format for petabyte-scale analytics. In this video I'm going to demonstrate how to create, upgrade and use iceberg tables on Ozone in CDP Private Cloud. Iceberg is engine agnostic and it works with most analytic query engines like Hive, Impala, Spark and so on.

Reliable Data Exchange with the Outbox Pattern and Cloudera DiM

In this post, I will demonstrate how to use the Cloudera Data Platform (CDP) and its streaming solutions to set up reliable data exchange in modern applications between high-scale microservices, and ensure that the internal state will stay consistent even under the highest load.

Self Service is Simply Efficient - Cloudera DataFlow Designer GA announcement

We are thrilled to announce that the new DataFlow Designer is now generally available to all CDP Public Cloud customers. Data leaders will be able to simplify and accelerate the development and deployment of data pipelines, saving time and money by enabling true self service.

Cloudera DataFlow Designer: The Key to Agile Data Pipeline Development

We just announced the general availability of Cloudera DataFlow Designer, bringing self-service data flow development to all CDP Public Cloud customers. In our previous DataFlow Designer blog post, we introduced you to the new user interface and highlighted its key capabilities. In this blog post we will put these capabilities in context and dive deeper into how the built-in, end-to-end data flow life cycle enables self-service data pipeline development.

Streaming Ingestion for Apache Iceberg With Cloudera Stream Processing

Recently, we announced enhanced multi-function analytics support in Cloudera Data Platform (CDP) with Apache Iceberg. Iceberg is a high-performance open table format for huge analytic data sets. It allows multiple data processing engines, such as Flink, NiFi, Spark, Hive, and Impala to access and analyze data in simple, familiar SQL tables.

A UI That Makes You Want to Stream

To get the most out of any application, a graphical user interface improves your efficiency and data streaming without exception. A UI should help you through the steps of an often-complex flow as the visible layer between your problem and solution. Even the most hardcore back end enthusiasts will admit that its significance is undeniable for a complete product. It has to be well organized and easy to understand, yet be able to provide the right tools in the right place.