Systems | Development | Analytics | API | Testing

Latest Posts

Trusted Data: Alchemy For Misinformation

The best description of untrusted data I’ve ever heard is, “We all attend the QBR – Sales, Marketing, Finance – and present quarterly results, except the Sales reports and numbers don’t match Marketing numbers and neither match Finance reports. We argue about where the numbers came from, then after 45 minutes of digging for common ground, we chuck our shovels and abandon the call in disgust.” How would you go about fixing that situation?

Materialized Views in SQL Stream Builder

Cloudera SQL Stream Builder (SSB) gives the power of a unified stream processing engine to non-technical users so they can integrate, aggregate, query, and analyze both streaming and batch data sources in a single SQL interface. This allows business users to define events of interest for which they need to continuously monitor and respond quickly. There are many ways to distribute the results of SSB’s continuous queries to embed actionable insights into business processes.

Observe Everything

Over the past handful of years, systems architecture has evolved from monolithic approaches to applications and platforms that leverage containers, schedulers, lambda functions, and more across heterogeneous infrastructures. Cloudera Data Platform (CDP) is no different: it’s a hybrid data platform that meets organizations’ needs to get to grips with complex data anywhere, turning it into actionable insight quickly and easily.

Educating ChatGPT on Data Lakehouse

As the use of ChatGPT becomes more prevalent, I frequently encounter customers and data users citing ChatGPT’s responses in their discussions. I love the enthusiasm surrounding ChatGPT and the eagerness to learn about modern data architectures such as data lakehouses, data meshes, and data fabrics. ChatGPT is an excellent resource for gaining high-level insights and building awareness of any technology. However, caution is necessary when delving deeper into a particular technology.

Reliable Data Exchange with the Outbox Pattern and Cloudera DiM

In this post, I will demonstrate how to use the Cloudera Data Platform (CDP) and its streaming solutions to set up reliable data exchange in modern applications between high-scale microservices, and ensure that the internal state will stay consistent even under the highest load.

Self Service is Simply Efficient - Cloudera DataFlow Designer GA announcement

We are thrilled to announce that the new DataFlow Designer is now generally available to all CDP Public Cloud customers. Data leaders will be able to simplify and accelerate the development and deployment of data pipelines, saving time and money by enabling true self service.

Cloudera DataFlow Designer: The Key to Agile Data Pipeline Development

We just announced the general availability of Cloudera DataFlow Designer, bringing self-service data flow development to all CDP Public Cloud customers. In our previous DataFlow Designer blog post, we introduced you to the new user interface and highlighted its key capabilities. In this blog post we will put these capabilities in context and dive deeper into how the built-in, end-to-end data flow life cycle enables self-service data pipeline development.

Streaming Ingestion for Apache Iceberg With Cloudera Stream Processing

Recently, we announced enhanced multi-function analytics support in Cloudera Data Platform (CDP) with Apache Iceberg. Iceberg is a high-performance open table format for huge analytic data sets. It allows multiple data processing engines, such as Flink, NiFi, Spark, Hive, and Impala to access and analyze data in simple, familiar SQL tables.

A UI That Makes You Want to Stream

To get the most out of any application, a graphical user interface improves your efficiency and data streaming without exception. A UI should help you through the steps of an often-complex flow as the visible layer between your problem and solution. Even the most hardcore back end enthusiasts will admit that its significance is undeniable for a complete product. It has to be well organized and easy to understand, yet be able to provide the right tools in the right place.

Leveraging Data Analytics in the Fight Against Prescription Opioid Abuse

Every day in the US thousands of legitimate prescriptions for the opioid class of pharmaceuticals are written to mitigate acute pain during post-operation recovery, chronic back and neck pain, and a host of other cases where patients experience moderate-to-severe discomfort.