Systems | Development | Analytics | API | Testing

Latest Posts

Fraud Detection With Cloudera Stream Processing Part 2: Real-Time Streaming Analytics

In part 1 of this blog we discussed how Cloudera DataFlow for the Public Cloud (CDF-PC), the universal data distribution service powered by Apache NiFi, can make it easy to acquire data from wherever it originates and move it efficiently to make it available to other applications in a streaming fashion.

Why Replicating HBase Data Using Replication Manager is the Best Choice

In this article we discuss the various methods to replicate HBase data and explore why Replication Manager is the best choice for the job with the help of a use case. Cloudera Replication Manager is a key Cloudera Data Platform (CDP) service, designed to copy and migrate data between environments and infrastructures across hybrid clouds.

Beyond Data Fabrics: Cloudera Modern Data Architectures

As Cloudera CMO David Moxey outlined in his blog, we live in a hybrid data world. Data is growing and continues to accelerate its growth. It is changing in makeup and appearing in ever more places. Driving insight and value from it all is as much of an opportunity as it is a challenge. As a result, it’s getting ​​progressively more complex for businesses to access, use, and create value from it.

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

We are excited to announce the general availability of Apache Iceberg in Cloudera Data Platform (CDP). Iceberg is a 100% open table format, developed through the Apache Software Foundation, and helps users avoid vendor lock-in. Today’s general availability announcement covers Iceberg running within key data services in the Cloudera Data Platform (CDP)—including Cloudera Data Warehousing (CDW), Cloudera Data Engineering (CDE), and Cloudera Machine Learning (CML).

Fraud Detection with Cloudera Stream Processing Part 1

In a previous blog of this series, Turning Streams Into Data Products, we talked about the increased need for reducing the latency between data generation/ingestion and producing analytical results and insights from this data. We discussed how Cloudera Stream Processing (CSP) with Apache Kafka and Apache Flink could be used to process this data in real time and at scale. In this blog we will show a real example of how that is done, looking at how we can use CSP to perform real-time fraud detection.

Making the World a Better Place with Data

Much of the hype around big data and analytics focuses on business value and bottom-line impacts. Those are enormously important in the private and public sectors alike. But for government agencies, there is a greater mission: improving people’s lives. Data makes the most ambitious and even idealistic goals—like making the world a better place—possible.

Build Hybrid Data Pipelines and Enable Universal Connectivity With CDF-PC Inbound Connections

In the second blog of the Universal Data Distribution blog series, we explored how Cloudera DataFlow for the Public Cloud (CDF-PC) can help you implement use cases like data lakehouse and data warehouse ingest, cybersecurity, and log optimization, as well as IoT and streaming data collection. A key requirement for these use cases is the ability to not only actively pull data from source systems but to receive data that is being pushed from various sources to the central distribution service.

The Future of the Data Lakehouse - Open

Cloudera customers run some of the biggest data lakes on earth. These lakes power mission critical large scale data analytics, business intelligence (BI), and machine learning use cases, including enterprise data warehouses. In recent years, the term “data lakehouse” was coined to describe this architectural pattern of tabular analytics over data in the data lake.

Turning Streams Into Data Products

Every large enterprise organization is attempting to accelerate their digital transformation strategies to engage with their customers in a more personalized, relevant, and dynamic way. The ability to perform analytics on data as it is created and collected (a.k.a. real-time data streams) and generate immediate insights for faster decision making provides a competitive edge for organizations.