Systems | Development | Analytics | API | Testing

One Big Cluster Stuck: Environment Health Scorecard

Throughout the One Big Cluster Stuck series we’ve explored impactful best practices to gain control of your Cloudera Data platform (CDP) environment and significantly improve its health and performance. We’ve shared code, dashboards, and tools to help you on your health improvement journey. We’d like to provide one last tool.

From Hive Tables to Iceberg Tables: Hassle-Free

For more than a decade now, the Hive table format has been a ubiquitous presence in the big data ecosystem, managing petabytes of data with remarkable efficiency and scale. But as the data volumes, data variety, and data usage grows, users face many challenges when using Hive tables because of its antiquated directory-based table format. Some of the common issues include constrained schema evolution, static partitioning of data, and long planning time because of S3 directory listings.

12 Times Faster Query Planning With Iceberg Manifest Caching in Impala

Iceberg is an emerging open-table format designed for large analytic workloads. The Apache Iceberg project continues developing an implementation of Iceberg specification in the form of Java Library. Several compute engines such as Impala, Hive, Spark, and Trino have supported querying data in Iceberg table format by adopting this Java Library provided by the Apache Iceberg project.

Integrating Cloudera Data Warehouse with Kudu Clusters

Apache Impala and Apache Kudu make a great combination for real-time analytics on streaming data for time series and real-time data warehousing use cases. More than 200 Cloudera customers have implemented Apache Kudu with Apache Spark for ingestion and Apache Impala for real-time BI use cases successfully over the last decade, with thousands of nodes running Apache Kudu.

Cloudera Data Catalog | Data Stewardship, Data Lakes, & GDPR in Pharma

Explore the captivating world of Data Stewardship with a focus on Cloudera's Data Catalog. In this friendly and professional session, our esteemed speaker, Hemanth, will share his expertise and knowledge to foster collaboration and discussion among participants, as we delve into the intricacies of Data Lakes and GDPR compliance within the Pharma industry. During this interactive session, Hemanth will expertly guide participants through key concepts related to Cloudera Data Catalog, including.

Calving Apache Iceberg

Apache Iceberg is an open-source high-performance format for huge analytic tables that brings the reliability and simplicity of SQL tables to big data. It enables engines like Spark, Trino, Flink, Presto, Hive, and Impala to work with the same tables, simultaneously and safely. Discover how Apache Iceberg can transform the way you store and manage your big data, and take your analytics to the next level.

CDP Private Cloud | Cloud-native analytics on-premises

In this demo, you'll learn how CDP Private Cloud, Cloudera's on-premises private open data lakehouse leverages Kubernetes technology to deliver cloud-native data storage, processing, and analytics capabilities in and air-gapped environment. We also delve into example modern data use cases that can run on CDP Private Cloud today, including large language models for training in-context enterprise AI, running an air-gapped data lakehouse with Apache Iceberg, and all your data can be underpinned by Apache Ozone for object storage akin to cloud storage.

How to Manage Risk with Modern Data Architectures

The recent failures of regional banks in the US, such as Silicon Valley Bank (SVB), Silvergate, Signature, and First Republic, were caused by multiple factors. To ensure the stability of the US financial system, the implementation of advanced liquidity risk models and stress testing using (MI/AI) could potentially serve as a protective measure.