Systems | Development | Analytics | API | Testing

How to Use Apache Iceberg in CDP's Open Lakehouse

In June 2022, Cloudera announced the general availability of Apache Iceberg in the Cloudera Data Platform (CDP). Iceberg is a 100% open-table format, developed through the Apache Software Foundation, which helps users avoid vendor lock-in and implement an open lakehouse. The general availability covers Iceberg running within some of the key data services in CDP, including Cloudera Data Warehouse (CDW), Cloudera Data Engineering (CDE), and Cloudera Machine Learning (CML).

Introducing Applied Machine Learning Prototypes

Applied Machine Learning Prototypes (AMPs) are open source projects that will fundamentally change the way data scientists build, deploy, and monitor ML models. These fully-developed prototypes are built around common industry use cases — like Churn Prediction Monitoring, Anomaly Detection, and more — and can be customized to give you significant head start. Available in Cloudera Machine Learning, AMPs are tested, trusted, and research backed by Fast Forward Labs.

Monitoring in Edge Flow Manager | Observability with Grafana

This video explains Edge Flow Manager (EFM) integration with Prometheus and Grafana. After installing and configuring Prometheus to scrape, EFM should also be configured to expose metrics. When the time series are in place, Grafana is also installed and configured to visualize exposed metrics. There are some EFM specific Grafana dashboards that are publicly available that can be easily downloaded and imported to Grafana. When everything is configured correctly agent specific dashboards can be accessed from the EFM UI.

Applying Fine Grained Security to Apache Spark

Apache Spark with its rich data APIs has been the processing engine of choice in a wide range of applications from data engineering to machine learning, but its security integration has been a pain point.t Many enterprise customers needi finer granularity of control, in particular at the column and row level (commonly known as Fine Grained Access Control or FGAC).

Fine-Tune Fair to Capacity Scheduler in Weight Mode

Cloudera Data Platform (CDP) unifies the technologies from Cloudera Enterprise Data Hub (CDH) and Hortonworks Data Platform (HDP). As part of that unification process, Cloudera merged the YARN Scheduler functionality from the legacy platforms, creating a Capacity Scheduler that better services all customers. In merging this scheduler functionality, Cloudera significantly reduced the time and effort to migrate from CDH and HDP.

Industry Impact | Data-Driven Digital Transformation

Data is more than ones and zeroes. If you can put it to work, data has the power to transform your entire company, even your entire industry. With more than 2000 customers in over 85 countries, Cloudera is helping companies across industries generate more revenue, build new products and understand their customers at scale and speed.

Driving Success With a Modern Data Architecture and a Hybrid Approach in the Financial Services and Telco Industries

Corporations are generating unprecedented volumes of data, especially in industries such as telecom and financial services industries (FSI). Many organizations are hoping to leverage these massive amounts of data by investing heavily in big data solutions – solutions that they hope can meet business goals such as increasing customer satisfaction, uncovering alternative revenue streams, or improving operational efficiency.

Commands: Debug and Property Update

The support of remote issue observation, investigation and possibly resolution is a powerful new feature of Edge Flow Manager. This video shows a case where the user observes a problem via the Agent Manager UI, is able to collect additional information using the Debug Command which provides configuration, property and logs from the observed agent and in this particular case is able to resolve the issue by using the Property Update Command to reconfigure the agent remotely.

Flow Creation in Edge Flow Manager

This video shows the usage of Edge Flow Manager’s flow designer and using the example flow it explains the concept of agent classes and publishing. It goes through the Dashboard view for agent classes and the canvas for the flow designer where processors, remote process groups and funnels are also explained. To see all of this in action, a very basic flow is created with two processors and published to the MiNiFi agents under the agent class the flow is designed for. After publishing, the means of tracking the flow deployment progress are also covered.