Systems | Development | Analytics | API | Testing

Latest Videos

Introducing Applied Machine Learning Prototypes

Applied Machine Learning Prototypes (AMPs) are open source projects that will fundamentally change the way data scientists build, deploy, and monitor ML models. These fully-developed prototypes are built around common industry use cases — like Churn Prediction Monitoring, Anomaly Detection, and more — and can be customized to give you significant head start. Available in Cloudera Machine Learning, AMPs are tested, trusted, and research backed by Fast Forward Labs.

Monitoring in Edge Flow Manager | Observability with Grafana

This video explains Edge Flow Manager (EFM) integration with Prometheus and Grafana. After installing and configuring Prometheus to scrape, EFM should also be configured to expose metrics. When the time series are in place, Grafana is also installed and configured to visualize exposed metrics. There are some EFM specific Grafana dashboards that are publicly available that can be easily downloaded and imported to Grafana. When everything is configured correctly agent specific dashboards can be accessed from the EFM UI.

Commands: Debug and Property Update

The support of remote issue observation, investigation and possibly resolution is a powerful new feature of Edge Flow Manager. This video shows a case where the user observes a problem via the Agent Manager UI, is able to collect additional information using the Debug Command which provides configuration, property and logs from the observed agent and in this particular case is able to resolve the issue by using the Property Update Command to reconfigure the agent remotely.

Flow Creation in Edge Flow Manager

This video shows the usage of Edge Flow Manager’s flow designer and using the example flow it explains the concept of agent classes and publishing. It goes through the Dashboard view for agent classes and the canvas for the flow designer where processors, remote process groups and funnels are also explained. To see all of this in action, a very basic flow is created with two processors and published to the MiNiFi agents under the agent class the flow is designed for. After publishing, the means of tracking the flow deployment progress are also covered.

Differences between the C++ and Java MiNiFi agents

In this video we will go through all the differences between the C++ and Java MiNiFi agents. The video shows the differences observed on the Edge Flow Manager UI ranging from different information to the presence of buttons and dropdown elements determined by the agent type. Differences in feature set and functionality are also highlighted. The two implementations also have different footprints (memory and CPU) as well as a different set of available components. This video will help you determine the MiNiFi agent that best suits your use case.

The Chief Data Officer | Digital Transformation

Today, data isn't a cost center. It's a business driver. And Chief Data Officers are responsible for using data to create real results and transform their business. Meet Ray, the CDO at a high tech global electronics manufacturer. Ray relies on the Cloudera Data Platform to bring multiple data sources together, Ray's company can connect supply chain, go-to-market and product research data in one place, while lowering the cost on their network.

Technology Spotlight: Apache Iceberg

At Cloudera, we are committed to staying true to our open source roots and working well within the communities is critical to that. Since 2021, we have supported the growing Iceberg community with hundreds of contributions across Impala, Hive, Spark and Iceberg. We look forward to continuing the momentum as companies embrace the open lakehouse. General release now available in the Cloudera Data Platform.

Introduction to Cloudera Edge Flow Manager

This video is a 101 introduction about Edge Flow Manager (EFM), the Cloudera Edge Management (CEM) solution for managing and monitoring Apache MiNiFi agents at scale. The video goes through all the different views of the user interface to demonstrate and explain all of the features for designing flows, publishing flows to the agents, execute remote commands, monitoring the agents, etc.

Hello, Spark! An intro to Apache Spark using PySpark in the Cloud

If you’re new to the world of large-scale data analytics, this session is for you! We'll cover the basics of what problems Apache Spark can solve, why and when to use Spark, and how Spark enables efficient use of time and computing hardware. We’ll also demonstrate how easy it is to run a PySpark job in the public cloud using the Data Science Workbench and Cloudera Data Engineering Products.