Systems | Development | Analytics | API | Testing

August 2022

Breaking State and Local Data Silos with Modern Data Architectures

Data is the fuel that drives government, enables transparency, and powers citizen services. But while state and local governments seek to improve policies, decision making, and the services constituents rely upon, data silos create accessibility and sharing challenges that hinder public sector agencies from transforming their data into a strategic asset and leveraging it for the common good.

Incremental Strategies to Move Your Data Strategy Forward

Firms are burdened with tech debt and endless regulatory compliance, often leaving innovation last to receive the necessary budgets. Data-fuelled innovation requires a pragmatic strategy. This blog lays out some steps to help you incrementally advance efforts to be a more data-driven, customer-centric organization.

Authentication and Authorization in Edge Flow Manager

This video covers the security aspects of Edge Flow Manager (EFM). It shows the differences between an admin and a regular user. The important thing to note is that authorization is based on Agent Classes so if a user has no defined policy on a particular Agent Class, then the user won’t see any class / agent / event information that belongs to such a class. For convenience users can be grouped so permissions can be inherited from pre-defined groups.

Building Custom Runtimes with Editors in Cloudera Machine Learning

Cloudera Machine Learning (CML) is a cloud-native and hybrid-friendly machine learning platform. It unifies self-service data science and data engineering in a single, portable service as part of an enterprise data cloud for multi-function analytics on data anywhere. CML empowers organizations to build and deploy machine learning and AI capabilities for business at scale, efficiently and securely, anywhere they want.

Cloudera Data Platform (CDP) One

Data analytics is a big deal, with big goals, and even bigger transformation. Unlocking that potential often requires big complex projects, big teams, and big budgets. Introducing Cloudera Data Platform One or CDP One. An all in one cloud service that radically simplifies the entire data lifecycle from ingestion to analysis while delivering the power of an enterprise data platform with the simplicity of a turnkey solution. CDP One integrates with all your existing tools, bringing all your siloed data together in an open data lake house without the need for specialized ops and cloud expertize.

Fraud Detection with Cloudera Stream Processing

This video shows how Cloudera DataFlow powered by Apache NiFi solves the first-mile problem by making it easy and efficient to acquire, transform, and move data so that we can enable streaming analytics use cases with very little effort. It will also briefly discuss the advantages of running this flow in a cloud-native Kubernetes deployment of Cloudera DataFlow. Then, we will explore how we can run real-time streaming analytics using Apache Flink, and we will use Cloudera SQL Stream Builder GUI to easily create streaming jobs using only SQL language (no Java/Scala coding required).

How Universal Data Distribution Accelerates Complex DoD Missions

We’ve come a long way since 1778 when George Washington’s spies gathered and shared military intelligence on the British Army’s tactical operations in occupied New York. But information broadly, and the management of data specifically, is still “the” critical factor for situational awareness, streamlined operations, and a host of other use cases across today’s tech-driven battlefields.

Getting Started with Cloudera Stream Processing Community Edition

Cloudera has a strong track record of providing a comprehensive solution for stream processing. Cloudera Stream Processing (CSP), powered by Apache Flink and Apache Kafka, provides a complete stream management and stateful processing solution. In CSP, Kafka serves as the storage streaming substrate, and Flink as the core in-stream processing engine that supports SQL and REST interfaces.

An Introduction to Disaster Recovery with the Cloudera Data Platform

The previous decade has seen explosive growth in the integration of data and data-driven insight into a company’s ability to operate effectively, yielding an ever-growing competitive advantage to those that do it well. Our customers have become accustomed to the speed of decision making that comes from that insight. Data is integral for both long-term strategy and day-to-day, or even minute-to-minute operation.

The future of data architecture is hybrid: choosing your hybrid-first data strategy starts at Cloudera Now 2022

With all of the buzz around cloud computing, many companies have overlooked the importance of hybrid data. Many large enterprises went all-in on cloud without considering the costs and potential risks associated with a cloud-only approach. The truth is, the future of data architecture is all about hybrid.

How to Use Apache Iceberg in CDP's Open Lakehouse

In June 2022, Cloudera announced the general availability of Apache Iceberg in the Cloudera Data Platform (CDP). Iceberg is a 100% open-table format, developed through the Apache Software Foundation, which helps users avoid vendor lock-in and implement an open lakehouse. The general availability covers Iceberg running within some of the key data services in CDP, including Cloudera Data Warehouse (CDW), Cloudera Data Engineering (CDE), and Cloudera Machine Learning (CML).

Introducing Applied Machine Learning Prototypes

Applied Machine Learning Prototypes (AMPs) are open source projects that will fundamentally change the way data scientists build, deploy, and monitor ML models. These fully-developed prototypes are built around common industry use cases — like Churn Prediction Monitoring, Anomaly Detection, and more — and can be customized to give you significant head start. Available in Cloudera Machine Learning, AMPs are tested, trusted, and research backed by Fast Forward Labs.

Monitoring in Edge Flow Manager | Observability with Grafana

This video explains Edge Flow Manager (EFM) integration with Prometheus and Grafana. After installing and configuring Prometheus to scrape, EFM should also be configured to expose metrics. When the time series are in place, Grafana is also installed and configured to visualize exposed metrics. There are some EFM specific Grafana dashboards that are publicly available that can be easily downloaded and imported to Grafana. When everything is configured correctly agent specific dashboards can be accessed from the EFM UI.

Applying Fine Grained Security to Apache Spark

Apache Spark with its rich data APIs has been the processing engine of choice in a wide range of applications from data engineering to machine learning, but its security integration has been a pain point.t Many enterprise customers needi finer granularity of control, in particular at the column and row level (commonly known as Fine Grained Access Control or FGAC).

Fine-Tune Fair to Capacity Scheduler in Weight Mode

Cloudera Data Platform (CDP) unifies the technologies from Cloudera Enterprise Data Hub (CDH) and Hortonworks Data Platform (HDP). As part of that unification process, Cloudera merged the YARN Scheduler functionality from the legacy platforms, creating a Capacity Scheduler that better services all customers. In merging this scheduler functionality, Cloudera significantly reduced the time and effort to migrate from CDH and HDP.

Industry Impact | Data-Driven Digital Transformation

Data is more than ones and zeroes. If you can put it to work, data has the power to transform your entire company, even your entire industry. With more than 2000 customers in over 85 countries, Cloudera is helping companies across industries generate more revenue, build new products and understand their customers at scale and speed.