Systems | Development | Analytics | API | Testing

Latest Posts

Cloudera Completes SOC 2 Type II Certification for CDP Public Cloud

We believe security is the cornerstone of any legitimate data platform, and we’re excited to announce that Cloudera has successfully achieved SOC 2 Type II certification for Cloudera Data Platform (CDP) Public Cloud. Achieving our SOC 2 certification is the culmination of significant work across our organization and demonstrates to independent auditors that we adhere to industry-standard security controls and processes.

Migrating Apache NiFi Flows from HDF to CFM with Zero Downtime

Has your organization considered upgrading from Hortonworks Data Flow (HDF) to Cloudera Flow Management (CFM), but thought the migration process would be too disruptive to your mission critical dataflows? In truth, many NiFi dataflows can be migrated from HDF to CFM quickly and easily with no data loss and without any service interruption. Here we explore three common use cases where a CFM cluster can assume an HDF cluster’s dataflows with minimal to no downtime.

Get Your Analytics Insights Instantly - Without Abandoning Central IT

Do you need faster time to value? Does your organization’s success depend on immediate delivery of new reports, applications, or projects? When you go to Central IT for support, are you blocked by insanely long wait times for the resources needed to meet your business goals? If so – you are likely one of the growing group of Line of Business (LoB) professionals forced into creating your own solution – creating your own Shadow IT.

Building a Machine Learning Application With Cloudera Data Science Workbench And Operational Database, Part 3: Productionization of ML models

In this last installment, we’ll discuss a demo application that uses PySpark.ML to make a classification model based off of training data stored in both Cloudera’s Operational Database (powered by Apache HBase) and Apache HDFS. Afterwards, this model is then scored and served through a simple Web Application. For more context, this demo is based on concepts discussed in this blog post How to deploy ML models to production.

Digital Transformation is a Data Journey From Edge to Insight

Digital transformation is a hot topic for all markets and industries as it’s delivering value with explosive growth rates. Consider that Manufacturing’s Industry Internet of Things (IIOT) was valued at $161b with an impressive 25% growth rate, the Connected Car market will be valued at $225b by 2027 with a 17% growth rate, or that in the first three months of 2020, retailers realized ten years of digital sales penetration in just three months.

How to configure clients to connect to Apache Kafka Clusters securely - Part 3: PAM authentication

In the previous posts in this series, we have discussed Kerberos and LDAP authentication for Kafka. In this post, we will look into how to configure a Kafka cluster to use a PAM backend instead of an LDAP one. The examples shown here will highlight the authentication-related properties in bold font to differentiate them from other required security properties, as in the example below. TLS is assumed to be enabled for the Apache Kafka cluster, as it should be for every secure cluster.

Cloudera Flow Management Continuous Delivery while Minimizing Downtime

Cloudera Flow Management, based on Apache NiFi and part of the Cloudera DataFlow platform, is used by some of the largest organizations in the world to facilitate an easy-to-use, powerful, and reliable way to distribute and process data at high velocity in the modern big data ecosystem. Increasingly, customers are adopting CFM to accelerate their enterprise streaming data processing from concept to implementation.

Finding digital transformation in high places - how a ski resort improved operational agility and customer experiences

Most blogs in my history are very focused on Industry 4.0’s digital transformation of the manufacturing industry, which in itself is pretty remarkable. By 2025, Industry 4.0 is expected to generate greater than $11 trillion in economic value as connected manufacturing processes, operations and their supply chains become more streamlined, efficient, agile and realize improved productivity, improved uptime and product quality.

Cloudera Data Warehouse Demonstrates Best-in-Class Cloud-Native Price-Performance

Cloud data warehouses allow users to run analytic workloads with greater agility, better isolation and scale, and lower administrative overhead than ever before. With the ability to quickly provision on-demand and the lower fixed and administrative costs, the costs of operating a cloud data warehouse are driven mostly by the price-performance of the specific data warehouse platform.

Brick and Mortar Stores are Now Built Brick by Brick with Digital Insights

In my last three blogs (Get to Know Your Retail Customer: Accelerating Customer Insight and Relevance; Improving your Customer-Centric Merchandising with Location-based in-Store Merchandising; and Maximizing Supply Chain Agility through the “Last Mile” Commitment) I painted a picture that showed an ever-changing landscape in retail, considering that consumers are more in control than ever, mobile (at least somewhat digitally mobile considering the pandemic) and socially connected.