Systems | Development | Analytics | API | Testing

January 2021

Industry X.0 - Made Real, Practical Insights Today enabling Profits Tomorrow

Manufacturing’s digital transformation growth is truly impressive considering it’s delivering value with explosive growth rates. Consider that Manufacturing’s Industry Internet of Things (IIOT) was valued at $161b with an impressive 25% growth rate, or that the Connected Car market will be valued at $225b by 2027 with a 17% growth rate. But then conflicting information arrives as VentureBeat reports that around 90 percent of machine learning models never make it into production?

Cloudera Flow Management Continuous Delivery Architecture

Having introduced the flow delivery challenges and corresponding resolutions in the first article ‘Cloudera Flow Management Continuous Delivery while Minimizing Downtime’, we will combine all the preceding solutions into an example of flow management continuous delivery architecture. DataFlow Continuous Delivery Architecture In the whole process, we can see the following steps.

Cloudera Completes SOC 2 Type II Certification for CDP Public Cloud

We believe security is the cornerstone of any legitimate data platform, and we’re excited to announce that Cloudera has successfully achieved SOC 2 Type II certification for Cloudera Data Platform (CDP) Public Cloud. Achieving our SOC 2 certification is the culmination of significant work across our organization and demonstrates to independent auditors that we adhere to industry-standard security controls and processes.

Migrating Apache NiFi Flows from HDF to CFM with Zero Downtime

Has your organization considered upgrading from Hortonworks Data Flow (HDF) to Cloudera Flow Management (CFM), but thought the migration process would be too disruptive to your mission critical dataflows? In truth, many NiFi dataflows can be migrated from HDF to CFM quickly and easily with no data loss and without any service interruption. Here we explore three common use cases where a CFM cluster can assume an HDF cluster’s dataflows with minimal to no downtime.

Get Your Analytics Insights Instantly - Without Abandoning Central IT

Do you need faster time to value? Does your organization’s success depend on immediate delivery of new reports, applications, or projects? When you go to Central IT for support, are you blocked by insanely long wait times for the resources needed to meet your business goals? If so – you are likely one of the growing group of Line of Business (LoB) professionals forced into creating your own solution – creating your own Shadow IT.

Building a Machine Learning Application With Cloudera Data Science Workbench And Operational Database, Part 3: Productionization of ML models

In this last installment, we’ll discuss a demo application that uses PySpark.ML to make a classification model based off of training data stored in both Cloudera’s Operational Database (powered by Apache HBase) and Apache HDFS. Afterwards, this model is then scored and served through a simple Web Application. For more context, this demo is based on concepts discussed in this blog post How to deploy ML models to production.

Digital Transformation is a Data Journey From Edge to Insight

Digital transformation is a hot topic for all markets and industries as it’s delivering value with explosive growth rates. Consider that Manufacturing’s Industry Internet of Things (IIOT) was valued at $161b with an impressive 25% growth rate, the Connected Car market will be valued at $225b by 2027 with a 17% growth rate, or that in the first three months of 2020, retailers realized ten years of digital sales penetration in just three months.

How to configure clients to connect to Apache Kafka Clusters securely - Part 3: PAM authentication

In the previous posts in this series, we have discussed Kerberos and LDAP authentication for Kafka. In this post, we will look into how to configure a Kafka cluster to use a PAM backend instead of an LDAP one. The examples shown here will highlight the authentication-related properties in bold font to differentiate them from other required security properties, as in the example below. TLS is assumed to be enabled for the Apache Kafka cluster, as it should be for every secure cluster.

Cloudera Flow Management Continuous Delivery while Minimizing Downtime

Cloudera Flow Management, based on Apache NiFi and part of the Cloudera DataFlow platform, is used by some of the largest organizations in the world to facilitate an easy-to-use, powerful, and reliable way to distribute and process data at high velocity in the modern big data ecosystem. Increasingly, customers are adopting CFM to accelerate their enterprise streaming data processing from concept to implementation.

Finding digital transformation in high places - how a ski resort improved operational agility and customer experiences

Most blogs in my history are very focused on Industry 4.0’s digital transformation of the manufacturing industry, which in itself is pretty remarkable. By 2025, Industry 4.0 is expected to generate greater than $11 trillion in economic value as connected manufacturing processes, operations and their supply chains become more streamlined, efficient, agile and realize improved productivity, improved uptime and product quality.

Cloudera Data Warehouse Demonstrates Best-in-Class Cloud-Native Price-Performance

Cloud data warehouses allow users to run analytic workloads with greater agility, better isolation and scale, and lower administrative overhead than ever before. With the ability to quickly provision on-demand and the lower fixed and administrative costs, the costs of operating a cloud data warehouse are driven mostly by the price-performance of the specific data warehouse platform.

Brick and Mortar Stores are Now Built Brick by Brick with Digital Insights

In my last three blogs (Get to Know Your Retail Customer: Accelerating Customer Insight and Relevance; Improving your Customer-Centric Merchandising with Location-based in-Store Merchandising; and Maximizing Supply Chain Agility through the “Last Mile” Commitment) I painted a picture that showed an ever-changing landscape in retail, considering that consumers are more in control than ever, mobile (at least somewhat digitally mobile considering the pandemic) and socially connected.

Building a Machine Learning Application With Cloudera Data Science Workbench And Operational Database, Part 2: Querying/ Loading Data

In this installment, we’ll discuss how to do Get/Scan Operations and utilize PySpark SQL. Afterward, we’ll talk about Bulk Operations and then some troubleshooting errors you may come across while trying this yourself. Read the first blog here. Get/Scan Operations In this example, let’s load the table ‘tblEmployee’ that we made in the “Put Operations” in Part 1. I used the same exact catalog in order to load the table. Executing table.show() will give you:

Apache NiFi - the data movement enabler in a hybrid cloud environment

Cloudera provides its customers with a set of consistent solutions running on-premises and in the cloud to ensure customers are successful in their data journey for all of their use cases, regardless of where they are deployed. Cloudera DataFlow provides Apache NiFi in both the Cloudera Data Platform Private Cloud Base (on-premises) and Public Cloud (AWS, Azure, and Google Cloud) products in this hybrid cloud strategy.

Enabling Self-Service Business Insights with Cloudera Data Warehouse

Requests to Central IT for data warehousing services can take weeks or months to deliver. Central IT teams at large organizations face a proliferation of IT projects arising from the complexities of markets and from the needs of internal lines of business (LoBs). At the same time, Central IT must juggle cost and risk.

Top 5 Questions about Apache NiFi

Over the last few weeks, I delivered four live NiFi demo sessions, showing how to use NiFi connectors and processors to connect to various systems, with 1000 attendees in different geographic regions. I want to thank you all for joining and attending these events! Interactive demo sessions and live Q&A are what we all need these days when working remotely from home is now a norm. If you have not seen my live demo session, you can catch up by watching it here.

Building a Machine Learning Application With Cloudera Data Science Workbench And Operational Database, Part 1: The Set-Up & Basics

Introduction Python is used extensively among Data Engineers and Data Scientists to solve all sorts of problems from ETL/ELT pipelines to building machine learning models. Apache HBase is an effective data storage system for many workflows but accessing this data specifically through Python can be a struggle. For data professionals that want to make use of data stored in HBase the recent upstream project “hbase-connectors” can be used with PySpark for basic operations.

Maximizing Supply Chain Agility through the "Last Mile" Commitment

In my last two blogs (Get to Know Your Retail Customer: Accelerating Customer Insight and Relevance, and Improving your Customer-Centric Merchandising with Location-based in-Store Merchandising) we looked at the benefits to retail in building personalized interactions by accessing both structured and unstructured data from website clicks, email and SMS opens, in-store point sale systems and past purchased behaviors.