Systems | Development | Analytics | API | Testing

May 2021


The Ethics of AI Comes Down to Conscious Decisions

This blog post was written by Pedro Pereira as a guest author for Cloudera. Right now, someone somewhere is writing the next fake news story or editing a deepfake video. An authoritarian regime is manipulating an artificial intelligence (AI) system to spy on technology users. No matter how good the intentions behind the development of a technology, someone is bound to corrupt and manipulate it. Big data and AI amplify the problem. “If you have good intentions, you can make it very good.


Session-based Recommender Systems

Recommendation systems have become a cornerstone of modern life, spanning sectors that include online retail, music and video streaming, and even content publishing. These systems help us navigate the sheer volume of content on the internet, allowing us to discover what’s interesting or important to us. The classic modeling approaches to recommendation systems can be broadly categorized as content-based, as collaborative filtering-based, or as hybrid approaches that combine aspects of the two.


Shorten time to critical insights with Streaming SQL

Data and analytics have become second nature to most businesses, but merely having access to the vast volumes of data from these devices will no longer suffice. Leading enterprises realize that the speed of data presents a new frontier for competitive differentiation. It is imperative for organizations to reduce time-to-insights to gain a competitive advantage by responding decisively to competitors, fine-tuning operations, and serving fickle customers.


The Four Upgrade and Migration Paths to CDP from Legacy Distributions

The move into any new technology requires planning and coordinated effort to ensure a successful transition. This blog will describe the four paths to move from a legacy platform such as Cloudera CDH or HDP into CDP Public Cloud or CDP Private Cloud. The four paths are In-place Upgrade, Side-car Migration, Rolling Side-car Migration, and Migrate to Public Cloud.


Pushing Past Pilot Paralysis to Launch and Scale IIOT Use Cases

With billions of industrial IoT (IIOT) devices in place, generating massive volumes of data from “the edge,” the potential for proof of concept success for use cases in the factory can be paralyzing. While the value of this digital revolution, aka Industry 4.0, is clear, realizing the full promise has been slow. Research and real-life experience from Accenture shows that many manufacturers get stuck early on or can’t get beyond proof-of-concept pilots to scale.

Future of Data Meetup: Collect, Curate, Predict & Visualise your Streaming Data

How do you get your data from A to B? We take you on a journey with your data through: Join us to find out more about managing your data lifecycle, and see it in action during our demo. AGENDA 18:00 - Welcome 18:05 - Best Practice: Streaming Data & Analytics 18:20 - Demo: Collect, Curate, Predict & Visualise your Streaming Data 19:00 - Open Networking 19:30 - END

NVIDIA RAPIDS in Cloudera Machine Learning

In the previous blog post in this series, we walked through the steps for leveraging Deep Learning in your Cloudera Machine Learning (CML) projects. This year, we expanded our partnership with NVIDIA, enabling your data teams to dramatically speed up compute processes for data engineering and data science workloads with no code changes using RAPIDS AI.


Streaming Market Data with Flink SQL Part II: Intraday Value-at-Risk

This article is the second in a multipart series to showcase the power and expressibility of FlinkSQL applied to market data. In case you missed it, part I starts with a simple case of calculating streaming VWAP. Code and data for this series are available on github. Speed matters in financial markets. Whether the goal is to maximize alpha or minimize exposure, financial technologists invest heavily in having the most up-to-date insights on the state of the market and where it is going.


The value of CDP Public Cloud over legacy Hadoop-on-IaaS implementations

Prior the introduction of CDP Public Cloud, many organizations that wanted to leverage CDH, HDP or any other on-prem Hadoop runtime in the public cloud had to deploy the platform in a lift-and-shift fashion, commonly known as “Hadoop-on-IaaS” or simply the IaaS model.


Key considerations when making a decision on a Cloud Data Warehouse

Making a decision on a cloud data warehouse is a big deal. Beyond there being a number of choices each with very different strengths, the parameters for your decision have also changed. Modernizing your data warehousing experience with the cloud means moving from dedicated, on-premises hardware focused on traditional relational analytics on structured data to a modern platform.


Accelerate Moving to CDP with Workload Manager

Since my last blog, What you need to know to begin your journey to CDP, we received many requests for a tool from Cloudera to analyze the workloads and help upgrade or migrate to Cloudera Data Platform (CDP). The good news is Cloudera has a tried and tested tool, Workload Manager (WM) that meets your needs. WM saves time and reduces risks during upgrades or migrations.


5 Factors to Consider When Choosing a Stream Processing Engine

Are you using the right stream processing engine for the job at hand? You might think you are—and you very well might be!—but have you really examined the stream processing engines out there in a side-by-side comparison to make sure? Our Choose the Right Stream Processing Engine for Your Data Needs whitepaper makes those comparisons for you, so you can quickly and confidently determine which engine best meets your key business requirements.


Automating CDP Private Cloud Installations with Ansible

The introduction of CDP Public Cloud has dramatically reduced the time in which you can be up and running with Cloudera’s latest technologies, be it with containerised Data Warehouse, Machine Learning, Operational Database or Data Engineering experiences or the multi-purpose VM-based Data Hub style of deployment.


cdpcurl: Low-Level CDP API Access

Cloudera Data Platform (CDP) provides an API that enables you to access CDP functionality from a script, or to integrate CDP features with an application. In practice you can use the CDP API to script repetitive tasks, manage CDP resources, or even create custom applications. You can learn more about the API in its official documentation. There are multiple ways to access the API, including through a dedicated CLI, through a Java SDK, and through a low-level tool called cdpcurl.

Future of Data Meetup: Continuous SQL With SQL Stream Builder

Continuous SQL is using Structured Query Language (SQL) to create computations against unbounded streams of data, and show the results in a persistent storage. The result stored in a persistent storage can be connected to other applications to have an analytical visualization of your data. Compared to traditional SQL, in Continuous SQL the data has a start, but no end. This means that queries continuously process results to a sink or other target types. When you define your job in SQL, the SQL statement is interpreted and validated against a schema. After the statement is executed, the results that match the criteria are continuously returned.

Quantifying the value of multi-cloud deployment strategies with CDP Public Cloud

In this article, I will be focusing on the contribution that a multi-cloud strategy has towards these value drivers, and address a question that I regularly get from clients: Is there a quantifiable benefit to a multi-cloud deployment? That question is typically being asked when I explain the ability to leverage container technology that offers a consistent deployment environment across multiple clouds and form factors (public, private, or hybrid cloud).


Streaming Market Data with Flink SQL Part I: Streaming VWAP

Speed matters in financial markets. Whether the goal is to maximize alpha or minimize exposure, financial technologists invest heavily in having the most up-to-date insights on the state of the market and where it is going. Event-driven and streaming architectures enable complex processing on market events as they happen, making them a natural fit for financial market applications.


Driving Agility and Scalability through Smart Data

Last year presented business and organizational challenges that hadn’t been seen in a century and the troubling fact is that the challenges applied pains and gains unequally across industry segments. While brick-and-mortar retail was crushed a year ago with mandated store closures, digital commerce retailers realized ten years of digital sales penetration in only three months.