Systems | Development | Analytics | API | Testing

June 2021

Migrate Hive data from CDH to CDP public cloud

Many Cloudera customers are making the transition from being completely on-prem to cloud by either backing up their data in the cloud, or running multi-functional analytics on CDP Public cloud in AWS or Azure. The Replication Manager service facilitates both disaster recovery and data migration across different environments.

Fraud Processing with SQL Stream Builder

SQL Stream Builder allows developers and analysts to write streaming applications using industry-standard SQL. In this video, you will learn the interactive experience with syntax checking, error reporting, schema detection, query creation, and creating outputs on fraud detection with its powerful interface and APIs.

From 0 to Dashboard with Cloudera Data Warehouse

Today you'll see a quick demo on how to start off with any given dataset, reference it within Cloudera Data Warehouse, and then use the in house Data Visualization to create a live dashboard from the data. We'll use some example shipping data and show how you can go from 0 to dashboard in no time at all.

Deploying applications on CDP Operational Database (COD)

CDP Operational Database Experience (COD) is a PaaS offering on the Cloudera Data Platform (CDP). COD enables you to create a new operational database with a few clicks and auto-scales based on your workload. Behind the scenes, COD automatically manages cluster deployment and configuration, reducing overheads related to setting up new database instances. Additionally, auto-scaling eliminates the need to size a cluster for your workloads.

Insurers - Be Aware of the Hidden Exposures in assessing the economic impact of Climate Risk

Climate change is a challenge for insurers in some obvious ways, such as stronger and more frequent natural disasters. Yet there are also more subtle risks to monitor, including changes to insured assets, risks, and exposures. Climate impacts the production quality and quantity of insured consumable goods, their location, and their supply chains.

Automated Deployment of CDP Private Cloud Clusters

At Cloudera, we have long believed that automation is key to delivering secure, ready-to-use, and well-configured platforms. Hence, we were pleased to announce the public release of Ansible-based automation to deploy CDP Private Cloud Base. By automating cluster deployment this way, you reduce the risk of misconfiguration, promote consistent deployments across multiple clusters in your environment, and help to deliver business value more quickly.

Telecommunications and the Hybrid Data Cloud

As the inexorable drive to cloud continues, telecommunications service providers (CSPs) around the world – often laggards in adopting disruptive technologies – are embracing virtualization. Not only that, but service providers have been deploying their own clouds, some developing IaaS offerings, and partnering with cloud native content providers like Netflix and Spotify to enhance core telco bundles.

How to use Apache Spark with CDP Operational Database Experience

Apache Spark is a very popular analytics engine used for large-scale data processing. It is widely used for many big data applications and use cases. CDP Operational Database Experience Experience (COD) is a CDP Public Cloud service that lets you create and manage operational database instances and it is powered by Apache HBase and Apache Phoenix.

Future of Data Meetup: Building Automated Machine Learning Workflows in the Cloud

In this meetup, we’re going to put ourselves in the shoes of an electric car manufacturer that produces all the parts for their cars in house. First, we’ll show you an example on how this fictional car company could walk through the process of creating a prediction model based on part production data. We will then automate the creation of these models by making them depending on an upstream data collection process. To finish it off, we’ll deploy these models and make them accessible via an external API all within a native cloud environment using the Cloudera Data Platform.

The 4 keys to a successful manufacturing IIOT pilot

If you have read our previous post focusing on the challenges of planning, launching and scaling IIOT use cases, you’ve narrowed down the business problems you’re trying to solve, and you have a plan that is both created by the implementation team and supported by executive management. Here’s a plan to make sure you’ve got it all down. Think of these success factors like the legs of a kitchen table and the results that you desire, a bowl of homemade chicken soup.

What is new in Cloudera Streaming Analytics 1.4?

At the end of March, we released the first version of Cloudera SQL StreamBuilder as part of CSA 1.3. It enabled users to easily write, run and manage real-time SQL queries on streams from Apache Kafka with an exceptionally smooth user experience. Since then, we have been working hard to expose the full power of Apache Flink SQL and the existing Data Warehousing tools in CDP to combine it into a state-of-the-art real-time analytics platform.

Cloudera named a Strong Performer in The Forrester Wave: Streaming Analytics, Q2 2021

Cloudera has been named as a Strong Performer in the Forrester Wave for Streaming Analytics, Q2 2021. We are excited to be recognized in this wave at, what we consider to be, such a strong position. We are proud to have been named as one of “The 14 providers that matter most” in streaming analytics. The report states that richness of analytics, development tool options and near-effortless scalability are what streaming analytics customers should look for in a provider.

Cloudera Streaming Analytics 1.4: the unification of SQL batch and streaming

In October of 2020 Cloudera acquired Eventador and Cloudera Streaming Analytics (CSA) 1.3.0 was released early in 2021. It was the first release to incorporate SQL Stream Builder (SSB) from the acquisition, and brought rich SQL processing to the already robust Apache Flink offering. The team’s focus turned to bringing Flink Data Definition Language (DDL) and the batch interface into SSB with that completed.

Validations - Cloudera Support's Predictive Alerting Program

Cloudera Support’s cluster validations proactively identify known problem signatures contained in customers’ diagnostic data with the goal of increasing cluster health, performance, and overall stability. Cluster validations are included in a customer’s enterprise subscription at no additional cost. All customers with access to the Support case portal will also be able to take advantage of cluster validations.

Fast Forward Live: Session-based Recommender Systems

Join us live with Fast Forward Labs to discuss the recently possible in Machine Learning and AI. Being able to recommend an item of interest to a user (based on their past preferences) is a highly relevant problem in practice. A key trend over the past few years has been session-based recommendation algorithms that provide recommendations solely based on a user’s interactions in an ongoing session, and which do not require the existence of user profiles or their entire historical preferences. This report explores a simple, yet powerful, NLP-based approach (word2vec) to recommend a next item to a user. While NLP-based approaches are generally employed for linguistic tasks, here we exploit them to learn the structure induced by a user’s behavior or an item’s nature.

Future of Data Meetup: The Power of "Yes" or: How I learned to Stop Worrying and Love Governance

Full data lifecycle projects hold tremendous potential for organizations to uncover new insights and drivers of revenue and profitability. Big Data has brought the promise of doing device data capture, data enrichment, data science, and analytics at scale to enterprises. This promise also comes with challenges for developers, admins, and consumers to continuously access new data and collaborate.

Modernizing Data Pipelines using Cloudera Data Platform - Part 1

Data pipelines are in high demand in today’s data-driven organizations. As critical elements in supplying trusted, curated, and usable data for end-to-end analytic and machine learning workflows, the role of data pipelines is becoming indispensable. To keep up, data pipelines are being vigorously reshaped with modern tools and techniques.

Apache Ozone Metadata Explained

Apache Ozone is a distributed object store built on top of Hadoop Distributed Data Store service. It can manage billions of small and large files that are difficult to handle by other distributed file systems. As an important part of achieving better scalability, Ozone separates the metadata management among different services: Ozone Manager (OM) service manages the metadata of the namespace such as volume, bucket and keys.