Systems | Development | Analytics | API | Testing

October 2020

CDP Data Visualization: Self-Service Data Visualization For The Full Data Lifecycle

With the massive explosion of data across the enterprise — both structured and unstructured from existing sources and new innovations such as streaming and IoT — businesses have needed to find creative ways of managing their increasingly complex data lifecycle to speed time to insight.

Listening to the Customer in the 21st Century: It's All About Data

The customer has never been more right. Across industries, customers have become conditioned to demand not only near-instant responses to their needs but that their needs be anticipated in advance. Financial institutions are not given a pass, despite a competitive landscape flooded with regulation and privacy considerations. The customer still has expectations for a personalized, timely, and relevant experience.

New Applied ML Research: Meta-Learning & Structural Time Series

At Cloudera Fast Forward we work to make the recently possible useful. Our goal is to take the incredible data science and machine learning research developments we see emerging from academia and large industrial labs, and bridge the gap to products and processes that are useful to practitioners working across industries.

Data security vs usability: you can have it all

Growing up, were you ever told you can’t have it all? That you can’t eat all the snacks in one sitting? That you can’t watch the complete Back to the Future trilogy as well as study for your science exam in one evening? Over time, we learn to set priorities, make a decision for one thing over the other, and compromise. Just like when it comes to data access in business.

HBase Clusters Data Synchronization with HashTable/SyncTable tool

Replication (covered in this previous blog article) has been released for a while and is among the most used features of Apache HBase. Having clusters replicating data with different peers is a very common deployment, whether as a DR strategy or simply as a seamless way of replicating data between production/staging/development environments.

Re-thinking The Insurance Industry In Real-Time To Cope With Pandemic-scale Disruption

The Insurance industry is in uncharted waters and COVID-19 has taken us where no algorithm has gone before. Today’s models, norms, and averages are being re-written on the fly, with insurers forced to cope with the inevitable conflict between old standards and the new normal.

New Multithreading Model for Apache Impala

Today we are introducing a new series of blog posts that will take a look at recent enhancements to Apache Impala. Many of these are performance improvements, such as the feature described below which will give anywhere from a 2x to 7x performance improvement by taking better advantage of all the CPU cores. In addition, a lot of work has also been put into ensuring that Impala runs optimally in decoupled compute scenarios, where the data lives in object storage or remote HDFS.

How-to: Index Data from S3 via NiFi Using CDP Data Hubs

Data Discovery and Exploration (DDE) was recently released in tech preview in Cloudera Data Platform in public cloud. In this blog we will go through the process of indexing data from S3 into Solr in DDE with the help of NiFi in Data Flow. The scenario is the same as it was in the previous blog but the ingest pipeline differs. Spark as the ingest pipeline tool for Search (i.e.

Validating Jet Engine Predictive Models Using Cloudera Machine Learning

In this video, we’ll go over how to use Cloudera Machine Learning (CML) to validate a complex predictive model. Using a publicly available NASA dataset that simulates how jet engines degrade over time, we’ll use machine learning concepts in a cloud environment to go from simulation data to a cost benefit analysis in just a few steps. We’ll also show how we can run experiments to track specific metrics from many different scenarios that our predictive model could possibly be implemented in.

Using Cloudera Machine Learning to Build a Predictive Maintenance Model for Jet Engines

Running a large commercial airline requires the complex management of critical components, including fuel futures contracts, aircraft maintenance and customer expectations. Airlines, in just the U.S. alone, average about 45,000 daily flights and transporting over 10 million passengers a year (source: FAA). Airlines typically operate on very thin margins, and any schedule delay immediately angers or frustrates customers.

Apache Spark on Kubernetes: How Apache YuniKorn (Incubating) helps

Apache Spark unifies batch processing, real-time processing, stream analytics, machine learning, and interactive query in one-platform. While Apache Spark provides a lot of capabilities to support diversified use cases, it comes with additional complexity and high maintenance costs for cluster administrators. Let’s look at some of the high-level requirements for the underlying resource orchestrator to empower Spark as a one-platform.

What you need to know to begin your journey to CDP

Recently, my colleague published a blog build on your investment by Migrating or Upgrading to CDP Data Center, which articulates great CDP Private Cloud Base features. Existing CDH and HDP customers can immediately benefit from this new functionality. This blog focuses on the process to accelerate your CDP journey to CDP Private Cloud Base for both professional services engagements and self-service upgrades.

Cloudera acquires Eventador to accelerate Stream Processing in Public & Hybrid Clouds

We are thrilled to announce that Cloudera has acquired Eventador, a provider of cloud-native services for enterprise-grade stream processing. Eventador, based in Austin, TX, was founded by Erik Beebe and Kenny Gorman in 2016 to address a fundamental business problem – make it simpler to build streaming applications built on real-time data. This typically involved a lot of coding with Java, Scala or similar technologies.

7 Requirements for Digital Transformation

Digital transformation is not just about technological transformation of the organization, it’s about transforming the culture of an organization. It’s not enough to bolt technology onto an existing strategy and consider it transformed. That’s the message from our Chief Marketing Officer Mick Hollison discussing digital transformation with Charlene Li at Cloudera Now.

Collaboration is Key to Reducing Pain and Finding Value in Data

When it comes to cloud, being an early adopter does not necessarily put you ahead of the game. I know of companies that have been perpetually “doing cloud” for 10 years, but very few that have “done cloud” in a way that democratises and makes data accessible, with minimal pain points. Cloud is an enabler. It makes it easier to collect, analyse, and disseminate information.

Cloudera Supercharges the Enterprise Data Cloud with NVIDIA

Cloudera Data Platform Powered by NVIDIA RAPIDS Software Aims to Dramatically Increase Performance of the Data Lifecycle Across Public and Private Clouds Cloudera announced today a new collaboration with NVIDIA that will help Cloudera customers accelerate data engineering, analytics, machine learning and deep learning performance with the power of NVIDIA GPU computing across public and private clouds.

UK Government: From cloud first to cloud appropriate?

Since 2013 the UK Government’s flagship ‘Cloud First’ policy has been at the forefront of enabling departments to shed their legacy IT architecture in order to meaningfully embrace digital transformation. The policy outlines that the cloud (and specifically, public cloud) be the default position for any new services; unless it can be demonstrated that other alternatives offer better value for money.