Systems | Development | Analytics | API | Testing

Latest Posts

Apache YuniKorn (Incubating) 0.8 release: What's new and upcoming?

Apache YuniKorn (Incubating) is a standalone resource scheduler that aims to bring advanced scheduling capabilities for Big Data workloads onto containerized platforms. Please read YuniKorn: a universal resources scheduler to learn about the rationale and architecture. Since the time of our last post, we are delighted to update that YuniKorn was accepted by the Apache incubator in Jan 2020!

Introducing MLOps And SDX for Models in Cloudera Machine Learning

It seems everyone is talking about machine learning (ML) these days — and ML’s use in products and services we consume everyday continues to be increasingly ubiquitous. But for many enterprise organizations, the promise of embedding ML models across the business and scaling use cases remains elusive. So what about ML makes it difficult for enterprises to adopt at scale?

Building an application to predict customer churn

Too often, companies are finding out after the fact that customers have stopped using their product or service, without enough notice to have done anything about it. The term customer churn is used to describe the loss of existing customers. These are people or organizations that were using a company’s products and/or services and have decided not to use them anymore, in favor of a competitor. Tracking customer churn is a key business metric for most companies.

Paving the pathway to Cloudera Data Platform with Cloudera DataFlow

With the announcement of the availability of Cloudera Data Platform (CDP), our customers have been buzzing with excitement. While each one of you has been trying out different aspects of CDP, we do recognize that you are in various stages of maturity in terms of your current product adoption or implementation. Particularly, our Cloudera DataFlow customers are in various stages of product adoption.

Bloor Research identifies what makes a Modern Data Warehouse champion

When speaking with customers, I often hear that they are committed to digital transformation and being a data-driven enterprise. Those may just seem like abstract, lofty words to aspire to but the reality is much more practical. We have major banks needing to ensure that they have a complete view of their customers, and can reduce churn through personalized service and offerings. Telecommunications giants that absolutely need to maintain network health so there are no dropped calls or missed messages.

Driving Digital Transformation for Federal Agencies with CXaaS

For more than a decade, government CIOs have been gearing up for and championing digital transformation. And not a moment too soon: From federal headquarters to statehouses, agencies today are expected to mirror the near-seamless user experience of today’s commercial sector, delivering agile, efficient responsiveness to constituent needs.

Healthcare's Big Data Challenge: How a hybrid data platform can help

The healthcare industry is crumbling under the weight of disruption. Newly empowered patients have high expectations for procedure and price transparency, and personal health information access, to enable informed treatment choices. Providers must deliver care faster, better and within a framework of rigorous quality, compliance, and cost containment guidelines. Drug and medical device makers are under pressure to deliver critical therapies quickly while ensuring safety, efficacy, and affordability.

Operational Database Integrity

This blog post is part of a series on Cloudera’s Operational Database (OpDB) in CDP. Each post goes into more details about new features and capabilities. Start from the beginning of the series with, Operational Database in CDP. This blog post provides an overview of the OpDB data integrity capabilities that help you achieve ACID transactions and data consistency. OpDB guarantees certain properties to ensure atomicity, durability, consistency, and visibility.

The challenges you'll face deploying machine learning models (and how to solve them)

In 2019, organizations invested $28.5 billion into machine learning application development (Statistica). Yet, only 35% of organizations report having analytical models fully deployed in production (IDC). When you connect those two statistics, it’s clear that there are a breadth of challenges that must be overcome to get your models deployed and running.

One billion files in Ozone

Apache Hadoop Ozone is a distributed key-value store that can manage both small and large files alike. Ozone was designed to address the scale limitations of HDFS with respect to small files. HDFS is designed to store large files and the recommended number of files on HDFS is 300 million for a Namenode, and doesn’t scale well beyond this limit.