Systems | Development | Analytics | API | Testing

Cloudera

5 Steps to Making Better Business Decisions with Machine Learning

Most of the day to day work for knowledge workers is spent helping the business make better decisions, like choosing whether it’s worth expending the effort (or actual money) to achieve the desired business goal. The example I often use when talking about ML is churn prediction (and I’m starting to think I’m overusing it now). It costs money to retain a customer who is thinking of moving, but this is less than the cost of getting new customers.

Take Control of Your Destiny, Leave Retail Laggards in the Dust

Ongoing reports of the “Retail Apocalypse” were fueled once again in 2019 with more than a dozen well-known retail brands closing their doors forever. On the flip side, a “Retail Renaissance” is well underway – and signs indicate that retail leaders that have already invested in their digital transformation journey will continue to reap rewards well into the future.

Why Data Chain of Custody is Essential to Reducing Product Liability Risks

When a market grows as quickly as implantable medical devices, set to top a staggering $153.8 billion by 2026, the potential risk to patients can rise as well. As implantable medical devices proliferate, so do the number of costly, life-threatening, and reputation-tarnishing recalls. A single large recall can account for millions of device units.

Real-time log aggregation with Apache Flink Part 2

We are continuing our blog series about implementing real-time log aggregation with the help of Flink. In the first part of the series we reviewed why it is important to gather and analyze logs from long-running distributed jobs in real-time. We also looked at a fairly simple solution for storing logs in Kafka using configurable appenders only. As a reminder let’s review our pipeline again

Day in the Life of a Cloudera Data Platform Admin

Cloudera Data Platform (CDP) on Public Cloud makes being an admin for a big data platform even easier thanks to SDX. Watch me spend a day at a temp position for Aperture Cybertronics as their Data Admin. I'll quickly deploy clusters, grants users access, and change performance settings such as autoscaling for the Aperture Cybertornics' staff.

Benchmarking Ozone: Cloudera's next-generation Storage for CDP

Apache Hadoop Ozone was designed to address the scale limitation of HDFS with respect to small files and the total number of file system objects. On current data center hardware, HDFS has a limit of about 350 million files and 700 million file system objects. Ozone’s architecture addresses these limitations[4]. This article compares the performance of Ozone with HDFS, the de-facto big data file system.

Searcher Seismic is utilizing seismic data for the oil and gas industry providing a map to de-risk exploration

In today’s age of technology, the processing of seismic data requires powerful computers, talented researchers, software, and skills. For the Oil and Gas Industry, its paramount to making strategic business decisions. Seismic data accurately helps to plan for wells, reduce the need for further exploration, and minimizes the impact on the environment.

Disk and Datanode Size in HDFS

This blog discusses answers to questions like what is the right disk size in datanode and what is the right capacity for a datanode. A few of our customers have asked us about using dense storage nodes. It is certainly possible to use dense nodes for archival storage because IO bandwidth requirements are usually lower for cold data. However the decision to use denser nodes for hot data must be evaluated carefully as it can have an impact on the performance of the cluster.

How Florida State University is Boosting Student Success and Addressing Data Challenges

For public universities, metrics such as retention rate and graduation rate are important indicators for standing out in the competitive landscape. These success metrics are paramount to bringing in more students, making them successful, and continuing to grow a strong alumni network.

Cloudera Data Warehouse - What You Should Know

Cloudera Data Warehouse is just one of the many experiences you can use on the Cloudera Data Platform (CDP). Cloudera Data warehouse packages up the projects you may already know and use such as Impala and Hive into a service. This Service runs on Kubernetes which gives it the ability to pause, resume, scale up, or down quickly and automatically.