Analytics

Day in the Life of a Cloudera Data Platform Admin

Cloudera Data Platform (CDP) on Public Cloud makes being an admin for a big data platform even easier thanks to SDX. Watch me spend a day at a temp position for Aperture Cybertronics as their Data Admin. I'll quickly deploy clusters, grants users access, and change performance settings such as autoscaling for the Aperture Cybertornics' staff.

What is embedded analytics

What is embedded analytics? Embedded analytics is the integration of analytical capabilities and data visualizations - real-time reports and dashboards - into another software application. This allows the end user to analyze the data held within the software application into which the analytics platform is embedded. With this analysis, the end user can identify and mitigate issues and spot opportunities to maximize.

Benchmarking Ozone: Cloudera's next-generation Storage for CDP

Apache Hadoop Ozone was designed to address the scale limitation of HDFS with respect to small files and the total number of file system objects. On current data center hardware, HDFS has a limit of about 350 million files and 700 million file system objects. Ozone’s architecture addresses these limitations[4]. This article compares the performance of Ozone with HDFS, the de-facto big data file system.

Data Structure Zoo

Solving a problem programatically often involves grouping data items together so they can be conveniently operated on or copied as a single unit – the items are collected in a data structure. Many different data structures have been designed over the past decades, some store individual items like phone numbers, others store more complex objects like name/phone number pairs. Each has strengths and weaknesses and is more or less suitable for a specific use case.

How to get value out of your embedded analytics

Over the years, we’ve worked with a lot of software vendors who have embedded analytics into their product and there’s a range of reasons why they’ve chosen to do that. Some want to modernize existing analytics with a better solution, while others want to engage with more users or extend the use of their application to the C-Suite by delivering something of value to management like reporting.

Searcher Seismic is utilizing seismic data for the oil and gas industry providing a map to de-risk exploration

In today’s age of technology, the processing of seismic data requires powerful computers, talented researchers, software, and skills. For the Oil and Gas Industry, its paramount to making strategic business decisions. Seismic data accurately helps to plan for wells, reduce the need for further exploration, and minimizes the impact on the environment.

Fresh Features: first-rate filters

Filtering is an underappreciated feature of business intelligence and analytics. Yet filters are critical to data analysis. Filters will probably be the primary method, of all the possible interaction types, that end users utilize. Welcome to part 3 of Yellowfin 9 Fresh Features. If you missed part 2, check out the enhancements to Yellowfin's automated data discovery - Signals. Yellowfin has a rich filter functionality that isn't available in some of the other leading analytics platforms.

Disk and Datanode Size in HDFS

This blog discusses answers to questions like what is the right disk size in datanode and what is the right capacity for a datanode. A few of our customers have asked us about using dense storage nodes. It is certainly possible to use dense nodes for archival storage because IO bandwidth requirements are usually lower for cold data. However the decision to use denser nodes for hot data must be evaluated carefully as it can have an impact on the performance of the cluster.