Latest Posts

Data security vs usability: you can have it all

Oct 26, 2020 By Wim Stoop@theWimster In Cloudera

Growing up, were you ever told you can’t have it all? That you can’t eat all the snacks in one sitting? That you can’t watch the complete Back to the Future trilogy as well as study for your science exam in one evening? Over time, we learn to set priorities, make a decision for one thing over the other, and compromise. Just like when it comes to data access in business.

Read Post

Cloudera

Read more about Data security vs usability: you can have it all

DELL/EMC taking the next step with PowerScale and ECS certification on CDP Private Cloud Base

Oct 26, 2020 By Nadeem Asghar In Cloudera

Cloudera and Dell/EMC are continuing our long and successful partnership of developing shared storage solutions for analytic workloads running in hybrid cloud.

Read Post

Cloudera

Read more about DELL/EMC taking the next step with PowerScale and ECS certification on CDP Private Cloud Base

HBase Clusters Data Synchronization with HashTable/SyncTable tool

Oct 22, 2020 By Wellington Chevreuil In Cloudera

Replication (covered in this previous blog article) has been released for a while and is among the most used features of Apache HBase. Having clusters replicating data with different peers is a very common deployment, whether as a DR strategy or simply as a seamless way of replicating data between production/staging/development environments.

Read Post

Cloudera

Read more about HBase Clusters Data Synchronization with HashTable/SyncTable tool

Re-thinking The Insurance Industry In Real-Time To Cope With Pandemic-scale Disruption

Oct 21, 2020 By Monique Hesseling In Cloudera

The Insurance industry is in uncharted waters and COVID-19 has taken us where no algorithm has gone before. Today’s models, norms, and averages are being re-written on the fly, with insurers forced to cope with the inevitable conflict between old standards and the new normal.

Read Post

Cloudera

Read more about Re-thinking The Insurance Industry In Real-Time To Cope With Pandemic-scale Disruption

New Multithreading Model for Apache Impala

Oct 20, 2020 By Justin HayesShant HovsepianTim ArmstrongDavid Rorke In Cloudera

Today we are introducing a new series of blog posts that will take a look at recent enhancements to Apache Impala. Many of these are performance improvements, such as the feature described below which will give anywhere from a 2x to 7x performance improvement by taking better advantage of all the CPU cores. In addition, a lot of work has also been put into ensuring that Impala runs optimally in decoupled compute scenarios, where the data lives in object storage or remote HDFS.

Read Post

Cloudera

Read more about New Multithreading Model for Apache Impala

How-to: Index Data from S3 via NiFi Using CDP Data Hubs

Oct 15, 2020 By Eva Nahari In Cloudera

Data Discovery and Exploration (DDE) was recently released in tech preview in Cloudera Data Platform in public cloud. In this blog we will go through the process of indexing data from S3 into Solr in DDE with the help of NiFi in Data Flow. The scenario is the same as it was in the previous blog but the ingest pipeline differs. Spark as the ingest pipeline tool for Search (i.e.

Read Post

Cloudera

Read more about How-to: Index Data from S3 via NiFi Using CDP Data Hubs

Using Cloudera Machine Learning to Build a Predictive Maintenance Model for Jet Engines

Oct 14, 2020 By Tui Leauanae In Cloudera

Running a large commercial airline requires the complex management of critical components, including fuel futures contracts, aircraft maintenance and customer expectations. Airlines, in just the U.S. alone, average about 45,000 daily flights and transporting over 10 million passengers a year (source: FAA). Airlines typically operate on very thin margins, and any schedule delay immediately angers or frustrates customers.

Read Post

Cloudera

Read more about Using Cloudera Machine Learning to Build a Predictive Maintenance Model for Jet Engines

Apache Spark on Kubernetes: How Apache YuniKorn (Incubating) helps

Oct 14, 2020 By Sunil Govindan In Cloudera

Apache Spark unifies batch processing, real-time processing, stream analytics, machine learning, and interactive query in one-platform. While Apache Spark provides a lot of capabilities to support diversified use cases, it comes with additional complexity and high maintenance costs for cluster administrators. Let’s look at some of the high-level requirements for the underlying resource orchestrator to empower Spark as a one-platform.

Read Post

Cloudera

Read more about Apache Spark on Kubernetes: How Apache YuniKorn (Incubating) helps

What you need to know to begin your journey to CDP

Oct 13, 2020 By Bill Zhang In Cloudera

Recently, my colleague published a blog build on your investment by Migrating or Upgrading to CDP Data Center, which articulates great CDP Private Cloud Base features. Existing CDH and HDP customers can immediately benefit from this new functionality. This blog focuses on the process to accelerate your CDP journey to CDP Private Cloud Base for both professional services engagements and self-service upgrades.

Read Post

Cloudera

Read more about What you need to know to begin your journey to CDP

Cloudera acquires Eventador to accelerate Stream Processing in Public & Hybrid Clouds

Oct 12, 2020 By Arun Murthy In Cloudera

We are thrilled to announce that Cloudera has acquired Eventador, a provider of cloud-native services for enterprise-grade stream processing. Eventador, based in Austin, TX, was founded by Erik Beebe and Kenny Gorman in 2016 to address a fundamental business problem – make it simpler to build streaming applications built on real-time data. This typically involved a lot of coding with Java, Scala or similar technologies.

Read Post

Cloudera

Read more about Cloudera acquires Eventador to accelerate Stream Processing in Public & Hybrid Clouds

Systems | Development | Analytics | API | Testing

Latest Posts

Data security vs usability: you can have it all

DELL/EMC taking the next step with PowerScale and ECS certification on CDP Private Cloud Base

HBase Clusters Data Synchronization with HashTable/SyncTable tool

Re-thinking The Insurance Industry In Real-Time To Cope With Pandemic-scale Disruption

New Multithreading Model for Apache Impala

How-to: Index Data from S3 via NiFi Using CDP Data Hubs

Using Cloudera Machine Learning to Build a Predictive Maintenance Model for Jet Engines

Apache Spark on Kubernetes: How Apache YuniKorn (Incubating) helps

What you need to know to begin your journey to CDP

Cloudera acquires Eventador to accelerate Stream Processing in Public & Hybrid Clouds

Monthly Archive

Follow Us