Systems | Development | Analytics | API | Testing

July 2020

Faster Analytics with Cloudera Data Warehouse (CDW) Demo Highlight

The cloud-led journey to digital transformation requires organizations to become significantly more data-driven, yet traditional data warehouses have difficulty with new data volumes, new data types, and a variety of use cases. In this session, we will show you how Cloudera Data Warehouse offers a guide to your cloud journey by offering a modern hybrid cloud solution for an unprecedented scale that delivers insight to every part of your organization, faster while saving costs.

Meeting Medical Device Data Privacy, Governance, and Security Challenges

Medical devices have become increasingly complex as technology evolves, and the sheer number of these devices now being worn or implanted has grown exponentially over the past few years. There are currently over 500,000 different types of smart, connected medical devices in use that have the ability to collect, share, or store private patient data and protected health information (PHI)(1).

The reinvention of the Telco: From Pipe to Processor

The next generation of 5G networks are unlocking a mind-bending array of new use cases. Blistering speed, super low latency, and access to more powerful mobile hardware bring VR, AR and ultra high-definition experiences into sharp focus for the near future. But there’s a bigger shift being driven by 5G, and it’s not actually about speed at all. It’s about re-thinking the modern telco business model.

Building a Scalable Process Using NiFi, Kafka and HBase on CDP

Navistar is a leading global manufacturer of commercial trucks. With a fleet of 350,000 vehicles, unscheduled maintenance and vehicle breakdowns created ongoing disruption to their business. Navistar required a diagnostics platform that would help them predict when a vehicle needed maintenance to minimize downtime.

Enabling high-speed Spark direct reader for Apache Hive ACID tables

Apache Hive supports transactional tables which provide ACID guarantees. There has been a significant amount of work that has gone into hive to make these transactional tables highly performant. Apache Spark provides some capabilities to access hive external tables but it cannot access hive managed tables. To access hive managed tables from spark Hive Warehouse Connector needs to be used.

Digital Transformation is Way More than Just Digital

Over the last 25 years, I have an unparalleled front seat to the digital transformation that is now accelerating in the connected manufacturing and automotive industry. Not many people have had the opportunity to witness the transformation and be as active in this area as I have; I consider myself lucky.

The benefits of building an on-demand data lake in healthcare

This blog was written in partnership with Navdeep Alam, Senior Director, Global Data Warehouse, IQVIA Healthcare is unique. It isn’t defined like other businesses by how much revenue can be generated, but more in terms of achieving positive health outcomes, better value, and saving lives through the rapid development of new treatments and therapies.

Cloudera Operational Database experience (dbPaaS) available as Technical Preview

The Cloudera Operational Database (COD) experience is a managed dbPaaS solution which abstracts the underlying cluster instance as a Database. It can auto-scale based on the workload utilization of the cluster and will be adding the ability to auto-tune (better performance within the existing infrastructure footprint) and auto-heal (resolve operational problems automatically) later this year.

The Rise Of Connected Manufacturing - How Data Is Driving Innovation Part II

A Shift Towards Industry 4.0 Is Improving Manufacturing Efficiency And Increasing Innovation In Part II of our series with Michael Ger, Managing Director of Manufacturing and Automotive at Cloudera, he looks in greater detail at how AI, big data, and machine learning are impacting connected living and the evolution of autonomous driving.

Operational Database Scalability

Cloudera’s Operational Database provides unparalleled scale and flexibility for applications, enabling enterprises to bring together and process data of all types and from more sources, while providing developers with the flexibility they need. In this blog, we’ll look into capabilities that make Operational Database the right choice for hyperscale.

Minimizing Cloud Concentration Risk for Financial Services Institutions, Regulators and Cloud Service Providers

Since the financial crisis of 2008, regulators have been consistently working to identify emerging risks that can potentially result in financial stability events. The growth in cloud adoption across the Financial Services Industry (FSI) and the associated increase in reliance on third-party infrastructure providers has gained the attention of regulators at global, regional, and national levels.

Connected Manufacturing Insights from the Edge with Cloudera DataFlow

Connected Manufacturing’s Pivot to an Enterprise Data Solution Connected Manufacturing is at a turning point and it is catalyzed by a real, measurable change and shift in data types – real-time and time-series data is growing 50% faster than latent or static data forms and streaming analytics projected to grow at a 28% CAGR, leaving legacy data platforms that specialize in static historical data solutions, functioning on-prem or in discrete clouds, inadequate in addressing today’s rea

Building an effective data approach in a hybrid cloud world

“In today’s world of disruption and transformation, there are a few key things that all organizations are trying to figure out: how to remain relevant to their customer base, how to deal with the pressure of disruption in their industry and, undoubtedly, how to look to technology to help deliver a better service.” Paul Mackay Today we are sitting down with Marc Beierschoder, Analytics & Cognitive Offering Lead at Deloitte Germany and Paul Mackay, the EMEA Cloud Lead at Cloudera to dis

CDP Private Cloud ends the battle between agility & control in the data center

As a BI Analyst, have you ever encountered a dashboard that wouldn’t refresh because other teams were using it? As a data scientist, have you ever had to wait 6 months before you could access the latest version of Spark? As an application architect, have you ever been asked to wait 12 weeks before you could get hardware to onboard a new application?

Apache Hadoop YARN in CDP Data Center 7.1: What's new and how to upgrade

This blogpost will cover how customers can migrate clusters and workloads to the new Cloudera Data Platform – Data Center 7.1 (CDP DC 7.1 onwards) plus highlights of this new release. CDP DC 7.1 is the on-premises version of Cloudera Data Platform.

Overview of the Operational Database performance in CDP

This article gives you an overview of Cloudera’s Operational Database (OpDB) performance optimization techniques. Cloudera’s Operational Database can support high-speed transactions of up to 185K/second per table and a high of 440K/second per table. On average, the recorded transaction speed is about 100K-300K/second per node. This article provides you an overview of how you can optimize your OpDB deployment in either Cloudera Data Platform (CDP) Public Cloud or Data Center.

Eliminate the pitfalls on your path to public cloud

As organizations look to get smarter and more agile in how they gain value and insight from their data, they are now able to take advantage of a fundamental shift in architecture. In the last decade, as an industry, we have gone from monolithic machines with direct-attached storage to VMs to cloud. The main attraction of cloud is due to its separation of compute and storage – a major architectural shift in the infrastructure layer that changes the way data can be stored and processed.

How to run queries periodically in Apache Hive

In the lifecycle of a data warehouse in production, there are a variety of tasks that need to be executed on a recurring basis. To name a few concrete examples, scheduled tasks can be related to data ingestion (inserting data from a stream into a transactional table every 10 minutes), query performance (refreshing a materialized view used for BI reporting every hour), or warehouse maintenance (executing replication from one cluster to another on a daily basis).

Introducing FlinkSQL in Cloudera Streaming Analytics

Our 1.2.0.0 release of Cloudera Streaming Analytics Powered by Apache Flink brings a wide range of new functionality, including support for lineage and metadata tracking via Apache Atlas, support for connecting to Apache Kudu and the first iteration of the much-awaited FlinkSQL API. Flink’s SQL interface democratizes stream processing, as it caters to a much larger community than the currently widely used Java and Scala APIs focusing on the Data Engineering crowd.