Systems | Development | Analytics | API | Testing

Latest Posts

Overview of the Operational Database performance in CDP

This article gives you an overview of Cloudera’s Operational Database (OpDB) performance optimization techniques. Cloudera’s Operational Database can support high-speed transactions of up to 185K/second per table and a high of 440K/second per table. On average, the recorded transaction speed is about 100K-300K/second per node. This article provides you an overview of how you can optimize your OpDB deployment in either Cloudera Data Platform (CDP) Public Cloud or Data Center.

Eliminate the pitfalls on your path to public cloud

As organizations look to get smarter and more agile in how they gain value and insight from their data, they are now able to take advantage of a fundamental shift in architecture. In the last decade, as an industry, we have gone from monolithic machines with direct-attached storage to VMs to cloud. The main attraction of cloud is due to its separation of compute and storage – a major architectural shift in the infrastructure layer that changes the way data can be stored and processed.

How to run queries periodically in Apache Hive

In the lifecycle of a data warehouse in production, there are a variety of tasks that need to be executed on a recurring basis. To name a few concrete examples, scheduled tasks can be related to data ingestion (inserting data from a stream into a transactional table every 10 minutes), query performance (refreshing a materialized view used for BI reporting every hour), or warehouse maintenance (executing replication from one cluster to another on a daily basis).

Introducing FlinkSQL in Cloudera Streaming Analytics

Our 1.2.0.0 release of Cloudera Streaming Analytics Powered by Apache Flink brings a wide range of new functionality, including support for lineage and metadata tracking via Apache Atlas, support for connecting to Apache Kudu and the first iteration of the much-awaited FlinkSQL API. Flink’s SQL interface democratizes stream processing, as it caters to a much larger community than the currently widely used Java and Scala APIs focusing on the Data Engineering crowd.

Are you prepared to mature to 'ready-made' data management?

When it comes to furnishing our living spaces, it seems we go through phases. When I was just setting out and leaving home, IKEA was my preferred furniture store. You make your choice, collect all the flat-pack boxes, lug them home, and after some hex key gymnastics: voilà. You’ve truly made it! Since then, I’ve drifted from the “some assembly required” phase to the “ready-made” one.

CDP Private Cloud ends the battle between agility & control in the data center

As a BI Analyst, have you ever encountered a dashboard that wouldn’t refresh because other teams were using it? As a data scientist, have you ever had to wait 6 months before you could access the latest version of Spark? As an application architect, have you ever been asked to wait 12 weeks before you could get hardware to onboard a new application?

Why an integrated analytics platform is the right choice

Companies realize that in order to grow, connect products and services, or protect their business, they need to become data-driven. In selecting the tools to realize these goals, organizations effectively have two choices: a self-selected combination of analytics tools and applications or a unified platform that handles all. In this blog we will discuss the challenges of the former choice that will provide justification for the latter.

Multi-Raft - Boost up write performance for Apache Hadoop-Ozone

Apache Hadoop-Ozone is a new-era object storage solution for Big Data platform. It is scalable with strong consistency. Ozone uses Raft protocol, implemented by Apache Ratis (Incubating), to achieve high availability in its distributed system. My team in Tencent started to introduce Ozone as a backend object storage in production a few months ago and we’re onboarding more and more data warehouse users.

The Rise Of Connected Manufacturing And How Data Is Driving Innovation, Part I

This interview was conducted by Cindy Maike, VP Industry Solutions The shift towards Industry 4.0 is improving manufacturing efficiency and the factory of the future will increasingly be driven by technology like the Internet of Things (IoT), Automation, Artificial Intelligence (AI), and Cloud Computing.

Auto-TLS in Cloudera Data Platform Data Center

Wire encryption protects data in motion, and Transport Layer Security (TLS) is the most widely used security protocol for wire encryption. TLS provides authentication, privacy and data integrity between applications communicating over a network by encrypting the packets transmitted between endpoints. Users interact with Hadoop clusters via browser or command line tools, while applications use REST APIs or Thrift.