How to automate big data governance

Companies deploying big data analytics to gain competitive advantage can quickly sour their successes by lacking a big data governance strategy. Which turns their data assets into data liabilities. In this article, we dive into the field of information governance and information management and explore how to set up and automate a big data governance program for success. Big data governance is a set of processes and principles that ensure the high value of data throughout its lifecycle.

In the event-driven galaxy, which metadata matters most?

As a developer, you're no stranger to your vast and varied data environment… Or are you? The tremendous amount of data your organization collects is stored in various sources and formats. You need a way to understand where and what data is, to be able to do what you need to do: build amazing event-driven applications.

BI Tool Integrations for Heroku Postgres

Heroku is a powerful platform for application development. Users can build and deploy on the cloud, and you can effortlessly scale up once your app takes off. And behind every app, you'll find an equally powerful database: Heroku Postgres. If you're building Heroku apps, you'll find them to be a rich source of operational and customer data. Add in the right Business Intelligence (BI) tools, and you'll be able to derive insights about the inner workings of your organization.

Cloudera Data Engineering - Integration steps to leverage spark on Kubernetes

Cloudera Data Engineering is a serverless service for Cloudera Data Platform (CDP) that allows you to submit jobs to auto-scaling virtual clusters. CDE enables you to spend more time on your applications, and less time on infrastructure. CDE allows you to create, manage, and schedule Apache Spark jobs without the overhead of creating and maintaining Spark clusters.

No Data Loss and No Service Interruption - HDF to CFM Rolling Migration

The blog “Migrating Apache NiFi Flows from HDF to CFM with Zero Downtime” detailed how many common NiFi dataflows can be easily migrated when the Hortonworks DataFlow and Cloudera Flow Management clusters are running side-by-side. But what if you lack the resources to run multiple NiFi clusters concurrently? Not a problem.

Five lessons in leadership from Snowflake CEO, Frank Slootman

Since the start of the pandemic nearly a year ago, there's been one word on the lips of every business leader, analyst, and investor around the world: cloud. COVID-19 fundamentally changed the way businesses operate. In response, organizations went all in on cloud, betting on the unmatched scale, speed, and security of SaaS applications to help them weather the storm. Nowhere was this shift more pronounced that in our own data and analytics industry.

Reverse ETL: What You Need to Know

Data integration has been around for decades in some form or fashion, as organizations are always looking for ways to combine their enterprise data and collect it in a centralized location. The most commonly used and dominant type of data integration is ETL (extract, transform, load). ETL first extracts data from one or more source systems, transforms it as necessary, and then loads it into a target warehouse or data lake.