Systems | Development | Analytics | API | Testing

ETL Methodologies: A Guide to Our Data Warehouse Integration Platform

Without data integration processes like ETL, today's businesses would hardly be able to make sense of the constant streams of data flowing into their tools. Of course, even though ETL is widely recognized as mission-critical to data management and business intelligence (BI) initiatives, that doesn't mean it's the most straightforward process to implement. For those looking to better understand how ETL can help their business, or what similar integration methodologies exist, this is the guide you need.

How to Connect ThoughtSpot to Amazon Redshift Serverless

Many businesses rely on Amazon Redshift Serverless for their cloud data warehouse and ThoughtSpot to derive insights from the data stored within. For this blog, I’m going to show you how to create a connection between Amazon Redshift Serverless and ThoughtSpot. It’s easy to connect Redshift with ThoughtSpot whether you have it running as a cluster which you have provisioned, or serverless.

Best practices of migrating Hive ACID Tables to BigQuery

Are you looking to migrate a large amount of Hive ACID tables to BigQuery? ACID enabled Hive tables support transactions that accept updates and delete DML operations. In this blog, we will explore migrating Hive ACID tables to BigQuery. The approach explored in this blog works for both compacted (major / minor) and non-compacted Hive tables. Let’s first understand the term ACID and how it works in Hive. ACID stands for four traits of database transactions.

Fraud Detection with Cloudera Stream Processing

This video shows how Cloudera DataFlow powered by Apache NiFi solves the first-mile problem by making it easy and efficient to acquire, transform, and move data so that we can enable streaming analytics use cases with very little effort. It will also briefly discuss the advantages of running this flow in a cloud-native Kubernetes deployment of Cloudera DataFlow. Then, we will explore how we can run real-time streaming analytics using Apache Flink, and we will use Cloudera SQL Stream Builder GUI to easily create streaming jobs using only SQL language (no Java/Scala coding required).