Systems | Development | Analytics | API | Testing

Cloudera Data Engineering 2021 Year End Review

Since the release of Cloudera Data Engineering (CDE) more than a year ago, our number one goal was operationalizing Spark pipelines at scale with first class tooling designed to streamline automation and observability. In working with thousands of customers deploying Spark applications, we saw significant challenges with managing Spark as well as automating, delivering, and optimizing secure data pipelines.

How To Use Change Data Capture with Integrate.io

Change data capture (CDC) is a crucial, but also tremendously underappreciated, feature that forms the backbone of modern ETL workloads. Without knowing which data has changed since you last accessed it, you’d be forced to extract all the data from a source table or database each time that you perform data integration—which would be a tremendously inefficient process.

Recognizing Organizations Leading the Way in Data Security & Governance

The right set of tools helps businesses utilize data to drive insights and value. But balancing a strong layer of security and governance with easy access to data for all users is no easy task. Retrofitting existing solutions to ever-changing policy and security demands is one option. Another option — a more rewarding one — is to include centralized data management, security, and governance into data projects from the start.

10 Best Practices for Building a Good API

APIs are being created faster than ever before with ever-advancing technologies such as node.js and AngularJS. With the flexibility in design and integrations for APIs, there isn't a more exciting time than now to be an API developer. However, with so many new technologies and methods of creating APIs comes the question, "What makes a good API?" While the increase in API creation has many advantages for businesses in multiple areas, there is also more room for low-quality API production.

Leveraging BigQuery Audit Log pipelines for Usage Analytics

In the BigQuery Spotlight series, we talked about Monitoring. This post focuses on using Audit Logs for deep dive monitoring. BigQuery Audit Logs are a collection of logs provided by Google Cloud that provide insight into operations related to your use of BigQuery. A wealth of information is available to you in the Audit Logs. Cloud Logging captures events which can show “who” performed “what” activity and “how” the system behaved.

Is SSIS a Good ETL Tool?

ETL (Extract, Transfer and Load) is a well-known data integration process. There is an overwhelming number of tools that you can use (one of which is SSIS) and it can be difficult to choose between them. What exactly is SSIS, and how can it help your company perform ETL better than you ever have before? This article will explain the major features of SSIS, demonstrate the pros and cons of implementing it, and advise as to when you might be better off with a different ETL tool.

Modern Data Stack using Integrate.io for the ELT

Integrate.io is a company that provides an ELT (Extract, Load and Transform) data stack. They can do transformations using DBT, which stands for Database Transformation toolkit. Then they use Integrate.io again to push the data into systems like Salesforce. This system will allow you to have better control over your data and provide a cost-effective solution.