Systems | Development | Analytics | API | Testing

September 2022

Complete ETL Process Overview (design, challenges and automation)

The Extract, Transform, and Load process (ETL for short) is a set of procedures in the data pipeline. It collects raw data from its sources (extracts), cleans and aggregates data (transforms) and saves the data to a database or data warehouse (loads), where it is ready to be analyzed. A well-engineered ETL process provides true business value and benefits such as: Novel business insights. The entire ETL process brings structure to your company’s information.

Star Schema vs Snowflake Schema and the 7 Critical Differences

Star schemas and snowflake schemas are the two predominant types of data warehouse schemas. A data warehouse schema refers to the shape your data takes - how you structure your tables and their mutual relationships within a database or data warehouse. Since the primary purpose of a data warehouse (and other Online Analytical Processing (OLAP) databases) is to provide a centralized view of all the enterprise data for analytics, data warehouse schemas help us achieve superior analytic results.

8 Ways You Can Reduce the Costs of Your Data Operations

Don’t sacrifice scalability for savings - have it both ways When left unchecked, the cumulative costs of your company data can ramp up fast. From training CPU-intensive machine learning algorithms that aren’t used in production to supporting enormous databases storing every minute event “just in case”. Letting your data operating costs run without checks and balances can quickly cause costs to bloat beyond your allocated budgets.

Data Monetization: What it is & How to do it

Whether you are trying to establish a new niche, carve out a bigger share of a mature and competitive market, or increase the value of your existing products or services, data can be used to gain a competitive advantage and increase your revenue. In this article, we will establish what data monetization is, how it translates to business use cases, showcase how it impacts business performance, and offer guidance on how to start monetizing data today.

G2 Fall 2022 Reports: Keboola rated top in 7 categories

Since the beginning, Keboola has been designed to be a world-class data platform as a service. Based on the results of this season’s G2 statistics, we have succeeded yet again. The purpose of the Keboola platform is to make our customers’ data processing simple, reliable, and transparent throughout the company. We love our customers and value the feedback they've given our teams over the years.

4 Best Data Lineage Tools in 2022

The modern enterprise taps into over 400 different data sources to extract the insights that sharpen its competitive edge. The complexity, though, does not stop at the origin, where data is generated. To get valuable insights from raw data enterprises must extract data from its source, transform the data (clean and aggregate it), and finally load the data into a data warehouse or BI tool, where it is served to data scientists for analysis.

Data Mesh Architecture Through Different Perspectives

We previously wrote how the data mesh architecture rose as an answer to the problems of the monolithic centralized data model. To recap, in the centralized data models, ETL or ELT data pipelines collect data from various enterprise data sources and ingest it into a single central data lake or data warehouse. Data consumers and business intelligence tools access the data from the central storage to drive insights and inform decision-making.