Systems | Development | Analytics | API | Testing

Data Pipelines

Automating Data Pipelines in CDP with CDE Managed Airflow Service

When we announced the GA of Cloudera Data Engineering back in September of last year, a key vision we had was to simplify the automation of data transformation pipelines at scale. By leveraging Spark on Kubernetes as the foundation along with a first class job management API many of our customers have been able to quickly deploy, monitor and manage the life cycle of their spark jobs with ease. In addition, we allowed users to automate their jobs based on a time-based schedule.

Why Modernizing the First Mile of the Data Pipeline Can Accelerate all Analytics

Every enterprise is trying to collect and analyze data to get better insights into their business. Whether it is consuming log files, sensor metrics, and other unstructured data, most enterprises manage and deliver data to the data lake and leverage various applications like ETL tools, search engines, and databases for analysis. This whole architecture made a lot of sense when there was a consistent and predictable flow of data to process.

CNC: The journey from Excel spreadsheets to automated data pipelines and fast, reliable insights

Founded in 1991, CNC (Czech News Center) is one of the largest media companies in the Czech Republic. They offer dozens of print and online publications to the Czech market, including Blesk, Aha!, and E15. A commitment to journalistic integrity has enabled their growth, now reaching millions of readers. They are currently undergoing a vast digitalization process with the aim to become the fastest-growing and largest media house in the Czech Republic.

AWS Data Pipeline Best Practices

Knowing best practices for Amazon Web Services (AWS) data pipelines is essential for modern companies handling large datasets and requiring secure ETL (Extract, Transform, Load) processes. In this article, we discuss AWS data pipeline best practices to ensure top performance and streamlined processes — without complications that can impede the execution of data transfer.

Create a Salesforce ETL Pipeline in 30 Minutes

Salesforce is one of the world’s most popular CRM (customer relationship management) software platforms, helping businesses of all sizes and industries beat their competitors and better serve their clients. But instead of keeping your Salesforce data inside the CRM platform itself, you can make better use of this information by moving it into a target data warehouse.

Get control over your data pipelines with data orchestration

Enterprises are tapping and leveraging big data to get ahead of the competition. As Peter Sondergaard, ex-Executive Vice President at Gartner said: The problem with the combustion engine is that it does not scale well. As companies grow, the data platforms they previously relied on for analytics start to break apart.

Modernizing Data Pipelines using Cloudera Data Platform - Part 1

Data pipelines are in high demand in today’s data-driven organizations. As critical elements in supplying trusted, curated, and usable data for end-to-end analytic and machine learning workflows, the role of data pipelines is becoming indispensable. To keep up, data pipelines are being vigorously reshaped with modern tools and techniques.