Systems | Development | Analytics | API | Testing

September 2022

Complete ETL Process Overview (design, challenges and automation)

The Extract, Transform, and Load process (ETL for short) is a set of procedures in the data pipeline. It collects raw data from its sources (extracts), cleans and aggregates data (transforms) and saves the data to a database or data warehouse (loads), where it is ready to be analyzed. A well-engineered ETL process provides true business value and benefits such as: Novel business insights. The entire ETL process brings structure to your company’s information.

Star Schema vs Snowflake Schema and the 7 Critical Differences

Star schemas and snowflake schemas are the two predominant types of data warehouse schemas. A data warehouse schema refers to the shape your data takes - how you structure your tables and their mutual relationships within a database or data warehouse. Since the primary purpose of a data warehouse (and other Online Analytical Processing (OLAP) databases) is to provide a centralized view of all the enterprise data for analytics, data warehouse schemas help us achieve superior analytic results.

Data Governance and Strategy for the Global Enterprise

While the word “data” has been common since the 1940s, managing data’s growth, current use, and regulation is a relatively new frontier. Governments and enterprises are working hard today to figure out the structures and regulations needed around data collection and use. According to Gartner, by 2023 65% of the world’s population will have their personal data covered under modern privacy regulations.

8 Ways You Can Reduce the Costs of Your Data Operations

Don’t sacrifice scalability for savings - have it both ways When left unchecked, the cumulative costs of your company data can ramp up fast. From training CPU-intensive machine learning algorithms that aren’t used in production to supporting enormous databases storing every minute event “just in case”. Letting your data operating costs run without checks and balances can quickly cause costs to bloat beyond your allocated budgets.

What You Should Know About Corporate Loyalty and IT

This is a guest post with exclusive content by Bill Inmon. Bill “is an American computer scientist recognized by many as the father of the data warehouse. Inmon wrote the first book, held the first conference, wrote the first column in a magazine, and was the first to offer classes in data warehousing.” -Wikipedia. The five critical considerations for corporate loyalty.

Cloudera DataFlow Functions for Public Cloud powered by Apache NiFi

Since its initial release in 2021, Cloudera DataFlow for Public Cloud (CDF-PC) has been helping customers solve their data distribution use cases that need high throughput and low latency requiring always-running clusters. CDF-PC’s DataFlow Deployments provides a cloud-native runtime to run your Apache NiFi flows through auto scaling Kubernetes clusters as well as centralized monitoring and alerting and improved SDLC for developers.

Domino's new secret sauce? Real-time data & analytics with Talend

Domino’s Pizza, one of the world’s top restaurant brands, already knows how to translate data into great customer experiences and stronger sales with Talend. Over the past five years, the company has used Talend to integrate 100’s data sources into a single source of customer information — and has harnessed that data to improve everything from personalized promotions to logistics to financial forecasting. The latest ingredient to their success?

Achieving Product Analytics Maturity in Only 4 Steps

“What should you and your business focus on when trying to create better customer journeys and beat competition?” That was the question we asked Countly data captains (also known as Countly customers) when trying to determine how well they collect customer experience metrics and how well they were using that data to make data-driven decisions.

Serverless NiFi Flows with DataFlow Functions: The Next Step in the DataFlow Service Evolution

Cloudera DataFlow for the Public Cloud (CDF-PC) is a cloud-native service for Apache NiFi within the Cloudera Data Platform (CDP). CDF-PC enables organizations to take control of their data flows and eliminate ingestion silos by allowing developers to connect to any data source anywhere with any structure, process it, and deliver to any destination using a low-code authoring experience.

Announcing GA of DataFlow Functions

Today, we’re excited to announce that DataFlow Functions (DFF), a feature within Cloudera DataFlow for the Public Cloud, is now generally available for AWS, Microsoft Azure, and Google Cloud Platform. DFF provides an efficient, cost optimized, scalable way to run NiFi flows in a completely serverless fashion. This is the first complete no-code, no-ops development experience for functions, allowing users to save time and resources.

How to Build a Google Sheets Sales Dashboard in 5 Easy Steps

Keeping track of your sales team’s numbers and metrics can be tedious and time-consuming. If you’re a small team with limited resources or just looking for more flexibility in your data management, Google Sheets can be a great place to host your sales data. Spreadsheets are customizable, scalable, and easily accessible.

Data Monetization: What it is & How to do it

Whether you are trying to establish a new niche, carve out a bigger share of a mature and competitive market, or increase the value of your existing products or services, data can be used to gain a competitive advantage and increase your revenue. In this article, we will establish what data monetization is, how it translates to business use cases, showcase how it impacts business performance, and offer guidance on how to start monetizing data today.

The Ultimate Guide to Choosing the Best JavaScript Charting Library

Charting libraries are in great demand, and their creation and use are becoming increasingly popular in languages such as JavaScript. As evidence, several JavaScript charting libraries are available, both commercial and open-source, with a wide range of functionalities to meet the demands of users. But how can a developer make an informed decision and choose the best JavaScript charting library? It's a difficult question, but we're here to assist!

How to add multiple charts to a report

In this video you will learn how to add multiple charts, or visualizations, to a single data table as you build a report. You'll learn about using the Auto Chart feature, as well as manually selecting your own chart types. In addition to adding charts, you will learn how to add text, graphics, and images to your report. Once you are finished adding charts and other visual elements, you will learn how to properly save your multi-chart report.

Simplify Data Access Control | infoSecur

In this episode of “Powered by Snowflake” host Daniel Myers sits down with infoSecur’s Founder and CEO, Michael Magalsky. infoSecur is a centralized tool, used across all structured data environments and database sources to manage data policies and access down to the cell level across your data cloud. The “Powered by Snowflake" video series features conversations with technology leaders who are building businesses and applications on top of Snowflake.

The Top Three Entangled Trends in Data Architectures: Data Mesh, Data Fabric, and Hybrid Architectures

Data teams have the impossible task of delivering everything (data and workloads) everywhere (on premise and in all clouds) all at once (with little to no latency). They are being bombarded with literature about seemingly independent new trends like data mesh and data fabric while dealing with the reality of having to work with hybrid architectures. Each of these trends claim to be complete models for their data architectures to solve the “everything everywhere all at once” problem.

The Data Challenge Nobody's Talking About: An Interview from CDAO UK

Chief Data & Analytics Officer UK (CDAO UK) is the United Kingdom’s premier event for senior data and analytics executives. The three-day event, with more than 200 attendees and 50+ industry-leading speakers, was packed with case studies, thought leadership, and practical advice around data culture, data quality and governance, building a data workforce, data strategy, metadata management, AI/MLOps, self-service strategies, and more.

Modern Marketing Data Stack: Best Practices from Analyzing Snowflake's Customer Base

Are you efficiently unifying, modeling, analyzing, and activating all the data you need to drive impactful marketing campaigns and customer experiences? For years, marketing teams have struggled to operate from a single view of the customer and their business, essential to powering personalized experiences and measuring impact on key KPIs such as sales, growth, and profitability. Today, only half of all marketers have a unified view of the customer.

Built with BigQuery: BigQuery ML enables Faraday to make predictions for any US consumer brand

In 2022, digital natives and traditional enterprises find themselves with a better understanding of data warehousing, protection, and governance. But machine learning and the ethical application of artificial intelligence and machine learning (AI/ML) remain open questions, promising to drive better results if only their power can be safely harnessed.

What Does Bad Data Cost You & How to Make Bad Data Healthier?

Data fuels the growth of any modern organization, but what happens when the fuel goes bad? The growth stops. Data-driven enterprises rely heavily on their collected information to make important business decisions, but if this information contains errors, the organization may have to suffer huge losses. In 2021, Gartner reported that organizations incur an average loss of USD 12.9 million due to poor-quality data.

Yellowfin 9.8 Release Highlights

Introduced in 9.7 as the simplest way to ask questions of your data, Yellowfin 9.8 delivers exciting updates to Guided NLQ, and additional improvements to our report builder. The latest release makes Guided NLQ even more powerful and simpler to use, with new question types (Cross-tab), more intuitive questions with field synonyms and default date periods, and protection against long-running queries - in addition to faster report building.

Accelerate data modernization initiatives with Talend Change Data Capture

In times of economic uncertainty, businesses need to get the most value from their data, while minimizing pressure on their systems and databases. But with the exponential growth of the variety and volume of data, extracting business value out of that data is only getting increasingly difficult. The result is data transfer latency, data loss, high cost of managing such data, data sources, leading to an inability to use data to make high-ROI business decisions.

G2 Fall 2022 Reports: Keboola rated top in 7 categories

Since the beginning, Keboola has been designed to be a world-class data platform as a service. Based on the results of this season’s G2 statistics, we have succeeded yet again. The purpose of the Keboola platform is to make our customers’ data processing simple, reliable, and transparent throughout the company. We love our customers and value the feedback they've given our teams over the years.

2023 CDP Platform Guide: What to Look for in a Customer Data Platform and Our Favorite CDPs

The better you understand your customers, the better you can optimize and focus your marketing and products. This is where a customer data platform (CDP platform) can help. A CDP platform automates the process of aggregating and analyzing historical and real-time customer data from the widest variety of sources. Then it transforms that data into accurate and actionable business insights.

Demo: Unravel Data - Automated Troubleshooting for Job Failures

For DataOps teams, job failures are common. But finding the issue is (traditionally) where things get even worse. It can take hours or days to troubleshoot a job failure. Unravel Data provides a single view where DataOps teams can locate exactly where–and why–a job failed, along with precise recommendations to troubleshoot the error. DataOps teams are now able to both diagnose and troubleshoot job failures in minutes instead of days or weeks.

Demo: Unravel Data - Data Pipeline Optimization (The Easy Way)

Data pipelines fail all the time for a variety of reasons; service downtime, data volume fluctuations, etc. Diagnosing these failures manually is very difficult and time consuming. Unravel Data allows DataOps teams to troubleshoot pipeline failures automatically – showing exactly where and why a pipeline failed, and precise recommendations to remedy the issues. Using Unravel, DataOps teams can now diagnose and fix data pipeline failures in a fraction of the time.

Demo: Unravel Data - Code-Level Insights for DataOps Teams

To ensure that jobs are running optimally, DataOps teams need to look at the detailed code. But DataOps teams don’t have the right tools to easily examine problematic code - or a simple path to optimizing it. With Unravel Data, DataOps teams can quickly troubleshoot applications that are throwing errors - all the way down to a specific line of problematic code. All in a single view.

Demo: Unravel Data - Allocating Costs with Precision Using the Enhanced Chargeback Report

DataOps teams need to understand where costs are going. But the reports provided by cloud vendors aren’t very granular - and they only get the reports after excess costs have been racked up. Unravel allows DataOps teams to understand where costs are going at a detailed level: by user, by service, by department. This information is captured and available as soon as a cluster is detected – allowing DataOps teams to take action and optimize in real time.

Demo: Unravel Data - Automated Budget Tracking to Prevent Overruns

DataOps teams need to be able to set budgets at a specific scope - and know if your various teams or departments are tracking to those budgets. But today, most DataOps teams only know that the budget was overrun after it’s too late. With Unravel, establishing and tracking budgets to prevent overruns is easy.

Demo: Unravel Data - Keep DataOps Budgets On Track (Automatically)

Demo: Unravel Data - Keep DataOps Budgets On Track (Automatically) DataOps teams are paying tremendous amounts of money for cloud instances that were spun up and then forgotten. For larger enterprises, this can equate to millions of dollars in waste. But even for smaller teams, this type of inefficiency is not acceptable. With Unravel, your DataOps team can set a budget/cost threshold (or a time duration), to ensure that you keep your budgets on track. Receive alerts as soon as an instance hits your predefined threshold - instead of discovering it in your monthly bills.

Demo: Unravel Data - A Unified View for Data App Performance Details

Today, DataOps teams have to correlate data from far too many point tools. DataOps observability is far too cumbersome; the manual effort to optimize data apps takes time that DataOps teams simply don’t have. With Unravel’s AI-enabled platform, all of this disparate data is pulled together into a unified view of data app performance; every detail in a single view. View configurations, logs, and errors… all in one place.

Demo: Unravel Data - Tuning Data App Performance Automatically

Optimizing data apps shouldn’t be trial and error. This takes nights and weekends away from DataOps teams - and it’s incredibly inefficient. Unravel provides an “expert in a box” feature, driven by AI, that provides DataOps teams with tangible insights and recommendations to optimize data apps. Need to fix a bottleneck to meet an SLA? Trying to improve the overall efficiency of data pipelines? Unravel makes this easy with specific, automated recommendations (all the way down to the code-level) to tune your data apps for better performance.

Demo: Unravel Data - Optimizing Cloud Costs at the Cluster Level

Most DataOps teams have a huge opportunity when it comes to optimizing their cloud costs. Today, the standard for success of many developers is ensuring that their jobs are running at all costs. The efficiency of those jobs isn’t the top priority. With Unravel, DataOps teams can optimize cloud costs by rightsizing their clusters. Unravel makes it easy to identify clusters that are consuming a large percentage of resources, and drill down to see automatic recommendations to improve the efficiency of those clusters.

Demo: Unravel Data - Map Your Workloads to the Cloud (and Calculate Costs)

When a data team is migrating applications to the cloud, they’ll need to anticipate how many resources those apps will consume. This can often take a DataOps teams into unfamiliar territory since on-prem applications are assessed very differently from a utilization standpoint. This information is critical to inform the cloud architecture - and to anticipate the total cost of ownership for the cloud migration.

Unravel: DataOps Observability Designed for Data Teams

Today every company is a data company. And even with all the great new data systems and technologies, it’s people—data teams—who unlock the power of data to drive business value. But today’s data teams are getting bogged down. They’re struggling to keep pace with the increased volume, velocity, variety, complexity—and cost—of the modern data stack. That’s where Unravel DataOps observability comes in. Designed specifically for data teams, Unravel gives you the observability, AI, and automation to help you understand, optimize and govern your data estate—for performance, cost, and quality.

Demo: Unravel Data - Preparing for Cloud Migration with Automated Cluster Discovery

One of the first steps of any cloud migration is creating an inventory of the applications and services that are currently being used. Today, that involves a lot of manual interviews with people from across the business to understand the needs behind each cluster. This process, as you can imagine, is incredibly prone to errors and miscommunications that can negatively impact migration planning efforts.

Demo: Unravel Data - How to Avoid Tuning & Replatforming Delays

How can your DataOps team anticipate bottlenecks that might occur during a cloud migration? One of the most common issues is version incompatibilities. On prem environments tend to run older instances of applications (vs. newer cloud environments) - which means that your team will need to consider any incompatible code before migrating.

Cost of Data Warehousing: Conventional Wisdom Versus Reality

This is a guest post with exclusive content by Bill Inmon. Bill Inmon is a prominent American computer scientist and prolific author, recognized by many as the father of data warehousing. Inmon has written over 60 books, including the first book exploring the core concepts of data warehouses. Inmon also held the first data warehousing conference and has written for many respected data management publications, as well as offering classes in data warehousing.

Improve Underwriting Using Data and Analytics

Insurance carriers are always looking to improve operational efficiency. We’ve previously highlighted opportunities to improve digital claims processing with data and AI. In this post, I’ll explore opportunities to enhance risk assessment and underwriting, especially in personal lines and small and medium-sized enterprises.

How to Become a Data Economy Leader: The Rise of the CDO (Chief Data Officer)

The catalyst of innovation and transformation is data. The companies that recognize the power of data and wield it to drive business transformation are seeing positive impacts on their business outcomes, as indicated in our report, How to Win in the Data Economy. We surveyed 1,000 senior business and technology executives to gauge the impact the data industry is having on their businesses, and to what extent companies are embracing the opportunity to become data leaders.

The case for a query modification language and why dashboards are dead

In 1895, a German physicist was trying to determine if he could observe cathode rays escaping from a glass tube and noticed an unexpected glow on a fluorescent screen several feet away. On further examination, it turned out to be a different kind of radiation that we now know as X-ray. Fast forward to today and you can’t even imagine diagnosing many medical problems without the X-ray.

Defending your customer's data - René Waslo

This episode features an interview with René Waslo, Risk and Financial Advisory Principal at Deloitte & Touche. She works as a cyber professional within the Energy, Resources and Industrials sector. In this episode, René talks about zero trust, trends in security breaches, sustainability in cyber, and encouraging women to enter the cyber industry.

Activate your data: How to get started with ThoughtSpot Sync

Every data team wants to make insights more actionable for frontline business users. The only question is how. You know they spend the majority of their time in business-critical tools like HubSpot, Slack, and Microsoft Teams. So why not bring the data-driven insights created in ThoughtSpot to the apps they use most? With ThoughtSpot Sync, you can. Starting today, ThoughtSpot customers will be able to send insights directly from ThoughtSpot to Google Sheets, Slack, and Microsoft Teams.

Where Is Your Customer Data Located?

Modern organizations have multiple touch points continuously collecting customer data. Data collection is essential for firms that use it for personalized marketing campaigns and improving customer experience. Analytics provided by this data help enterprises observe customer behavior and make critical business decisions. However, before firms can explore any use cases, it is crucial for them to recognize the data touch points where vital information is collected.

What's new in ThoughtSpot Analytics Cloud 8.7.0

Want to bring the data-driven insights created in ThoughtSpot to the apps your teams use most? With this month's release of ThoughtSpot Analytics Cloud 8.7.0.cl, we're launching ThoughtSpot Sync that lets you operationalize your insights by sending data directly to tools like Slack, Microsoft Teams, and Google Sheets. Watch this video to learn more about ThoughtSpot Sync, along with other new features like Liveboard tabs and threshold-based alerts in SpotIQ Monitor.

The Fascinating History of Data Visualization

Data visualization is an elementary component of human learning and understanding. It's also a significant capability of business intelligence (BI) and analytics solutions today. So, how did data visualization first come to be? The humble origins and gradual evolution of data visualization has led to an interesting timeline.

Building an Automated ML Pipeline with a Feature Store Using Iguazio & Snowflake

When operationalizing machine and deep learning, a production-first approach is essential for moving from research and development to scalable production pipelines in a much faster and more effective manner. Without the need to refactor code, add glue logic and spend significant efforts on data and ML engineering, more models will make it to production and with less issues like drift.

Data Vault Techniques on Snowflake: Streams and Tasks on Views

Snowflake removes the need to perform maintenance tasks on your data platform and provides you with the freedom to choose your data model methodology for the cloud. When attempting to keep the cost of data processing low, both data volume and velocity can make things challenging.

SCIM (System for Cross-domain Identity Management)

The identity team at Cloudera has been working to add the System for Cross-domain Identity Management (SCIM) support to Cloudera Data Platform (CDP) and we’re happy to announce the general availability of SCIM on Azure Active Directory! In Part One we discussed: CDP SCIM Support for Active Directory, which discusses the core elements of CDP’s SCIM support for Azure AD.

A Complete Guide to ETL test automation

In these busy times that we live in, technologies are expanding, evolving, and transforming every day. We are today introduced to new technical terms frequently, giving us a sense that digital space has taken an invincible part in our lives. Whenever we talk about technology, we may change the operating methods or overall interaction process, but one thing remains constant – the data. In other words, whatever we do with technology, we are constantly generating data. Table Of Contents.

About the State of Value Stream Management in 2022

Value Stream Management (VSM) is about empowering delivery organizations to measure, mitigate, and monitor complexity. Simply put, it aims at improving the flow of value in your organisation. The VSM Consortium recently released their highly anticipated report on “The State of Value Stream Management 2022” . In this post we recap some of the findings and look at it specifically from a software engineering and platform engineering point of view.

DataOps Observability Designed for Data Teams

Today every company is a data company. And even with all the great new data systems and technologies, it’s people—data teams—who unlock the power of data to drive business value. But today’s data teams are getting bogged down. They’re struggling to keep pace with the increased volume, velocity, variety, complexity—and cost—of the modern data stack. That’s where Unravel DataOps observability comes in.

Why ETL is Critical for Ecommerce Data Success & How to Start

It’d be hard to find anyone who’d say that taking a data-driven approach to business decisions is not worthwhile. Yet, so many businesses aren’t doing it because, as simple as it may sound on paper, it takes a great deal of strategic planning to pull off. One of the most crucial tools when it comes to accomplishing a data-driven decision-making process is known as ETL.

How To Deploy a HuggingFace Model (Seamlessly)

What if I want to serve a Huggingface model on ClearML? Where do I start? In general, machine learning engineers know by now that a good model serving engine is invaluable when serving models in production. These days, NVIDIA’s Triton inference engine is a popular option to do so, but it is lacking in some respects.

Our reflections on the 2022 Gartner Magic Quadrant for Data Integration Tools

In its 2022 Magic Quadrant™ for Data Integration Tools report, Gartner® observes that “organizations are increasingly seeking a comprehensive range of improved data integration capabilities to modernize their data, analytics and application infrastructures.”

The Biggest Mistake in E-Commerce: More Data Means More Business Value

This is a guest post for Integrate.io written by Bill Inmon, an American computer scientist recognized as the "father of the data warehouse." Inmon wrote the first book and first magazine column about data warehousing, held the first conference about this topic, and was the first person to teach data warehousing classes.

4 Best Data Lineage Tools in 2022

The modern enterprise taps into over 400 different data sources to extract the insights that sharpen its competitive edge. The complexity, though, does not stop at the origin, where data is generated. To get valuable insights from raw data enterprises must extract data from its source, transform the data (clean and aggregate it), and finally load the data into a data warehouse or BI tool, where it is served to data scientists for analysis.

The Critical Elements of Effective BI Dashboards

A business intelligence (BI) dashboard is an essential tool for managing and monitoring your business data. Well-designed BI dashboards can help you quickly and easily understand your company’s performance, identify trends, and make informed decisions. In this article, we will discuss seven key elements of an effective business intelligence dashboard. By understanding these elements, you can create a powerful tool that helps you manage your business data effectively.

A Flexible and Efficient Storage System for Diverse Workloads

Apache Ozone is a distributed, scalable, and high-performance object store, available with Cloudera Data Platform (CDP), that can scale to billions of objects of varying sizes. It was designed as a native object store to provide extreme scale, performance, and reliability to handle multiple analytics workloads using either S3 API or the traditional Hadoop API.

9 best practices and tips to follow for effective data visualization

Visualizing data is an important aspect of presenting insights clearly. But it's not always easy to create an effective visualization that people will understand on their first glance, or even second. So how do you create the kinds of graphs and tables that leave key stakeholders thinking, " Wow! I need this information!" In this post, we will discuss the top nine best practices for data visualization.

Trends and Emerging Technologies in Data Analytics for Manufacturing and Consumer Tech

More data is available than ever, challenging organizations to change how they interact with their data so they can get the most out of it. Ahmed Munir, Lead SAP Functional Technology Architect Manager at Whirlpool Corporation, has 16 years of SAP leadership experience. He joins us to share what he has learned about building great data teams, upcoming trends in data analytics to keep an eye on, and how data teams will evolve over the next 5-10 years.

Introducing Datastream for BigQuery

In today’s competitive environment, organizations need to quickly and easily make decisions based on real-time data. That’s why we’re announcing Datastream for BigQuery, now available in preview, featuring seamless replication from operational database sources such as AlloyDB for PostgreSQL, PostgreSQL, MySQL, and Oracle, directly into BigQuery, Google Cloud’s serverless data warehouse.

Data Mesh Architecture Through Different Perspectives

We previously wrote how the data mesh architecture rose as an answer to the problems of the monolithic centralized data model. To recap, in the centralized data models, ETL or ELT data pipelines collect data from various enterprise data sources and ingest it into a single central data lake or data warehouse. Data consumers and business intelligence tools access the data from the central storage to drive insights and inform decision-making.

DataOps Observability: The Missing Link for Data Teams

As organizations invest ever more heavily in modernizing their data stacks, data teams—the people who actually deliver the value of data to the business—are finding it increasingly difficult to manage the performance, cost, and quality of these complex systems. Data teams today find themselves in much the same boat as software teams were 10+ years ago. Software teams have dug themselves out the hole with DevOps best practices and tools—chief among them full-stack observability.

Adverity is Powered by Snowflake-and Moving into New Markets with Confidence

What’s harder than finding the right data architecture? Finding the right dedicated partner. Adverity gets both with Snowflake. Learn how the two organizations are moving into new markets and supplying even more reliable marketing data to Adverity customers. When a fast-growing SaaS business looks to expand its client base, it normally encounters two major challenges: In many cases, an external data solution provider can only help solve the scalability challenge.

Introduction to Datastream for BigQuery

Datastream is a serverless and easy-to-use change data capture and replication service that makes it easy to replicate data from operational databases into BigQuery reliably and with minimal latency. In this video, Gabe Weiss, Developer Advocate at Google, discusses setting up real-time replication from Cloud SQL to BigQuery. Watch along and learn how to get started with Datastream for BigQuery!

Real-time Event Streaming For Customer Data | RudderStack

In this episode of “Powered by Snowflake,” host Daniel Myers sits down with RudderStack’s Head of Customer Engineering, Lewis Mbae. RudderStack helps customers ingest, transform, and integrate data into the Data Cloud. This conversation covers the value of the Data Cloud as a central source of truth, the challenges of building an enterprise-grade customer data platform, empowering data engineers, and more.

Why is Customer Feedback so Important for the FinTech Industry?

Some time ago, we covered the key metrics that a Product Manager in a fintech organization should make a top priority when determining their KPIs, breaking them down into five groups: Session-based data, Customer Feedback, Technical Metrics, Action Stats, and Revenue. With that in mind, we conducted a series of surveys on LinkedIn, asking PMs in the fintech industry which of those groups were the most important for them while running digital product analytics.

Demystifying Modern Data Platforms

July brings summer vacations, holiday gatherings, and for the first time in two years, the return of the Massachusetts Institute of Technology (MIT) Chief Data Officer symposium as an in-person event. The gathering in 2022 marked the sixteenth year for top data and analytics professionals to come to the MIT campus to explore current and future trends. A key area of focus for the symposium this year was the design and deployment of modern data platforms.

How a Tour Operation Company Used Data to Improve Their Customer Experience

The customer journey is the decision-making process each buyer goes through before converting to a paying customer of your business. Mastering this journey will require an in-depth understanding of each stage and how you can continually improve your efforts. To understand this journey, you need to take a customer-centric approach, putting yourself in your customer’s shoes to understand their point of view.

7 Key Benefits of Data Visualization Tools

Data visualization is one of the most important capabilities of any business intelligence (BI) and analytics solution. It helps people translate complex data into a visual context, like a chart or a graph, identify trends numbers alone can't easily reveal, and discover hidden patterns in your dashboard. Data visualization also provides a wealth of additional benefits, such as enabling easier understanding of the correlation between operations and results.

Chose Both: Data Fabric and Data Lakehouse

A key part of business is the drive for continual improvement, to always do better. “Better” can mean different things to different organizations. It could be about offering better products, better services, or the same product or service for a better price or any number of things. Fundamentally, to be “better” requires ongoing analysis of the current state and comparison to the previous or next one. It sounds straightforward: you just need data and the means to analyze it.

Three dbt data modeling mistakes and how to fix them

When I first started my role as an analytics engineer, I was tasked with rewriting a bunch of data models that were written in the past by contractors. These models were taking over 24 hours to run and often failed to run at all. They were poorly thought out and contained a bunch of “quick fix” code rather than being designed with the entire flow of the model in mind.

Why is Data Integration Important in a Data Management Process?

Our five key points: Your data management processes are only as effective as the quality of the data you collate. Gaining access to as much data as possible is vital if you want the business-critical insights that can set you apart from the crowd. For Ecommerce businesses, so many of the resources you use are online, such as cloud-based SaaS, ERPs, or CRMs. Integrate.io explains why data integration is such a big part of data management for Ecommerce and the benefits of an intuitive ETL and ELT tool.

Get to anomaly detection faster with Cloudera's Applied Machine Learning Prototypes

The Applied Machine Learning Prototype (AMP) for anomaly detection reduces implementation time by providing a reference model that you can build from. Built by Fast Forward Labs, and tested on AMD EYPC™ CPUs with Dell Technologies, this AMP enables data scientists across industries to truly practice predictive maintenance.

Flex your FitBit stats using OAuth 2 authentication and Talend

We’re back with another Job of the Week – but this time, we’re taking a step back to cover a concept we’ve skipped over in previous segments: OAuth2 authentication. Richard’s demonstrations often show simpler shortcuts to accessing data – but these shortcuts may not always be practical in real-world examples. Never fear! We’ll arm you with the know-how you need to make your data hacks just as impressive in real life.

The Modern Data Lakehouse: An Architectural Innovation

Imagine having self-service access to all business data, anywhere it may be, and being able to explore it all at once. Imagine quickly answering burning business questions nearly instantly, without waiting for data to be found, shared, and ingested. Imagine independently discovering rich new business insights from both structured and unstructured data working together, without having to beg for data sets to be made available.

Blending Data in the Data Warehouse

This is a guest post with exclusive content by Bill Inmon. Bill Inmon “is an American computer scientist, recognized by many as the father of the data warehouse. Inmon wrote the first book, first magazine column, held the first conference, and was the first to offer classes in data warehousing.” -Wikipedia. Our key points: One of the characteristics of most computing and analytical environments is that the environment consists of only one type of data.

Kubernetes Logs Collection with MiNiFi C++

The MiNiFi C++ agent provides many features for collecting and processing data at the edge. All the strengths of MiNiFi C++ make it a perfect candidate for collecting logs of cloud native applications running on Kubernetes. This video explains how to use the MiNiFi C++ agent as a side-car pod or as a DaemonSet to collect logs from Kubernetes applications. It goes through many examples and demonstrations to get you started with your own deployments. Don’t hesitate to reach out to Cloudera to get more details and discuss further options and integrations with Edge Flow Manager.

Top 10 Essential Types of Data Visualization

Data visualization helps people comprehend and attain insight into big data. It represents complex data in visually interesting ways that assist in our understanding, and paves the way for a greater link between the provided raw data, and our overall engagement with it. Nowadays, we accumulate data in ever-increasing sizes, so we need an intelligent way to understand such vast volumes of information. In analytics, we often use different types of data visualization to convey complex datasets.

Top 10 must-read books for data and analytics leaders in 2022

It’s that time of year - back to school, back to books, and our annual must-read books for data and analytics leaders. Given the pace of change in our industry, continuous learning is a must, whether through networking, podcasting, or reading. To cull this year’s list, I focused mainly on books published in the last two years with the themes of data, analytics and AI. I scoured lists and reviews on Amazon, solicited ideas from social networks and got to reading.

Large Scale Industrialization Key to Open Source Innovation

We are now well into 2022 and the megatrends that drove the last decade in data—The Apache Software Foundation as a primary innovation vehicle for big data, the arrival of cloud computing, and the debut of cheap distributed storage—have now converged and offer clear patterns for competitive advantage for vendors and value for customers.

Optimize Driver Behavior to Reduce Fuel Consumption | Integrate.io

While advancing technology is beginning to bring us electric vehicles, most vehicles on the road still operate on fossil fuels. This includes the commercial vehicles used in major fleet operations. Traditionally, gathering metrics regarding a driver's driving style, vehicle speed, cruise control usage, diagnostics, drive cycle, a vehicle's engine, and other factors has been quite difficult.

Three Must-Read Data and Analytics Books with Tim Harford, Zhamak Dehghani, and Brent Dykes

It is once again that time of year when our host, Cindi Howson shares her favorite data and analytics book recommendations. In this special annual episode, we feature three of the industry’s top data writers, thinkers, and fellow podcasters. Tim Harford comes to the conversation with his new book, The Data Detective, and big-picture ideas about how traits like curiosity serve data scientists so well. Zhamak Dehghani shares her concept of The Data Mesh, especially as it relates to sharing data across business verticals. Finally, in his book, Effective Data Storytelling,

7 Most Common Dashboard Design Mistakes to Avoid

Dashboards are an important data analytics tool for understanding business metrics and managing your business performance. However, if your dashboard is not designed well, it can be difficult to use for analysis, and ineffective for decision-making. Dashboard design should be simple and accessible, and reflect your company's branding and identity. Ensuring your analytical dashboard is readable, usable and accurate is heavily reliant on following best practice dashboard design.

Modern Data Architecture for Telecommunications

In the wake of the disruption caused by the world’s turbulence over the past few years, the telecommunications industry has come out reasonably unscathed. There remain challenges in workforce management, particularly in call centers, and order backlogs for fiber broadband and other physical infrastructure are being worked through. But digital transformation programs are accelerating, services innovation around 5G is continuing apace, and results to the stock market have been robust.

Rolling NIST's Cybersecurity Framework into Action

Data backup is the last line of defense when a cyberattack occurs, especially when the attack is ransomware. With robust data backup technologies and procedures, an organization can return to a point-in-time prior to the attack and return to operations relatively quickly. But as data volumes continue to explode, ransomware attacks are growing more sophisticated and beginning to target that precious backup data and administrator functions.

Managing agents in Edge Flow Manager

This video explains the Agent Manager view introduced with the 1.4 release. The main goal of this view was to give the user better understanding and more control over the agents in the system. Monitoring individual agents’ health becomes easier as you can see rich details about them. From the Agent Details view, you can also request and download debug logs from the agents, so in case of any issues you don’t need to log in to the agent’s environment. The highly customizable main table and the different tabs (details, alerts, commands and properties) are explained in detail.

Burying the Data Warehouse - Why? | Integrate.io

This is a guest post with exclusive content by Bill Inmon. Bill Inmon “is an American computer scientist, recognized by many as the father of the data warehouse. Inmon wrote the first book, first magazine column, held the first conference, and was the first to offer classes in data warehousing.” — Wikipedia. Our critical points: Data warehouses are the whack-a-mole of technology.

5 Insights from Gartner's Hype Cycle for Data Management 2022 Report

As a global leader in technology research, Gartner supports enterprise organizations, non-profits, and government agencies by sharing information and in-depth analysis of emerging technological trends, tools, and products. With the continued growth of big data over the past decade, Gartner has been especially invested in helping data and analytics (D&A) leaders make the right decisions for managing and generating value from data within their organizations.

Five Reasons for Migrating HBase Applications to Cloudera Operational Database in the Public Cloud

Apache HBase has long been the database of choice for business-critical applications across industries. This is primarily because HBase provides unmatched scale, performance, and fault-tolerance that few other databases can come close to. Think petabytes of data spread across trillions of rows, ready for consumption in real-time.

The History of BI Dashboards

Technology is always evolving, and so is the way we use it to collect and analyze data. From humble spreadsheet beginnings to fully automated business monitoring and AI-powered analysis, the range of analytics tools on offer today is astounding to consider. The business intelligence dashboard is one such option. It has existed for decades as a tool for organizations to monitor and analyze operational data - and it's not quite dead yet.

Expert Panel: Challenges with Modern Data Pipelines

Modern data pipelines have become more business-critical than ever. Every company today is a data company, looking to leverage data analytics as a competitive advantage. But the complexity of the modern data stack imposes some significant challenges that are hindering organizations from realizing their goals and realizing the value of data.

Tackling Cloud Complexity with Standardization at VMware Explore

Cloud complexity is an inevitability. Regardless of where an organization may be on their cloud journey – on-prem, in the public cloud, or managing an expanding hybrid cloud – the reality is managing the enterprise isn’t getting any easier. Demand continues to rise for greater access to more data across the organization to do things like run analytics and machine learning and to automate more processes.

8 Reasons to Build Your Cloud Data Lake on Snowflake

You want to enable analytics, data science, or applications with data so you can answer questions, predict outcomes, discover relationships, or grow your business. But to do any of that, data must be stored in a manner to support these outcomes. This may be a simple decision when supporting a small, well-known use case, but it quickly becomes complicated as you scale the data volume, variety, workloads, and use cases.

Slack Elevates the Customer Experience by Centralizing Marketing Data in Snowflake

Software company Slack is on a mission to make work simpler, more pleasant, and more productive. Millions of users across more than 150 countries use Slack to collaborate with team members, connect other tools and services, and access information. Marketers at Slack rely on large amounts of data to build custom audiences, manage subscriber consent preferences, and measure campaign performance.

How the modern data stack helps analysts avoid dashboard hell

The modern data stack promises greater agility for data teams, best of breed capabilities, and gaster time to market. But does it elevate the analyst's & analytics engineer career and bring joy back to their daily job? Dead end dashboards on a modern cloud data platform pigeon-holes analysts into repetitive, low-value work. Join ThoughtSpot co-founder and CTO Amit Prakash and Chief Data Strategy Officer, Cindi Howson for a real-time deep dive on how the modern data stack is elevating the analyst and empowering business leaders.