Systems | Development | Analytics | API | Testing

June 2021

Mercury Rising in BigQuery with Multistatement Transactions

Mercury, the Roman god of commerce, is often depicted carrying a purse, symbolic of business transactions, wearing winged sandals, illustrating his abilities to move at great speeds. Transactions power the world’s business systems today, ranging from millions of packages moving worldwide tracked in real time by logistics companies to global payments from personal loans to securities trading to intergovernmental transactions, keeping goods and services flowing worldwide.

What is Customer Data Ingestion?

The verdict is in: The more you analyze your customer data, the better chance you have of outperforming your business rivals, attracting new prospects and providing excellent service. For example, a report by McKinsey & Company has good news for companies who are "intensive users of customer analytics:" Their chances of excelling at new customer acquisition and being highly profitable are 23 and 19 times more likely, respectively, than those of their competitors.

Pushing Data from a Data Warehouse to Salesforce

Salesforce is the world’s leading CRM (customer relationship management) software, with a 20 percent market share. The Salesforce CRM software is chock-full of features for business intelligence (BI) and analytics so that you can capture hidden insights and make smarter, data-driven decisions. The traditional ETL (extract, transform, load) process extracts data from one or more sources and then deposits it into a centralized data repository.

Agile Data in Financial Services

In financial services, data is essential for storing product information, capturing customer details, processing transactions and keeping records of accounts; the relationship between products and their underlying data has always been symbiotic. A significant amount of data infrastructure is static, fragmented across data silos or based on legacy platforms. This has created an impedance mismatch between products and the underlying data.

Qlik Cloud Data Services - Hybrid Data Delivery - Presentation and Demo - Do More with Qlik

This is a recording of the Do More with Qlik Webinar Series. To lead in the digital age, everyone in your business needs easy access to the latest and most accurate data. Join Mike Tarallo and Ola Mayer as they introduce you to Qlik’s vision for its integration platform as a service - Qlik Cloud Data Services, covering one of the first data services to be generally available - Hybrid Data Delivery. Hybrid Data Delivery lets you build a data pipeline to your Qlik Sense apps. It collects data from disparate sources and continuously delivers analytics ready data so your Qlik Sense apps always have the latest information.

Shine on with user-friendly SQL capabilities in BigQuery

June is the month which holds the summer solstice, and (at least in the northern hemisphere) we enjoy the longest days of sunshine out of the entire year. Just as the sun is making its longest trips across the sky, the BigQuery team is delighted to announce our next set of user-friendly SQL features.

ATB Financial boosts SAP data insights and business outcomes with BigQuery

When ATB Financial decided to migrate its vast SAP landscape to the cloud, the primary goal was to focus on things that matter to customers as opposed to IT infrastructure. Based in Alberta, Canada, ATB Financial serves over 800,000 customers through hundreds of branches as well as digital banking options. To keep pace with competition from large banks and FinTech startups and to meet the increasing 24/7 demands of customers, digital transformation was a must.

PII Pseudonymization is the New Normal

Businesses rely on personal data to better tailor their approach to customer relations and streamline a marketing strategy suited to their target audience. In today's business climate, holding onto the personally identifiable information (PII) of specific individuals for various marketing and customer services purposes requires secure storage and extraction. PII pseudonymization is the latest and greatest method for protecting personal data.

Maxa - AI for ERP and Point of Sale systems built on Snowflake

In today's episode, Daniel Myers from Snowflake interviews Raphael Steinman, CEO and Founder of maxa.ai -- an AI software platform that runs alongside ERP and point of sale systems that helps to monitor, forecast, and optimize businesses. Powered by Snowflake is a series where we interview technology leaders who are building businesses and applications on top of Snowflake.

What is ETL?

The ETL process involves moving data from a source, transforming the data in some way, and loading the information to the same or a different source. You may feel a little confused the first time you encounter an ETL process. With the right platform, though, you can adjust quickly and learn how to manipulate data to make it more valuable.

AutoML Tables is now generally available in BigQuery ML

Google’s cloud data warehouse, BigQuery, has enabled organizations around the world to accelerate their digital transformation and empower their data analysts to unlock actionable insights from their data. Using BigQuery ML, data analysts are able to create sophisticated machine learning models with just SQL and uncover predictive insights from their data much faster.

Secure PII Pseudonymization: How to Do It Right

With news of a devastating data breach constantly in the headlines, you need to take proactive steps to safeguard the personally identifiable information (PII) that your organization stores and processes. Along with techniques such as PII masking, PII pseudonymization is one of the most popular and practical ways to protect sensitive data. But what is PII pseudonymization, exactly, and how can you pseudonymize PII? We’ll answer these questions and more in this article.

SEMrush - Your End-to-end SEO Solution

In today’s digital age, keeping up with market trends is exactly what a business has to do to stay ahead. Creating a solid online brand image plays a key role in this task, and to do it, dedicated SEO efforts go a long way. Crafting targeted keywords that can direct traffic to your webpages can work wonders in capturing a widespread customer base. Now what if we were to tell you that instead of doing everything manually, you could rely on an automated tool to take care of things?

The BigQuery admin reference guide: Resource Hierarchy

Starting this week, we’re adding new content to the BigQuery Spotlight Youtube series. Throughout the summer we’ll be adding new videos and blog posts focused on helping new BigQuery architects and administrators master the fundamentals. You can find complimentary material for the topics discussed in the official BigQuery documentation.

AWS Data Pipeline Best Practices

Knowing best practices for Amazon Web Services (AWS) data pipelines is essential for modern companies handling large datasets and requiring secure ETL (Extract, Transform, Load) processes. In this article, we discuss AWS data pipeline best practices to ensure top performance and streamlined processes — without complications that can impede the execution of data transfer.

5 Steps to Prevent PII Data Breaches

When it was revealed in September 2017, the massive Equifax data breach made international headlines. As one of the three major credit agencies in the United States, Equifax is responsible for processing personally identifiable information (PII) such as individuals’ names, addresses, and social security numbers. According to Equifax, 143 million people were affected by the data breach, making it one of the biggest cybersecurity disasters in history.

Migrate Hive data from CDH to CDP public cloud

Many Cloudera customers are making the transition from being completely on-prem to cloud by either backing up their data in the cloud, or running multi-functional analytics on CDP Public cloud in AWS or Azure. The Replication Manager service facilitates both disaster recovery and data migration across different environments.

Getting to Know the Apache Hadoop Technology Stack

With technology innovations raging at incredible speeds over the past few decades, new and exciting platforms for gathering, storing, transforming, and manipulating data are entering the market every day. Apache Hadoop was one of these disrupters when it entered the market in 2006, offering distributed storage and big data processing using a network of many computers.

The Ultimate Guide to HIPAA

The Healthcare Insurance Portability and Accountability Act (HIPAA) has been an important federal law in healthcare since 1996. Part of its purpose was to create standards meant to protect sensitive patient information, and it took on even more important once the digitalization of patient health records became widespread. Now it’s required for certain types of businesses to protect patient health information—or face fines that range from $100 to $50,000 per violation.

Deploying applications on CDP Operational Database (COD)

CDP Operational Database Experience (COD) is a PaaS offering on the Cloudera Data Platform (CDP). COD enables you to create a new operational database with a few clicks and auto-scales based on your workload. Behind the scenes, COD automatically manages cluster deployment and configuration, reducing overheads related to setting up new database instances. Additionally, auto-scaling eliminates the need to size a cluster for your workloads.

Ecommerce analytics 101: The ultimate guide

To grow your eCommerce business ahead of your competitors you need to rely on analytics. Ecommerce analytics is the compass that replaces your gut feelings as you scale your e-shop to higher grounds and more online sales. In this ultimate guide to eCommerce analytics we will look at: What is eCommerce analytics? Why is eCommerce analytics crucial for the success of your store? What are the best metrics and KPIs to track for eCommerce?

Fraud Processing with SQL Stream Builder

SQL Stream Builder allows developers and analysts to write streaming applications using industry-standard SQL. In this video, you will learn the interactive experience with syntax checking, error reporting, schema detection, query creation, and creating outputs on fraud detection with its powerful interface and APIs.

From 0 to Dashboard with Cloudera Data Warehouse

Today you'll see a quick demo on how to start off with any given dataset, reference it within Cloudera Data Warehouse, and then use the in house Data Visualization to create a live dashboard from the data. We'll use some example shipping data and show how you can go from 0 to dashboard in no time at all.

SaaS in 60 - Digest Notification Delivery Option

This week we introduce a new delivery option within the Qlik Sense SaaS notification settings. In addition to receiving triggered, as they happen, Qlik Sense SaaS notifications via email, mobile devices, and via the hub notification icon – you can now choose to receive all your notifications in a easy to view digest which creates a single notification bundle delivered to you in a single email.

How to structure your BigQuery resources

What are folders, projects, and datasets and how do they come together to support warehousing fundamentals like security and cost management? In this episode of BigQuery Spotlight, we’ll review the BigQuery resource model and how this resource hierarchy is reflected in the Cloud Console, where you’ll interact with and analyze your BigQuery data. Moreover, we’ll give you some helpful tips when it comes to structuring your own BigQuery resources. Watch to learn the best way to structure your BigQuery deployments!

Data Privacy: Are You Making These Mistakes?

Organizations have access to massive amounts of data, but they don’t always give enough thought to how they’re going to keep it private and protected. Dozens of data privacy regulations are in effect or in development globally, and the average consumer is learning more about how much of their data gets collected and used by businesses. For this reason, companies need to focus on keeping data safe while it's under their control, but it’s easy to make mistakes.

Prescriptive Analytics

Data analytics technology helps organizations make sense of an ever-increasing volume of data. As this technology matures, it gets better at delivering actionable insights and helping companies determine outcomes. Prescriptive analytics is a modern solution that builds upon other analytics technology and guides organizations to the right decisions for a particular situation.

Insurers - Be Aware of the Hidden Exposures in assessing the economic impact of Climate Risk

Climate change is a challenge for insurers in some obvious ways, such as stronger and more frequent natural disasters. Yet there are also more subtle risks to monitor, including changes to insured assets, risks, and exposures. Climate impacts the production quality and quantity of insured consumable goods, their location, and their supply chains.

The Rocket Behind Snowflake's Rocketship

Not a day goes by without questions from candidates, customers, and other interested parties about how we run Snowflake Engineering. I often hear: “Snowflake has been delivering a truly innovative, high quality product, and the pace of delivery is only accelerating. There must be a secret to it.” Indeed, we have a unique engineering team, and continue to hire world-class engineers. They are the driving force behind our products.

Looking for an ETL tool? Stop. Right. Here.

You have started your data journey. You know you need to somehow collect data from various sources and land them into a data warehouse or data lake of some sort. Right now you’re browsing tools and calculating costs - there’s one for extraction, another one for transformations, there’s an ETL tool. What if we told you there’s a better way?

Pulno - The Ideal Website Evaluator

If you’ve been working with SEO for some time, you’d know that although it’s a reliable way of improving your website’s searchability, it’s often marred by the downside of being very time-consuming. Moreover, manual SEO strategies could easily fail to achieve the desired results because of the ever-increasing level of competition. In such a scenario, an automated tool that helps you enhance your SEO efforts can prove to be a boon.

How to Grow Revenue with a 360-Degree Customer View

At the Modern Data Stack EMEA Conference, Fivetran Customer Success Manager Maeve Byrne is joined by Igor Chtivelband, Co-Founder and VP of Data & CRM at Billio.io and Bahadir Sahin, Director of Data & Analytics at Onfido. The panel shares their journeys toward better customer engagement, fueled by faster access to more data.

Discover Which Source Brings in The Most New Opportunities for Your Business

Calculating the return on your marketing investment can be challenging and time-consuming. As there are various marketing sources and channels that create new sales opportunities, it’s important to know which ones are working best to help you meet your business goals.

3 Reasons Extract, Load & Transform is a Bad Idea

Extract, Load, Transform (ELT) technology makes it easy for organizations to pull data from databases, applications, and other sources, and move it into a data lake. But companies pay for this convenience in many ways. ELT solutions can have a negative impact on data privacy, data quality, and data management.

Building a Single Pipeline for Data Integration and ML with Azure Synapse Analytics and Iguazio

Across organizations large and small, ML teams are still faced with data silos that slow down or halt innovation. Read on to learn about how enterprises are tackling these challenges, by integrating with any data types to create a single end-to-end pipeline and rapidly run AI/ML with Azure Synapse Analytics with Iguazio.

Get a Complete View of Salesforce Data with MongoDB

Teri will show you how you can incorporate Salesforce (relational data) into a MongoDB collection (non-relational data) to give your customers a unified customer experience. The webinar will focus on the piece of the puzzle where we read Salesforce data and format it into the shape needed to go into a Mongo collection (a collection is the term MongoDB uses for a data set like a table in a relational database). We’re showcasing the ability to go back and forth from NoSQL to SQL for a unified customer experience.

Build/Buy in MLOPs for R&D Does "off-the-shelf" exist yet?

What kind of tools and infrastructure does a company need in order to build, train, validate and maintain data-based models as part of products? The straight answer is - “it depends.” The longer one is: “MLOps.” It is far too early to determine the “best” patterns and workflows for Data-Science, Machine- and Deep-Learning products. Yet, there are numerous examples of successful deployments from businesses both big and small.

Comprehending ClearML and MLOps - Enabling the New A-Z (ODSC East '21)

ClearML is an industry leading MLOps suite, fully open source and free in the best sense. Designed to ease the start, running and management of experiments and orchestration for every day practitioners, we will also see how it provides a clear path to deployment. Starting with a high level overview of the parts built into ClearML, we will then journey into what is and also, importantly, what is not part of ClearML's mandate. Along the way we will demonstrate how-to integrate into your PyTorch code, as well as the capabilities of reporting and possible workflows that could be made easier by pipeline usage.

[MLOps] The Clear SHOW - S02E10 - Everything You Wanted To Know About Model Stores*

Ariel (ft. G. Raffa) discusses the reasoning behind model stores, why you might want to build one, and reviews a model store library vs. ClearML to understand what needs to be built "on top" of our open-source MLOps Engine. + Operator AI ClearML is the only open-source tool to manage all your MLOps in a unified and robust platform providing collaborative experiment management, powerful orchestration, easy-to-build data stores, and one-click model deployment.

BigQuery row-level security enables more granular access to data

Data security is an ongoing concern for anyone managing a data warehouse. Organizations need to control access to data, down to the granular level, for secure access to data both internally and externally. With the complexity of data platforms increasing day by day, it's become even more critical to identify and monitor access to sensitive data.

How to Modernize Your Analytics Department

At the Modern Data Stack EMEA Conference, Fivetran Solution Architect Niamh O'Brien is joined by Jonathan Rankin, Data Manager, Guardian News and Media; Donas Rashedi, Head of Data, Douglas; and Pete Williams, Chief Data Officer, Penguin Random House UK. This group discusses the future of the modern data stack and its implications for business in 2021.

Pinboard Design Best Practices - ThoughtSpot Success Series #6

Join ThoughtSpot's Customer Success team and other users like yourself as we discuss various topics in our new Success Series. During this event, we shared best practices one can follow when creating pinboards that will help yourself, your teams and your customers get maximum value out of what their data is telling them.

Google's Page Experience Update: How to Better Prepare Your Agency and Clients According to Pepperland Marketing

In this episode of Metrics and Chill, Sean Henri, Founder and CEO at Pepperland marketing, shares the latest Google Page Experience update details, including how his agency prepared themselves and their clients and the changes they implemented.

Parameta Solutions elevates its analysts with ThoughtSpot

At Parameta Solutions, clients have come to expect our data, and our data products, to be robust and reliable. The way we really stand out in this specialist arena is through the quality and sophistication of our client services. I was delighted recently to share my experiences of how ThoughtSpot is supporting us in both of these aims during a webinar hosted by Cindi Howson during the Chief Data & Analytics Officers UK, 2021 event last February.

Why User-Level Security Is Crucial for Business Intelligence

Picking the right business intelligence (BI) tool is essential to helping you beat your competitors, better serve your customers, and make smarter data-driven decisions. However, there's no one-size-fits-all tool for every enterprise. Not all BI users are created equal, and not all users should have the same level of access to sensitive and confidential data.

ETL with Apache Airflow

Written in Python, Apache Airflow is an open-source workflow manager used to develop, schedule, and monitor workflows. Created by Airbnb, Apache Airflow is now being widely adopted by many large companies, including Google and Slack. Being a workflow management framework, Apache Airflow differs from other frameworks in that it does not require exact parent-child relationships. Instead, you only need to define parents between data flows, automatically organizing them into a DAG (directed acyclic graph).

Simplifying Data Management at LinkedIn Part 2

In the second of this two-part episode of Data+AI Battlescars, Sandeep Uttamchandani, Unravel Data’s CDO, speaks with Kapil Surlaker, VP of Engineering and Head of Data at LinkedIn. In part one, they covered LinkedIn’s challenges related to metadata management and data access APIs. This second part dives deep into data quality.

Should You Leave Your Company's PII Data Unprotected?

Personally identifiable information (PII) is some of the most valuable data that organizations can have. It's also some of the most dangerous if you don't follow data security best practices. If you don't treat this data with care, you could end up in the headlines as the victim of the latest data breach, costing you money and damaging your reputation. Of course, you should never leave PII data unprotected. So what is the best way to protect the confidential and sensitive PII that you handle?

Simplifying Data Management at LinkedIn Part 1

In the first of this two-part episode of Data+AI Battlescars, Sandeep Uttamchandani, Unravel Data’s CDO, speaks with Kapil Surlaker, VP of Engineering and Head of Data at LinkedIn. In this first part, they cover LinkedIn’s challenges related to Metadata Management and Data Access APIs. Part 2 will dive deep into data quality.

Accelerating AI-based search in the cloud with ThoughtSpot for Snowpark

We’re all familiar with how Google Search revolutionized information processing for consumers. The ingenious combination of AI with a new way to organize content on the web created a user-friendly experience that forever changed how the world finds relevant information on the internet.

Organizations Grapple with Skyrocketing Cloud Costs, Anodot Survey Finds

The pandemic upended business for many or at the very least cast a grim shade of uncertainty, so, as many took to working from home, they also were commissioned with cutting waste. Among the biggest sources of misspend in 2020 – cloud services. And remote work may have actually spurred the problem, as organizations migrate more applications to the cloud to support these workers.

PII Substitution: 4 Ways to Protect Your Sensitive Data

News of the latest massive data breach is always in the headlines. How can you avoid being next on the list? In order to function, businesses of all sizes and industries need to collect personally identifiable information (PII) about their employees and customers—but they also need to take proactive steps to keep this information secure and defend against PII breaches. PII substitution is an effective tactic to shield your sensitive and confidential data from prying eyes.

Recruiting and Building the Data Science Team at Etsy

In this episode of Data+AI Battlescars (formerly CDO Battlescars), Sandeep Uttamchandani talks to Chu-Cheng, CDO at Etsy. This episode focuses on Chu-Cheng’s battlescars related to recruiting and building a data science team. Chu-Cheng leads the global data organization at Etsy. He’s responsible for data science, AI innovation, machine learning and data infrastructure. Prior to Etsy, Chu-Cheng has led various data roles, including at Amazon, Intuit, Rakuten and eBay.

The Complete Guide to Student Data Privacy

Are you handling students' education records or personally identifiable information (PII)? If so, it's crucial that you're familiar with what student privacy laws such as the Family Educational Rights and Privacy Act (FERPA) have to say. In this article, we'll go over what educators and administrators need to know about FERPA and student data privacy.

Create a Salesforce ETL Pipeline in 30 Minutes

Salesforce is one of the world’s most popular CRM (customer relationship management) software platforms, helping businesses of all sizes and industries beat their competitors and better serve their clients. But instead of keeping your Salesforce data inside the CRM platform itself, you can make better use of this information by moving it into a target data warehouse.

Automated Deployment of CDP Private Cloud Clusters

At Cloudera, we have long believed that automation is key to delivering secure, ready-to-use, and well-configured platforms. Hence, we were pleased to announce the public release of Ansible-based automation to deploy CDP Private Cloud Base. By automating cluster deployment this way, you reduce the risk of misconfiguration, promote consistent deployments across multiple clusters in your environment, and help to deliver business value more quickly.

Welcome to Snowpark: New Data Programmability for the Data Cloud

At Snowflake Summit 2021, we announced that Snowpark and Java functions were starting to roll out to customers. Today we’re happy to announce that these features are available in preview to all customers on AWS today. These features represent a major new foray into data programmability, enabling you to more easily make Snowflake’s platform do more for you. Snowflake started its journey to the Data Cloud by completely rethinking the world of data warehousing to accommodate big data.

[MLOPS] The Clear SHOW - S02E09 - All your "stores" are belong to us!

G. Raffa describes our new arc: We are going to build a "model store" using the open-source MLOps engine! Tell us what you think of his plan in the comments below! ClearML is the only open source tool to manage all your MLOps in a unified and robust platform providing collaborative experiment management, powerful orchestration, easy-to-build data stores, and one-click model deployment. ClearML is the foundation of your data science team. Don’t see the functionality you need? Build on top of it in a snap.

The full stack solution for data democratization

Speed and agility are vital in today’s dynamic economy. But moving fast in the dark is dangerous. Decision makers need the insights and guidance they can only get from reliable data. But despite massive efforts from their internal BI teams, getting that data when and how they need it has been problematic.

The day the dashboard died

For more than 20 years, dashboards served as a foundational element of business intelligence, helping leaders visualize and share valuable data across their organization. At inception, dashboards were the perfect vehicle for delivering key report KPIs without data workers needing a background in coding or IT. But much has changed over the last two decades, including the appetite and needs of your business users.

Amazon Redshift Database Developer Guide

Amazon Redshift is one of the most prominent data warehousing leaders across companies of all industries and sizes, providing applications in analytics, reporting, business intelligence, etc. Using Amazon redshift will allow you to retrieve, compare, and evaluate large amounts of data in multiple-stage operations to deliver the desired outcome.

Telecommunications and the Hybrid Data Cloud

As the inexorable drive to cloud continues, telecommunications service providers (CSPs) around the world – often laggards in adopting disruptive technologies – are embracing virtualization. Not only that, but service providers have been deploying their own clouds, some developing IaaS offerings, and partnering with cloud native content providers like Netflix and Spotify to enhance core telco bundles.

Get control over your data pipelines with data orchestration

Enterprises are tapping and leveraging big data to get ahead of the competition. As Peter Sondergaard, ex-Executive Vice President at Gartner said: The problem with the combustion engine is that it does not scale well. As companies grow, the data platforms they previously relied on for analytics start to break apart.

Monitoring BigQuery reservations and slot utilization with INFORMATION_SCHEMA

BigQuery Reservations help manage your BigQuery workloads. With flat-rate pricing, you can purchase BigQuery slot commitments in 100-slot increments in either flex, monthly, or yearly plans instead of paying for queries on demand. You can then create/manage buckets of slots called reservations and assign projects, folders, or organizations to use the slots in these reservations. By default, queries running in a reservation automatically use idle slots from other reservations.

SOA vs. Microservices

Most individuals who work in technology — and in particular, cloud computing — are likely aware of how service-oriented architecture (SOA) and microservices work. There is much discussion, however, regarding the best approaches for various situations. There are crucial differences in data governance, component sharing, and the architecture between SOA vs. microservices. In this article, you'll learn the basics of SOA vs.

Crashes in Neobank, eBank, and Crypto-trading Apps Are Unforgivable-Crash Analytics is the Answer

Regardless of how technically sound the engineering of an app is, bugs, errors, and crashes can happen. So when they do, you must recover from it by doing a deep analysis of the technical aspects and the impact on the overall customer experience. If your crypto-exchange or banking app is not getting the right insights you need from the crashes, your churn rate and your customers will definitely let you know sooner than you think.

The Official 2021 Checklist for HIPAA Compliance

The Health Insurance Portability and Accountability Act of 1996 (HIPAA) is a U.S. federal law. It sets national standards for health care providers to maintain the privacy of patients' protected health information (PHI), including electronically protected health information (ePHI). If you collect, store, or process any kind of patient or medical data, you need to be aware of HIPAA and how it affects your operations. But what does it really mean to be HIPAA compliant?

Is your data healthy?

It’s no secret that what companies need from their data and what they can actually get from their data are two very different things. According to our recent survey, most executives work with data every day, but only 40% of them always trust the data they work with. We also discovered that 78% of them have challenges making data-driven decisions. Virtually every business is collecting more data than ever before, so lack of data can’t be the issue.

How to use Apache Spark with CDP Operational Database Experience

Apache Spark is a very popular analytics engine used for large-scale data processing. It is widely used for many big data applications and use cases. CDP Operational Database Experience Experience (COD) is a CDP Public Cloud service that lets you create and manage operational database instances and it is powered by Apache HBase and Apache Phoenix.

Display Moesif Reports Within Tableau

As a product leader, there’s no better way to show the value of your API platform then by graphically displaying key metics. If you’re already working with Tableau, it’s easy to extract key charts, or workspaces, from Moesif’s dashboards and insert them into your visualization platform. The information from Moesif will be inserted into Tableau as a web page object.

What is PII Masking and How Can You Use It?

Imposter fraud is the second-most common type of fraud reported to the Federal Trade Commission, with around one-fifth of all cases resulting in financial loss to the victim. This often occurs because of a failure on the part of organizations to protect personally identifiable information (PII). Fraud is only one type of attack that may occur. Phishing is another exceptionally common data security threat. It often results from crawlers collecting email addresses, one type of PII, on the open web.

How Analytics is Becoming as Simple and Reliable as Electricity

At the Modern Data Stack Conference EMEA, Fivetran's VP of EMEA Nate Spohn provides the keynote address with added discussions with Bob Muglia, Investor and former CEO at Snowflake; Alex Billar, VP of Platform, EMEA; and Magnus Carlsson, VP & Head of Innovation I&D Nordics, Capgemeni.

A Guide to Data Privacy and Data Protection

Organizations collect and use personal data for a variety of purposes, often without considering the impact on data privacy. Individuals are increasingly more aware of how their data is being used and the lack of say they have over the process. Data privacy and protection regulations are in place around the world to protect consumers and stop their personal information from being misused.

6 Mistakes to Avoid When Handling PII

Personally identifiable information, or PII, is sensitive information that can identify an individual. Industry or data protection laws often regulate this type of data, requiring that organizations handle PII according to specific practices. It’s all too easy to make mistakes when working with PII, so we've highlighted six common scenarios to look out for.

The 4 keys to a successful manufacturing IIOT pilot

If you have read our previous post focusing on the challenges of planning, launching and scaling IIOT use cases, you’ve narrowed down the business problems you’re trying to solve, and you have a plan that is both created by the implementation team and supported by executive management. Here’s a plan to make sure you’ve got it all down. Think of these success factors like the legs of a kitchen table and the results that you desire, a bowl of homemade chicken soup.

How to avoid BI vendor lock-in with open architecture

Many organizations have considered or experienced a desire to take advantage of a cheaper platform, more feature rich-software, or to divorce from a vendor who is not delivering. But the pain of moving is simply too great to consider doing it. This is the state of being locked-in. In this blog, we explain what lock-in is, what it means for businesses searching for new embedded analytics solutions, and why it should be a consideration when choosing a platform.

Bringing the World's Data Together: Announcements from Snowflake Summit

At this year’s Snowflake Data Cloud Summit: Data Together Now, customers and partners from around the world came together to explore the transformational power of data and Snowflake’s vision for bringing the world’s data together in the Data Cloud. Over the course of two days and 70 sessions, attendees were inspired by keynotes, learned from their peers in customer and partner sessions, and participated in hand-on labs and technical deep-dives.

Future of Data Meetup: Building Automated Machine Learning Workflows in the Cloud

In this meetup, we’re going to put ourselves in the shoes of an electric car manufacturer that produces all the parts for their cars in house. First, we’ll show you an example on how this fictional car company could walk through the process of creating a prediction model based on part production data. We will then automate the creation of these models by making them depending on an upstream data collection process. To finish it off, we’ll deploy these models and make them accessible via an external API all within a native cloud environment using the Cloudera Data Platform.

Explosive Data Growth on a Flat IT Budget

Data is growing exponentially. In 2020 alone 64 Zettabytes of data was created! At the same time, many IT organizations are struggling with flat IT budgets—driving the need for flexible data storage consumption. As businesses grow, their ability to consume capacity in a flexible manner becomes even more critical. Tom Christensen, Hitachi Vantara Global Technology Advisor, discusses on-prem, public cloud and private cloud options and the advantages of private cloud, consumption based services.

Copado - A Complete Native DevOps Platform built on Snowflake

Powered by Snowflake is a series where we interview technology leaders who are building businesses and applications on top of Snowflake. In this episode, Daniel Myers from Snowflake interviews Gloria Ramchandani, Senior Director of Product at Copado, an end-to-end, native DevOps platform for Salesforce built on Snowflake.

Anecdotes - A Modern Compliance Platform Built on Snowflake

Powered by Snowflake is a series where we interview technology leaders who are building businesses and applications on top of Snowflake. In this episode, Daniel Myers from Snowflake interviews Yair Kuznitsov, CEO and co-founder of Anecdotes, a modern platform for compliance professionals that continuously collects and maps relevant data from hundreds of different systems into normalized, credible evidence and offers advanced visibility to ensure the best cross-team collaboration, built on Snowflake.

Rollbar Academy: Rollbar Analytics

This session focuses on revealing the operational data that is available for analysis within your Rollbar account and how to utilize it to better understand and improve your development processes. Learn how to take advantage of features like People tracking and RQL to explore error data in-depth and how to further automate these steps using the Rollbar REST API.

How T-Mobile Netherlands ditched its dashboards to dial up self-service business insights

In a telecoms industry differentiated mainly by service and threatened by churn, gaining access to timely customer data initially drove internal demand for analytics and BI (ABI) reporting. When demand grew beyond customer service, our IT team really struggled to keep up. This set us off on a journey to find ways to help our colleagues to find insights and the answers to their questions themselves.

From two years to 24 hours: How Mastercard taps into financial facts faster with ThoughtSpot

When dealing with payments, speed is king. Technology is increasing the speed at which innovation happens, and nowhere is that more apparent than how everyday commerce is transacted. Credit cards are now built with tap-to-pay capabilities, if you still even have a physical credit card. Phones have become the new credit card, with data from transactions and mobile banking adding to the overwhelming amount of data that can be collected.

What is new in Cloudera Streaming Analytics 1.4?

At the end of March, we released the first version of Cloudera SQL StreamBuilder as part of CSA 1.3. It enabled users to easily write, run and manage real-time SQL queries on streams from Apache Kafka with an exceptionally smooth user experience. Since then, we have been working hard to expose the full power of Apache Flink SQL and the existing Data Warehousing tools in CDP to combine it into a state-of-the-art real-time analytics platform.

Cloudera named a Strong Performer in The Forrester Wave: Streaming Analytics, Q2 2021

Cloudera has been named as a Strong Performer in the Forrester Wave for Streaming Analytics, Q2 2021. We are excited to be recognized in this wave at, what we consider to be, such a strong position. We are proud to have been named as one of “The 14 providers that matter most” in streaming analytics. The report states that richness of analytics, development tool options and near-effortless scalability are what streaming analytics customers should look for in a provider.

Cloudera Streaming Analytics 1.4: the unification of SQL batch and streaming

In October of 2020 Cloudera acquired Eventador and Cloudera Streaming Analytics (CSA) 1.3.0 was released early in 2021. It was the first release to incorporate SQL Stream Builder (SSB) from the acquisition, and brought rich SQL processing to the already robust Apache Flink offering. The team’s focus turned to bringing Flink Data Definition Language (DDL) and the batch interface into SSB with that completed.

How to Turn Your Data Into Insights: the Art and Science of E-Discovery

The thing about data is there’s no end to how much of it you can collect and keep. Each day, if you’re like most of the global banks I work with, you’re generating oceans of the stuff. Yet the bulk of it will never be very helpful or even relevant to your day-to-day business decisions.

The Clear SHOW - S02E08 - DataOps pt. III (Pipe it up!)

Finally! We write a reusable pipeline to wrap it all together into an automated workflow for R&D! Watch T.Guerre to find out how! First time hearing about us? Go to - clear.ml! ClearML: One open-source suite of tools that automates preparing, executing, and analyzing machine learning experiments. Bring enterprise-grade data science tools to any ML project

Iguazio Product tutorial 2021

The Iguazio Data Science Platform enables you to develop, deploy and manage real-time AI applications at scale. It provides data science, data engineering and DevOps teams with one platform to operationalize machine learning and rapidly deploy operational ML pipelines. The platform includes an online and offline feature store, fully integrated with automated model monitoring and drift detection, model serving and dynamic scaling capabilities, all packaged in an open and managed platform.

How Marketers Will Measure Campaign ROI After Third-Party Cookie Deprecation

Every marketer knows the old John Wanamaker quote, “Half the money I spend on advertising is wasted; the trouble is, I don’t know which half.” With the introduction and evolution of multi-touch attribution (MTA), the media industry has moved closer than ever to understanding the impact of each touchpoint on a customer’s behavior, from signing up for a newsletter to making a purchase.

BI Compliance: Can a Restructure Deliver Enhanced Data Privacy?

Every data-driven business is terrified of the prospect of a data breach. Exposing sensitive data could mean reputational damage, loss of clients, and heavy fines under emerging privacy laws. But every data-driven business also wants to make use of its data. Business intelligence (BI) platforms allow anyone to build complex and detailed dashboards that help them understand the organization’s current state. How do you resolve this tension? One approach is to build a privacy-first data structure.

ELT: Easy to Deploy, Easy to Outgrow

Extract, load, transform (ELT) technology is a type of data pipeline that ingests data from one or more sources, loads the data into its destination (typically a data lake), and then allows end-users to perform ad-hoc transformations on it as needed. ELT can perform mass extraction of all data types, including raw data, without the need to set up transformation rules and filters before data loading.

Validations - Cloudera Support's Predictive Alerting Program

Cloudera Support’s cluster validations proactively identify known problem signatures contained in customers’ diagnostic data with the goal of increasing cluster health, performance, and overall stability. Cluster validations are included in a customer’s enterprise subscription at no additional cost. All customers with access to the Support case portal will also be able to take advantage of cluster validations.

Worksheet Best Practices - ThoughtSpot Success Series #5

Introducing the ThoughtSpot Success Series! Want to expand your knowledge of ThoughtSpot? Want to learn some great tips and tricks? Join ThoughtSpot's Customer Success team and other users like yourself as we discuss various topics in our new Success Series. During this event, we'll share how to build a ThoughtSpot use case pipeline that allows you to maximize the value return on investment & maintain momentum.

Embedded analytics 2.0: Your secret weapon to empowering frontline workers and locking in customers loyalty

Last year, Harvard Business Review and ThoughtSpot published a groundbreaking survey on the business benefits of empowering frontline workers with data. Revenues are higher, operations more efficient, customer service better, and employees happier. And yet, few organizations deploy BI this way, historically held back by the technology, conflicting priorities, and mindset.

Introducing Continual - the missing AI layer for the modern data stack

I’m extremely excited to introduce Continual. Continual is the easiest way to maintain predictions – from customer churn to inventory forecasts – directly in your cloud data warehouse. It’s built for modern data teams that want to leverage machine learning to drive revenue, streamline operations, and power innovative products and services without complex engineering.

What Does Customer 360 Mean?

Collecting and analyzing data on your customers’ preferences and behavior is one of the best ways to improve your products and customer service. As your business grows, you need enterprise-class customer relationship management (CRM) software that can store and manage all of your customer data. Salesforce is a software as a service (SaaS) company that provides the Salesforce CRM software to more than 100,000 organizations worldwide.

PII Substitution May Be the Future of Data Privacy

Unfortunately, most of us have had our sensitive data or personal information compromised at one point or another. Whether the leaked data involves credit cards, a bank account number, a social security number, or an email address, nearly everyone has been a victim of a third-party data breach. In 2020, over 155 million people in the U.S. — nearly half the country's population — experienced unauthorized data exposure.

Modernizing Data Pipelines using Cloudera Data Platform - Part 1

Data pipelines are in high demand in today’s data-driven organizations. As critical elements in supplying trusted, curated, and usable data for end-to-end analytic and machine learning workflows, the role of data pipelines is becoming indispensable. To keep up, data pipelines are being vigorously reshaped with modern tools and techniques.

Apache Ozone Metadata Explained

Apache Ozone is a distributed object store built on top of Hadoop Distributed Data Store service. It can manage billions of small and large files that are difficult to handle by other distributed file systems. As an important part of achieving better scalability, Ozone separates the metadata management among different services: Ozone Manager (OM) service manages the metadata of the namespace such as volume, bucket and keys.

Fast Forward Live: Session-based Recommender Systems

Join us live with Fast Forward Labs to discuss the recently possible in Machine Learning and AI. Being able to recommend an item of interest to a user (based on their past preferences) is a highly relevant problem in practice. A key trend over the past few years has been session-based recommendation algorithms that provide recommendations solely based on a user’s interactions in an ongoing session, and which do not require the existence of user profiles or their entire historical preferences. This report explores a simple, yet powerful, NLP-based approach (word2vec) to recommend a next item to a user. While NLP-based approaches are generally employed for linguistic tasks, here we exploit them to learn the structure induced by a user’s behavior or an item’s nature.

Future of Data Meetup: The Power of "Yes" or: How I learned to Stop Worrying and Love Governance

Full data lifecycle projects hold tremendous potential for organizations to uncover new insights and drivers of revenue and profitability. Big Data has brought the promise of doing device data capture, data enrichment, data science, and analytics at scale to enterprises. This promise also comes with challenges for developers, admins, and consumers to continuously access new data and collaborate.

Data Transformation | Snowflake & Matillion | Rise of The Data Cloud

Data transformation, 2021 data trends, how Matillion is pushing the world of software forward, and how Matillion's partnership with Snowflake is advancing the industry, are some of the topics covered in this episode of Rise of the Data Cloud, featuring Matthew Scullion, Founder, and CEO of Matillion.

Realtime data replication into BigQuery with Datastream and Dataflow

How can you replicate data from a relational database in real time? In this video, we’ll show you you can combine Datastream with Dataflow templates to replicate data from a relational database. Watch to learn how you can use this streaming analytics service in unison with Datastream to easily replicate data from Oracle to BigQuery in real time!

7 Data Migration Best Practices and Tools

Data migration seems simple from a high-level point of view. After all, you’re simply moving data between two or more locations. In practice, however, migrating data can be one of your IT department’s trickiest data management initiatives. According to LogicWorks, 90 percent of CIOs in charge of data migrations moving from on-premises to the cloud have encountered problems during this process, with 75 percent missing planned deadlines.

PII Data Privacy: How to Stay Compliant

When people share their personal information with an organization, they’re performing an act of trust. They trust you to keep their data safe from hackers, and they trust you to use their data only for legitimate purposes. While many organizations honor this trust, others do not. As a result, governments worldwide are rushing to pass data protection legislation that puts the power back in the hands of people.

Why it's time to update your embedded analytics

There was a time when product teams could embed basic dashboards and data visualizations, and that was more than enough to satisfy the average user’s business intelligence (BI) and analytics needs. Today, however? Not so much. From AI-assisted analysis to automated alerts to contexualized insights, there are several new sophisticated analytical capabilities that have become core parts of modern embedded analytics solutions, and many users’ expectations have risen accordingly.