Systems | Development | Analytics | API | Testing

December 2020

Alooma vs. MuleSoft vs. Xplenty: Features, Support and Pricing

The main differences between Alooma, MuleSoft, and Xplenty: Data-driven organizations pull data from multiple locations such as in-house databases, SaaS, and cloud-based apps, making it difficult to determine accurate business insights. Moving all this information into a single location makes data analytics easier. This is where Extract, Transform, and Load (ETL) comes in.

Delivering faster analytics projects with Stitch in the EU

It’s no secret that businesses are undergoing a dramatic digital acceleration during the pandemic. A recent Gartner CFO Survey characterizes this acceleration as “from the pace of a multi-year marathon to a 12-month sprint,” while a McKinsey survey estimates that organizations have shortened the digitization of their customer and supply-chain interactions by three to four years. Lagging behind in turning data into answers is no longer an option.

Qlik Analytics 2020 - Alerting, Augmented Analytics, Active Intelligence and More

2020 was quite a year of innovation for Qlik analytics. We delivered key new augmented analytics capabilities with big updates to Insight Advisor, we integrated intelligent alerts fully into Qlik Sense in less than a year, we continued to expand our visualization capabilities to make it easier to showcase your data in exciting and compelling ways, and we made it even easier to execute analytics in the cloud.

Qlik Analytics 2020 - Alerting, Augmented Analytics, Active Intelligence and More

2020 was quite a year of innovation for Qlik analytics. We delivered key new augmented analytics capabilities with big updates to Insight Advisor, we integrated intelligent alerts fully into Qlik Sense in less than a year, we continued to expand our visualization capabilities to make it easier to showcase your data in exciting and compelling ways, and we made it even easier to execute analytics in the cloud.

How ThoughtSpot Navigates the Cloud | Part 2 | Snowflake Inc.

COVID-19 has pushed ThoughSpot to emphasize it's capabilities in cloud technology. CMO Scott Holden talks about how ThoughtSpot's search engine can sit atop & use cloud technology and how he's shifted the company's marketing efforts and campaigns. Rise of the Data Cloud is brought to you by Snowflake.

Amazon Kinesis vs. Kafka: A Detailed Comparison of Data Stream Services

The key differences between Amazon Kinesis and Kafka are: Introducing data streamers! These services validate and route messages from one application to another, managing workload and message queues effectively. The result? Users process messages through a centralized processor and handle large data streams more efficiently. Amazon Kinesis and Apache Kafka are two data stream services.

What It Actually Means to be HIPAA Compliant

The Health Insurance Portability and Accountability Act, or HIPAA, is a federal regulation in the United States that protects healthcare data containing personal health information, or PHI. It also covers Electronic PHI, or E-PHI, which are digital records of this information. The ability to effectively using healthcare data is essential for improving patient outcomes, quality of care, resource allocation, revenues, and other operations.

Top 7 Trends in Marketing Analytics for 2021

Powered by advances in machine learning, marketing analytics delivers more bottom-line impact with each passing year. It enables organizations to improve the targeting of ads and other content, optimize their ad spend through advanced marketing attribution, increase customer lifetime value, reduce churn, and more. While technology is making granular targeting and measurement possible, marketers are also doubling down on measures to ensure consumer privacy and data governance in their initiatives.

The road to data quality: Getting to customer 360 faster with Machine Learning

Read Part 1 here > Data analytics is a complex process that demands time and effort from data scientists. From cleaning and prepping data to performing data analysis, data scientists go through an extensive procedure to uncover hidden patterns, identify trends, and find correlations in data to make informed business decisions. The task of integrating, cleaning, and organizing data assets often take up the bulk of the data scientist’s time.

How to Implement Xplenty In Your Data Stack

A Big Data stack has several layers that take your data from source to analytics tools. Extract, Transform, Load tools integrate data from sources into a data warehouse or lake. Business intelligence solutions use centralized data for their analytic needs. An ETL tool such as Xplenty offers a user-friendly experience for ingesting data from many sources, transforming it as needed, and sending it to the next layer. Here’s how you can implement this handy tool in your organization.

Jitterbit vs. MuleSoft vs. Xplenty: An ETL Tool Comparison

The major differences between Jitterbit, MuleSoft, and Xplenty: Extract, Transform, and Load (ETL) streamlines data integration by consolidating data from multiple sources, turning it into useful formats, and loading it into a centralized location. The world's most successful organizations use ETL to tame big data, produce visual data flows, and garner business-critical analytics. But with so many ETL tools on the market, which one should you choose?

An A-Z Data Adventure on Cloudera's Data Platform

In this blog we will take you through a persona-based data adventure, with short demos attached, to show you the A-Z data worker workflow expedited and made easier through self-service, seamless integration, and cloud-native technologies. You will learn all the parts of Cloudera’s Data Platform that together will accelerate your everyday Data Worker tasks.

How ASEAN Retailers Can Become insight driven with a Hybrid Cloud data strategy

There has been an e-commerce explosion this year as consumers seek safety and convenience from the comfort of their own homes using digital tools to purchase everything from durable hard goods to fashion accessories to daily living consumables like food perishables, cleaning products and even school supplies.

Yellowfin 9.4 Release Highlights

In the latest release, the Yellowfin team focused on several new enhancements that expand the ability to share Stories and valuable insights more widely, and deliver greater consistency and governance in the user experience when building dashboards and reports. For the full list of updates, read the release notes and check out the video below to see some of these new enhancements in action.

Investment in Knoema Puts Global Public Data Sets At Snowflake Users' Fingertips

The world is rich with publicly available data sets that can provide immense value to businesses, in areas as diverse as economics, health, agriculture and transportation. But the data is highly fragmented, stored in different formats and databases around the world, making it very hard for businesses to consume and monetize.

Unlocking Value by Going All-in On The Data Cloud

Snowflake met with Mark Stange-Tregear, Vice President of Analytics for Rakuten Rewards, and Andrew Parry, Vice President of IT Application Development for Office Depot, at Data Cloud Summit 2020. The Data Cloud is unlocking new ways of delivering products and services to customers, managing supply chains, and collaborating globally. Rakuten Rewards and Office Depot are going “all- in” on the Snowflake Data Cloud to transform their businesses. Q. How does data affect your business?

Three Takeaways from Amazon CTO's Keynote at AWS Reinvent 2020

One of the highly anticipated events every year is the keynote from Dr. Werner Vogels at the annual AWS Reinvent conference. As CTO of Amazon, Dr. Vogels has considerable influence on product and engineering innovation that directly impacts hundreds of millions of users and developers. Here are three takeaways from Dr. Vogels’ keynote this year.

2020: A Continuous Focus on Data and Analytics Innovation

Despite the challenges COVID-19 brought to the world this year, Qlik continued to innovate – integrating new capabilities directly to our core offerings in the areas of alerting, augmented analytics and conversational analytics. We also continued to enhance our solutions in the direction of making data less passive and more active, and made key acquisitions like Knarr Analytics and Blendr.io, continuing our journey of delivering a true end-to-end solution that enables Active Intelligence.

2020: A Continuous Focus on Data and Analytics Innovation

Despite the challenges COVID-19 brought to the world this year, Qlik continued to innovate – integrating new capabilities directly to our core offerings in the areas of alerting, augmented analytics and conversational analytics. We also continued to enhance our solutions in the direction of making data less passive and more active, and made key acquisitions like Knarr Analytics and Blendr.io, continuing our journey of delivering a true end-to-end solution that enables Active Intelligence.

Geospatial data processing with streaming SQL for Apache Kafka

An old airport customer of mine (whilst I worked for another company) used to pop someone next to a busy runway with a stopwatch strapped round their neck. The unfortunate person had to manually log the time aircrafts spent on the runway to measure the runway occupancy. All very archaic. Even in those days.

Xplenty Workspaces

Today we are delighted to introduce our new Workspaces feature that allows users to organize and group their packages together. It’s always been a bit challenging to organize packages within Xplenty especially if you have hundreds of packages, or if there are many users using the account. Finally, all those issues should be addressed by the new Xplenty Workspace feature.

"More than 60% of Our Pipelines Have SLAs," Say Unravel Customers at Untold

Unravel Data recently held its first-ever customer conference, Untold 2020. We promoted Untold 2020 as a convocation of #datalovers. And these #datalovers generated some valuable data – including the interesting fact that more than 60% of surveyed customers have SLAs for either “more than 50% of their pipelines” (42%) or “all of their pipelines” (21%). All of this ties together.

Data's Impact on Business & Technology | Part 2 | Snowflake Inc.

DataRobot’s quest to digitally transform companies has led them to revolutionize artificial intelligence and machine learning technology. Dan Wright, President and COO of DataRobot, discusses how his company uses Snowflake, and provides examples detailing DataRobot’s ability to track the spread of COVID-19 in vulnerable areas around the United States using artificial intelligence. Rise of the Data Cloud is brought to you by Snowflake.

Most popular public datasets to enrich your BigQuery analyses

From rice genomes to historical hurricane data, Google Cloud Public Datasets offer a world of exploration and insight. The more than 20 PB across 200+ datasets in our Public Dataset Program helps you explore big data and data analytics without a lot of cost, setup, or overhead. You can explore up to 1 TB per month at no cost, and you don’t even need a billing account to start using BigQuery sandbox.

Top 10 AI & Data Podcasts You Should Be Listening To

With the speed of change in artificial intelligence (AI) and big data, podcasts are an excellent way to stay up-to-date on recent developments, new innovations, and gain exposure to experts’ personal opinions, regardless if they can be proven scientifically. Great examples of the thought-provoking topics that are perfect for a podcast’s longer-form, conversational format include the road to AGI, AI ethics and safety, and the technology’s overall impact on society.

Data Science vs. Data Engineering: What You Need to Know

According to The Economist, “the world’s most valuable resource is no longer oil, but data.” Despite the value of enterprise data, much has been written about the so-called “data science shortage”: the supposed lack of professionals with knowledge of how to use and manipulate big data. A 2018 study by LinkedIn estimated that there were more than 151,000 unfilled jobs in the U.S. requiring data science skills.

How to Build Real-Time Feature Engineering with a Feature Store

Simplifying feature engineering for building real-time ML pipelines might just be the next holy grail of data science. It’s incredibly difficult and highly complex, but it’s also desperately needed for multiple use cases across dozens of industries. Currently, feature engineering is siloed between data scientists, who search for and create the features, and data engineers, who rewrite the code for a production environment.

Enabling The Full ML Lifecycle For Scaling AI Use Cases

When it comes to machine learning (ML) in the enterprise, there are many misconceptions about what it actually takes to effectively employ machine learning models and scale AI use cases. When many businesses start their journey into ML and AI, it’s common to place a lot of energy and focus on the coding and data science algorithms themselves.

Spark APM - What is Spark Application Performance Management

Apache Spark is a fast and general-purpose engine for large-scale data processing. It’s most widely used to replace MapReduce for fast processing of data stored in Hadoop. Designed specifically for data science, Spark has evolved to support more use cases, including real-time stream event processing. Spark is also widely used in AI and machine learning applications.

How Data Robot Automates Artificial Intelligence | Part 1 | Snowflake Inc.

Dan Wright, President and COO of DataRobot, talks about how DataRobot is revolutionizing artificial intelligence by automating an end-to-end experience to market for consumer use. He discusses how DataRobot's new platforms unlock powerful applications of deep learning technology that help leverage open source technology and provide security to customers. Rise of the Data Cloud is brought to you by Snowflake.

How Has COVID-19 Impacted Data Science?

The COVID-19 pandemic disrupted supply chains and brought economies around the world to a standstill. In turn, businesses need access to accurate, timely data more than ever before. As a result, the demand for data analytics is skyrocketing as businesses try to navigate an uncertain futured. However, the sudden surge in demand comes with its own set of challenges.

Cloudera Replication Plugin enables x-platform replication for Apache HBase

The Cloudera Data Platform (CDP) is the latest Big Data offering from Cloudera. It includes Apache HBase and Phoenix as part of the platform. These two components are provided in 3 form-factors: Cloudera’s Apache HBase customers typically run mission-critical applications that cannot afford any downtime. They need a way to migrate to a new deployment either without a production outage or, at a minimum, a tiny outage.

The role of data in COVID-19 vaccination record keeping

The role of data in COVID-19 vaccination record keeping Now that the Pfizer vaccine has been approved by the FDA for use in the US, and the Moderna vaccine likely isn’t far behind, we are now on the verge of being able to emerge from the social distancing world that began earlier in 2020. Recent news has talked about distributing a vaccination record card to everyone who gets a COVID-19 vaccine.

How businesses use automated monitoring

One of the big trends we’ve seen this year is organizations going direct to consumer. Manufacturers who sold through retail outlets are moving online, and as a result a huge amount of digital transformation is occurring. A customer of ours has done exactly that. Kyowa is a Japanese cosmetics and health food company and they’ve moved from retail to online and digital, and Yellowfin has been a significant part of that journey. In particular, they’ve used Signals.

405% 3-year ROI Procuring Snowflake Through AWS Marketplace: New Forrester TEI Study

Snowflake is delighted to share the findings of a new Forrester Consulting Total Economic Impact™ (TEI) study that examines the potential return on investment for organizations that procure Snowflake through Amazon Web Services (AWS) Marketplace and then use Snowflake as a core part of your application’s architecture. We commissioned the study in partnership with AWS.

Managing Snowflake's Compute Resources

This is the 3rd blog in our series on Snowflake Resource Optimization. In parts 1 and 2 of this blog series, we showed you how Snowflake’s unique architecture allows for a virtually unlimited number of compute resources to be accessed near-instantaneously. We also provided best practices for administering these compute resources to optimize performance and reduce credit consumption.

Service & data integration: how to manage a multi-provider environment

To be able to deliver the latest and greatest services to customers and clients today, telcos must employ different vendors, subcontractors, and technology partners to fulfill market needs. While this allows organizations to cover all the bases, it also means disparate data sources, different technologies and schemas, and distinct internal workflows and processes—all of which can result in a disjointed customer experience, all the way from sales to service.

How businesses use automated business monitoring

One of the big trends we’ve seen this year is organizations going direct to consumer. Manufacturers who sold through retail outlets are moving online, and as a result a huge amount of digital transformation is occurring. A customer of ours has done exactly that. Kyowa is a Japanese cosmetics and health food company and they’ve moved from retail to going online and digital and Yellowfin has been a significant part of that journey.

Why You Need DataOps in Your Organization

DataOps is the hot new trend in IT, following on from the rapid rise of DevOps over the last decade. The growth of AI, machine learning, and move to cloud all contribute to the growing importance of DataOps. Kunal Agarwal, Unravel Data CEO will take you through the rise of DataOps and show you how to implement a data culture in your organization.

Bringing transaction support to Cloudera Operational Database

We’re excited to share that after adding ANSI SQL, secondary indices, star schema, and view capabilities to Cloudera’s Operational Database, we will be introducing distributed transaction support in the coming months. The ACID model of database design is one of the most important concepts in databases. ACID stands for atomicity, consistency, isolation, and durability. For a very long time, strict adherence to these four properties was required for a commercially successful database.

How does Apache Spark 3.0 increase the performance of your SQL workloads

Across nearly every sector working with complex data, Spark has quickly become the de-facto distributed computing framework for teams across the data and analytics lifecycle. One of most awaited features of Spark 3.0 is the new Adaptive Query Execution framework (AQE), which fixes the issues that have plagued a lot of Spark SQL workloads. Those were documented in early 2018 in this blog from a mixed Intel and Baidu team.

Qlik Replicate - Real-data Ingestion and update and so much more

Qlik Replicate (formerly Attunity Replicate) empowers organizations to accelerate data replication, ingestion and streaming across a wide variety of heterogeneous databases, data warehouses, and big data platforms. Used by hundreds of enterprises worldwide, Qlik Replicate moves your data easily, securely and efficiently with minimal operational impact.

Exploding arrays in Kafka with lateral joins

In this article we are going to explore lateral joins. "What is a lateral join?" you may ask. It's a new kind of join that allows to extract and work with the single elements found inside an array, as if the array was a normal table. Lenses 4.1 comes with a lot of new features that make your life easier when working with arrays: we introduced 6 new functions to work with arrays, better support for array literals, and lateral joins.

Data-driven Software Engineering Management. Which Data??

Leading software organisations with data-driven insights? Sure! Bring it on! But how? And where to get the right data from? Learn how to unlock your software engineering teams treasure trove of data for better decisions making. It is easy to get behind the idea of data-driven decision making in the software engineering world.

Hadoop vs. SQL - Which is Better for Data Management?

The key differences between Hadoop vs. SQL: Organizations rely on big data to power their business, but many teams struggle with the complexities of data management. Thankfully, Hadoop and SQL handle large data sets more efficiently. These tools manage data in unique ways, which makes it difficult for us to compare them on a like-for-like basis. However, organizations looking to streamline their tech stacks might have reason to choose one over the other. In this article, we compared Hadoop vs.

Top 4 Reasons Why You Should Upgrade Your Stream Processing Workloads To CDP

If there’s one thing enterprises have learned in 2020, it’s how to navigate through uncertain times, and in 2021, organizations will likely have to continue navigating through a shifting landscape. One trend that we’ve seen this year, is that enterprises are leveraging streaming data as a way to traverse through unplanned disruptions, as a way to make the best business decisions for their stakeholders.

Snowflake Demo: Cross-Cloud Replication & Failover and Failback

Snowflake's cross-cloud replication & failover/failback support ensures high availability and quick recovery of data — no matter where or through which cloud provider your business operates. This demo video will walk you through how you would replicate a database across three clouds for business continuity purposes (from AWS US West in Oregon to Azure East US in Virginia to GCP Europe West in the Netherlands).

Chartio and Xplenty: Business Intelligence for Smart Companies

We're living in a data-driven age. In every sector, we've seen new companies emerge, executing lightning-fast strategies based on sophisticated analytics. These data mavericks have disrupted and sometimes even devoured their more traditional rivals. To stay afloat, you need a state-of-the-art data infrastructure. That means having the right platforms, the right data pipelines, and the right analytics engines. But when you have all that data, what do you actually do with it?

Covid Data: An anomalous blip, or the new normal?

COVID-19 has forced virtually every industry to embrace an acceleration in digital capabilities. While it can be argued that digital transformation was already underway; it’s hard to dispute that it has accelerated in recent months. A recent McKinsey survey, cited in CRN, shows that worldwide, 58 percent of customer interactions were digital as of July 2020.

3x better performance with CDP Data Warehouse compared to EMR in TPC-DS benchmark

In a previous blog post on CDW performance, we compared Azure HDInsight to CDW. In this blog post, we compare Cloudera Data Warehouse (CDW) on Cloudera Data Platform (CDP) using Apache Hive-LLAP to EMR 6.0 (also powered by Apache Hive-LLAP) on Amazon using the TPC-DS 2.9 benchmark. Amazon recently announced their latest EMR version 6.1.0 with support for ACID transactions. This benchmark is run on EMR version 6.0 as we couldn’t get queries to run successfully on version 6.1.0.

Data Lake Export Public Preview Is Now Available on Snowflake

Public preview of the data lake export feature is now available. Snowflake announced a private preview of data lake export at the Snowflake virtual summit in June 2020. Data lake export is one of the key features of the data lake workload in the Snowflake Data Cloud. The feature makes Snowflake data accessible to the external data lake, and it enables customers to take advantage of Snowflake’s reliable and performant processing capabilities.

10X Engineering Leadership Series: 21 Playbooks to Lead in the Online Era

Managing online teams has become the new normal! In an online world, how do you give effective feedback, have a difficult conversation, increase team accountability, communicate to stakeholders effectively, and so on? At Unravel, we are a fast-growing AI startup with a globally distributed engineering team across the US, EMEA, and India. Even before the pandemic this year, the global nature of our team has prepared us for effectively leading outcomes across online engineering teams.

Structured vs Unstructured Data: A Short Guide

Data is the oil that fuels the growth of modern enterprises. But unless you have the tools to unlock the potential of data, you might be left stuck on the tracks as your competitors speed ahead. With the rise of Big Data, the nature of the data that we work with has changed drastically. Data scientists like to refer to the ‘3 Vs’ of Big Data: The 3 Vs of Big Data reshaped the data landscape as we knew it.

Modernizing Data in a Cloud-Enabled World | Part 2 | Snowflake Inc.

Deloitte's partnership with Snowflake showcases how Snowflake's new Cloud Platform modernization of data helps companies migrate information & innovate within the Cloud. Frank Farrall, AI Ecosystems & Snowflake Alliance Leader at Deloitte, details how his organization uses Snowflake's cloud technology & artificial intelligence to problem solve and innovate quickly. Rise of the Data Cloud is brought to you by Snowflake.

How to configure clients to connect to Apache Kafka Clusters securely - Part 2: LDAP

In the previous post, we talked about Kerberos authentication and explained how to configure a Kafka client to authenticate using Kerberos credentials. In this post we will look into how to configure a Kafka client to authenticate using LDAP, instead of Kerberos. We will not cover the server-side configuration in this article but will add some references to it when required to make the examples clearer.

Cost Conscious Data Warehousing with Cloudera Data Platform

Have you been burned by the unexpected costs of a cloud data warehouse? If so, you know about the failed economics of some cloud-native solutions on the market today. If not, before adopting a cloud data warehouse, consider the true costs of a cloud-native data warehouse. Data warehouses have been broadly adopted to provide timely reports and valuable insights. However, traditional deployments are notoriously cumbersome and cost-prohibitive at large scales.

Extending Snowflake's External Functions with Serverless-Adding Driving Times from Mapbox to SQL

Data engineers love to use SQL to solve all kinds of data problems. For this and more, Snowflake is a perfect partner. Snowflake’s support for standard SQL and several SQL variations, combined with JavaScript stored procedures, has helped me solve complex data challenges. But sometimes you might have the need for custom code.

How Deloitte's Adopting AI Technology | Part 1 | Snowflake Inc.

Frank Farrall, AI Ecosystems & Snowflake Alliance Leader at Deloitte, describes how he's leading Deloitte's digital transformation by creating an AI ecosystem that helps clients navigate competitive marketplaces using data & technology. Rise of the Data Cloud is brought to you by Snowflake.

Kafka Is Not a Database

It's important to understand the uses and abuses of streaming infrastructure. Apache Kafka is a message broker that has rapidly grown in popularity in the last few years. Message brokers have been around for a long time; they're a type of datastore specialized for "buffering" messages between producer and consumer systems. Kafka has become popular because it's open-source and capable of scaling to very large numbers of messages.

CaliberMind Onboards Customer Data With Fivetran

With automated data integration, CaliberMind uncovers data insights for customers. As a Customer Data Platform (CDP), CaliberMind delivers data-driven insights to its customers. To do so, it must connect to its customers’ data sources, extract, process and transform the data, run it through specially designed analytic models, and, finally, present data back to the customer as insights. CaliberMind uses Fivetran to offload the task of ingesting data from its customers’ applications.

Solutions Analyst: The Career for Innovative All-Rounders

Every business wants to stay agile. They invest in analytics to learn about their customers and their internal state, and they use these insights to make bold and innovative decisions. But then they run into a common problem: how to put those decisions into action. This is where a solutions analyst comes in. These multi-talented creative thinkers will look at the current state of play and identify the smartest path forward.

Federated Learning, Machine Learning, Decentralized Data

Two years ago we wrote a research report about Federated Learning. We’re pleased to make the report available to everyone, for free. You can read it online here: Federated Learning. Federated Learning is a paradigm in which machine learning models are trained on decentralized data. Instead of collecting data on a single server or data lake, it remains in place—on smartphones, industrial sensing equipment, and other edge devices—and models are trained on-device.

How Cloudera Supports Government Data Encryption Standards

As part of our ongoing commitment to supporting Government regulations and standards in our enterprise solutions, including data protection, Cloudera recently introduced a version of our Cloudera Data Platform, Private Cloud Base product (7.1.5 release) that can be configured to use FIPS compliant cryptography.

Better Listening Through Customer Experience Insights

Snowflake connected with Margaret Sherman of Sonos at Data Cloud Summit 2020 to hear how the company is using the Data Cloud to understand customer preferences and enhance listening experiences. In a world where people are surrounded by a lot of noise, purity of sound in music and other content we seek out in the comfort of our homes can offer a welcome respite. There are lessons to be learned from a company reinventing home audio for today and tomorrow—and using the Data Cloud to do it.

Democratizing Machine Learning Capabilities With Qlik Sense and Amazon SageMaker

The ability to discover insights from past events, transactions and interactions is how many customers currently utilize Qlik. Qlik’s unique approach to Business Intelligence (BI) using an in-memory engine and intuitive interface has democratized BI for typical business users, who usually have little to no technical savvy. But, for many years, organizations have only been able to analyze metrics or KPIs of “what has happened” (i.e., descriptive analytics).

Democratizing Machine Learning Capabilities With Qlik Sense and Amazon SageMaker

The ability to discover insights from past events, transactions and interactions is how many customers currently utilize Qlik. Qlik’s unique approach to Business Intelligence (BI) using an in-memory engine and intuitive interface has democratized BI for typical business users, who usually have little to no technical savvy. But, for many years, organizations have only been able to analyze metrics or KPIs of “what has happened” (i.e., descriptive analytics).

Outlier Detection: The Different Types of Outliers

Time series anomaly detection is a tool that detects unusual behavior, whether it's hurtful or advantageous for the business. In either case, quick outlier detection and outlier analysis can enable you to adjust your course quickly, before you lose customers, revenue, or an opportunity. The first step is knowing what types of outliers you’re up against. Chief Data Scientist Ira Cohen, co-founder of Autonomous Business Monitoring platform Anodot, covers the three main categories of outliers and how you'll see them arise in a business context.

Cloud Data Management Guide: Solutions & Best Practices

Your data can quickly get out of control when you’re working with multiple cloud storage services and applications throughout your organization. Complex cloud ecosystems can make it difficult to know what data you have, how it’s being managed, whether it’s safe, and how to use it effectively. Cloud data management platforms can stop this frustrating scenario in its tracks.

Get to Know Your Retail Customer: Accelerating Customer Insight and Relevance

There are lessons to be learned from the brick and mortar or pure-play digital retailers that have been successful in the Covid-19 chaos. As the pandemic’s stress test of e-commerce, in-store insights, supply chain visibility, and fulfillment capabilities have revealed shortcomings, and long-lasting consumer experiences— it has also allowed many companies to pivot to very successful strategies built on enterprise data and the digitization efforts that accompany it.

Global View Distributed File System with Mount Points

Apache Hadoop Distributed File System (HDFS) is the most popular file system in the big data world. The Apache Hadoop File System interface has provided integration to many other popular storage systems like Apache Ozone, S3, Azure Data Lake Storage etc. Some HDFS users want to extend the HDFS Namenode capacity by configuring Federation of Namenodes. Other users prefer other alternative file systems like Apache Ozone or S3 due to their scaling benefit.

The practical benefits of augmented analytics

Augmented analytics uses emerging technologies like automation, artificial intelligence (AI), machine learning (ML) and natural language generation (NLG) to automate data manipulation, monitoring and analysis tasks and enhance data literacy. In our previous blog, we covered what augmented analytics actually is and what it really means for modern business intelligence.

What Is a Data Pipeline?

A data pipeline is a series of actions that combine data from multiple sources for analysis or visualization. In today’s business landscape, making smarter decisions faster is a critical competitive advantage. Companies desire their employees to make data-driven decisions, but harnessing timely insights from your company’s data can seem like a headache-inducing challenge.

What are ETL tools?

Thinking of building out an ETL process or refining your current one? Read more to learn about how ETL tools give you time to focus on building data models. ETL stands for extract-transform-load, and is commonly used when referring to the process of data integration. Extract refers to pulling data from a particular data source. Transforms are used to make that data into a processable format. Load is the final step to drop the data into the designated target.

Accelerate Application Development with the Operational Database Demo Highlight

Cloudera Operational Database is a fast, flexible, dbPaaS database that enables faster application development. It simplifies application planning as it grows in scale and importance, and is a great fit for many application types including mobile, web, gaming, ad-tech, IoT, and ML model serving.

Achieve Pin-Point Historical Analysis of Your Salesforce Data

Want to look at how data has changed over time? Simply enable history mode, a Fivetran feature that data analysts can turn on for specific tables to analyze historical data. The feature achieves Type 2 Slowly Changing Dimensions (Type 2 SCD), meaning a new timestamped row is added for every change made to a column. We launched history mode for Salesforce in May and have been delighted with the response.

Fivetran vs. MuleSoft vs. Xplenty : An ETL Comparison

The key differences between Fivetran, MuleSoft, and Xplenty: Hiring a data scientist or engineer can cost up to $140,000 per year —something many businesses can't afford. Still, organizations need to pull data from different locations into a data lake or warehouse for business insights. An Extract, Transform, and Load (ETL) platform makes this process easier, but few organizations have the technical or coding know-how to make it happen.

Moving Big Data and Streaming Data Workloads to AWS

Cloud migration may be the biggest challenge, and the biggest opportunity, facing IT departments today - especially if you use big data and streaming data technologies, such as Cloudera, Hadoop, Spark, and Kafka. In this 55-minute webinar, Unravel Data product marketer Floyd Smith and Solutions Engineering Director Chris Santiago describe how to move workloads to AWS EMR, Databricks, and other destinations on AWS, fast and at the lowest possible cost.

Hive vs. SQL: Which One Performs Data Analysis Better?

Key differences between Hive and SQL: Big data requires powerful tools. Successful organizations query, manage and analyze thousands of data sets from hundreds of data sources. This is where tools like Hive and SQL come in. Although very different, both query and program big data. But which tool is right for your organization? In this review, we compare Hive vs. SQL on features, prices, support, user scores, and more.

How to configure clients to connect to Apache Kafka Clusters securely - Part 1: Kerberos

This is the first installment in a short series of blog posts about security in Apache Kafka. In this article we will explain how to configure clients to authenticate with clusters using different authentication mechanisms.

How leading organizations govern their data to find success

With the increased focus on delivering value customers, it is imperative to build a next generation customer hub that delivers high quality and governed data. In this video we will share best practices for implementing a comprehensive data governance approach. Learn how to leverage the capabilities of the Talend Data Fabric to deploy a forward-looking data management architecture that detects and retrieves metadata from across databases and applications, builds data lineage, and adds traceability.

Solution Architect: Become the Ultimate Problem Solver

There's an old XKCD cartoon that describes a conversation between a manager and a software developer. This kind of conversation happens all the time. Business leaders know their strategic goals. IT people know what the tech can do. But aligning goals with technology is an ongoing challenge.This is where solution architects come in. They act as a bridge between the business and technical side, and they figure out how to get things done.

Cloudera Operational Database Infrastructure Planning Considerations

In this blog post, let us take a look at how you can plan your infrastructure planning that you may have to do when deploying an operational database cluster on a CDP Private Cloud Base deployment. Note that you may have to do some planning assumptions when designing your initial infrastructure, and it must be flexible enough to scale up or down based on your future needs.

Beware of Creating a New Legacy of Artificial Intelligence Silos

Although the issue of silos in IT and data management are well known, companies appear to be falling back into this trap by not distributing their artificial intelligence (AI) and machine learning (ML) capabilities across their business. New research from Qlik and IDC revealed that just 20 percent of businesses widely distribute these capabilities across the organization.

Beware of Creating a New Legacy of Artificial Intelligence Silos

Although the issue of silos in IT and data management are well known, companies appear to be falling back into this trap by not distributing their artificial intelligence (AI) and machine learning (ML) capabilities across their business. New research from Qlik and IDC revealed that just 20 percent of businesses widely distribute these capabilities across the organization.

5 ways Machine Learning can improve the data cataloging process

Data is an essential asset for any business, with comprehensive efforts made to generate, source, and prepare it for analytical use. But just as important as collection and cleaning is ensuring its accessibility for users across the organization. This highlights the need for an organized data inventory—a directory that makes it possible to easily sort, search, and find the data assets required. In other words, you need a data catalog, a core component of master and meta data management.