Analytics

Moving Big Data and Streaming Data Workloads to AWS

Cloud migration may be the biggest challenge, and the biggest opportunity, facing IT departments today - especially if you use big data and streaming data technologies, such as Cloudera, Hadoop, Spark, and Kafka. In this 55-minute webinar, Unravel Data product marketer Floyd Smith and Solutions Engineering Director Chris Santiago describe how to move workloads to AWS EMR, Databricks, and other destinations on AWS, fast and at the lowest possible cost.

Hive vs. SQL: Which One Performs Data Analysis Better?

Key differences between Hive and SQL: Big data requires powerful tools. Successful organizations query, manage and analyze thousands of data sets from hundreds of data sources. This is where tools like Hive and SQL come in. Although very different, both query and program big data. But which tool is right for your organization? In this review, we compare Hive vs. SQL on features, prices, support, user scores, and more.

How leading organizations govern their data to find success

With the increased focus on delivering value customers, it is imperative to build a next generation customer hub that delivers high quality and governed data. In this video we will share best practices for implementing a comprehensive data governance approach. Learn how to leverage the capabilities of the Talend Data Fabric to deploy a forward-looking data management architecture that detects and retrieves metadata from across databases and applications, builds data lineage, and adds traceability.

How to configure clients to connect to Apache Kafka Clusters securely - Part 1: Kerberos

This is the first installment in a short series of blog posts about security in Apache Kafka. In this article we will explain how to configure clients to authenticate with clusters using different authentication mechanisms.

Beware of Creating a New Legacy of Artificial Intelligence Silos

Although the issue of silos in IT and data management are well known, companies appear to be falling back into this trap by not distributing their artificial intelligence (AI) and machine learning (ML) capabilities across their business. New research from Qlik and IDC revealed that just 20 percent of businesses widely distribute these capabilities across the organization.

5 ways Machine Learning can improve the data cataloging process

Data is an essential asset for any business, with comprehensive efforts made to generate, source, and prepare it for analytical use. But just as important as collection and cleaning is ensuring its accessibility for users across the organization. This highlights the need for an organized data inventory—a directory that makes it possible to easily sort, search, and find the data assets required. In other words, you need a data catalog, a core component of master and meta data management.