Global View Distributed File System with Mount Points

Apache Hadoop Distributed File System (HDFS) is the most popular file system in the big data world. The Apache Hadoop File System interface has provided integration to many other popular storage systems like Apache Ozone, S3, Azure Data Lake Storage etc. Some HDFS users want to extend the HDFS Namenode capacity by configuring Federation of Namenodes. Other users prefer other alternative file systems like Apache Ozone or S3 due to their scaling benefit.

The practical benefits of augmented analytics

Augmented analytics uses emerging technologies like automation, artificial intelligence (AI), machine learning (ML) and natural language generation (NLG) to automate data manipulation, monitoring and analysis tasks and enhance data literacy. In our previous blog, we covered what augmented analytics actually is and what it really means for modern business intelligence.

Accelerate Application Development with the Operational Database Demo Highlight

Cloudera Operational Database is a fast, flexible, dbPaaS database that enables faster application development. It simplifies application planning as it grows in scale and importance, and is a great fit for many application types including mobile, web, gaming, ad-tech, IoT, and ML model serving.

What Is a Data Pipeline?

A data pipeline is a series of actions that combine data from multiple sources for analysis or visualization. In today’s business landscape, making smarter decisions faster is a critical competitive advantage. Companies desire their employees to make data-driven decisions, but harnessing timely insights from your company’s data can seem like a headache-inducing challenge.

What are ETL tools?

Thinking of building out an ETL process or refining your current one? Read more to learn about how ETL tools give you time to focus on building data models. ETL stands for extract-transform-load, and is commonly used when referring to the process of data integration. Extract refers to pulling data from a particular data source. Transforms are used to make that data into a processable format. Load is the final step to drop the data into the designated target.

Achieve Pin-Point Historical Analysis of Your Salesforce Data

Want to look at how data has changed over time? Simply enable history mode, a Fivetran feature that data analysts can turn on for specific tables to analyze historical data. The feature achieves Type 2 Slowly Changing Dimensions (Type 2 SCD), meaning a new timestamped row is added for every change made to a column. We launched history mode for Salesforce in May and have been delighted with the response.

Moving Big Data and Streaming Data Workloads to AWS

Cloud migration may be the biggest challenge, and the biggest opportunity, facing IT departments today - especially if you use big data and streaming data technologies, such as Cloudera, Hadoop, Spark, and Kafka. In this 55-minute webinar, Unravel Data product marketer Floyd Smith and Solutions Engineering Director Chris Santiago describe how to move workloads to AWS EMR, Databricks, and other destinations on AWS, fast and at the lowest possible cost.

Fivetran vs. MuleSoft vs. Xplenty : An ETL Comparison

The key differences between Fivetran, MuleSoft, and Xplenty: Hiring a data scientist or engineer can cost up to $140,000 per year —something many businesses can't afford. Still, organizations need to pull data from different locations into a data lake or warehouse for business insights. An Extract, Transform, and Load (ETL) platform makes this process easier, but few organizations have the technical or coding know-how to make it happen.