Systems | Development | Analytics | API | Testing

Analytics

How-to: Index Data from S3 via NiFi Using CDP Data Hubs

Data Discovery and Exploration (DDE) was recently released in tech preview in Cloudera Data Platform in public cloud. In this blog we will go through the process of indexing data from S3 into Solr in DDE with the help of NiFi in Data Flow. The scenario is the same as it was in the previous blog but the ingest pipeline differs. Spark as the ingest pipeline tool for Search (i.e.

You can now run projects in Keboola Connection for free

Over the past few months, we’ve been considering how to create a platform that’s accessible to everyone. With that said, we’re happy to announce that you can now use Keboola Connection for free! No contract, no talking to our (albeit incredibly lovely) sales team - just jump in and start building.

Redivis makes research data accessible, experiences collaborative with BigQuery

Understanding the data we collect is essential—it allows us to identify trends and uncover answers about our world. However, stories in our data frequently go untold. Large datasets are hard to share between research communities due to their size, security restraints, and complexity. Even if these datasets are accessible to users, the tools needed to query them often require deep technical knowledge.

Smile with new user-friendly SQL capabilities in BigQuery

October happens to be the month to celebrate World Smile Day when Harvey Ball, the inventor of the smiley face declared this day as such to give people a reason to smile. This month, BigQuery users have a lot of new reasons to smile about with the release of new user-friendly SQL capabilities now generally available.

Using Cloudera Machine Learning to Build a Predictive Maintenance Model for Jet Engines

Running a large commercial airline requires the complex management of critical components, including fuel futures contracts, aircraft maintenance and customer expectations. Airlines, in just the U.S. alone, average about 45,000 daily flights and transporting over 10 million passengers a year (source: FAA). Airlines typically operate on very thin margins, and any schedule delay immediately angers or frustrates customers.

Apache Spark on Kubernetes: How Apache YuniKorn (Incubating) helps

Apache Spark unifies batch processing, real-time processing, stream analytics, machine learning, and interactive query in one-platform. While Apache Spark provides a lot of capabilities to support diversified use cases, it comes with additional complexity and high maintenance costs for cluster administrators. Let’s look at some of the high-level requirements for the underlying resource orchestrator to empower Spark as a one-platform.

George Fraser and Tristan Handy Discuss the Fivetran-Fishtown Partnership

In a Slack discussion, the two CEOs explain why the Fivetran-dbt integration is great for data analytics engineering enthusiasts. After the recent launch of Fivetran dbt Transformations, both the Fivetran and Fishtown Analytics teams received questions about the newly available feature. (Fishtown is the team behind dbt.) Fivetran CEO George Fraser and Fishtown Analytics CEO Tristan Handy addressed those questions on Slack, and discussed the harmonious relationship between the two companies.