Systems | Development | Analytics | API | Testing

Analytics

How to run queries periodically in Apache Hive

In the lifecycle of a data warehouse in production, there are a variety of tasks that need to be executed on a recurring basis. To name a few concrete examples, scheduled tasks can be related to data ingestion (inserting data from a stream into a transactional table every 10 minutes), query performance (refreshing a materialized view used for BI reporting every hour), or warehouse maintenance (executing replication from one cluster to another on a daily basis).

Ask questions to BigQuery and get instant answers through Data QnA

Today, we’re announcing Data QnA, a natural language interface for analytics on BigQuery data, now in private alpha. Data QnA helps enable your business users to get answers to their analytical queries through natural language questions, without burdening business intelligence (BI) teams. This means that a business user like a sales manager can simply ask a question on their company’s dataset, and get results back that same way.

New Connector: YouTube Analytics

The value of YouTube has grown significantly for companies looking to bolster their brands with video content. The YouTube API is report-based, and its prebuilt reports fall into one of two categories: channel reporting and content owner reporting. Channel reports refer to the videos on a specific YouTube channel, while content owner reports contain data on all the channels owned by a particular individual.

Introducing FlinkSQL in Cloudera Streaming Analytics

Our 1.2.0.0 release of Cloudera Streaming Analytics Powered by Apache Flink brings a wide range of new functionality, including support for lineage and metadata tracking via Apache Atlas, support for connecting to Apache Kudu and the first iteration of the much-awaited FlinkSQL API. Flink’s SQL interface democratizes stream processing, as it caters to a much larger community than the currently widely used Java and Scala APIs focusing on the Data Engineering crowd.

Massive growth in data today: 3 must-have skills for Data Science

In recent years, there’s been an increasing demand for data scientists left and right, across industries and across departments. In the same vein, companies are getting more and more data than they know what to do with. In fact, according to IBM, 90% of the data in the world today has been created in the last two years alone. To put this influx to good use, organizations are turning to data scientists.

A Message To You Kafka - The Advantages of Real-time Data Streaming

In these uncertain times of the COVID-19 crisis, one thing is certain – data is key to decision making, now more than ever. And, the need for speed in getting access to data as it changes has only accelerated. It’s no wonder, then, that organisations are looking to technologies that help solve the problem of streaming data continuously, so they can run their businesses in real-time.

Managing ML Projects - Allegro Trains vs GitHub

The resurrection of AI due to the drastic increase in computing power has allowed its loyal enthusiasts, casual spectators, and experts alike to experiment with ideas that were pure fantasies a mere two decades ago. The biggest benefactor of this explosion in computing power and ungodly amounts of datasets (thank you, internet!) is none other than deep learning, the sub-field of machine learning(ML) tasked with extracting underlining features, patterns, and identifying cat images.