Systems | Development | Analytics | API | Testing

Building a Machine Learning Application With Cloudera Data Science Workbench And Operational Database, Part 1: The Set-Up & Basics

Introduction Python is used extensively among Data Engineers and Data Scientists to solve all sorts of problems from ETL/ELT pipelines to building machine learning models. Apache HBase is an effective data storage system for many workflows but accessing this data specifically through Python can be a struggle. For data professionals that want to make use of data stored in HBase the recent upstream project “hbase-connectors” can be used with PySpark for basic operations.

A Three-Step Plan to Innovate Hadoop for the Cloud

How large is your Hadoop data lake? 500 terabytes? A petabyte? Even more? And it is certainly growing, bit by bit, day after day. What began as inexpensive big data infrastructure now demands ever more expenditures on storage and servers while becoming increasingly unwieldy and expensive to manage. Such rapacity makes it ever harder to realize a proper return on investment from that Hadoop infrastructure.

2021 Trends - How You Ranked Them

On December 8th, it was time for the annual “State of the Union” from Qlik, with regards to BI & Data Trends. Overwhelmingly, attendance was in the many thousands, and we received thousands of questions. To get that type of engagement in a year where people have done nothing but virtual conferences is amazing. One person put it to me like this: “I just joined in on your webinar on the top data and analytics trends and it was truly fantastic.

How to build the dream analytics team

The problem with modern analytics is that it overpromises and underdelivers. We can even quantify the disappointment: This begs the question: why even bother with analytics? Well, when analytics is done right, it pays back $13.01 for every dollar spent. In fact, data-driven companies outperform their competitors in almost every conceivable way. We have broached the topic of extracting value from data before, from how to set up the right data strategy to building a data-driven culture.

Cash Back on Your Data Stack with Rakuten Rewards | Snowflake Inc.

In this episode of Rise of the Data Cloud, Mark Stange-Tregear, Vice President of Analytics at Rakuten Rewards, talks about how to successfully communicate with both merchants and consumers, the nuances of analyzing consumer data, the future of cloud data analytics, & much more. Rise of the Data Cloud is brought to you by Snowflake.

Maximizing Supply Chain Agility through the "Last Mile" Commitment

In my last two blogs (Get to Know Your Retail Customer: Accelerating Customer Insight and Relevance, and Improving your Customer-Centric Merchandising with Location-based in-Store Merchandising) we looked at the benefits to retail in building personalized interactions by accessing both structured and unstructured data from website clicks, email and SMS opens, in-store point sale systems and past purchased behaviors.

10 Predictions about Data Cloud Analytics in 2021

2021 is the year of the Data Cloud. Powered by the Snowflake platform, the Data Cloud will be the place where organizations across industries can converge to mobilize their data. Snowflake estimates that there are still hundreds of millions of data sets isolated in cloud data storage and on-premises data centers globally. The Data Cloud eliminates these silos, allowing you to seamlessly unify, analyze, and share your data to reach deeper insights and even open new revenue streams.

The Train Has Left the Station for the Last Time

We have three big announcements to our community today, and I wanted to talk to you about them: One, Allegro Trains is changing its name, two, we’re adding a completely new way to use Trains, and three, we’re announcing a bunch of features that make Trains an even better product for you! Read all about it on our blog at Clear.ml, our new website for our open source suite of tools.