Systems | Development | Analytics | API | Testing

February 2021

Sample applications for Cloudera Operational Database

Cloudera Operational Database is an operational database-as-a-service that brings ease of use and flexibility to Apache HBase. Cloudera Operational Database enables developers to quickly build future-proof applications that are architected to handle data evolution. In the previous blog posts, we looked at application development concepts and how Cloudera Operational Database (COD) interacts with other CDP services.

New in BigQuery BI Engine: faster insights across popular BI tools

Business analysts working with larger and larger data sets are finding traditional BI methods can't keep up with their need for speed. BigQuery BI Engine is designed to meet this need by accelerating the most popular dashboards and reports that connect to BigQuery. With the freshest data available, your analysts can identify trends faster, reduce risk, match the pace of customer demand, even improve operational efficiency in an ever-changing business climate.

Change The Way You Do ML With Applied ML Prototypes

Today’s enterprise data science teams have one of the most challenging, yet most important roles to play in your business’s ML strategy. In our current landscape, businesses that have adopted a successful ML strategy are outperforming their competitors by over 9%. The implications of ML on the future of business are clear. However, only 4% of enterprise executives today report seeing success from their ML investment.

5 key features of any modern embedded analytics platform

Start-ups founded on analytics have been shaking up every industry. Finance has been disrupted by Monzo's data focus, Netflix’s analytics has upended film entertainment, and Swyfft has used data to change the game for US home insurance. Today's users have come to expect analytics in their applications.

How to use a machine learning model from a Google Sheet using BigQuery ML

Spreadsheets are everywhere! They are one of the most useful productivity tools available. They make organizing, calculating, and presenting data a breeze. Google Sheets is the spreadsheet application included in Google Workspace, which has over 2 billion users. Machine learning, or ML for short, has also become an essential business tool. Making predictions with data at low cost and high accuracy has transformed industries.

Five Trends for the Financial Services Industry to Track in 2021

With a new year ahead, it’s time for financial services to pause, take stock of the “new normal,” and plan a path forward. COVID-19 forced nearly every industry to adapt to a new reality, and the financial services industry was no exception. Consumer habits shifted drastically. Suddenly, many people started working from home. Employee and customer needs changed. Adaptability was a necessity.

Becoming the Most Loved Baby Products Brand Globally With Qlik

I’ve been a Business Intelligence (BI) analyst and evangelist for over two decades now. As you can imagine I’ve worked with many different BI platforms throughout my career, especially during my time as a BI Consultant. In this role, I was product agnostic, so from Power BI to Tableau, you name it, I used it! However, Qlik Sense quickly stood out to me as the most powerful and intuitive platform on the market.

You're Not the Only One With Data Problems

I’ve met with lots of customers and prospects throughout my career. And, I’ve noticed that, when I’ve asked them to describe their current software situation, many would say the same things. “We should have updated this a long time ago.” “It’s embarrassing how long it takes to do a simple task.” “I bet other companies stopped doing things like this years ago.”

DataOps for Industrial IoT

The growth in IoT data collection and processing underscores the need for comprehensive data management strategies. The average enterprise today has deployed – and collects data from – nearly 4,000 IoT endpoints. And these organizations expect a 65% increase in the number of connected IoT endpoints over the next two years. Hear from 451 Research (part of S&P Global Market Intelligence) and Hitachi Vantara to assess the business impact of edge computing and IIoT on data management.

Creating a Data Strategy & Self-Service Data Platform in FinTech

In this episode of CDO Battlescars, Sandeep Uttamchandani, Unravel Data’s CDO, speaks with Keyur Desai, CDO of TD Ameritrade. They discuss battlescars in two areas: Building a Data Strategy and Pervasive Self-Service Analytics Platforms. Keyur is a data executive with over 30 years of experience managing and monetizing data and analytics.

Creating a Data Strategy & Self-Service Data Platform in FinTech

In this episode of CDO Battlescars, Sandeep Uttamchandani, Unravel Data’s CDO, speaks with Keyur Desai, CDO of TD Ameritrade. They discuss battlescars in two areas: Building a Data Strategy and Pervasive Self-Service Analytics Platforms. Keyur is a data executive with over 30 years of experience managing and monetizing data and analytics.

Building loyalty with data and analytics

In 1969, my aunt graduated from university and joined IBM, the dominant player in the nascent tech industry at the time. She remained at “Big Blue” where she met and married my uncle, and rose up through the management ranks, until their joint semi-retirement exactly 30 years later. She recently told me, “the only way you could get fired in those days was to murder someone, embezzle or steal”.

The Multifaceted Value Proposition of the Cloudera Data Platform

The Cloudera Data Platform (CDP) represents a paradigm shift in modern data architecture by addressing all existing and future analytical needs. It builds on a foundation of technologies from CDH (Cloudera Data Hub) and HDP (Hortonworks Data Platform) technologies and delivers a holistic, integrated data platform from Edge to AI helping clients to accelerate complex data pipelines and democratize data assets.

Forging a truly data-driven organization

In a 2020 study performed by Nature Research, 70 different teams of neuroimaging experts were asked to test nine hypotheses by looking at the same MRI data set. You may not be surprised to learn that these teams reached a wide range of different conclusions, in part because no two teams chose identical workflows to analyze the data. With seventy teams, there were 70 different workflows.

Peloton and Qlik: The Apple Watch Conundrum

Part of being a data professional is pretty simple... you notice when things don't add up. In my case, my Apple Watch and my Peloton aren't on the same data page when it comes to calorie tracking. In this blog, I'm going to deduce why I think it's happening and use Qlik and the Peloton/Apple metrics as the data to support my conclusions.

Discover Your Datasets - The Self-Service Data Roadmap, Session 1 of 4

In this session, Unravel CDO and VP Engineering Sandeep Uttamchandani describes the start of any large, data-driven project: the Discover phase. You must identify the insights you want to generate from the project, you must discover; that is, you must identify the current data assets you have, and the new data assets you will need, to generate the insights you want to produce. Sandeep expertly guides you through this process, and shows you how to invest the right amount of time and effort to get the job done.

Express Cloudera POV on 2021 data trends in insurance

Almost a year into the pandemic, the accelerated digital transformation has begun to feel less abrupt and more sustained. 2021 looks likely to be defined by a new phase: Thriving on digital transformation, rather than just surviving through it. We’ve written about the changes forced on the traditionally risk-averse insurance industry by COVID-19.

Qlik - Gartner Magic Quadrant for BI and Analytics Leader Again!

The new 2021 Gartner Magic Quadrant for BI and Analytics report is out, and you can find it here! Gartner’s brand, alongside its breadth of research by its analysts, ensures that it’s a key reference document for clients in buying situations. No wonder, then, that every year the industry anxiously awaits where dots will fall on that famous 2x2 matrix. Therefore, I’m delighted to announce that Qlik is a Leader, again, for the 11th year in a row.

Cloudera DataFlow's key milestones and wins in 2020

Needless to say, 2020 was an unforgettable year in a lot of ways and we were all happy to say goodbye to it. The pandemic has ushered in new ways of how we conduct businesses, remote work cultures, telehealth, grocery/food deliveries, etc. While certain industries were hard-hit by this change, most of the businesses were able to adapt, pivot, and take on this adversity in their stride.

Lost in the Cloud? Why Mapping YOUR Transformation Journey Is More Important Than Ever

The past 10 months have accelerated the race to cloud. That’s all the more reason to pause and check that you’re moving in the right direction. Cloud migration these days is something of a no-brainer. For most businesses, it’s no longer a question of whether to migrate to cloud. The real issues are around the how, when, what, where and even the why of cloud.

New Snowflake Features Released in January 2021

Snowflake continued expanding its platform capabilities at the start of the new year, adding updates to data sharing, Snowsight, and data pipelines that help customers and partners access, mobilize, and share their data for better data-driven outcomes. Here’s a brief rundown of some of the exciting announcements from January 2021.

Using other CDP services with Cloudera Operational Database

In the previous blog post, we looked at some of the application development concepts for the Cloudera Operational Database (COD). In this blog post, we’ll see how you can use other CDP services with COD. COD is an operational database-as-a-service that brings ease of use and flexibility to Apache HBase. Cloudera Operational Database enables developers to quickly build future-proof applications that are architected to handle data evolution.

The Global Health Crisis Is Accelerating Transportation's Need for Digital Transformation

The transportation industry has reached an inflection point – one in which nearly all forms of travel have been met with unprecedented challenges. Transit and airport revenues have been decimated with the lack of passengers, while freight and shipping companies have been overwhelmed with demand from an explosion of e-commerce orders. Despite facing unprecedented challenges, the industry is facing an equally unprecedented opportunity to innovate.

Moving Big Data and Streaming Data Workloads to Google Cloud Platform

Cloud migration may be the biggest challenge, and the biggest opportunity, facing IT departments today - especially if you use big data and streaming data technologies, such as Cloudera, Hadoop, Spark, and Kafka. In this 55-minute webinar, Unravel Data product marketer Floyd Smith and Solutions Engineering Director Chris Santiago describe how to move workloads to Google Dataproc, BigQuery, and other destinations on GCP, fast and at the lowest possible cost.

Why Verizon Media picked BigQuery for scale, performance and cost

As the owner of Analytics, Monetization and Growth Platforms at Yahoo, one of the core brands of Verizon Media, I'm entrusted to make sure that any solution we select is fully tested across real-world scenarios. Today, we just completed a massive migration of Hadoop and enterprise data warehouse (EDW) workloads to Google Cloud’s BigQuery and Looker.

Seven Data Resources: Our Valentine for House-Bound #datalovers

According to a recent press release by the National Retail Federation, “nearly seventy-three percent of consumers celebrating Valentine’s Day this year feel it’s important to do so given the current state of the pandemic.” The release also states that “consumers still feel it’s important to spoil their loved ones in light of the pandemic.” We couldn’t agree more on the importance of celebrating the day.

How to accelerate digital transformation with Automated Business Monitoring

With automation becoming more user-friendly and streamlined than ever before, it's understandable organizations across sectors are examining how it can enhance their analytics capability and accelerate their business shift toward digital transformation.

Fine-Grained Authorization with Apache Kudu and Apache Ranger

When Kudu was first introduced as a part of CDH in 2017, it didn’t support any kind of authorization so only air-gapped and non-secure use cases were satisfied. Coarse-grained authorization was added along with authentication in CDH 5.11 (Kudu 1.3.0) which made it possible to restrict access only to Apache Impala where Apache Sentry policies could be applied, enabling a lot more use cases.

How to trigger Cloud Run actions on BigQuery events

Many BigQuery users ask for database triggers—a way to run some procedural code in response to events on a particular BigQuery table, model, or dataset. Maybe you want to run an ELT job whenever a new table partition is created, or maybe you want to retrain your ML model whenever new rows are inserted into the table. In the general category of “Cloud gets easier”, this article will show how to quite simply and cleanly tie together BigQuery and Cloud Run.

How Emirates And Allianz Benelux Are Transforming Customer Service With The Data Cloud

Snowflake met with Jan Doumen, Head of Expertise for Allianz Benelux, and Naveed Memon, Program Director, Data and Analytics for Emirates, at Data Cloud Summit 2020. Read excerpts from the conversation to learn how capturing data insights in the Data Cloud brings value to their businesses. Data’s value in the 21st century is often compared to oil’s value in the 18th century. It can transform organizations, opening doors to unprecedented opportunities.

Data Enrichment Using Cloudera Data Engineering

In this video, we'll walk through an example on how you can use Cloudera Data Engineering to pull in multiple datasets from a Hive data warehouse and go through the process of enriching the data through the use of Apache Spark. We'll then run this Spark job from within Cloudera Data Engineering so that we can follow the progress and see details about the job's execution.

Stephanie Stillman Talks About Data Sharing And The Data Marketplace | Behind the Data Cloud

Today on Behind The Data Cloud, Daniel Meyers interviews Snowflake Product Manager Stephanie Stillman and they talk about how she entered the data industry, data sharing, and the data marketplace. Behind the Data Cloud is a builder-focused video series.

Architecting a data lineage system for BigQuery

Democratization of data within an organization is essential to help users derive innovative insights for growth. In a big data environment, traceability of where the data in the data warehouse originated and how it flows through a business is critical. This traceability information is called data lineage. Being able to track, manage, and view data lineage helps you to simplify tracking data errors, forensics, and data dependency identification.

15 of the Best Data Analytics Tools of 2021

The importance of effective data analytics within an organization is widely accepted by business leaders at this point. With use cases for data analysis spanning every department—from IT management, financial planning, marketing analytics, and so on—the right data analytics tools can have a significant impact on a company’s profitability and growth.

Stitch vs. Talend vs. Xplenty: A Head-to-Head Comparison

Five differences between Stitch, Talend, and Xplenty: Organizations store data in many destinations, making that data difficult to analyze. Legacy systems, SaaS locations, in-house databases, apps, you name it — by storing data in all kinds of places, companies can complicate data analytics considerably. Storing data in a warehouse or a lake makes more sense.

Cloudera Operational Database application development concepts

Cloudera Operational Database is now available in three different form-factors in Cloudera Data Platform (CDP). If you are new to Cloudera Operational Database, see this blog post. And, check out the documentation here. In this blog post, we’ll look at both Apache HBase and Apache Phoenix concepts relevant to developing applications for Cloudera Operational Database.

Joining the Data Cloud

Join executives from Allianz Benelux and Emirates to hear why their organizations are joining the Data Cloud. The Data Cloud is transforming companies across financial services, transportation, and other industries. As leaders develop strategies to support the next 3–5 years of innovation, the Data Cloud is becoming a critical enabler for the success of their enterprises. Learn how these companies are seizing the opportunity with Snowflake, and see the broader impact Snowflake’s cloud data platform is having on their organizations.

Using COD and CML to build applications that predict stock data

No, not really. You probably won’t be rich unless you work really hard… As nice as it would be, you can’t really predict a stock price based on ML solely, but now I have your attention! Continuing from my previous blog post about how awesome and easy it is to develop web-based applications backed by Cloudera Operational Database (COD), I started a small project to integrate COD with another CDP cloud experience, Cloudera Machine Learning (CML).

Data - the Octane Accelerating Intelligent Connected Vehicles

The digital revolution is making a deep impact on the automotive industry, offering practically unlimited possibilities for more efficient, convenient, and safe driving and travel experiences in connected vehicles. This revolution is just beginning to accelerate – in fact, according to a recent Applied Market Research study, the global connected car market was valued at $63.03 billion in 2019, and is projected to reach $225.16 billion by 2027, registering a CAGR of 17.1% from 2020 to 2027.

Snowflake, the Swiss Army Knife of Data for inReality

inReality provides an analytics platform that leverages IoT sensor data (for example, visual technologies) to bring operational excellence and exceptional customer experiences to all types of venues. The company’s clients range from public schools to major telecommunication companies with the goal being to make their spaces more secure and efficient, to solve problems, and to create better experiences for their patrons.

Cloudera wins Risk Markets Technology Award for Data Management Product of the year

Financial services institutions need the ability to analyze and act on massive volumes of data from diverse sources in order to monitor, model, and manage risk across the enterprise. They need a comprehensive data and analytics platform to model risk exposures on-demand. Cloudera is that platform. I am pleased to announce that Cloudera was just named the Risk Data Repository and Data Management Product of the Year in the Risk Markets Technology Awards 2021.

Augmented analytics: 3 key advantages for software vendors

Artificial intelligence (AI), automation and machine learning (ML) are rapidly transforming the analytical experience for everyday business users in 2021. Whether it’s automated visualizations, continuous analysis, or reduced time-to-insight, there are many practical benefits of augmented analytics that are well documented and fully realized today.

5 Lessons We Learned Validating Security Controls at Snowflake

You may have read about Snowflake’s IPO last year. But you probably didn’t hear about all the work that the Snowflake security team did in preparation. Our corporate security program went through a security analytics review to ensure that it satisfied the new security policy requirements resulting from the IPO. Here are a few lessons that we learned when setting up automated security control validation on our Snowflake security data lake.

High-Performance, Cost-Effective Move to Azure

Cloud migration may be the biggest challenge, and the biggest opportunity, facing IT departments today - especially if you use big data and streaming data technologies, such as Cloudera, Hadoop, Spark, and Kafka. In this 55-minute webinar, Unravel Data product marketer Floyd Smith and Solutions Engineering Director Chris Santiago describe how to move workloads to Azure HDInsights, Databricks, and other destinations on Azure, fast and at the lowest possible cost

Introducing real-time data integration for BigQuery with Cloud Data Fusion

Businesses today have a growing demand for real-time data integration, analysis, and action. More often than not, the valuable data driving these actions—transactional and operational data—is stored either on-prem or in public clouds in traditional relational databases that aren’t suitable for continuous analytics.

Continuous model evaluation with BigQuery ML, Stored Procedures, and Cloud Scheduler

Continuous evaluation—the process of ensuring a production machine learning model is still performing well on new data—is an essential part in any ML workflow. Performing continuous evaluation can help you catch model drift, a phenomenon that occurs when the data used to train your model no longer reflects the current environment.

Data, The Unsung Hero of the Covid-19 Solution

COVID-19 vaccines from various manufacturers are being approved by more countries, but that doesn’t mean that they will be available at your local pharmacy or mass vaccination centers anytime soon. Creating, scaling-up and manufacturing the vaccine is just the first step, now the world needs to coordinate an incredible and complex supply chain system to deliver more vaccines to more places than ever before.

Six Trends Driving Adoption of Lumada DataOps Suite

Innovative organizations need DataOps and new technologies because old-school data integration is no longer sufficient. The traditional approach creates monolithic, set-in-concrete data pipelines that can’t convert data into insights quickly enough to keep pace with business. The following trends are driving the adoption of Hitachi’s Lumada DataOps Suite.

CDP Public Cloud: SSH Key Deployment

This video covers how to deploy SSH keys in CDP Public Cloud. It touches on how to generate a new SSH key pair and steps through the process of deploying it for a workload user through the Cloudera Management Console Web UI, as well as using the CDP command-line tool. It discusses the security implications of using the Cloudbreak user for login on data hub hosts, and explains why workload user credentials should be used instead in most cases. It also demonstrates using the deployed SSH keys for login to data hub hosts.

How to configure clients to connect to Apache Kafka Clusters securely - Part 4: TLS Client Authentication

In the previous posts in this series, we have discussed Kerberos, LDAP and PAM authentication for Kafka. In this post we will look into how to configure a Kafka cluster and client to use a TLS client authentication. The examples shown here will highlight the authentication-related properties in bold font to differentiate them from other required security properties, as in the example below. TLS is assumed to be enabled for the Apache Kafka cluster, as it should be for every secure cluster.

3 things we learned embedding Yellowfin software

One of the key pieces of work that we've done this past year is to actually build a completely bespoke application, so that we could properly look at the different ways that we could embed Yellowfin. This has helped us create a really unique customer experience within a third-party application. Like all great stories, our vision fundamentally changed on that journey, and we learned three valuable lessons as we built this application we want to share with you.

Bringing It All Together in 2021

As a result of overwhelming excitement (and pressure) from my fellow Qlikkies, I’m going to share with you the recent demo I did at our all-company annual kick-off which shows Active Intelligence in action. It was intended to be an “internal-only” demo because it mixes existing capabilities with near-term future ones, but, on reflection, I think you, too, will be just as excited.