Systems | Development | Analytics | API | Testing

September 2020

Elevating Data Science Practices for the Media, Entertainment & Advertising Industries

As more and more companies are embedding AI projects into their systems, attracted by the promise of efficiencies and competitive advantages, data science teams are feeling the growing pains of a relatively immature practice without widespread established and repeatable norms.

Cloudera Data Warehouse outperforms Azure HDInsight in TPC-DS benchmark

Performance is one of the key, if not the most important deciding criterion, in choosing a Cloud Data Warehouse service. In today’s fast changing world, enterprises have to make data driven decisions quickly and for that they rely heavily on their data warehouse service. In this blog post, we compare Cloudera Data Warehouse (CDW) on Cloudera Data Platform (CDP) using Apache Hive-LLAP to Microsoft HDInsight (also powered by Apache Hive-LLAP) on Azure using the TPC-DS 2.9 benchmark.

From office to home: The new telco landscape in data, analytics & more

The telecommunications space plays a critical role in facilitating modern communication, especially during these uncertain times. With all the social distancing measures and quarantine restrictions, it has become essential to keep people and businesses connected to each other—be it with their families, colleagues, or customers. This surge in demand for connectivity has transformed the telco landscape in numerous ways.

Why Moving to Talend in the Cloud Is the Right Choice

What if you used your data to become more productive and profitable? What if you reduced your data TCO by 20%? What if you could see your data's quality in real-time — and fix it just as fast? In this video, Talend Cloud Expert Thomas Steinborn and David Petrella explore all the possibilities that comes with better data integration and integrity in the cloud and drive the majority of new Talend customers to choose Talend in the cloud.

Creating a Data-Driven Strategy | Rise of the Data Cloud

Jon Hyman, Co-Founder and CTO of Braze, talks about migrating data to the cloud, why data-driven companies outperform competitors, improving email marketing with data, and much more. Rise of the Data Cloud is brought to you by Snowflake. To see how you can get secure and easy access to any data with near-infinite scalability, visit Snowflake.com/academy.

Upgrade Journey: The Path from CDH to CDP Private Cloud

Cloudera delivers an enterprise data cloud that enables companies to build end-to-end data pipelines for hybrid cloud, spanning edge devices to public or private cloud, with integrated security and governance underpinning it to protect customers data. Cloudera has found that customers have spent many years investing in their big data assets and want to continue to build on that investment by moving towards a more modern architecture that helps leverage the multiple form factors.

Salesforce Data Migrations & Attribute-Driven Design

A talk by David Masri Founder & CEO, Gluon Digital David Masri founded Gluon Digital in 2020 with the goal of promoting data migration and integration best practices to the Salesforce Ohana. Prior to founding Gluon Digital, Dave has spent years working with data and with Salesforce. He has been involved in dozens of Salesforce data migration and integration projects and has used that experience to run numerous training programs for aspiring integration/migration specialists, and then ultimately authored his book on the subject.

Choosing the right Data Warehouse SQL Engine: Apache Hive LLAP vs Apache Impala

Some of the most powerful results come from combining complementary superpowers, and the “dynamic duo” of Apache Hive LLAP and Apache Impala, both included in Cloudera Data Warehouse, is further evidence of this. Both Impala and Hive can operate at an unprecedented and massive scale, with many petabytes of data. Both are 100% Open source, so you can avoid vendor lock-in while you use your favorite BI tools, and benefit from community-driven innovation.

ETL Testing: What, Why, and How to Get Started

Companies use their data to accelerate business growth and overtake their competitors. To achieve this, they invest a lot in their ETL (extract-transform-load) operations, which take raw data and transform it into actionable information. It’s no wonder, then, that ETL testing is a crucial part of a well-functioning ETL process, since the ETL process generates mission-critical data.

Talend Data Inventory - Turning Data Quality into a Team Sport

Talend’s Data Inventory application enables your organization to easily collaborate across multiple business and technology functions and strengthen data integrity by centrally organizing datasets, consistently applying standardization rules and proactively correcting data errors. In this video, you will understand how Data Inventory, combined with Talend Pipeline Designer and Talend Data Preparation, extends your collaboration capabilities and enables self-service across your organization.

Superhero Salesforce Records & Predictions with Einstein

Create a "Superhero" on a google form which then generates a Salesforce record with a Total Power Prediction. We can look at how seemingly unrelated variables (such as eye color and hair color) have an impact on the Total Power Score. Then view the results of the Superhero creations displayed on a leaderboard!

Talend recognized for the first time in the 2020 Gartner Magic Quadrant for Enterprise iPaaS

What a pivotal year it’s been for the integration platform as a service (iPaaS) market! Today, improving customer centricity, driving new innovative applications and systems to market, or optimizing supply chains to meet new digital demands have become so critical to many organizations’ growth.

The Power of Data and Analytics To Help Opening the Workplace

The world has gone remote. For those that can, working from home has become the new normal thanks to Covid-19. The gradual shift underway over the past number of years has accelerated, and most organizations have adapted. This mass pivot has been enabled largely by technology, specifically the move to SaaS and cloud, which allow employees to working productively from almost anywhere.

Databases Demystified Lesson 10: Query Planning and Optimization

In this lesson, we talk about what a query planner is and does in the database. We talk about the difference between declarative and imperative programming languages, and we wrap up with a discussion of some common strategies for database optimization to improve query speed.

Integrating Data and Business with Didier Le Tien | Rise of The Data Cloud

This episode features an interview with Didier Le Tien, Vice President of App Development at US Foods. Didier has nearly 20 years of experience strategizing and executing digital and Big Data transformations. In this episode Didier talks about integrating data into US Foods’ business process and innovation strategy, COVID-19’s impact on the restaurant industry, the new technology landscape the cloud provides, and much more.

Cloudera Data Platform in AWS Marketplace Simplifies and Accelerates Cloud Adoption

As organizations look to optimize the speed and cost of their cloud journey in today’s rapidly evolving economy, Cloudera is delighted to announce the availability of Cloudera Data Platform (CDP) Public Cloud in AWS Marketplace. Now customers can easily, confidently and cost-effectively discover, procure and deploy the world’s first Enterprise Data Cloud, powered by AWS, for faster time-to-insight from their advanced analytics and machine learning services.

Customers Rate Snowflake Experience 3x Higher Than Industry Average

At Snowflake, our number one company value is “put customers first. We only succeed when our customers do. And how we help enable their success depends on how well we serve them as a technology provider. To understand if our efforts meet their needs, we conduct an annual Customer Experience Relational Survey. As we’ve done each year, we are pleased to share the findings of this year’s survey, conducted in May 2020 and produced in partnership with Walker.

Reflecting on the past six months

I know a lot of organizations have really struggled in the current environment, but the last six months have actually been quite exciting for us at Yellowfin. We've achieved a lot and have built a fantastic strategy for the future. We have really focused in on our sales organization, hiring a new VP of Global Sales, Josh Read, and appointing new sales leadership in the regions as well.

9 Key Areas to Cover in Your Anomaly Detection RFP

Evaluating a new, unknown technology is a complicated task. Although you can articulate the goals you’re trying to achieve, you’re probably faced with multiple solutions that approach the problem in different ways and highlight varying features. To cut through the clutter, you need to figure out what questions to ask in order to evaluate which technology has the optimal capabilities to get the job done in your unique setting.

How Correlation Analysis Boosts the Efficacy of eCommerce Promotions

In the first part of the blog series, we discussed how correlation analysis can be leveraged to reduce time to detection (TTD) and time to remediation (TTR) by guiding mitigation efforts early. Further, correlation analysis helps to reduce alert fatigue by filtering out irrelevant anomalies and grouping multiple anomalies stemming from a single incident into one alert. In this part, we throw light on the applicability of correlation analysis in the realm of eCommerce, specifically, promotions.

How to get powerful and actionable insights from any and all of your data, without delay

A North American telecom company struggled for years trying to react quickly enough to new categories and new levels of spam texts and calls. They also did not have a good way to know when and why they would need additional capacity on their own, or any other telecom company’s networks.

Introducing CDP Data Engineering: Purpose Built Tooling For Accelerating Data Pipelines

For enterprise organizations, managing and operationalizing increasingly complex data across the business has presented a significant challenge for staying competitive in analytic and data science driven markets.

Doing business in Japan during COVID-19

The way that people do business in Japan has radically changed as a result of COVID-19. Historically, the Japanese are very much about face to face relationship selling where you establish relationships and build trust. But now everything has to be done remotely or online. Like everyone else, the Japanese are really keen to keep doing business so they've actually embraced the remote way of doing things which has been quite interesting to watch and had some unexpected benefits.

The Snowflake IPO - What does it mean?

Today, Snowflake began life as a publicly traded company on the New York Stock Exchange. What does it mean? It depends on who you are. For employees, this is of course a huge milestone, especially for the longest serving employees who hired on at the company in 2013 when the company first started staffing beyond its core founding team.

How Neural Guard Built its X-Ray & CT Scanning AI Production Pipeline - Customer Story

Neural Guard produces automated threat detection solutions powered by AI for the security screening market. With the expansion of global trends like urbanization, aviation, mass transportation, and global trade, the associated security and commercial challenges have become ever more crucial.

Correlation Analysis: A Natural Next Step for Anomaly Detection

Over the last decade, data collection has become a commodity. Consequently, there has been a tremendous deluge of data in every area of industry. This trend is captured by recent research, which points to growing volume of raw data and growth of market segments fueled by that data growth.

Building ML Pipelines Over Federated Data & Compute Environments

A Forbes survey shows that data scientists spend 19% of their time collecting data sets and 60% of their time cleaning and organizing data. All told, data scientists spend around 80% of their time on preparing and managing data for analysis. One of the greatest obstacles that make it so difficult to bring data science initiatives to life is the lack of robust data management tools.

Addressing the data storm with the Enterprise Data Cloud

For some, this may look like a new category at this year’s Data Impact Awards. However, the Enterprise Data Cloud category marks the evolution of what was once the Data Anywhere category. The main reason for this change is that this title better represents the move that our customers are making; away from acknowledging the ability to have data ‘anywhere’.

Access control for Azure ADLS cloud object storage

Cloudera Data Platform 7.2.1 introduces fine-grained authorization for access to Azure Data Lake Storage using Apache Ranger policies. Cloudera and Microsoft have been working together closely on this integration, which greatly simplifies the security administration of access to ADLS-Gen2 cloud storage.

Dashboards vs automated business monitoring: What's the difference?

In 2020, however, contining to rely just on dashboards for your BI needs isn't enough. Why? Data is growing exponentially - in both size and complexity - within every business today. Manually keeping track of performance and searching for insights has become difficult for many users, and it's fostered new expectations - to be able to do more with analytics - including making it faster and easier to keep on top of changes or opportunities.

Yellowfin: the embedded BI platform of choice for dashboard designers

For our Head of Product Design and Creative Director, Tony Prysten, design is always top of mind. In analytics platforms, good design plays an important role in how people understand and use data. Here Tony shares how Yellowfin has been created with designers and developers in mind.

Using Augmented Intelligence To Drive Recovery and Growth Through COVID and Beyond

There seems to be universal acceptance that effective use of data can help maximize bottom line value. However, many businesses still aren’t successfully leveraging data to its full extent due to people, processes and technology roadblocks. Thankfully, there is a unifying approach to unlocking the immense opportunity for enterprises to more effectively leverage data to create new products, services and business models.

5 ways business intelligence helps telcos gain competitive advantage

As an industry that capitalizes on the transfer and exchange of data, the telecom sector has a wealth of data on their hands they can use to stay ahead of the competition — network performance, product usage, customer information, billing details, and more. This constant influx of data presents a lot of opportunities for telcos, but only if organizations adopt strategies that aim to make this data accessible and useful.

EP. 3 Developing a Long-Term Data Strategy Minna Karha, Head of Data at Finnair

Minna Karha, Head of Data at Finnair, joins us for this episode of Rise of the Data Cloud. She came to Finnair two and a half years ago as the first person in that role. The company considers data expertise to be essential for its ongoing digital transformation. She built a team and developed a long-term data strategy through 2025. A key element of the strategy is moving much of Finnair’s data to the cloud so it can be more easily integrated. Right now, Finnair is beginning to share data via the cloud with business partners, including other carriers.

How to Run Spark Over Kubernetes to Power Your Data Science Lifecycle

Spark is known for its powerful engine which enables distributed data processing. It provides unmatched functionality to handle petabytes of data across multiple servers and its capabilities and performance unseated other technologies in the Hadoop world. Although Spark provides great power, it also comes with a high maintenance cost. In recent years, innovations to simplify the Spark infrastructure have been formed, supporting these large data processing tasks.

Fundamentals for Success in Cloud Data Management

Everybody needs more data and more analytics, with so many different and sometimes often conflicting needs. Data engineers need batch resources, while data scientists need to quickly onboard ephemeral users. Data architects deal with constantly evolving workloads and business analysts must balance the urgency and importance of a concurrent user population that continues to grow.

Covid-19 Accelerates The Need for Retail, Manufacturing Supply Chains To Adapt

The ongoing disruption to critical supply chains in both the manufacturing and retail space has seen businesses having to respond quickly, turning to data, analytics, and new technologies to better predict and manage ‘real-time’ business disruptions.

Strength in Numbers: Why Crowdsourcing Works!

The heat of summer and the smell of fresh-cut grass triggers many memories. I feel a sense of yearning from those memories, particularly as I know, during normal times, the college football season has begun. It’s been many years – too many to mention here – since I last played. The sense of anticipation persists, as it is this time of year the team would gather for camp.

Cloudera Named Leader in The Forrester Wave: Notebook-Based Predictive Analytics and Machine Learning, Q3 2020

Cloudera has been named a Leader in The Forrester Wave™: Notebook-Based Predictive Analytics and Machine Learning, Q3 2020. At Cloudera, we are committed to always staying at the forefront of data and analytics innovation — enabling enterprises to more optimally work with data to deliver analytic results across the business quickly and securely.

Data Champions: Balancing IT and Business Needs

Digital transformation has been on the agenda for a long time, but the sudden need to respond to the unprecedented challenges of 2020, has meant the buzzword has become an executable reality for many enterprises. I recently came across a KPMG report that revealed that 80% of executives are increasing investments on emerging technologies now, to drive higher realized value in the future. Underlying digital transformation and investment decisions is a precious asset: data.

What data tells us about crime during lockdown

Recently, we took another look at our Chicago crime dataset to see what was different in crime as a result of the COVID-19 lockdown and what we found was fascinating. This is a dataset provided by the City of Chicago that tracks any type of reported crime. We often use it to demonstrate the power of Yellowfin. The first thing we saw was that a lot of crime went down. This chart shows crime rates in Chicago over the past two years.

Building an effective data approach in a hybrid cloud world - part 3

In our last two posts, we talked with Deloitte’s Marc Beierschoder and Martin Mannion respectively about the requirement organizations have to deploy their data and analytics, quickly, into a hybrid environment. On top of that, there is the fundamental aspect of consistent security and governance of your enterprise data cloud and need for multiple users with different requirements to access data flexibly.

How-to: Index Data from S3 Using CDP Data Hub

This blog post will present a simple “hello world” kind of example on how to get data that is stored in S3 indexed and served by an Apache Solr service hosted in a Data Discovery and Exploration cluster in CDP. For the curious: DDE is a pre-templeted Solr-optimized cluster deployment option in CDP, and recently released in tech preview. We will only cover AWS and S3 environments in this blog.

The Evolution of Insight Advisor - Qlik Sense September 2020

With the third generation of BI upon us, analytics solutions are leveraging AI to generate insights, automate tasks, and support new types of interactions. Qlik is consistently recognized as a leader in augmented analytics, and with the September 2020 release, we’ve set the bar even higher.

The Future Of The Telco Industry And Impact Of 5G & IoT - Part 1

Communication Service Providers (CSPs) are in the middle of a data-driven transformation. The current scale and pace of change in the Telecommunications sector is being driven by the rapid evolution of new technologies like the Internet of Things (IoT), 5G, advanced data analytics, and edge computing. This is opening up new revenue opportunities, use cases, and even the possibility for different types of business models within the sector, changing the way that CSPs operate.

EP.2 The Future of Data and Visualization with Francois Ajenstat, Chief Product Officer at Tableau

On this episode, Steve Hamm sits down with Francois Ajenstat, Chief Product Officer at Tableau and a leader in the data analytics industry. They discuss the ins and outs of data visualization, how Francois approaches data integration at Tableau, and the future of artificial intelligence.

The Future Of The Telco Industry And Impact Of 5G & IoT - Part 3

In the final installment in the series, Vijay Raja, Director of Industry & Solutions Marketing at Cloudera shares his views on how the telecom sector is changing and where it goes next. Hi Vijay, thank you so much for joining us again. To continue where we left off, how are ML and IoT influencing the Telecom sector, and how is Cloudera supporting this industry evolution?

The Ultimate Guide to Cluster Analysis

Cluster analysis is a process used in artificial intelligence and data mining to discover the hidden structure in your data. There is no single cluster analysis algorithm. Instead, data practitioners choose the algorithm which best fits their needs for structure discovery. Here, we present a comprehensive overview of cluster analysis, which can be used as a guide for both beginners and advanced data scientists.

Automating data pipelines with BigQuery and Fivetran

Companies from every industry vertical, including finance, retail, logistics, and others, all share a common horizontal analytics challenge: How do they best understand the market for their products? Solving this problem requires companies to conduct a detailed marketing, sales, and finance analysis to understand their place within the larger market. These analyses are designed to unlock insights in a company's data that can help businesses run more efficiently.

BigQuery explained: An overview

Google BigQuery was released to general availability in 2011 and has since been positioned as a unique analytics data warehousing service. Its serverless architecture allows it to operate at scale and speed to provide incredibly fast SQL analytics over large datasets. Since its inception, numerous features and improvements have been made to improve performance, security, reliability, and making it easier for users to discover insights.

CDP Private Cloud is a Game-changer for Partners

Recently, Cloudera announced the release of Cloudera CDP Private Cloud, delivering the final component of our hybrid cloud strategy. There’s nothing comparable to it in the industry. CDP Private Cloud offers benefits of a public cloud architecture—autoscaling, isolation, agile provisioning, etc.—in an on-premise environment.

Yellowfin Signals: What is a step change?

Comprehensive, comparative analysis of our data can be a highly time-consuming manual task for your users. But new-wave automation and machine learning (ML) tools makes monitoring, alerting and gleaning new insights a lot faster. One such example is Yellowfin Signals and the latest addition to its extensive algorithm library, called step change.

3 Snowflake Features That Make Data Science Easier

Data science is proving to be a major competitive advantage for companies. While business intelligence (BI) helps companies with reporting and historical analysis, data science goes a step further and predicts the future. It can leverage much more data from many more sources, and using machine learning (ML) principles, it automatically identifies patterns and trends to model, predict, or forecast future outcomes.

Why SEO Experts Should Use these 10 SEO Tools in 2020

From a tech enthusiast to a modern entrepreneur, everyone is using Search Engine Optimization or SEO to generate online traffic. Such is the importance of SEO today that the entire industry today is worth $65 billion. People often find it confusing to implement strategies for search engine optimization, but there are a few tools that are making the process easy for digital marketers and SEO experts. Are you looking for the best SEO tools for your brand?

The Machine Learning Collaboration Tool You'll Want to Ride Solo - User Story

I’ll admit it. I am a gushing fan of this new product from Allegro AI called Allegro Trains. I’m not sure what to call it — what noun I should attach to this creature. “Framework” and “Platform” have become, to my ears, rather meaningless jargon designed to detach suit-wearing types from their money. “Harness” is close.

Discover and Explore Data Faster with the CDP DDE Template

It is hard to believe if you have had previous experience with setting up, sizing, and deploying a distributed search engine service that this is possible. Imagine how many times IT has lost valuable time spending hours trying to understand Apache Solr application requirements and map them into how to best size and deploy the Solr service. Time that is lost to Line of Business as well.

The Advantages Of Live Data-Streaming In The Competitive Financial Services Sector (Part III)

Live data-streaming offers businesses exciting new opportunities to transform the way they operate, leveraging real-time insights to drive better decision making and enhance operational efficiency. To find out more about how streaming data might impact the financial services sector I sat down for a chat with Dinesh Chandrasekhar, Head of Product Marketing in Cloudera’s Data-in-Motion Business Unit.

The unexpected benefits of COVID-19 in Japan

The way that people do business in Japan has radically changed as a result of COVID-19. Historically, the Japanese have been very much about face to face relationship selling where you establish relationships and build trust but now everything has to be done remotely or online. Like everyone else, the Japanese are really keen to keep doing business so they've actually embraced the remote way of doing things which has been quite interesting to watch and had some unexpected benefits.

6 reasons why data integration matters in retail

To transform your retail organization and be more customer-centric, you need to improve in areas where it counts: areas that impact your customers’ experience. The lifeblood of any retail organization, your customers expect a lot from the industry, especially with all the recent changes in shopping habits and the economic landscape in general. Customers are more discerning these days, and it will be up to retail organizations to cater to their needs and expectations.

Episode 9: Introduction to Indexes

In this episode, we learn about a very important technique for making database queries faster: indexes! Indexes are a very powerful technique and are critical to understanding how databases work under the hood. In this lesson we talk about how database indexes make use of the binary search algorithm to speed up queries. We also cover the trade-offs associated with using database indexes, and why it might not be a good idea to use too many indexes.

EP.1 The Past, Present, and Future of the Data Cloud w/ Frank Slootman, Chairman & CEO of Snowflake

As founder and CEO of Snowflake, Frank Slootman, has had a first-hand look at the rise of the data cloud. In fact, he helped coin the very term "data cloud." He is a pioneer in helping organizations migrate their data to the cloud, manage it there, and make it accessible for machine learning and other sophisticated analytics techniques. On this episode, Frank speaks about some of the powerful new capabilities enabled by the cloud data platform, including the ability to share data with ease and convenience that was not possible before. #RISEOFTHEDATACLOUD