Systems | Development | Analytics | API | Testing

January 2021

Bringing Users In: How Voice of the Customer Can Change Your Product's Development

There is a scene in a certain TV show in which a middle-aged man tells a younger one that he missed picking someone up from the train station because his device did not remind him about it. The younger man points out that, since he bought a new device, he must transfer the appointments.

Industry X.0 - Made Real, Practical Insights Today enabling Profits Tomorrow

Manufacturing’s digital transformation growth is truly impressive considering it’s delivering value with explosive growth rates. Consider that Manufacturing’s Industry Internet of Things (IIOT) was valued at $161b with an impressive 25% growth rate, or that the Connected Car market will be valued at $225b by 2027 with a 17% growth rate. But then conflicting information arrives as VentureBeat reports that around 90 percent of machine learning models never make it into production?

Masking Semi-Structured Data with Snowflake

Snowflake recently launched dynamic data masking, an incredibly useful feature for companies and data-centric organizations that have strict security data governance requirements. This article demonstrates how we implemented data masking at Snowflake by introducing a data masking policy on a VARIANT data type field that holds data in JSON format. We implemented the policy on top of tables and views.

Sunny Bedi And His Most Important IT Initiatives | Rise of The Data Cloud | Part 2 | Snowflake

Sunny Bedi, CIO and CDO of Snowflake, talks about how Snowflake is a data-driven company, data security in the cloud, how to use AI to minimize data threats, and much more. Rise of the Data Cloud is brought to you by Snowflake.

Cloudera Flow Management Continuous Delivery Architecture

Having introduced the flow delivery challenges and corresponding resolutions in the first article ‘Cloudera Flow Management Continuous Delivery while Minimizing Downtime’, we will combine all the preceding solutions into an example of flow management continuous delivery architecture. DataFlow Continuous Delivery Architecture In the whole process, we can see the following steps.

Data and Customer Privacy: What Companies Need to Do

Today’s Data Privacy Day offers consumers an opportunity to learn about how companies use, collect, and share their personal information. At the same time, it gives companies a chance to focus on and highlight how they are protecting customer data. Although most businesses view data privacy practices as a way to mitigate their risk, good practices around data privacy can actually differentiate your organization from your competitors.

Minding the gaps in your cloud migration strategy

As your organization begins planning and budgeting for 2021 initiatives, it’s time to take a critical look at your cloud migration strategy. If you’re planning to move your on-premises big data workloads to the cloud this year, you’re undoubtedly faced with a number of questions and challenges.

Sunny Bedi Explains His Many Roles At Snowflake | Rise of The Data Cloud | Part 1 | Snowflake

Sunny Bedi, CIO and CDO of Snowflake, talks about how Snowflake is a data-driven company, data security in the cloud, how to use AI to minimize data threats, and much more. Rise of the Data Cloud is brought to you by Snowflake.

Retailers find flexible demand forecasting models in BigQuery ML

Retail businesses understand the value of demand forecasting—using their intuition, product and market experience, and seasonal patterns and cycles to plan for future demand. Beyond the need for forecasts that are as accurate as possible, modern retailers also face the challenge of being able to perform demand planning at scale.

Cloudera Completes SOC 2 Type II Certification for CDP Public Cloud

We believe security is the cornerstone of any legitimate data platform, and we’re excited to announce that Cloudera has successfully achieved SOC 2 Type II certification for Cloudera Data Platform (CDP) Public Cloud. Achieving our SOC 2 certification is the culmination of significant work across our organization and demonstrates to independent auditors that we adhere to industry-standard security controls and processes.

Standardizing Business Metrics & Democratizing Experimentationat Intuit

CDO Battlescars is a podcast series hosted by Sandeep Uttamchandani, Unravel Data’s CDO. He talks to data leaders across data engineering, analytics, and data science about the challenges they encountered in their journey of transforming raw data into insights. The motivation of this podcast series is to give back to the data community the hard-learned lessons that Sandeep and his peer data leaders have learned over the years.

How our customers modernize business intelligence with BigQuery and Looker

Businesses increasingly gather data to better understand their customers, products, marketing, and more. But unlocking valuable and meaningful insights from that data requires powerful, reliable, and scalable solutions. We hear from our BigQuery and Looker customers that they’ve been able to modernize business intelligence (BI) and allow self-service discovery on the data the business collects.

The Biggest Threat to the Security of Healthcare Data

When cyberattacks take out business systems, organizations suffer from direct and indirect financial losses. When healthcare systems go down, it’s a matter of life and death. Healthcare organizations were already a frequent target of cybercriminals, and the pandemic has made this situation worse. Infosecurity Magazine reports that healthcare data breaches will increase by 3x in 2021, at a time when so many healthcare providers are burnt out and exhausted from battling the pandemic.

Migrating Apache NiFi Flows from HDF to CFM with Zero Downtime

Has your organization considered upgrading from Hortonworks Data Flow (HDF) to Cloudera Flow Management (CFM), but thought the migration process would be too disruptive to your mission critical dataflows? In truth, many NiFi dataflows can be migrated from HDF to CFM quickly and easily with no data loss and without any service interruption. Here we explore three common use cases where a CFM cluster can assume an HDF cluster’s dataflows with minimal to no downtime.

Use automated data collection to stay ahead of the competition

The world produces more data than it can consume. Every minute, we watch more than 5 million videos and send over 200 million messages and emails. You read that right. Every. Single. Minute. Companies that want to tap into data-driven decision-making to dominate their competition need to collect the vast amounts of data produced and extract valuable insights using data analysis. But data collection can be extremely challenging.

Engineering Analytics for Automotive Software Development

Embedded World continues to be the largest embedded conference and exhibition globally. We are delighted that we are invited again to present at this forum. This year we will be jointly presenting with Synopsys, a leading driver in the automotive EDA and software space. We are looking forward to offer insights from Dr Dennis Oka, Principal Automotive Security Strategist at Synopsys, together with Dr Ralf Huuck, CEO of Logilica.

How To Report On Software Testing

Being able to write concise, easily comprehensible software testing reports is an important skill for software development team members to possess, particularly those in quality assurance, development, and support. Poorly written software testing reports can make the development process more difficult and less productive. Imagine a client asks if their app is ready for launch and based on your assessment, everything is working correctly.

Are you realizing the potential of your embedded analytics?

Yellowfin have done a lot of work with software vendors over the years, helping them to embed analytics into their applications and we've learned a lot through that process. One of the things we have learnt is that there are some recurring fundamental issues with how product managers bring analytics into their product, an approach which can prevent them from realizing the full potential of their embedded analytics.

Kafka to Splunk: Data mesh for security & IT

Splunk is a technology that made processing huge volumes and complex datasets accessible to security and IT teams. Despite its strengths for monitoring and investigation, Splunk is a bit of a one-way street. Once it's in Splunk, it's not that easy to stream the data elsewhere in great volume. And it doesn’t mean it’s the best technology for all IT and Security use cases. Or the cheapest.

External Tables Are Now Generally Available on Snowflake

Today, Snowflake is announcing the general availability (GA) of the External Tables feature. Snowflake launched the External Tables feature for public preview at the Snowflake Summit in June 2019. It is one of the key features of the data lake workload in the Snowflake Data Cloud.

How Data Is Helping Us Answer Life's Fundamental Questions

I’m a geneticist, which is really just a technical way of saying that I obsess about the minutiae of your family history. Now, while that sounds rather stalkerish, this is in fact a molecular and data-driven addiction, which is fuelled by my need to understand the nature of evolution and human behaviour via the data within our DNA. Our genomes are the single-most densely packed dataset that we have ever encountered.

Katalon TestOps - Test Orchestration and Quality Analytics Platform

The “Quality at Speed” movement – or delivering high-quality products in a short period – has expanded beyond the software industry: it appears in the standard playbook of companies in health care, finance, etc. This new movement pushes QA teams to continuously reinvent their software development cycle with advancing technological practices.

Get Your Analytics Insights Instantly - Without Abandoning Central IT

Do you need faster time to value? Does your organization’s success depend on immediate delivery of new reports, applications, or projects? When you go to Central IT for support, are you blocked by insanely long wait times for the resources needed to meet your business goals? If so – you are likely one of the growing group of Line of Business (LoB) professionals forced into creating your own solution – creating your own Shadow IT.

Analytical Applications: What are they?

These bundled analytics tools help organizations facilitate and increase the adoption of self-service BI practices among regular business users in a specific operational domain, such as finance, marketing and sales. It does so by improving the availability and measurement of important, relevant historical data for your end users’ decision-making.

Why Data Engineers Should Consider Microsoft Azure

Modern applications don’t function in isolation. To get the most out of the enterprise apps you build or buy, you’ll have to connect them to other applications. In other words, data engineers have to engage in effective application integration to achieve their business goals. Sometimes, this means connecting one application directly to another. But this is a rare occurrence in digitally transformed industries.

5 Ways to Process Small Data with Hadoop

From system logs to web scraping, there are many good reasons why you might have extremely large numbers of small data files at hand. But how can you efficiently process and analyze these files to uncover the hidden insights that they contain? You might think that you could process these small data files using a solution like Apache Hadoop, which has been specifically designed for handling large datasets.

Building a Machine Learning Application With Cloudera Data Science Workbench And Operational Database, Part 3: Productionization of ML models

In this last installment, we’ll discuss a demo application that uses PySpark.ML to make a classification model based off of training data stored in both Cloudera’s Operational Database (powered by Apache HBase) and Apache HDFS. Afterwards, this model is then scored and served through a simple Web Application. For more context, this demo is based on concepts discussed in this blog post How to deploy ML models to production.

Digital Transformation is a Data Journey From Edge to Insight

Digital transformation is a hot topic for all markets and industries as it’s delivering value with explosive growth rates. Consider that Manufacturing’s Industry Internet of Things (IIOT) was valued at $161b with an impressive 25% growth rate, the Connected Car market will be valued at $225b by 2027 with a 17% growth rate, or that in the first three months of 2020, retailers realized ten years of digital sales penetration in just three months.

How to configure clients to connect to Apache Kafka Clusters securely - Part 3: PAM authentication

In the previous posts in this series, we have discussed Kerberos and LDAP authentication for Kafka. In this post, we will look into how to configure a Kafka cluster to use a PAM backend instead of an LDAP one. The examples shown here will highlight the authentication-related properties in bold font to differentiate them from other required security properties, as in the example below. TLS is assumed to be enabled for the Apache Kafka cluster, as it should be for every secure cluster.

Goodbye 2020 - Hello 2021 Magic Quadrant for Analytics and BI Platforms

The wait is nearly over, and soon we’ll all be privy to this year’s Gartner Magic Quadrant for Analytics and BI Platforms. Qlik is proud of its 15-year history and ranking as a leader for the last decade in this signature research, and we are enthusiastic about sharing a complimentary copy of the full report when it publishes at this location: https://www.qlik.com/us/gartner-magic-quadrant-2021

Work at warp-speed in the BigQuery UI

Data analysts can spend hours writing SQL each day to get the right insights. So it’s crucial that the tools in the Google Cloud Console make that job as easy and as fast as possible. Now, we’re excited to show you how BigQuery’s Cloud Console UI has been updated with radical usability improvements for more efficient work, making it easier to find the data you need and write the right SQL quickly.

Cloudera Flow Management Continuous Delivery while Minimizing Downtime

Cloudera Flow Management, based on Apache NiFi and part of the Cloudera DataFlow platform, is used by some of the largest organizations in the world to facilitate an easy-to-use, powerful, and reliable way to distribute and process data at high velocity in the modern big data ecosystem. Increasingly, customers are adopting CFM to accelerate their enterprise streaming data processing from concept to implementation.

How Infutor Uses the Placekey External Function to Extend the Power of Snowflake

The Snowflake Data Cloud provides the unique ability for anyone to join their own data sets with thousands of live third-party data sets near-instantly, securely, and without moving data. Businesses operating in the Data Cloud gain a huge advantage over their competitors who are stuck in data silos and struggling with stale data sets downloaded from their legacy data providers weeks, months, or years ago.

How Customer Success Managers Drive Digital Transformation

Before the end of the year, I met with Christina McCoy, a Customer Success Manager for our federal region. We discussed some of her observations and proven practices for driving adoption in her accounts. Christina joined Qlik two years ago; she has over five years of CSM experience and is a graduate of Howard University, where she was a pitcher for the Bison’s softball program.

Prioritizing Your People with Randy Wigginton of Square | Snowflake Inc.

Randy Wigginton, Director of Platform Infrastructure Engineering at Square talks about what it takes to produce world-changing innovations, how to use data to fully understand your customers, insights into how to compete with tech giants, and much more. Rise of the Data Cloud is brought to you by Snowflake.

Handling Large Datasets in Data Preparation & ML Training Using MLOps

Data science has become an important capability for enterprises looking to solve complex, real-world problems, and generate operational models that deliver business value across all domains. More and more businesses are investing in ML capabilities, putting together data science teams to develop innovative, predictive models that provide the enterprise with a competitive edge — be it providing better customer service or optimizing logistics and maintenance of systems or machinery.

How to create 3D Body Maps in Yellowfin BI

One particular feature requested by our customers is 3D body mapping, and whether exporting 3D models and utilizing its visualization and filtering can be applied easily. This technical walkthrough shows you how to leverage Yellowfin to integrate 3D models within Yellowfin and then use them to create a fully interactive display in your dashboard.

7 Best data management tools in 2021

Data is produced and consumed at volumes and speeds which were unimaginable just a decade ago.Top players have taken advantage of this growth. Tapping into data resources for actionable insights - aptly called the new oil - makes data-driven companies dominate their competition. But the proliferation of data can lead to growing pains. Companies find themselves increasingly incapacitated by the vast and messy nature of their in-house data.

Good Testing Data is All You Need - Guest Post

Building machine learning (ML) and deep learning (DL) models obviously require plenty of data as a training-set and a test-set on which the model is tested against and evaluated. Best practices related to the setup of train-sets and test-sets have evolved in academic circles, however, within the context of applied data science, organizations need to take into consideration a very different set of requirements and goals. Ultimately, any model that a company builds aims to address a business problem.

Finding digital transformation in high places - how a ski resort improved operational agility and customer experiences

Most blogs in my history are very focused on Industry 4.0’s digital transformation of the manufacturing industry, which in itself is pretty remarkable. By 2025, Industry 4.0 is expected to generate greater than $11 trillion in economic value as connected manufacturing processes, operations and their supply chains become more streamlined, efficient, agile and realize improved productivity, improved uptime and product quality.

The role of customer experience in digital transformation

There is an undeniable truth that nobody can unsee: 2020 accelerated the digitalization of the world like no other time in the past. Individuals shifted en masse to interact, shop, play, learn, and even go to the doctor online. On the same note, organizations migrated internal and customer-facing operations to a digital realm, regardless of their size, location, or goals.

Kafka Total Cost of Ownership: What are you missing?

“We’ve seen two years’ worth of digital transformation in two months” said Microsoft’s Satya Nadella. Due to COVID-19, digital transformation roadmaps have been deleted, redrafted, doubled down and accelerated by up to a decade. Traditional companies are moving by osmosis towards streaming technologies such as Apache Kafka to kick off new digital services. But how much should it cost to experience 2030 in 2021?

Common Regulations that Data-Driven Entities Need to Know

For public and private entities, data collection is a way of life. That fact has led to the proliferation of common regulations to protect consumers and individuals from unacceptable use or storage of their private data. But it's not just data collection laws companies have to adhere to. There are many US-based and international statutes that put constraints on how they do business. What follows summarizes the most common regulations and how they can affect the work you do, day to day.

Cloudera Data Warehouse Demonstrates Best-in-Class Cloud-Native Price-Performance

Cloud data warehouses allow users to run analytic workloads with greater agility, better isolation and scale, and lower administrative overhead than ever before. With the ability to quickly provision on-demand and the lower fixed and administrative costs, the costs of operating a cloud data warehouse are driven mostly by the price-performance of the specific data warehouse platform.

Uncover Gold During an Economic Crisis: Five Steps to Monetizing Your Data

Because of the COVID-19 global pandemic, almost every industry is experiencing volatility, risks and changes to buying behavior. Nevertheless, in crisis often comes opportunity and a forcing factor for businesses to redefine themselves. Those looking to innovate after (or even during) this crisis should focus on two key concepts — data monetization and data modernization.

Loading complex CSV files into BigQuery using Google Sheets

BigQuery offers the ability to quickly import a CSV file, both from the web user interface and from the command line: Indeed, try to open this file up with BigQuery: and we get the errors like: This is because a row is spread across multiple lines, and so the starting quote on one line is never closed. This is not an easy problem to solve — lots of tools struggle with CSV files that have new lines inside cells. Google Sheets, on the other hand, has a much better CSV import mechanism.

Talend vs. MuleSoft vs. Xplenty: Which One Does ETL Better?

The key differences between Talend, MuleSoft, and Xplenty: Enterprise data volumes are increasing by 63 percent per month, according to a recent study. Twenty percent of organizations draw from 1,000 or more data sources. How do these companies extract and move all this data to a centralized destination for business analytics? As we know, Extract, Transform, and Load (ETL) streamlines this entire process. But smaller organizations lack the coding skills required for successful implementation.

Brick and Mortar Stores are Now Built Brick by Brick with Digital Insights

In my last three blogs (Get to Know Your Retail Customer: Accelerating Customer Insight and Relevance; Improving your Customer-Centric Merchandising with Location-based in-Store Merchandising; and Maximizing Supply Chain Agility through the “Last Mile” Commitment) I painted a picture that showed an ever-changing landscape in retail, considering that consumers are more in control than ever, mobile (at least somewhat digitally mobile considering the pandemic) and socially connected.

The telco industry isn't slowing down: what to expect and prioritize in 2021

2020 was all about problem-solving. And indeed, the industry was able to come up with creative and innovative solutions to address emerging threats and challenges—unexpected and not. So, what will the this new year bring for telecommunications? Here are five trends predicted to shape the telco industry in 2021.

What is Low-Code? Low-Code vs. No-Code, Low-Code Development Tools, and More

A developer's primary job is to work seamlessly, rapidly, and accurately to create software, apps, or websites that match business requirements. Unfortunately, there is a huge margin for error if you have to write lines and lines of complex code. Additionally, many basic tasks in the use of data-related software and other solutions, require extensive coding knowledge that many employees simply don't have. One solution to this is low-code software and development.

The role of the API in managing Big Data

Every time someone uses an app, information travels from a database to the user via an API. Single instances may not seem very important. As long as they perform the required task, people don’t think too much about how applications work. From a business perspective, though, the big data flowing through APIs could unlock important knowledge that helps tap into emerging trends and target customers better. To get the best results, though, companies need the best big data API management.

Building a Machine Learning Application With Cloudera Data Science Workbench And Operational Database, Part 2: Querying/ Loading Data

In this installment, we’ll discuss how to do Get/Scan Operations and utilize PySpark SQL. Afterward, we’ll talk about Bulk Operations and then some troubleshooting errors you may come across while trying this yourself. Read the first blog here. Get/Scan Operations In this example, let’s load the table ‘tblEmployee’ that we made in the “Put Operations” in Part 1. I used the same exact catalog in order to load the table. Executing table.show() will give you:

Apache NiFi - the data movement enabler in a hybrid cloud environment

Cloudera provides its customers with a set of consistent solutions running on-premises and in the cloud to ensure customers are successful in their data journey for all of their use cases, regardless of where they are deployed. Cloudera DataFlow provides Apache NiFi in both the Cloudera Data Platform Private Cloud Base (on-premises) and Public Cloud (AWS, Azure, and Google Cloud) products in this hybrid cloud strategy.

Managing Multiple Accounts with Snowflake Organizations Is Now in Public Preview

We are excited to announce that the new Snowflake Organizations feature is now available in public preview. Organizations enable customers to easily manage their data, storage, and compute across multiple Snowflake accounts and even across regions and clouds. Through a new ORGADMIN role, customers can now: We’re excited to hear how you use these new more powerful self-service capabilities to manage your Snowflake Data Cloud.

Credit Suisse AG Names Unravel Data A Disruptive Tech Winner

Unravel Data is a leader in the emerging field of DataOps, going beyond application performance monitoring (APM) to provide AI-powered recommendations for big data and streaming data applications. Now Unravel is being recognized by banking technology innovator Credit Suisse AG in their prestigious Disruptive Technology Recognition (DTR) program.

Peloton & Qlik: The Analytics of It All

Ok, I’ll admit it… I’m one of those people, I own a Peloton – and it’s awesome. But, as a data professional, I’ve struggled with getting decent metrics about how I’m doing and trying to see if I’m making progress with my fitness level. How can I discern performance stats to answer basic questions to gauge my performance over time?

Putting Data in Your Shopping Cart with Instacart | Snowflake Inc.

On this episode of Rise of the Data Cloud, Dustin Pearce, Vice President of Infrastructure at Instacart talks about the advances online shopping has made during the pandemic, where startups should place their priorities, the future of infrastructure, & much more. Rise of the Data Cloud is brought to you by Snowflake.

Enabling Self-Service Business Insights with Cloudera Data Warehouse

Requests to Central IT for data warehousing services can take weeks or months to deliver. Central IT teams at large organizations face a proliferation of IT projects arising from the complexities of markets and from the needs of internal lines of business (LoBs). At the same time, Central IT must juggle cost and risk.

Top 5 Questions about Apache NiFi

Over the last few weeks, I delivered four live NiFi demo sessions, showing how to use NiFi connectors and processors to connect to various systems, with 1000 attendees in different geographic regions. I want to thank you all for joining and attending these events! Interactive demo sessions and live Q&A are what we all need these days when working remotely from home is now a norm. If you have not seen my live demo session, you can catch up by watching it here.

Automated business monitoring: Why you need it now

One of the things we've done a lot of work on at Yellowfin is automated business monitoring (ABM), specifically with our product Yellowfin Signals. It can truly transform organizations and help them to deliver insights faster, ones they can react on. ABM has been in the market for about five years but we haven't seen it take off just yet. One reason is that automated business monitoring challenges the status quo of the data analyst.

Lessons Learned on Operationalizing Machine Learning at Scale with IHS Markit

According to Gartner, over 80% of data science projects never make it to production. This is the main problem that enterprises are facing today, when bringing data science into their organization or scaling existing projects. In this session, Senior Data Scientist Nick Brown will share his lessons learned from operationalizing machine learning at IHS Markit. He will discuss the functional requirements required to operationalize machine learning at scale, and what you need to focus on to ensure you have a reliable solution for developing and deploying AI.

How to Comply with Sweden's PII Data Protection Act

Personal Identifiable Information (PII) has become a headache for most digital-first businesses in recent years. Everyone agrees we need rules to keep personal data safe, but there’s no universal PII Data Protection Act we can all follow. Instead, there is a worldwide patchwork of regulations, many of which have global implications. Sweden is one of the pioneers in data security laws.

Stitch vs. Jitterbit vs. Xplenty: What's the Difference?

The key differences between Stitch, Jitterbit, and Xplenty: The average business pulls data from 400 different locations, which makes it tricky to generate valuable data insights. Data-driven organizations use an Extract, Transform, and Load (ETL) platform to pull all this information into a data lake or warehouse for deeper analysis. However, many businesses lack the technical skills (like coding) to facilitate this process. The three tools in this review make ETL workflows easier.

5 Best Practices for Integrating Data Science Into Your Marketing Analytics

Personalization enables marketers to send hypertargeted content and offers that are more likely to drive purchases and cultivate brand loyalty. Research by Accenture from 2018 shows that 91% of consumers are more likely to shop with companies that provide relevant offers and recommendations. Though personalization helps marketers optimize ad spend and drive improvements in customer lifetime value, basket size, and retention, it’s still untenable at scale in many organizations.

What is natural language generation?

Natural language generation (NLG) is best described as a sub-type of artificial intelligence (AI) that generates linguistically rich descriptions of insights, both written and spoken, in plain English. It does this by automatically scanning and finding the most interesting and important concepts in structured data that resides in our databases or apps, and translating it into a consumable, text-based narrative that is easier for the average business user to access and understand.

Snowflake and Saturn Cloud Partner to Bring 100x Faster Data Science to Millions of Python Users

Snowflake and Saturn Cloud are thrilled to announce our partnership to provide the fastest data science and machine learning (ML) platform. Snowflake’s Data Cloud comprises a global network where thousands of organizations mobilize data with near-unlimited scale, concurrency, and performance. Saturn Cloud’s platform provides lightning-fast data science. Combined, our solutions enable customers to maximize their ML and data science initiatives.

Building a Machine Learning Application With Cloudera Data Science Workbench And Operational Database, Part 1: The Set-Up & Basics

Introduction Python is used extensively among Data Engineers and Data Scientists to solve all sorts of problems from ETL/ELT pipelines to building machine learning models. Apache HBase is an effective data storage system for many workflows but accessing this data specifically through Python can be a struggle. For data professionals that want to make use of data stored in HBase the recent upstream project “hbase-connectors” can be used with PySpark for basic operations.

A Three-Step Plan to Innovate Hadoop for the Cloud

How large is your Hadoop data lake? 500 terabytes? A petabyte? Even more? And it is certainly growing, bit by bit, day after day. What began as inexpensive big data infrastructure now demands ever more expenditures on storage and servers while becoming increasingly unwieldy and expensive to manage. Such rapacity makes it ever harder to realize a proper return on investment from that Hadoop infrastructure.

2021 Trends - How You Ranked Them

On December 8th, it was time for the annual “State of the Union” from Qlik, with regards to BI & Data Trends. Overwhelmingly, attendance was in the many thousands, and we received thousands of questions. To get that type of engagement in a year where people have done nothing but virtual conferences is amazing. One person put it to me like this: “I just joined in on your webinar on the top data and analytics trends and it was truly fantastic.

How to build the dream analytics team

The problem with modern analytics is that it overpromises and underdelivers. We can even quantify the disappointment: This begs the question: why even bother with analytics? Well, when analytics is done right, it pays back $13.01 for every dollar spent. In fact, data-driven companies outperform their competitors in almost every conceivable way. We have broached the topic of extracting value from data before, from how to set up the right data strategy to building a data-driven culture.

Maximizing Supply Chain Agility through the "Last Mile" Commitment

In my last two blogs (Get to Know Your Retail Customer: Accelerating Customer Insight and Relevance, and Improving your Customer-Centric Merchandising with Location-based in-Store Merchandising) we looked at the benefits to retail in building personalized interactions by accessing both structured and unstructured data from website clicks, email and SMS opens, in-store point sale systems and past purchased behaviors.

10 Predictions about Data Cloud Analytics in 2021

2021 is the year of the Data Cloud. Powered by the Snowflake platform, the Data Cloud will be the place where organizations across industries can converge to mobilize their data. Snowflake estimates that there are still hundreds of millions of data sets isolated in cloud data storage and on-premises data centers globally. The Data Cloud eliminates these silos, allowing you to seamlessly unify, analyze, and share your data to reach deeper insights and even open new revenue streams.

The Train Has Left the Station for the Last Time

We have three big announcements to our community today, and I wanted to talk to you about them: One, Allegro Trains is changing its name, two, we’re adding a completely new way to use Trains, and three, we’re announcing a bunch of features that make Trains an even better product for you! Read all about it on our blog at Clear.ml, our new website for our open source suite of tools.

Cash Back on Your Data Stack with Rakuten Rewards | Snowflake Inc.

In this episode of Rise of the Data Cloud, Mark Stange-Tregear, Vice President of Analytics at Rakuten Rewards, talks about how to successfully communicate with both merchants and consumers, the nuances of analyzing consumer data, the future of cloud data analytics, & much more. Rise of the Data Cloud is brought to you by Snowflake.

The Importance of Data Storytelling in Shaping a Data Science Product

Artificial intelligence and machine learning are relentlessly revolutionizing marketplaces and ushering in radical, disruptive changes that threaten incumbent companies with obsolescence. To maintain a competitive edge and gain entry into new business segments, many companies are racing to build and deploy AI applications.

Top 5 Reasons to Implement Contextual Analytics

Once, dashboards and reports were limited to being embedded into our apps as standalone modules - essentially, as separately accessed product features. This limitation sometimes meant analytics is forgotten by users, and more often than not, underutilized in its potential. Today, contextual analytics makes it possible to embed analytics directly into the core workflow of your software.

Top 10 Thought Leaders in AI/ML We're Following

One of the best ways to stay current in the fast-evolving field of artificial intelligence and machine learning is by following thought leaders, evangelists, and influencers in the industry. In this article, we’ve selected 10 of the most influential thought leaders (listed alphabetically) that are helping drive the field forward.

10 Must-Read Data Analytics Websites

The field of data analytics is rapidly evolving alongside advances in technologies such as AI and machine learning. There are many valuable resources online that can help you stay up-to-date with the industry — from news sites, industry analysis, and the latest scientific research. We’re listing the top 10 websites and blogs (listed alphabetically) for anyone interested in keeping up with recent industry developments.

Anodot Tutorial: Introducing Business Impact Alerts

Now there’s an easy way to measure the business impact of every incident. Anodot lets you set a monetary value for each measure you monitor. Once you set the Impact Value, future alerts will show you how much the anomaly has cost you thus far. Anodot is the only monitoring solution built from the ground up to find and fix key business incidents, as they’re happening. As opposed to most monitoring solutions, which focus on machine and system data to track performance, Anodot also monitors the more volatile and less predictable business metrics that directly impact your company’s bottom line.