Systems | Development | Analytics | API | Testing

June 2020

Ritual Improves Retention With a Modern Data Stack

A brittle ETL pipeline, a mix of different code languages and degrading warehouse performance inhibited customer retention analysis. With a modern data stack, Ritual has a 95% reduction in data pipeline issues, a 75% reduction in query times, and a threefold increase in data team velocity. By empowering the business with data, the business has seen a sustained improvement in retention.

Are you prepared to mature to 'ready-made' data management?

When it comes to furnishing our living spaces, it seems we go through phases. When I was just setting out and leaving home, IKEA was my preferred furniture store. You make your choice, collect all the flat-pack boxes, lug them home, and after some hex key gymnastics: voilà. You’ve truly made it! Since then, I’ve drifted from the “some assembly required” phase to the “ready-made” one.

Breaking the Silos Between Data Scientists, Eng & DevOps - MLOPs Live #6 - With Ecolab

Building scalable #AI applications that generate value in real business environments require not just advanced technologies, but also better processes for #datascience, #engineering and #devops teams to collaborate effectively. We will be deep diving into this topic on our next #MLOpsLive webinar with: Greg Hayes, Data Science Director at Ecolab and Yaron Haviv, our Co-Founder and CTO.

Options for Embedding and Integrating with Qlik Sense

This ‘explainer’ video provides a detailed overview of the range of embedding and integration options for Qlik Sense. Beginning with the simplest embedding option and moving to the more sophisticated integration approaches, watch and learn through a complete step by step process illustrated by the different use cases.

How Trigo Built a Scalable AI Development & Deployment Pipeline for Frictionless Retail

Trigo is a provider of AI & computer vision based checkout-free systems for the retail market, enabling frictionless checkout and a range of other in-store operational and marketing solutions such as predictive inventory management, security and fraud prevention, pricing optimization and event-driven marketing.

Data-driven marketing 101: benefits, examples and implementation

Back in the old days, marketing was ridden with a lot of guesswork. Sometimes, unexpected campaigns brought new leads and converted prospects into customers. Other times, the best-designed campaigns flopped, the market remained unmoved and all you could hear after the launch of a campaign was silence. Data-driven marketing rose from the pains of this insecurity and took on the overwhelming growth of data for its support.

How to deliver exceptional CX with embedded data analytics

Your customers are surrounded by data all the time, gaining a range of experiences looking for insights to fuel their decision making. You can guarantee that each application they interact with will have a varying degree of sophistication in how they use and interpret data. It’s tough in the market, right now you need to retain your customers and take essential market share within your space. You might have less time to innovate your product right now as you focus on your core offering.

A Guide to Autonomous Monetization Monitoring for the Gaming Industry

Similar to other companies in the entertainment industry, gaming companies typically drive revenue from three sources: in-app purchases, ads, and subscription. A couple of examples of these sources include creating different in-app purchase options for each game and various ad units from multiple ad networks. While this diversity in revenue streams may be advantageous from a business perspective, from a technical standpoint, it creates numerous challenges.

CDP Private Cloud ends the battle between agility & control in the data center

As a BI Analyst, have you ever encountered a dashboard that wouldn’t refresh because other teams were using it? As a data scientist, have you ever had to wait 6 months before you could access the latest version of Spark? As an application architect, have you ever been asked to wait 12 weeks before you could get hardware to onboard a new application?

Why an integrated analytics platform is the right choice

Companies realize that in order to grow, connect products and services, or protect their business, they need to become data-driven. In selecting the tools to realize these goals, organizations effectively have two choices: a self-selected combination of analytics tools and applications or a unified platform that handles all. In this blog we will discuss the challenges of the former choice that will provide justification for the latter.

Using Your Existing API to Become a Snowflake Data Marketplace Provider, Part 2

One thing nearly all such data providers have is a REST API. Snowflake’s recently announced external functions capability allows Snowflake accounts to call external APIs. By using external functions, data enrichment providers can fulfill requests for data from Snowflake Data Marketplace consumers.

The Path of Most Resilience Chapter 3: Data and Analytics

If a company hopes to remain competitive in today's hyper-connected world, it must speak to individuals, not broad segments. Businesses must consider all types of data generated by their customers-contextual and real-time, structured and unstructured-to deliver relevant and precise results on the most appropriate channel and device.

Aceable Switches From Alooma to Fivetran, Eliminates ETL Maintenance

After Alooma announced it was sunsetting its services for Redshift customers, Aceable moved to Fivetran for data integration. In one week, the business integrated all of its sources, including MongoDB — a project that was never completed with Alooma. With Fivetran, Aceable eliminates the need for back-end maintenance and adds Jira to its stack to track project progress across the entire org.

Multi-Raft - Boost up write performance for Apache Hadoop-Ozone

Apache Hadoop-Ozone is a new-era object storage solution for Big Data platform. It is scalable with strong consistency. Ozone uses Raft protocol, implemented by Apache Ratis (Incubating), to achieve high availability in its distributed system. My team in Tencent started to introduce Ozone as a backend object storage in production a few months ago and we’re onboarding more and more data warehouse users.

Speed Up Development With Powered by Fivetran

Powered by Fivetran (PBF) provides a simple framework for developers to go beyond internal analytics projects to build data pipelines into their applications within the Fivetran platform. With no engineering overhead, you can easily access hundreds of customer accounts across countless Fivetran-supported data sources, including advertising platforms, CRM systems, databases, web events and more.

The Rise Of Connected Manufacturing And How Data Is Driving Innovation, Part I

This interview was conducted by Cindy Maike, VP Industry Solutions The shift towards Industry 4.0 is improving manufacturing efficiency and the factory of the future will increasingly be driven by technology like the Internet of Things (IoT), Automation, Artificial Intelligence (AI), and Cloud Computing.

Predictive Analytics: How to build machine learning models in 4 steps

Predictive analytics is a complex process that involves many steps and procedures—from collecting and preparing data to communicating findings through eye-catching dashboards. But there’s one stage that data scientists enjoy doing more than others: predictive modeling and algorithms. As an integral part of data science, modeling involves building a solution, mining the data for patterns, and refining algorithms.

MLRun Functions DEMO: Python Jupyter (Open-Source Data Science Orchestration + Experiment Tracking)

MLRun is a generic and convenient mechanism for #data scientists and software developers to build, run, and monitor #machinelearning (ML) tasks and pipelines on a scalable cluster while automatically tracking executed code, metadata, inputs, and outputs. On-Premise or Barebone/Metal - including Edge AI / Analytics Customers include NetApp, Quadient, Payoneer (and many more).

Git-based CI / CD for Machine Learning & MLOps

For decades, machine learning engineers have struggled to manage and automate ML pipelines in order to speed up model deployment in real business applications. Similar to how software developers leverage DevOps to increase efficiency and speed up release velocity, MLOps streamlines the ML development lifecycle by delivering automation, enabling collaboration across ML teams and improving the quality of ML models in production while addressing business requirements.

Auto-TLS in Cloudera Data Platform Data Center

Wire encryption protects data in motion, and Transport Layer Security (TLS) is the most widely used security protocol for wire encryption. TLS provides authentication, privacy and data integrity between applications communicating over a network by encrypting the packets transmitted between endpoints. Users interact with Hadoop clusters via browser or command line tools, while applications use REST APIs or Thrift.

Using Your Existing API to Become a Snowflake Data Marketplace Provider, Part 1

Many data providers who participate in Snowflake Data Marketplace are already using Snowflake Cloud Data Platform as their primary data store, and they can share secure slices of their data via Global Snowflake, Snowflake’s global data sharing feature, with any other Snowflake consumer regardless of which cloud or Snowflake region each is using. But other potential data providers, especially data enrichment companies, are not yet using Snowflake themselves.

How to use data to change behaviors

The government's COVID-19 response is a good example of failed leadership from a data perspective. While government officials have access to a huge amount of data to help them make decisions they’ve fundamentally failed to use it to take people on a journey. There’s a huge lesson that businesses can learn from this on how to use data to change behaviors. What we’ve seen is governments responding to the outbreak at a very simplistic level.

How Yellowfin is working with Health iPASS to help medical practices survive

In this Webinar, Health iPASS, a leader in medical practice management software, discusses how they are using Yellowfin to provide business insights to medical practices. These insights allow clients to optimize their patient collections and hold their front desk teams accountable for collecting information and capturing card on file. In an era where medical practices are struggling because of COVID-19, Health iPASS has used Yellowfin to create a set of products to help these practices adapt their workflow and build for the future.

How Unity analyzes petabytes of data in BigQuery for reporting and ML initiatives

Editor’s note: We’re hearing today from Unity Technologies, which offers a development platform for gaming, architecture, film and other industries. Here, Director of Engineering and Data Sampsa Jaatinen shares valuable insights for modern technology decision makers, whatever industry they’re in.

Build on your investment by Migrating or Upgrading to CDP Data Center

Cloudera Data Platform (CDP) Data Center(DC) is the on-premises release of Cloudera Data Platform. CDP DC combines the best services and components from Cloudera Enterprise Data Hub and Hortonworks Data Platform Enterprise along with new features and enhancements across the stack to deliver the premier on-premises enterprise data platform . This unified distribution is a scalable and customizable platform where you can securely run many types of workloads.

Lumada for DataOps - Innovate with Data

DataOps is data management for the AI era. It offers new opportunities for emerging industry leaders by simultaneously instituting agility, improving quality, and increasing production success. Here, I will outline how you can solve some of your biggest data management issues with Lumada solutions, which power some of the top organizations in the world. Let’s first discuss data friction and how to remove it.

How to be 10x more productive than the average data scientist

Being more productive than your super competitive peer group is hard. Being 10 times more productive might sound like an impossibility, an exaggeration.... or even a myth (unicorn, you say?). A 10x data scientist is literally 10 times more productive than the average data scientist. The skillsets of these data scientists create better career opportunities, higher peer recognition, and more interesting projects to work on.

How to use data to change behaviors

The government's response to COVID-19 is a good example of failed leadership from a data perspective. While government officials have access to a huge amount of data to help them make decisions, they’ve fundamentally failed to use that data to take people on a journey. There’s a huge lesson that businesses can learn from this on how to use data to change behaviors.

Databases Demystified Lesson Distributed Databases Part 3

In this episode of Michael Kaminksy's Databases Demystified, he explores: What does “consensus” mean and why is it important? Learn about the two-generals problem and what it teaches us about reaching consensus Learn about the main consensus algorithms in distributed databases: Raft & Paxos

Data engineering in 2020

Data Engineers are forever flying the flag for open-source technology. But now that we’re safely locked away in our homes - potentially for the rest of the year - a new danger looms: That we get distracted by our new data tools and lose touch with delivering value to the business. Today most Data Engineers around the world are working from home, and at first glance it may seem like this works. After all, a solid internet connection is all we need to carry on doing what we were doing...

How we build custom data extractors to meet client ETL Needs

On one of our webinars in April 2020 we talked about the developer portal and how our developer community are pushing the Keboola Connection platform into places that often surprise our own core team. Our partners often are the creative ones, adding their knowledge and expertise to expand our platform in service of our shared customers and their varying needs. This is a guest post, written by Johnathan Brook, Solutions Architect at 4 Mile Analytics.

Are Your Machine Learning Models Wrong?

In addition to the very real negative impact on every person around the world, the COVID-19 pandemic is driving business disruptions and closures at an unprecedented scale. Enormous government stimulus programs are resulting in explosions in fiscal deficits, regulators are relaxing capital constraints on banks and central banks are supporting economic stability with a range of interest rate cuts and other stimulus measures.

5 Challenges of Building Data Applications

Fast-growing software companies are building data applications for a variety of uses, from marketing apps that provide customer insights, to IoT apps that handle device feedback, and data analytics apps that process both historical and near real-time data. But developers often face obstacles when building, designing, and supporting applications that need to parse large volumes of information.

Databases Demystified Lesson 1 Introduction to Databases and SQL

In the first episode of Databases Demystified with Michael Kaminsky, we give a high-level overview of the most important concepts in databases. We start with a brief history of databases going from the invention of relational databases through present day and we talk about the differences between analytical and transactional databases, distributed and single-node databases, and in-memory vs on-disk databases We finish up talking briefly about SQL and what makes it special.

Databases Demystified Lesson 4: Transactions Part 1

In this episode of Michael Kaminksy's Databases Demystified, we learn all about what a transaction is, and what ACID means. Learn why database constraints are important, and what the commands "begin" "commit" and "rollback" mean. We talk about atomicity, consistency, isolation, and durability and why transactions are so important.

Databases Demystified Lesson 6: Distributed Databases Part 1

Welcome to episode 6 of Michael Kaminsky's Databases Demystified. In this lesson, we introduce a fascinating and incredibly important topic: distributed databases. We discuss "nodes" and "clusters" and we cover the two major paradigms in distributed databases: big-compute databases and high-availability databases.

Databases Demystified: Lesson 7: Distributed Databases Part 2

Episode 7 of Michael Kaminsky's Databases Demystified. Learn about new issues we face in distributed databases and all about the CAP theorem. We'll talk about leader and follower nodes, what happens when distributed databases lose connection with a node, and what CAP stands for: consistency, availability, and partition tolerance.

Databases Demystified Lesson 3: Row vs Column Store

In Michael Kaminsky's third episode, we learn about the differences in row store vs column store database. This is a very important concept for understanding the difference between analytical and transactional databases, and we talk about the tradeoffs between using row and column stores for saving the data. Michael gets into the weeds and talks about disk blocks and the different types of queries that work well for row and column stores.

Enterprise Data Platforms - Should Organizations Build or Buy?

“Build vs Buy” is an important decision every technology strategist has to make. With the rise of open source and the wealth of freely available software, organizations have the flexibility to build custom solutions when off-the-shelf solutions don’t directly address their needs. In the domain of enterprise data platforms, many organizations have leveraged the open-source ecosystem to build tailored solutions, expending a lot of resources in the process.

Answer Data-Driven Digital Business Wake-Up Call With QlikWorld

If “necessity is the mother of invention,” COVID-19 forced many businesses around the world to rethink their digital transformation, data and analytics strategies. Out of every crisis, there will be opportunities. Although the current pandemic has certainly acted as a black swan event in some cases impeding progress, it has, nonetheless, accelerated several of the trends already pushing toward a digital future.

Anodot - Autonomous Business Monitoring

Business metrics are notoriously hard to monitor because of their unique context and volatile nature. Anodot’s Business Monitoring platform uses machine learning to constantly analyze and correlate every business parameter, providing real-time alerts and forecasts in their context. Anodot reduces detection and resolution for revenue-critical issues by as much as 70%. We have your back, so you’re free to play the offense and grow your business.

Anodot - Autonomous Business Monitoring

Business metrics are notoriously hard to monitor because of their unique context and volatile nature. Anodot’s Business Monitoring platform uses machine learning to constantly analyze and correlate every business parameter, providing real-time alerts and forecasts in their context. Anodot reduces detection and resolution for revenue-critical issues by as much as 80%. We have your back, so you’re free to play the offense and grow your business.

Talend Named a Leader in the Enterprise Data Fabric, Q2 2020 Forrester WaveTM

We are happy to announce Talend is a Leader in The Forrester Wave™: Enterprise Data Fabric, Q2 2020. Talend’s unified approach to data management – combining data integration, integrity, and governance in a single platform – is the best way to gain clarity and confidence in your data. Since we launched Talend Data Fabric in 2015, we’ve been strong believers that data integration and management could not be solved with a static, siloed enterprise software solution.

Thanks to COVID-19 everyone is a data interpreter

One of the most interesting developments coming out of the current COVID-19 crisis is that people are looking at and interpreting data like never before. People that have never expressed an interest in data are now thinking about data and trying to understand what it means for them. There is a lot that businesses can learn from this. With COVID-19, people are taking the time to look at highly complex data, distill it, and assess what it means for their own behavior and lives.

Optimize Local and Global Decisions with Snowflake's Geospatial Support

Even in a global economy, businesses need a deep understanding of local markets. For example, marketing campaigns designed to attract buyers in a large metropolitan area won’t necessarily attract small-town customers. Noticing that buying patterns in one area are extending into a larger regional or nationwide trend can lead to decisions that increase profits. But accessing and analyzing a broad spectrum of geospatial data has been difficult and expensive. That is changing.

Analyzing key drivers behind life expectancy with Yellowfin

Presented in the recent Gartner Showfloor Showdown event, we analyzed health indicator data to discover key drivers behind life expectancy in Australia. Along the way, we identified some interesting insights within the data. Watch this demo as we perform a walkthrough of Yellowfin and experience the platform through the eyes of data analyst.

Introducing table-level access controls in BigQuery

We’re announcing a key capability to help organizations govern their data in Google Cloud. Our new BigQuery table-level access controls (table ACLs) are an important step that enables you to control your data and share it at an even finer granularity. Table ACLs also bring closer compatibility with other data warehouse systems where the base security primitives include tables—allowing migration of security policies more easily.

How Rabobank is Facilitating Financial Independence Through Real-time Data Insights

Banking and financial services organizations are all about customer relationships. By connecting with customers and assisting alongside their financial journeys, these organizations become trusted partners. Building trust and confidence increases the share of wallet and lifetime value. To achieve that on a global scale, you need to leverage big data and predictive analytics. As customers navigate their personal finances, they are looking for a bank they can trust.

Anticipate & adapt: 4 ways Predictive Analytics benefits manufacturing

In a world filled with volatility and unpredictability, organizations must be prepared to deal with disruption. This is especially true for the manufacturing sector, where the slightest error or variation can have a ripple effect across the production line and the organization. For example, as the global pandemic closed restaurants and crowded grocery stores, supply chain management had to be reimagined.

See for yourself: Lenses.io DataOps Workspace for Apache Kafka

Are you considering adapting Apache Kafka or hit a wall in your application of real-time data? DataOps adaptors efficiently build and scale a micro-service architecture and gain visibility into Apache Kafka's black box. This webinar will be a live demo of the Lenses.io DataOps platform. Viewers will learn how DataOps unlocks data observability, discovery & governance for Apache Kafka.

Enterprise data strategy: the right way to the cloud

Clive Humby stated, as far back as 2006, “data is the new oil.” The quote really took off following this 2017 report from The Economist. As a former chemical process engineer, oil immediately makes me think of refining it. Today’s analytics platform for the complete data lifecycle does the same for data as the refinery distillation columns does for crude oil: distilling value.

New Coronavirus Dashboards Reveal Which U.S. Counties May Start Spending First

Snowflake customer, Merkle Inc., has created a new set of COVID-19 interactive dashboards for businesses to use for free to determine which counties in the U.S. will most likely experience an economic recovery first. As economies reopen, states hit hardest by COVID-19, or states that relax social distancing measures sooner rather than later, will not reveal local market opportunities as they emerge.

Snowflake Service Account Security: Part 2

In Part 1, we covered the high-level objectives and methods for attacking service accounts. In Part 2 we discuss defense-in-depth mitigations to those methods. By the end of this blog, you will be able to apply secure-by-default mitigations to threats impacting Snowflake service accounts. The following table from Part 1 highlights the objectives and methods we want to mitigate: These secure-by-default mitigations help prevent and constrain credential misuse from theft and guessing attacks.

Setting up Allegro AI's Trains Platform

There’s a lot to track when training your ML models, and there’s no way around it; reviews and comparisons for best performance are virtually impossible without logging each experiment in detail. Yes, building models and experimenting with them is exciting work, but let’s agree that all that documentation can be laborious and error-prone – especially when you are essentially doing data entry grunt work, manually, using Excel spreadsheets.

WTF is a Convolutional Neural Network?

If you are a software engineer, there's a good chance that deep learning will inevitably become part of your job in the future. Even if you're not building the models that directly use CNNs, you might have to collaborate with data scientists or help business partners better understand what is going on under the hood. In this article, Julie Kent dives into the world of convolutional neural networks and explains it all in a not-so-scary way.

Apache Pulsar walks into a data bar

Some time ago, the concept of event streaming was not that widespread. In addition to that, the platforms provided were much fewer than today, and required more technical depth to set up and operate. However, as time passed, those platforms matured, community grew, the documentation was improved and the rest of the world started to wake up to the advantages these platforms can bring to address the real-time experiences businesses need. And Apache Pulsar is a great example.

The importance of Collaborative BI in a more 'remote' working world

The world has turned upside down. You don’t need me to tell you that. And, thanks to weeks of working from home, a new way of working may be upon us when we flip back. Many are seeing this period as a pivotal time in changing the way many organisations will function. Business has seen first hand that it can run effectively with many of its people working remotely.

Rise of the Data Cloud

It’s only natural to edge forward incrementally. But every once in a while, there is a step-level change that really alters the game. The Data Cloud is exactly that – an opportunity to completely mobilize your data in the service of your business. The Data Cloud is a new type of cloud, so you can avoid bunkering and siloing your data across the infrastructure and application clouds, as well as your on-premise systems.

Snowflake's Product Innovations for 2020

At Snowflake, we are relentlessly focused on our customers and on creating innovative technology to better serve their needs. Today marks another milestone where we demonstrate such focus. In this blog, I detail the latest innovations to our cloud data platform. These new features make even more powerful all six data workloads enabled by our platform – data warehouse, data lake, data exchange, data applications, data engineering, and data science.

The Imperative For Change

The ingredients of competitive advantage and differentiation tend to evolve over time. Throughout history they have included access to capital, raw materials, labor; they have been based on economies of scale and economies of scope. At IDC, it is our contention that the new basis for differentiation will be economies of intelligence.

Countly Application Performance Monitoring solution

Performance Monitoring provides you a deeper understanding of your app performance (Android, web and iOS) and lets you optimize its performance. The Performance Monitoring SDK collects data from your application and the Countly dashboard visualizes this data so that you can review and analyze your applications performance. With this tool you can find performance issues within your application and fix them for better user experience.

How Snowflake's Cloud Architecture Scales Modern Data Analytics

Companies are moving workloads to the cloud as they seek to improve speed, scale, and agility. Today's data warehouse managers want to boost analytics productivity, increase the ability to scale instantly, and ingest and support a diverse set of data without bottleneck delays. In this white paper, we explain how Snowflake delivers the speed, scale, and agility organizations need for data-driven decision-making.

Little Book of Big Success with Snowflake - Financial Services

Financial institutions are embracing cloud-based data technologies to improve their service and product offerings, streamline operations, and gain deeper customer insights. This ebook features success stories about the many ways financial services companies are leveraging Snowflake Cloud Data Platform to build a 360 degree view of customers, accelerate financial analysis with unlimited scale, and keep sensitive and regulated data secure.