Systems | Development | Analytics | API | Testing

August 2021

Buying and selling your home with data: A Q&A with Opendoor CTO Ian Wong

While many businesses struggled to keep pace with the changing economics of a global pandemic, the real estate industry was booming. The housing market reached record-breaking heights last month, with median existing-price homes rising 17.2% over the prior year. This increase in the average cost of a house was compounded by accelerated closing times, as the average house sold in 18 days, a record low.

Spark vs. Tez: What's the Difference?

Let's get started with this great debate. First, a step back; we’ve pointed out that Apache Spark and Hadoop MapReduce are two different Big Data beasts. The former is a high-performance in-memory data-processing framework, and the latter is a mature batch-processing platform for the petabyte scale. We also know that Apache Hive and HBase are two very different tools with similar functions. Hive is a SQL-like engine that runs MapReduce jobs, while HBase is a NoSQL key/value database on Hadoop.

Enter the World of Automated Data Management and Governance with Hitachi's Lumada Data Catalog

The era of manual data management and governance is rapidly coming to a close. The size of the trove of data at nearly every company has become so enormous that it cannot be maintained using manual cleaning, cataloging, governance and search methods. The release of Lumada Data Catalog 6.1 breaks new ground in automating data management, cleaning and governance processes, making it easier to find data and grant access to those who need it.

Data Science: The Future of Corporate Finance

Corporate finance must change. Across industries, an organization’s Finance team should shed light on what’s happening today with revenue and other financial indicators, while also predicting what the future may hold. And they must do the same for the entire organization. Until recently, it would have been impossible to meet these expectations. Excel-driven forecasting requires herculean efforts to wrangle data and report numbers by the end of each quarter.

Snowflake Data Marketplace In Action: Solving Real-World Problems with Public Data

Meet 3 data practitioners who used Snowflake Data Marketplace to answer interesting questions with public data - and gained real benefits for their companies and careers. Snowflake Data Marketplace allows data analysts, data scientists, and data engineers to access more than 650 live, ready-to-query data sets from more than 175 third-party data providers and data service providers.

The Importance of CDC for ETL

The growth of corporate data and the need for more corporate applications and systems are not trends that will soon slow down. Data has become an essential component of commercial success and a measure of the value of a company. Investing in platforms, processes, and people that can effectively protect, transform, and leverage data is the hallmark of a modern data-driven enterprise.

Using Automated Model Management for CPG Trade Success

CPG executives invest billions of dollars in trade and consumer promotion investments every year, spending as much as 15-20% of their total annual revenues on these initiatives. However, studies show that less than 72% of these promotions don’t break even and 59% of them fail. Despite these troubling statistics, most CPG organizations continue to design and execute essentially the same promotions year after year with negligible hope of obtaining sustained ROI.

Ledger Bennett delivers a superior data app experience with ThoughtSpot Everywhere

Ledger Bennett is a B2B demand generation agency that uses sales and marketing know-how to help customers increase revenue. Learn how Ledger Bennett is leveraging ThoughtSpot Everywhere to give both developers and customers the best data app experience, and why they are completely retiring Tableau in the process.

What is the Best Way to Move My Data Securely?

Moving data from an organization’s systems into data warehouses and data lakes are essential to fuel business intelligence and analytics tools. These insights guide businesses into making decisions backed by data, allowing them to choose actions that have the best chance of positive growth. However, getting data from the source systems to these data stores can be a harrowing process.

The top 10 books every data and analytics leader must read

In the final episode of season two of The Data Chief podcast, we talk with authors of four must-read books for data and analytics leaders — two new and two time-tested. As you invest in your continuous learning, here is the full round up of the latest top books I recommend for today’s data and analytics leaders.

How Renault solved scaling and cost challenges on its Industrial Data platform using BigQuery and Dataflow

French multinational automotive manufacturer Renault Group has been investing in Industry 4.0 since the early days. A primary objective of this transformation has been to leverage manufacturing and industrial equipment data through a robust and scalable platform. Renault designed an industrial data acquisition layer and connected it to Google Cloud, using optimized big data products and services that together form Renault's Industrial Data Platform.

Troubleshooting Cloud Services and Infrastructure with Log Analytics

Troubleshooting cloud services and infrastructure is an ongoing challenge for organizations of all sizes. As organizations adopt more cloud services and their cloud environments grow more complex, they naturally produce more telemetry data – including application, system and security logs that document all types of events. All cloud services and infrastructure components generate their own, distinct logs.

What is REST API Design?

Modern business requires a range of digital components to communicate effectively when transferring data and delivering critical messages. Application programming interfaces, or APIs, are sets of rules that regulate exactly how certain apps or machines connect. If you work with data at all, you’ll have heard of REST or RESTful, and REST APIs — but what is REST API design? We explain below.

Apache Ozone Powers Data Science in CDP Private Cloud

Apache Ozone is a scalable distributed object store that can efficiently manage billions of small and large files. Ozone natively provides Amazon S3 and Hadoop Filesystem compatible endpoints in addition to its own native object store API endpoint and is designed to work seamlessly with enterprise scale data warehousing, machine learning and streaming workloads. The object store is readily available alongside HDFS in CDP (Cloudera Data Platform) Private Cloud Base 7.1.3+.

Speed the Path to Vastly More Data Insights With Pentaho 9.2 and DataOps

In our modern world, accelerating the process of extracting insights from data is a complex challenge. Exacerbating this task are colossal data volumes, the expansion and use of multiple cloud platforms, and the increasing demands for self-service in a way that maintains compliance. Enterprises attempting to tackle the problem encounter various forms of friction everywhere they turn.

The Journey to Processing PII in the Data Cloud

During the process of turning data into insights, the most compelling data often comes with an added responsibility—the need to protect the people whose lives are caught up in that data. Plenty of data sets include sensitive information, and it’s the duty of every organization, down to each individual, to ensure that sensitive information is handled appropriately.

What is data ingestion?

We rely on advanced data platforms that extract data from multiple sources, clean it, and save it so data scientists and analysts can gain insights from data. Data seems to flow seamlessly from one location to another, supporting our data-driven decision-making. The entire system runs smoothly because the engineering operations under the hood are correctly set and maintained.

Monitoring in BigQuery

Want to ensure that your BigQuery environment stays cost effective and secure? In this episode of BigQuery Spotlight, we’ll examine how monitoring your data warehouse can optimize costs, help you pinpoint which queries need to be optimized, and audit both data-sharing and access. Watch to learn how BigQuery gives you the flexibility to export any of these data sources back into your data warehouse for custom reporting.

How To Transfer Flat Files Using SFTP With Xplenty

What is SFTP and What Does It Stand For? SFTP is a network protocol for securely transferring, accessing, and managing files on a remote computer. The SFTP protocol is intended as a more secure alternative to the traditional FTP protocol. The term SFTP stands for SSH File Transfer Protocol, where SSH is a cryptographic protocol that allows clients and servers to connect remotely. The files that you send or receive using SFTP are protected by SSH encryption in transit. This added layer of security means that SFTP is preferable to FTP in the vast majority of cases.

The Ethics of Data Exchange

COVID-19 vaccines were developed in record time. One of the main reasons for the accelerated development was the quick exchange of data between academia, healthcare institutions, government agencies, and nonprofit entities. “COVID research is a great example of where sharing data and having large quantities of data to analyze would be beneficial to us all,” said Renee Dvir, solutions engineering manager at Cloudera.

Snowflake and SK Inc. C&C Partner to Drive Innovation Powered by Data

According to Harvard Business Review, South Korea is one of the leading countries in the world for technology innovation, and it’s among the top producers of new data. Technology is so ingrained in the national identity that it launched a “Digital New Deal” to lay the foundation for a digital economy that will facilitate growth and innovation, according to PR Newswire.

How to Handle HIPAA Concerns With Cloud Data Warehouses

How to use a cloud data warehouse to achieve HIPPA compliance, reduce risk and offload some of the operational burden. How do you balance an accessible data warehouse with data protection and HIPAA Compliance? To get the most value from your data, it should be available to everyone in your organization who can benefit from the data analysis, insights and value it holds.

How ThoughtSpot's product management team uses ThoughtSpot to drive user growth

Enabling customers and users to quickly find the value within a product is critical for many organizations and at the heart of being a product manager. The approach to driving user growth involves a growth mindset, combining qualitative and quantitative research methods, and driving impactful solutions.

10 Predictions for the Future of Data Governance

According to TechTarget , data governance is managing the integrity, security, availability, and usability of data in an organization's system. Effective and efficient data governance makes sure data is accurate and consistent. There are several predictions regarding data governance you need to know.

Transforming the Gaming Industry with AI Analytics

In 2020, the gaming market generated over 177 billion dollars, marking an astounding 23% growth from 2019. While it may be incredible how much revenue the industry develops, what’s more impressive is the massive amount of data generated by today’s games. There are more than 2 billion gamers globally, generating over 50 terabytes of data each day.

What is Data Portability and Why is It Important?

Businesses are now storing more personal data on their customers than ever before—from names, addresses, and credit card numbers to such as IP addresses and browsing habits. Understandably, many consumers are speaking up and pushing back on how these businesses use their data—including an insistence on the “right to data portability.” Data portability is an essential issue for companies that must comply with regulations such as the GDPR and CCPA.

How Keboola benefits from using Keboola Connection - The story of the Lead

Greetings, my dear readers. It’s been some time since I’ve posted my last article. This is the third chapter of the introduction to the internal data world of Keboola. In the previous chapters, I’ve posted about an introduction to our internal reporting and communication with our users. Since the last time, a couple things have happened.

Kindred: Transforming raw data into powerful insights

Kindred Group is a publicly-traded gambling operator with offices across four continents, offering entertainment options such as online poker, sports betting, and online casinos. Since its founding, Kindred has experienced fast growth acquiring nine different gambling brands over the last 20 years. With over 30 million customers globally and numerous brands to manage, the Kindred team had a pressing need for a good data management system.

Data-driven competitive advantage in the financial services industry

There is an urgent need for banks to be nimble and adaptable in the thick of a multitude of industry challenges, ranging from the maze of regulatory compliance, sophisticated criminal activities, rising customer expectations and competition from traditional banks and new digital entrants. As banks find their bearings in this landscape, what appear to be insurmountable odds are in fact opportunities for growth and competitive differentiation.

Understanding Operational Analytics

Most companies have had to adjust to the big data push. Some have learned to fully leverage data to get a comprehensive view of their business and make long-term plans for their processes. However, it can be a long way from there to fueling minute-by-minute processes with quality data. Operational analytics allows your company to be at its most effective on a real-time basis. How does operational analytics (also called continuous analytics) offer an advantage to your company and how do you implement it?

How Cloudera Helps Realize and Accelerate Successful Data Product Strategies

In the first part of this series , I outlined the prerequisites for a modern Enterprise Data Platform to enable complex data product strategies that address the needs of multiple target segments and deliver strong profit margins as the data product portfolio expands in scope and complexity: With this article, I will dive into the specific capabilities of the Cloudera Data Platform (CDP) that has helped organizations to meet the aforementioned prerequisite capabilities and fulfill a successful data prod

The Not-So-Secret Sauce for Successful Cloud Migration

Over the last year, perhaps unsurprisingly, increasing numbers of companies have made the jump to the cloud. It’s become a necessary move for so many businesses. But, as I discussed with Joe DosSantos on the latest episode of Data Brilliant – the rewards are abundant, but the journey is not always straight forward.

What is eventual consistency and why should you care about it?

Distributed systems have unlocked high performance at a large scale and low latency. You can run your applications worldwide from the comfort of your Amazon Web Services (AWS) platform in California, but the user adding an item to their shopping cart in Japan will not notice any delay or system faults. However, distributed systems - and specifically distributed database systems - also malfunction.

What is the CAP theorem?

In the modern age, everything runs on the cloud. The majority of modern applications are written with cloud technologies - they use public cloud providers for DNS, distributed caching, and distributed data stores. Cloud solutions are so popular among engineers because of their many advantages: But distributed systems are not impervious to breaking. Foursquare’s example is testimony that even the great and mighty experience failure within distributed systems.

BigQuery Admin reference guide: API landscape

So far in this series, we’ve been focused on generic concepts and console-based workflows. However, when you’re working with huge amounts of data or surfacing information to lots of different stakeholders, leveraging BigQuery programmatically becomes essential. In today’s post, we’re going to take a tour of BigQuery’s API landscape - so you can better understand what each API does and what types of workflows you can automate with it.

IDC reveals 323% ROI for SAP customers using BigQuery

If the COVID-19 pandemic has taught us anything, it is that speed and intelligence are of the essence when it comes to making business decisions. Organizations must find ways of keeping ahead of competitors and disruptions by continually leveraging data to make smart decisions. The problem? Data may be everywhere, but it’s not always available in a form that businesses can use to generate analytics in real time.

How Influencing Events Impact the Accuracy of Business Monitoring

Businesses are flooded with constantly changing thresholds brought on by seasonality, special promotions and changes in consumer habits. Manual monitoring with static thresholds can’t account for events that do not occur in a regularly timed pattern. That’s why historical context of influencing events is critical in preventing false positives, wasted resources and disappointed customers.

The NetSuite integration guide

One of the most important things to consider is the cost of the platform coupled with the cost of the integration. Now that we have understood what NetSuite integration is, some of the drawbacks, and why you should consider an integration, we are going to delve into how to do the integration itself. The approach you are going to take to do the integration is determined by the technical expertise if any, the application to be connected with NetSuite, and your budget.

Accelerate Time to Insights With Lumada DataOps Suite

As enterprises seek to accelerate the process of getting insights from their data, they face numerous sources of friction. Data sprawl across silos, diverse formats, the explosion of data volumes, and the fact that data is spread across many data centers and clouds and processed by many disparate tools, all act to slow the progress.

How Digital 22 Saves 2 Hours on Reporting Each Month While Offering Clients 100 Percent Transparency with Databox

WIth Databox, Digital22 got a solution that enabled them to spend less time on reporting while continuing to offer clients transparency, along with the custom metrics they really care about.

The Ultimate Salesforce Developers Guide

Salesforce enables companies of all sizes to build amazing app experiences that drive stronger customer relationships. Heroku makes it easy to deliver engaging apps on the public cloud that integrate customer data. Heroku Connect is an easy way to keep your Salesforce data up-to-date with practically unlimited scaling, containers, and support for various application frameworks.

United Safety & Survivability Corporation Establishes a Unified Analytics Framework with Qlik Cloud

The manufacturing industry, like any other industry, is not immune to data challenges. Sourcing data, wrangling it and ensuring it’s being used in a governed, standardized way are not uncommon problems. Particularly in manufacturing, issues surface with inventory management, within the supply chain and with logistics.

The Citizen Integrator: Key to Business Agility

With the rapidly changing pace of innovative technology, companies must be able to pivot quickly or perish. The ability to adapt to change is critical to a company’s success. A key factor in the ability to pivot is access to real-time information to facilitate data-driven decisions. Traditionally, that data has existed across multiple systems with no simple method for bringing it all together meaningfully.

Automating Data Pipelines in CDP with CDE Managed Airflow Service

When we announced the GA of Cloudera Data Engineering back in September of last year, a key vision we had was to simplify the automation of data transformation pipelines at scale. By leveraging Spark on Kubernetes as the foundation along with a first class job management API many of our customers have been able to quickly deploy, monitor and manage the life cycle of their spark jobs with ease. In addition, we allowed users to automate their jobs based on a time-based schedule.

How to monetize BigQuery datasets using Apigee

Data harnessed through tools such as BigQuery allow organizations to deliver personalized experiences, make data-driven business decisions, and unlock new streams of revenue. In this video, we go through the challenges of putting data into action and show how APIs can help. Watch to learn how Apigee makes it simple and easy to package your business data as APIs!

Which Sources Drive The Highest Conversion Rates?

When it comes to conversion rates, channel sources should be at the center of your focus. After all, you want to know which of your marketing investments is driving the most conversions so you can double down or make adjustments to your strategy. In this episode of Data Snacks, we show you: How to track contact and customer conversions by source Other metrics that matter when it comes to conversions What you can do to increase your conversion rates by source

Dining with data: A Q&A with OpenTable's Senior Vice President of Data and Analytics Grant Parsamyan

For more than 20 years, OpenTable has connected foodies and novice diners with the restaurants they love. But how does its technology work on the back end? To make a long story short: data. Beyond the app and website, OpenTable provides restaurants with software that manages their floor plans, phone reservations, walk-ins, shift scheduling, turn times, and more.

Transforming Customer Data for Salesforce

CRM (customer relationship management) software is the lifeblood of any modern B2C company. By monitoring and storing all of your interactions with prospects and customers—from their first visit to your website to their most recent purchase—CRM software makes it dramatically easier to segment your customer base, identify hidden trends in the data, make smarter predictions, and forecasts, and much more.

Announcing the GA of Cloudera DataFlow for the Public Cloud

Are you ready to turbo-charge your data flows on the cloud for maximum speed and efficiency? We are excited to announce the general availability of Cloudera DataFlow for the Public Cloud (CDF-PC) – a brand new experience on the Cloudera Data Platform (CDP) to address some of the key operational and monitoring challenges of standard Apache NiFi clusters that are overloaded with high-performant flows.

Cloudera DataFlow for the Public Cloud: A technical deep dive

We just announced Cloudera DataFlow for the Public Cloud (CDF-PC), the first cloud-native runtime for Apache NiFi data flows. CDF-PC enables Apache NiFi users to run their existing data flows on a managed, auto-scaling platform with a streamlined way to deploy NiFi data flows and a central monitoring dashboard making it easier than ever before to operate NiFi data flows at scale in the public cloud.

Cloudera DataFlow for the Public Cloud

Cloudera DataFlow for the Public Cloud takes away the operational and monitoring challenges by providing cloud-native flow management capabilities powered by Apache NiFi. It is a purposely built framework to modernize the data flow user experience so that the NiFi developers and administrators can be prepared to easily handle sophisticated data flows in production.

Building an ETL Pipeline in Python

Thanks to its user-friendliness and popularity in the field of data science, Python is one of the best programming languages for ETL. Still, coding an ETL pipeline from scratch isn’t for the faint of heart — you’ll need to handle concerns such as database connections, parallelism, job scheduling, and logging yourself. The good news is that Python makes it easier to deal with these issues by offering dozens of ETL tools and packages.

What Is Homomorphic Encryption?

Data encryption is one of the smartest things any organization can do to protect the privacy and security of confidential and sensitive data. Using a unique encryption key, data is converted to an intermediate representation known as “ciphertext,” which usually appears as a jumbled mixture of letters and numbers to the human eye. This encrypted data will be meaningless to anyone without the corresponding decryption key—even malicious actors who breach an organization’s defenses.

Fresh Insights from High-Quality Data: How Migros is Delivering on the Full Potential of Business Intelligence

Migros is the largest retailer in Turkey, with more than 2500 outlets selling fresh produce and groceries to millions of people. To maintain high-quality operations, the company depends on fresh, accurate data. And to ensure high data quality, Migros depends on Talend. The sheer volume of data managed by Migros is astonishing. The company’s data warehouse currently holds more than 200 terabytes, and Migros is running more than 7,000 ETL (extract, transform, load) jobs every day.

Why Modernizing the First Mile of the Data Pipeline Can Accelerate all Analytics

Every enterprise is trying to collect and analyze data to get better insights into their business. Whether it is consuming log files, sensor metrics, and other unstructured data, most enterprises manage and deliver data to the data lake and leverage various applications like ETL tools, search engines, and databases for analysis. This whole architecture made a lot of sense when there was a consistent and predictable flow of data to process.

How to use Root Cause Analysis to Improve Engineering

Modern engineering has revolutionized almost every complex human endeavor. From lean manufacturing to globe-wide telecommunications; from software and IT bringing the world to our fingertips to medical devices discovering previously invisible diseases, there is no human endeavor that engineering has not changed for the better. But engineers don’t only build complex systems and tools that help the world run around. They’re also the first line of defense when things turn south.

SFDC Integrations

Do you need to integrate your Salesforce data into other systems? SFDC Integrations can offer your company a reliable, secure data infrastructure to transfer SFDC data into other systems. Data integration is a key component for any business that wants to get ahead and stay competitive in today's marketplace. That's why so many companies have used ETL tools like SFDC integrations software from Xplenty. Integrating with SFDC has never been easier than it is now.

The Foundations of a Modern Data-Driven Organisation: Change from Within (part 2 of 2)

In my previous blog post, I shared examples of how data provides the foundation for a modern organization to understand and exceed customers’ expectations. However, the important role data occupies extends beyond customer experience and revenue, as it becomes increasingly central in optimizing internal processes for the long-term growth of an organization.

How DataOps Shines a Light on the Growing Dark Data Problem

The arrival of more and more data in all segments of the enterprise started out as an embarrassment of riches, but quickly transformed into something close to a nightmare of dark data. However, a raft of new technologies and the processes embodied in DataOps are charting a path forward in which a much higher percentage of data becomes useful. The challenge most companies face is how to manage and get access to all the data flooding in from all directions.

Driving Data Governance and Data Products at ING Bank France

In this episode of Data+AI Battlescars, Sandeep Uttamchandani, Unravel Data’s CDO, speaks with Samir Boualla, CDO at ING Bank France, one of the largest banks in the world. They cover his battlescars in Driving Data Governance Across Business Teams and Building Data Products. At ING Bank France, Samir is the Chief Data Officer. He’s responsible for several teams that govern, develop, and manage data infrastructure and data assets to deliver value to the business.

What Data Is Behind This Metric?

We all know data is the new oil. Both data and oil are valuable resources and share a common quality; if unprocessed they cannot be used. Data and oil have to be broken down and built up again to create true value for the business. There is, however, one key difference. Whereas oil is tangible, data is not. This means that the flow of low-quality oil is traceable and will be noticed in the production process. But, what happens if there is a bad data flow in your organization?

Understanding BigQuery data governance

Want everyone in your organization to be able to easily find the data they need, while minimizing overall risk, and ensuring regulatory compliance? In this episode of BigQuery Spotlight, we’ll examine BigQuery data governance so you can ensure your data is secure. We’ll also go over Cloud Data Loss Prevention and an open-source framework for data quality validation.

Is Data-First AI the Next Big Thing?

We are roughly a decade removed from the beginnings of the modern machine learning (ML) platform, inspired largely by the growing ecosystem of open-source Python-based technologies for data scientists. It’s a good time for us to reflect back upon the progress that has been made, highlight the major problems enterprises have with existing ML platforms, and discuss what the next generation of platforms will be like.

How to Implement Change Data Capture in SQL Server

Every organization wants to stay on the cutting edge of technology, making smart and data-driven decisions. However, ensuring that company information and data integration remains up to date can be a very time-consuming process. That is where CDC can make all the difference. Change data capture or CDC allows for real-time data set changes, ensuring that company data is always up to date. Change data capture can transform the way companies make data-driven decisions.

Make your data more secure than ever with Talend Data Fabric

At Talend, we believe achieving healthier data is essential for business success. Safeguarding private data and staying in compliance with global regulations leads to healthier data by significantly decreasing risk – reducing the potential of having to pay dizzying fines or suffering a data breach that destroys customer relationships and trust.

Five Reasons Why Platforms Beat Point Solutions in Every Business Case

Once upon an IT time, everything was a “point product,” a specific application designed to do a single job inside a desktop PC, server, storage array, network, or mobile device. Point solutions are still used every day in many enterprise systems, but as IT continues to evolve, the platform approach beats point solutions in almost every use case. A few years ago, there were several choices of data deduplication apps for storage, and now, it’s a standard function in every system.

Object Tagging Is Now Available in Public Preview

Snowflake is happy to announce the availability of the Object Tagging feature in public preview today! This feature makes it easier for enterprises to know and control their data by applying business context, such as tags that identify data objects as sensitive, PII, or belonging to a cost center. Object Tagging broadens Snowflake’s native data governance capabilities by adding to existing governance capabilities such as Snowflake’s Dynamic Data Masking and Row Access Policies.

Top 7 Paraphrasing Tools to Write SEO Friendly Articles

An effective SEO strategy requires you to create unique and SEO-friendly articles and blog posts consistently. This is the only way for you to make your blog seem relevant in the eyes of the search engines. But creating unique content is not always easy. Every content writer has to face at least some degree of writer’s block at some point in their writing journey. If you are going through the same problem and you are feeling stuck, then using paraphrasing tools can be quite helpful for you.

How Real-Time Data Will Revolutionize Decision Making | Business of Data

Courtney Stanley sits down with Qlik CEO Mike Capone to discuss the implications of decision making with real-time data. COVID dramatically increased every organization’s need for real-time data. As access to more real-time data becomes the norm, decision making will define the winners. In this session, Mike Capone will discuss how leaders are reimagining the way decisions are being made across their organizations, and how collaboration around data is crucial to driving success.

Creating personalized meals with data: A Q&A with Daily Harvest Chief Algorithms Officer, Brad Klingenberg

It is becoming increasingly difficult to standardize taste. The myriad culinary preferences and gastric demands of the American population are reflected in the $997B valuation of the U.S. packaged food market in 2020. There has also been a push in recent years to augment trips to the grocery store with at-home meal kits and food delivery services, a trend further accelerated by the onset of quarantine restrictions.

Feature-bundling for the Save: Why a Data Point-based Invoice Makes Sense

Many a time, the choosing of a product analytics vendor can be quite an ordeal because there is no one-size-fits-all solution. The technical factors such as tech stack compatibility and integrations with the product have to be balanced with the financial health of the business. You need to find the sweet spot between where you need to go and how much money you actually have to get there.

Think you need a data lakehouse?

In our Data Lake vs Data Warehouse blog, we explored the differences between two of the leading data management solutions for enterprises over the last decade. We highlighted the key capabilities of data lakes and data warehouses with real examples of enterprises using both solutions to support data analytics use cases in their daily operations.

What Is NetSuite Software? What Is NetSuite Database?

Streamlining and optimizing its business workflows and processes is one of the most valuable things any organization can do behind the scenes. That’s where ERP (enterprise resource planning) software comes in. With use cases ranging from sales and finance to logistics and human resources, ERP platforms help integrate, standardize, and centralize all of your processes and data.

Keep your cloud close and your data closer

Everyone knows that more and more data is moving to the cloud. According to the latest research, 94% of all enterprises use cloud services and 48% of businesses store classified and important data in the cloud. While the cloud is ubiquitous, in practice it consists of data infrastructures in various locations around the world. The question of where the cloud data infrastructure storing your specific data is located is becoming increasingly important.

What's New in CDP Private Cloud Base 7.1.7?

With the release of CDP Private Cloud (PvC) Base 7.1.7, you can look forward to new features, enhanced security, and better platform performance to help your business drive faster insights and value. We understand that migrating your data platform to the latest version can be an intricate task, and at Cloudera we’ve worked hard to simplify this process for all our customers.

Generating and Viewing Lineage through Apache Ozone

As businesses look to scale-out storage, they need a storage layer that is performant, reliable and scalable. With Apache Ozone on the Cloudera Data Platform (CDP), they can implement a scale-out model and build out their next generation storage architecture without sacrificing security, governance and lineage. CDP integrates its existing Shared Data Experience (SDX) with Ozone for an easy transition, so you can begin utilizing object storage on-prem.

S&P Global Provides Instant Access to Curated Data

Snowflake connected with David Coluccio from S&P Global Market Intelligence at the Snowflake Data Cloud Tour to hear how the company is using the Snowflake Data Cloud to curate massive amounts of data and provide seamless access for its clients. S&P Global’s foundation is rooted in providing essential insights to make more-informed decisions.

Data Storytelling

One of the main challenges of analytics is making it accessible to more than just trained experts within an organization. Not everyone is data literate to the degree they need to be able to consume, understand, and action data on a dashboard. Dashboards on their own are very data rich, but many critical events and influences behind the numbers can't be captured, and actions taken that impacted the numbers are simply not reportable; as a result, the data only tells you half the story.

Why You Need a REST API

Imagine you were suddenly transported to a foreign city where you don’t speak the language—in fact, every person you encounter speaks a different language, and you aren’t even sure which one they are. That’s the situation faced by many developers and users today as they try to integrate different software and systems. One of the greatest challenges of modern computing is its complexity.

The CIO's Future Vision for the Digital Core - Adopting Application-First Infrastructure

Amid the rapid pace of change felt by organizations this year, it’s no surprise that digital transformation projects have been high on the agenda for CIOs across the globe. To achieve both their organizational and transformation goals and become digitally agile as a result, CIOs are often tasked with creating the conditions needed to enable an intelligent and flexible digital core.

New Snowflake Features Released in June and July 2021

Building on the announcements made at this year’s Summit, Snowflake has released a number of new enhancements, especially in the areas of data programmability, global governance, and data sharing. Read on to learn more. For additional details and to see some of these new capabilities in action, be sure to check out the on-demand sessions from Summit.

Data management is ALL THE RAGE!

Everyone wants to manage their data, and if it’s a feature store, even better! But for optimal data management, we must first discuss lightweight zero upfront setup costs and maximizing utility with ClearML-data. ClearML-data mimics the light weightiness of git for data (who doesn’t know git?) and gives it a spin. It is an open-source dataset management tool which is extremely efficient and conveys how we view DataOps and its distinction from git-like solutions, including.

Interview with Cybersecurity Specialist Mark Kerzner

For the newest instalment in our series of interviews asking leading technology specialists about their achievements in their field, we’ve welcomed Mark Kerzner, software developer and thought leader in cybersecurity training who is also the VP at training solutions company, Elephant Scale. His company has taught tens of thousands of students at dozens of leading companies. Elephant Scale started by publishing a book called ‘Hadoop Illuminated‘.

Complete Guide to NetSuite Development

NetSuite is a powerful, real-time, cloud-based ERP (enterprise resource planning) software. And NetSuite development is knowing how to use NetSuite efficiently. When NetSuite is used to its full capacity, it is a powerful tool for highlighting strengths and exposing weaknesses. It is capable of providing detailed reports in real-time for every department of your company.

Why Data Governance is the Future

Businesses today are powered by data. This data needs to be high quality and manageable but also compliant with rules and regulations. In order to ensure data is manageable and secure, data governance protocols provide better control and organization. The process of data governance refers to the effective management of technology, processes, and even people within a company or organization. Read on to learn why it's the future of business.

The Foundations of a Modern Data-Driven Organisation: Gaining a Clear View of the Customer

Today’s organizations face rising customer expectations in a fragmented marketplace amidst stiff competition. This landscape is one that presents opportunities for a modern data-driven organization to thrive. At the nucleus of such an organization is the practice of accelerating time to insights, using data to make better business decisions at all levels and roles.

Spark Troubleshooting, Part 1 - Ten Challenges

“The most difficult thing is finding out why your job is failing, which parameters to change. Most of the time, it’s OOM errors…” Jagat Singh, Quora Spark has become one of the most important tools for processing data – especially non-relational data – and deriving value from it. And Spark serves as a platform for the creation and delivery of analytics, AI, and machine learning applications, among others.

How to set up advertising analytics in 8 easy steps

The trouble with marketing initiatives is that it is almost impossible to tell how they impacted the business’s bottom line. As the marketing pioneer John Wanamaker said: A person scrolling through Twitter on their mobile app might have seen your ad, loved your brand, and then logged into their desktop to purchase your product. The gap between needs generated by marketing spans across marketing channels and time.

Why dashboards don't deliver on promised business value

Modern data and analytics leaders know that every business user is different. No two marketers or finance managers will use data in exactly the same way because no two share the same contextual view or understanding of the business. Their challenges are as nuanced as they are complex. And they need insights tailored to their specific needs if they are to be successful at solving business problems with data. Unfortunately, traditional BI tools treat everyone like carbon copies.

BigQuery Admin reference guide: Query optimization

Last week in the BigQuery reference guide, we walked through query execution and how to leverage the query plan. This week, we’re going a bit deeper - covering more advanced queries and tactical optimization techniques. Here, we’ll walk through some query concepts and describe techniques for optimizing related SQL.

All That Hype: Iguazio Listed in 5 Gartner Hype Cycles for 2021

We are proud to announce that Iguazio has been named a sample vendor in five 2021 Gartner Hype Cycles, including the Hype Cycle for Data Science and Machine Learning, the Hype Cycle for Artificial intelligence, Analytics and Business Intelligence, Infrastructure Strategies and Hybrid Infrastructure Services, alongside industry leaders such as Google, IBM and Microsoft (who are also close partners of ours).

Choosing Your Upgrade or Migration Path to Cloudera Data Platform

In our previous blog, we talked about the four paths to Cloudera Data Platform. If you haven’t read that yet, we invite you to take a moment and run through the scenarios in that blog. The four strategies will be relevant throughout the rest of this discussion. Today, we’ll discuss an example of how you might make this decision for a cluster using a “round of elimination” process based on our decision workflow.

Four Frameworks for Optimizing Cloud Strategy and Deployment

“40% of all enterprise workloads will be deployed in CIPS [cloud infrastructure and platform services] by 2023, up from only 20% in 2020.”.As the cloud permeates every aspect of business, decision-makers must make critical choices regarding infrastructure at every turn. Their answers will ultimately determine if every part of an organization is empowered to move forward in a cohesive way to reach business outcomes.

Run your jobs faster with Keboola's new feature: Dynamic Backend

Data transformations are the backbone of smooth-running data operations. Transformations are used in data replication between databases, data migration from cloud to on-premise, and data cleaning (aggregations, outlier removal, deduplication …) aka all the good stuff that goes into extracting insights from data. But as any data professional can attest, transformation can also be a painful bottleneck. Think scripts that run for an entire day and finish just before the next scheduled job.

Strategies for optimizing your BigQuery queries.

Did you know that optimizing SQL queries can be cost efficient? In this episode of BigQuery Spotlight, we speak to some strategies for optimizing your BigQuery queries. We’ll walk through what happens behind the scenes for more complex queries, and show you specific tactics you can use to optimize your SQL. Watch to learn some great techniques on how to make your queries more performant!

ThoughtSpot for ServiceNow Analytics

With ThoughtSpot, you can deliver a modern, familiar search-driven analytics experience on all your ServiceNow data. Drill anywhere, and get granular insights instantly. ThoughtSpot for ServiceNow Analytics is compatible with the Snowflake Data Cloud and other cloud data warehouse platforms. It leverages the standard ServiceNow data model while remaining highly flexible and customizable. Stop living in canned reports.

Can you achieve self-service analytics amid low data literacy?

Customers wanting to drive self-service analytics as part of creating a data-driven organization will often ask, “Can we achieve self service analytics, when our work force has low data literacy?” Or they might say they are not ready for self-service analytics, incorrectly thinking they need first to improve data literacy. But the two are inextricably linked. I liken it to teaching a child to read without giving them any books on which to build their skills.

What Is Needed for an SFTP Connection?

Along with its security benefits, an SFTP connection is the quickest and most efficient way to transfer files between two local or remote systems. When transferring files or data from one server to another, using an SFTP connection is one of the best options to ensure this data remains untampered. Utilizing an SFTP connection is especially beneficial for commonly used data integration systems like ETL and Reverse ETL. So what makes SFTP so great, and what is even needed for an SFTP connection?

Pillars of Knowledge, Best Practices for Data Governance

With hackers now working overtime to expose business data or implant ransomware processes, data security is largely IT managers’ top priority. And if data security tops IT concerns, data governance should be their second priority. Not only is it critical to protect data, but data governance is also the foundation for data-driven businesses and maximizing value from data analytics. Requirements, however, have changed significantly in recent years.

Accelerating Insight and Uptime: Predictive Maintenance

Historically, maintenance has been driven by a preventative schedule. Today, preventative maintenance, where actions are performed regardless of actual condition, is giving way to Predictive, or Condition-Based, maintenance, where actions are based on actual, real-time insights into operating conditions. While both are far superior to traditional Corrective maintenance (action only after a piece of equipment fails), Predictive is by far the most effective.

Data Lakehouses: Have You Built Yours?

In traditional data warehouses, specific types of data are stored using a predefined database structure. Due to this “schema on write” approach, prior to all data sources being consolidated into one warehouse, there needs to be a significant transformation effort. From there, data lakes emerge!

Unlock Marketing Analytics Power with the Snowflake Data Cloud

Over the past two decades, marketers have faced an uphill battle in trying to turn marketing into a fully data-driven discipline. Our challenge is not that we don’t have enough data but that data has been difficult to access and use. Marketing, sales, and product data is scattered across different systems, and we can’t get a complete picture of what is going on in our businesses.

How to Operationalize Your Data Warehouse

More and more businesses are opting to use data lakes or, more likely, data warehouses these days, which allow them to store, analyze, and utilize their data from one convenient destination. But beyond creating reports and in-depth analytics, how can you truly operationalize your data warehouse into an even more vital part of your business's digital stack? Reverse ETL could provide some opportunities to do just that.

The San Francisco Municipal Transportation Agency gets riders where they're going, thanks to Talend, Disy, and geospatial data

Every day, hundreds of thousands of residents and commuters in San Francisco, California, use the public transportation services of the San Francisco Municipal Transportation Agency (SFMTA). In addition to the city’s buses, subway system, and famous cable cars, the SFMTA manages comprehensive services including bicycle and e-scooter rentals, as well as permits for road closures.

Minimizing Supply Chain Disruptions with Advanced Analytics

January 2020 is a distant memory, but for most, the early days of the pandemic was a time that will be ingrained in memories for decades, if not generations. Over the last 18 months, supply chain issues have dominated our nightly news, social feeds and family conversations at the dinner table. Some but not all have stemmed from the pandemic.

Creating a COD Database

Cloudera Operational Database (COD) is an operational database as a service that brings ease of use and flexibility. Let’s see how easy it is to create a new database! Once you have created your environment, navigate to the COD Web interface. It takes you to the Databases page. Click Create Database, select the applicable environment, provide a name for your database and click Create Database. The creation of your new database is in progress. Once its status becomes Available it is ready to be used.

Preventing Shopping Cart Abandonment with Anomaly Detection

The global pandemic has changed B2C markets in many ways. In the U.S. market alone in 2020, consumers spent more than $860 billion with online retailers, driving up sales by 44% over the previous year.eCommerce sales are likely to remain high long after the pandemic subsides, as people have grown accustomed to the convenience of ordering online and having their goods – even groceries – delivered to their door.

Data Science + Cybersecurity

Cybersecurity is at a critical turning point, especially in the wake of the global lockdown that caused companies worldwide to conduct more online business than ever before. No organization is immune to data breaches, as hackers are using more sophisticated techniques — such as artificial intelligence — to perform these cyberattacks.

Replace and Boost your Apache Storm Topologies with Apache NiFi Flows

Recently, I worked with a large fortune 500 customer on their migration from Apache Storm to Apache NiFi. If you’re asking yourself, “Isn’t Storm for complex event processing and NiFi for simple event processing?”, you’re correct. A few customers chose a complex event engine like Apache Storm for their simple event processing, even when Apache NiFi is the more practical choice, cutting drastically down on SDLC (software development lifecycle) time.

Major Brands Democratize Data with Snowpark Accelerated

Earlier this year at Snowflake Summit 2021 , we announced Snowpark Accelerated , a new program for partners who integrate with Snowpark. It provides them with access to technical experts and additional exposure to Snowflake customers. It’s been incredibly exciting to watch what our partners have been building with the help of our new developer experience, which brings deeply integrated, DataFrame-style programming to the languages developers like to use.

Qlik and Fortune Launch "The Pandemic Effect on the Fortune Global 500" Data Analytics Site

The story of the last year+ is one of disruption and change across every aspect of our lives. As we all navigated a ‘new norm,’ businesses naturally had to pivot as well, with some sectors finding new opportunities while others scrambled to reimagine their entire go-to-market strategies.