Systems | Development | Analytics | API | Testing

October 2022

The Importance of Business and IT Alignment to Build Successful Data Pipelines

Okay, I’ll admit, I am pretty biased when it comes to how people within organizations work together to ensure successful data projects. I have been involved in too many projects that failed to take into account the importance of collaboration across departments and functions. They were stuck on data and only the data.

Leverage or liability - transforming data chaos into data excellence with Waterstone Mortgage

Data is a double-edged sword. It has tremendous potential, but if mismanaged or misused, can wreak havoc on the operations, costs, revenue, and reputation of an organization. In highly regulated industries, data can be an even bigger liability. Learn how Julia Fryk, Data Architect and Engineer at Waterstone Mortgage, championed superior data management, taking the company from data chaos to data excellence. The transformation from data that could not be trusted to high-quality assets that data consumers have confidence in has made a remarkable difference and extracted more value from data.

Snowflake's Retail Data Cloud

The retail and consumer packaged goods (CPG) industry is experiencing a global shift in how consumers, retailers, and brands interact. Long-term trends such as digitization and e-commerce, higher customer expectations, and supply chain transformation are accelerating. Businesses with a strong data foundation are rapidly adapting, and launching new digital capabilities and services.

The Power of Unlocking and Unifying Data

Every day, humans produce 2.5 quintillion bytes of data. Just to put that number into perspective, there are 18 zeros in a quintillion. Unifying data can allow you to take advantage of it and benefit your business. The vast majority of organizations have collected nearly endless amounts of data. Yet, these same organizations are starving for information that can be used to make more informed decisions. Information may be stored in databases that don’t talk to each other.

Welcome to the decade of data

To quote Hemingway: change happens gradually, then suddenly. We see this in the world around us. Think back to 2019. There’s no denying how much the pandemic reshaped our professional and personal lives, with technology driving this change at massive scale. Yet these changes, despite their ubiquity, are really the culmination of trends like cloud and automation that were well underway.

Protect Your Assets and Your Reputation in the Cloud

A recent headline in Wired magazine read “Uber Hack’s Devastation Is Just Starting to Reveal Itself.” There is no corporation that wants that headline and the reputational damage and financial loss it may cause. In the case of Uber it was a relatively simple attack using an approach called Multi Factor Authentication (MFA) fatigue. This is when an attacker takes advantage of authentication systems that require account owners to approve a log in.

6 Best Data Extraction Tools for 2022 (Pros, Cons, Best for)

A data extraction tool can help you speed up one of the most error-prone engineering processes: collecting raw data from different sources. In this article, we are going to analyze the following 6 market leaders in data extraction: Before we dive in, let’s look at all the problems you can avoid by implementing a data extraction tool.

Data Science Maturity and Understanding Data Architecture/Warehousing

This is a guest post for Integrate.io written by Bill Inmon, an American computer scientist recognized as the "father of the data warehouse." Inmon wrote the first book and magazine column about data warehousing, held the first conference about this topic, and was the first person to teach data warehousing classes. Data science is immature. This statement is not pejorative; it is simply a statement of historical fact. As such, it is not arguable.

Transformational triumph: eBay's data fabric modernization

In today’s economy, every business is eager to accelerate beyond and above the competition. Critical to this effort is the data your business runs successfully on, its backbone, and a good team behind the magic. Join eBay, the world's leading online marketplace, to hear how they accelerated their operational data with impressive results. Not only did the company experience no downtime, but they reduced data volumes by a whopping 50% — resulting in less friction, smoother operations, and a team built to enable success and scale.

5 Magic Fixes for the Most Common CSV File reader Problems | ThoughtSpot

I’ve encountered a thousand different problems with spreadsheets, data importing, and flat files over the last 20 years. While there are new tools that help make the most of this data, it's not always simple. I’ve distilled this list down to the most common issues among all the databases I’ve worked with. I’m giving you my favorite magic fixes here. (Well, okay, they aren’t really “magic” but some of them took me a long time to figure out.)

How to use Google Sheets for data analysis with ThoughtSpot

Businesses have been scaling rapidly in the cloud, driven by the pandemic and lured by the promise of agility and flexibility. But here’s a dirty little secret anyone who works in data knows. Despite the value of the cloud, tons of data hasn’t made it there. So, where is it? Spreadsheets. Still the stalwart workhorse, hero, and bane of the business world. We all love how Google revolutionized this world by bringing spreadsheets to the cloud.

Using Apache Solr REST API in CDP Public Cloud

The Apache Solr cluster is available in CDP Public Cloud, using the “Data exploration and analytics” data hub template. In this article we will investigate how to connect to the Solr REST API running in the Public Cloud, and highlight the performance impact of session cookie configurations when Apache Knox Gateway is used to proxy the traffic to Solr servers. Information in this blog post can be useful for engineers developing Apache Solr client applications.

7 Best Data Transformation Tools in 2022 (Pros, Cons, Best for)

The data transformation process reshapes data from a raw mess to a business goldmine by: Using a data transformation tool streamlines the entire process and saves you time and energy on these valuable but tedious tasks. In this article, we’ll explore the 7 best tools on the market for data transformations. Each tool will be evaluated with Pros, Cons, and a clear decision for who this tool is best for. Are you in a hurry?

From Data Engineering To Data Science To Producing Results For Today's Top Brands.

The simplicity of Snowflake—especially when it comes to scaling—is described by Nauman Hafiz, CTO of Constellation as a superpower. In this interview, Nauman sits down with Itamar Ben Hemo, CEO of Rivery, to discuss the impact of the data cloud on their business. They discuss moving from data engineering to data science, working with today's top brands, and freeing businesses to get the most value out of their data.

Future of Data Meetup: Enrich Your Data Inline with Apache NiFi

In this meetup, we’ll look at the different options for enriching your data using Apache NiFi. When and why would we prefer using NiFi for enrichment over a potentially more holistic solution, like Flink or Spark? What are the limitations? And how can we get the best of both worlds, performing data enrichment with NiFi when it makes sense and using our CEP engine when that makes the most sense? Join John Kuchmek and Mark Payne to find out!

What's new in ThoughtSpot Analytics Cloud 8.8.0.cl

In this release, we’re delighted to launch in beta ThoughtSpot metrics integration with dbt — the quickest way to move from data models to business insights. We’ve also added support for dbt models on Amazon Redshift and Google BigQuery, as well as new live query connectors for Presto and Trino. Check out the highlights in this video.

Demo: Unravel Data - Keep Cloud Data Budgets on Track (Automatically)

Data teams need to be able to set cloud data budgets at a specific scope - and know if your various teams or departments are tracking to those budgets. But today, most data teams only know that the budget was overrun after it’s too late. With Unravel, establishing and tracking budgets to prevent overruns is easy.

How to solve four SQL data modeling issues

SQL is the universal language of data modeling. While it is not what everyone uses, it is what most analytics engineers use. SQL is found all across the most popular modern data stack tools; ThoughtSpot’s SearchIQ query engine translates natural language query into complex SQL commands on the fly, dbt built the entire premise of their tool around SQL and Jinja. And within your Snowflake data platform, it’s used to query all your tables.

Accelerating Projects in Machine Learning with Applied ML Prototypes

It’s no secret that advancements like AI and machine learning (ML) can have a major impact on business operations. In Cloudera’s recent report Limitless: The Positive Power of AI, we found that 87% of business decision makers are achieving success through existing ML programs. Among the top benefits of ML, 59% of decision makers cite time savings, 54% cite cost savings, and 42% believe ML enables employees to focus on innovation as opposed to manual tasks.

Does Your Company Need a Data Observability Framework?

You have been putting in the work, and your company has been growing manifold, Your client base is growing more than ever, and the projects are pouring in. So what comes next? it is now time to focus on the data that you are generating. When programming an application, DevOps engineers keep track of many things, such as bugs, fixes, and the overall application performance. This ensures that the application operates with minimum downtime and that any future errors can be predicted.

It is Time to Rebundle the Modern Data Stack

When you look closer at the Modern Data Stack (MDS) you need to brace yourself. The number of tools companies use for their databases, user administration, data extraction, data integration, security, machine learning, and a myriad of other use cases has grown astronomically. Matt Turck, VC at FirstMark, composes a yearly infographic of the hot tools in the datascape: And this is just a shortlist of both the most popular and fastest-growing tools.

Power Up Your Data Operations with Templates & SpotApps

Gaining insights from your data can be time-consuming. Or as simple as a few clicks. Depending on if you want to do everything by yourself, or if you let us help. For example, next time one of your stakeholders asks: “Can you deliver the dashboards by the end of next week?” You can say yes with confidence, and in this blog we are going to show you why and how it is done.

Episode 1 | Data Pipelines | Data Journey | 7 Challenges of Big Data Analytics

Where does a data journey begin? With the data pipeline. Thomas Hazel begins this 7-part series on solving the biggest data analytics challenges today by starting with the building, scaling and maintenance of data pipelines. Avoid building the Taj Mahal … every 6 months … and join the data journey with us to learn about the unique approach ChaosSearch has taken.

Extracting Maximum Value From All Of Your Company's Data

How can you activate all your company's data to deliver the maximum marketing value? In this interview, Hightouch's Founder and Co-CEO Kashish Gupta and Warner Music Group's Customer Intelligence Platform Director Tom Dinneny weigh in on this question and more. They discuss the value of separating compute and storage, current industry trends, and their partnership with Snowflake.

How to Broadcast a Report in Yellowfin

In this video you will learn the basics of using Broadcast, including the differences between Broadcast, Smart Task, Personal Broadcast, and FTP Schedule. You will learn how to set up a Continuous Schedule broadcast with the correct time and frequency settings, or how to configure a broadcast triggered by an Alert, complete with delivery rules. You will also learn the differences between sending the broadcast in various formats, including a link that takes the viewer back to your live and updated report.

How to Easily Deploy Your Hugging Face Models to Production - MLOps Live #20- With Hugging Face

Watch Julien Simon (Hugging Face), Noah Gift (MLOps Expert) and Aaron Haviv (Iguazio) discuss how you can deploy models into real business environments, serve them continuously at scale, manage their lifecycle in production, and much more in this on-demand webinar!

10 Keys to a Secure Cloud Data Lakehouse

Enabling data and analytics in the cloud allows you to have infinite scale and unlimited possibilities to gain faster insights and make better decisions with data. The data lakehouse is gaining in popularity because it enables a single platform for all your enterprise data with the flexibility to run any analytic and machine learning (ML) use case. Cloud data lakehouses provide significant scaling, agility, and cost advantages compared to cloud data lakes and cloud data warehouses.

Supercharge Yellowfin with 100+ Interactive Charts from FusionCharts

Yellowfin converts your data into visual Reports and then helps you to create Dashboards and Stories to present these reports to your customers. The Reports are usually a combination of data fields and charts. Charts are a significant component of any report. They help customers understand the data quickly and give your report or presentation clarity and authority.

Successful Data Projects Start with Understanding Business Problems

Too many organizations start their data projects at the wrong end of the pipeline. Although challenges with data quality, integrity, access, and visibility are all important issues to address, a project should never start with the data. The reality is that all investments in data are meaningless if no business value can be gained. And this requires starting at the other end of the spectrum - evaluating the business problem to identify how data can help drive change within the organization.

Reskilling Against the Risk of Automation

Demand for both entry-level and highly skilled tech talent is at an all-time high, and companies across industries and geographies are struggling to find qualified employees. And, with 1.1 billion jobs liable to be radically transformed by technology in the next decade, a “reskilling revolution” is reaching a critical mass.

Consolidate Your Data on AlloyDB With Integrate.io in Minutes

The AlloyDB connector from Integrate.io empowers organizations to rapidly consolidate all of their data into AlloyDB—a high-powered, Google Cloud database that is 100% compatible with open-source PostgreSQL. By serving as an easy-to-set-up, high-speed data pipeline to AlloyDB, Integrate.io helps businesses modernize their legacy proprietary databases by migrating them to open-source, PostgreSQL-compatible systems.

Taming the Tech Stack: Leverage Your Existing Tech Stack Securely With an Analytics Layer

Analytics and data visualizations have the power to elevate a software product, making it a powerful tool that helps each user fulfill their mission more effectively. To stand apart from the competition, today’s software applications need to deliver a lot more than just transaction processing. They must also provide insights that help drive better decisions, alert users to matters that require their attention, and deliver up-to-the-minute information about the things that matter most.

Unifying Data, Optimizing Campaigns In Real Time, & Taking Marketers To New Heights.

Data has become the center of nearly all our marketing decision-making. In this interview, Snowflake CMO Denise Persson sits down with Data Cloud Now host Ryan Green to discuss the impact of the Modern Marketing Data Stack. They discuss how this report, which examined more than 6000 customers' use of technologies and applications, has led to invaluable findings—and how it can help marketers soar to heights previously thought unimaginable.

The Case for Embedded Analytics: How to Invest and Implement

In the past, most software applications were all about “data processing.” In the parlance of old-school management information systems, that meant an almost exclusive focus on keeping accurate transactional records alongside any master data necessary to complete that mission. Transaction processing is important, of course, but in today’s world, applications are expected to deliver a lot more than that.

Transformation for Analysis of Unintegrated Data-A Software Tautology

What pray tell is a tautology? A tautology is something that, under all conditions, is true. It is kind of like gravity. You can throw a ball in the air and, for a few seconds, it seems to be suspended. But soon gravity takes hold, and the ball falls back to earth.

How Keboola benefits from using Keboola Connection - There's no party like 3rd party

Oh boy, it’s been more than a year again since my last HKBFUKC article (yep, that’s a new standard abbreviation). This is the fourth article in the series. You can always check out the first, second and third on our blog. Again, loads of stuff has happened since the last time. I made the top 16 at the 4 Seasons MTG Legacy tournament in Bologna, I visited Lego House in Billund and I got married!

Cybersecurity: A Big Data Problem

Information technology has been at the heart of governments around the world, enabling them to deliver vital citizen services, such as healthcare, transportation, employment, and national security. All of these functions rest on technology and share a valuable commodity: data. Data is produced and consumed in ever-increasing amounts and therefore must be protected. After all, we believe everything that we see on our computer screens to be true, don’t we?

Planetly: Scaling companies' carbon management with data

Planetly uses technology to simplify carbon management for companies at scale. Their data-driven software solution helps companies reach net-zero emission targets in four steps: The entire carbon management life cycle is powered and fueled by data. We talked to Cari Davidson, VP of Engineering and Patricia Montag, the Engineering Lead Analytics, to better understand what role Keboola (and data as a whole) play in the company’s operations and what that means for the engineering team.

Build data apps with Streamlit + ThoughtSpot APIs

I’ve been following the Streamlit framework for a while, since Snowflake announced that they would acquire it to enable data engineers to quick spin up data apps. I decided to play around with it and see how we could leverage the speed of creating an app along with the benefits that ThoughtSpot provides, especially around the ability to use NLP for search terms. Streamlit is built in Python.

Build limitless workloads on BigQuery: New features beyond SQL

Our mission at Google Cloud is to help our customers fuel data driven transformations. As a step towards this, BigQuery is removing its limit as a SQL-only interface and providing new developer extensions for workloads that require programming beyond SQL. These flexible programming extensions are all offered without the limitations of running virtual servers.

Unlocking the value of unstructured data at scale using BigQuery ML and object tables

Most commonly, data teams have worked with structured data. Unstructured data, which includes images, documents, and videos, will account for up to 80 percent of data by 2025. However, organizations currently use only a small percentage of this data to derive useful insights. One of main ways to extract value from unstructured data is by applying ML to the data.

Integrating Observability into Your Security Data Lake Workflows

Today’s enterprise networks are complex. Potential attackers have a wide variety of access points, particularly in cloud-based or multi-cloud environments. Modern threat hunters have the challenge of wading through vast amounts of data in an effort to separate the signal from the noise. That’s where a security data lake can come into play.

Coherent Automates The Capture Of Spreadsheet Logic

Coherent Spark solves a common problem plaguing millions of Excel spreadsheet users: How to easily capture spreadsheet logic and bring it into the cloud to integrate it with modern systems? In this episode of “Powered by Snowflake,” Coherent Spark CTO Peter Roschke explains how the Spark platform takes advantage of the processing power of Snowflake to tackle that challenge. With Spark, the logic of Excel spreadsheets of any size or complexity, including enormous legacy spreadsheets containing millions of formulas, can be converted quickly into a cloud-compatible format capable of driving applications for all types of use cases.

Data Journey | 7 Challenges of Big Data Analytics | Episode 0

How do you truly solve the challenges of today’s ever growing big data analytic needs? Join us on a data journey with ChaosSearch's CTO & Founder, Thomas Hazel as he gets technical on how to solve 7 of the biggest data challenges teams are facing - from source to insights.

Where data strategies go wrong: Tales from the front lines

Anthony Palacio has built and executed successful data strategies for a diverse range of companies—from ExxonMobil and startups to his role as Talend’s Senior Manager of Strategic Planning and Analytics. In this video Anthony and other Talend data experts discuss how companies often get their data strategies wrong—and share insights on how your business can cut through the noise, focus on what matters, and pivot to a successful data strategy that gets results.

Public or On-Prem? Telco giants are optimizing the network with the Hybrid Cloud

The telecommunications industry continues to develop hybrid data architectures to support data workload virtualization and cloud migration. However, while the promise of the cloud remains essential—not just for data workloads but also for network virtualisation and B2B offerings—the sheer volume and scale of data in the industry require careful management of the “journey to the cloud.”

Using Kafka Connect Securely in the Cloudera Data Platform

In this post I will demonstrate how Kafka Connect is integrated in the Cloudera Data Platform (CDP), allowing users to manage and monitor their connectors in Streams Messaging Manager while also touching on security features such as role-based access control and sensitive information handling. If you are a developer moving data in or out of Kafka, an administrator, or a security expert this post is for you. But before I introduce the nitty-gritty first let’s start with the basics.

Diving Deep Into a Data Lake

A Data Lake is used to refer to a massive amount of data stored in a structured, unstructured, semi-structured, or raw form. The purpose is just to consolidate data into one destination and make it usable for data science and analytics algorithms. This data is used for observational, computational, and scientific purposes. The database has made it easier for AI models to gather data from various resources and implement a flawless system that can make informed decisions.

Fortune favors the prepared, Talend CEO keynote at Talend Connect '22

Pandemics, wars, the Great Resignation...doing business has become more complicated than ever. The No. 1 thing companies say they need to survive is to become data driven. It’s a great goal, but what does that really look like in action? Our opening keynote answers that question and more by outlining the three things you MUST have for your business to become truly, sustainably data driven. You’ll also get insights about how companies are making the change today from analyst and "CIO In the Know" host Tim Crawford, as well as Talend customers like eBay and financial services provider Harmoney.

Ep 60: The Modern Milkman's CSO, John Hughes on Using Data to Save Our Oceans from Plastic

Each year, around 8 million tons of plastic waste enter the ocean, killing wildlife and damaging the enviroment for future generations. Much of this pollution originates from the wasteful packaging used for everyday grocery products. But a greener grocery alternative is making waves from across the pond. Joining us today is John Hughes, Chief Strategy Officer at The Modern Milkman, a UK-based grocery delivery company that brings locally sourced goods directly to customers without any single-use plastics.

Breaking down marketing data silos with BigQuery

Welcome back to the Marketing Analytics Series! In this video, Kelci will demonstrate how to ingest and query marketing data with BigQuery. From an augmented understanding of your data to leveraging BigQuery’s public datasets, discover how you too can break down marketing data silos with BigQuery.

White Label Analytics: What It Is, Why It Matters & 5 Key Benefits

A key consideration when buying an embedded analytics solution is not only whether it supports embedding of charts and reports, but that it can integrate analytics in a way that is indistinguishable from the experience of your application. Learn what white-label BI is.

Cloudera Uses CDP to Reduce IT Cloud Spend by $12 Million

Like all of our customers, Cloudera depends on the Cloudera Data Platform (CDP) to manage our day-to-day analytics and operational insights. Many aspects of our business live within this modern data architecture, providing all Clouderans the ability to ask, and answer, important questions for the business. Clouderans continuously push for improvements in the system, with the goal of driving up confidence in the data.

Hevo vs Fivetran vs Integrate.io: An ETL Tool Comparison

In the competitive market of ETL solutions, platforms like Hevo, Fivetran and Integrate.io are amongst the top contenders. While they all are ETL/ELT platforms, each of them has their own unique set of features to offer. The best ETL tool for your business is the one that is best aligned to your requirements. So how do you decide which tool meets your business needs?

The Denver Broncos score a better fan experience with Fivetran

The Denver Broncos are tapping into the #moderndatastack to score a touchdown in fan engagement. With the help of Fivetran, the team efficiently centralizes massive data from CRM, Qualtrics and many other sources into Snowflake — unlocking a 360-degree view of their fans and a winning fan experience.

6 most useful data visualization principles for analysts

The difference between consuming data and actioning it often comes down to one thing: effective data visualization. Case in point? The John Snow’s famous cholera map. In 1854, John Snow (no, not that one) mapped cholera cases during an outbreak in London. Snow’s simple map uncovered a brand new pattern in the data—the cases all clustered around a shared water pump.

Data Lakes: The Achilles Heel of the Big Data Movement

Big Data started as a replacement for data warehouses. The Big Data vendors are loath to mention this fact today. But if you were around in the early days of Big Data, one of the central topics discussed was — if you have Big Data do you need a data warehouse? From a marketing standpoint, Big Data was sold as a replacement for a data warehouse. With Big Data, you were free from all that messy stuff that data warehouse architects were doing.

Using Time Series Charts to Explore API Usage

One major reason for digging into API and product analytics is to be able to easily identify trends in the data. Of course, trends can be very tough to see when looking at something like raw API call logs but can be much easier when looking at a chart aimed at easily allowing you to visualize trends. Enter the Time Series chart.

3 types of data models and when to use them

Data modeling is the process of organizing your data into a structure, to make it more accessible and useful. Essentially, you’re deciding how the data will move in and out of your database, and mapping the data so that it remains clean and consistent. ThoughtSpot can take advantage of many kinds of data models, as well as modeling languages. Since you know your data best, it’s usually a good idea to spend some time customizing the modeling settings.

Automated Financial Storytelling at Your Fingertips: Here's How

Every financial professional understands that the numbers matter a great deal when it comes to reporting financial results. Accuracy, consistency, and timeliness are important. Those same professionals also know that there’s substantive meaning behind those numbers and that it’s important to tell the stories that lend additional depth and context to the raw financial statements.

Choosing The Best Approach to Data Mesh and Data Warehousing

Data mesh is being talked about a lot to describe the way data is managed across the organization. But what does it really mean for your organization’s data management strategy and how can its framework support your business needs and drive data pipeline success? On a high level, data mesh is about connecting and enabling data management across distributed systems.

Neustar Sets A New Bar For Accuracy In The Field Of Identity Resolutions

In this episode of “Powered by Snowflake” host Daniel Myers queries the mind of Neustar’s Head of Product and Customer Intelligence, Ryan Engle. Neustar is an Identity Resolutions Platform, responsible for powering more than 90% of caller ID in the United States. This conversation covers fascinating topics such as the challenges of sharing customer data in a highly regulated industry, how the Native Application Framework allows Neustar to work directly within their clients’ environments, and how Snowflake “auto-magically” keeps data fresh and up to date.

Universal Data Distribution with Cloudera DataFlow for the Public Cloud

The speed at which you move data throughout your organization can be your next competitive advantage. Cloudera DataFlow greatly simplifies your data flow infrastructure facilitating complex data collection and movement through a unified process that seamlessly transfers data throughout your organization. Even as you scale. With Cloudera DataFlow for Public Cloud you can collect and move any data (structured, unstructured, and semi-structured) from any source to any destination with any frequency (real-time streaming, batch, and micro-batch).

Qlik Expands Google BigQuery Solutions, Adding Mainframe to SAP Business Data for Modern Analytics

In April this year, we announced that Qlik had successfully achieved Google Cloud Ready – BigQuery Designation for its Qlik Sense® cloud analytics solution and Qlik Data Integration®. We continue increasing customer confidence by combining multiple Qlik solutions alongside Google Cloud BigQuery to both help activate SAP data, and now mainframe data as well.

5 Ways Data Lake Can Benefit Your Organization

Today organizations are looking for better solutions to guarantee that their data and information are kept safe and structured. Using a data lake contributes to the creation of a centralized infrastructure for location management and enables any firm to manage, store, analyze, and efficiently categorize its data. Organizations find it extremely difficult to deal with data because the information is kept in silos and in multiple formats.

Changing Your ERP? Add Tax Tech That Works

Switching to a modern ERP software system affords many benefits, including increased efficiency, improved accuracy, and better control over your company’s finances. It is also an excellent opportunity to revisit many of the business processes that sit outside of your core ERP system. As you set out to improve your financial and operational procedures, you have an opportunity to rethink the way you perform tax planning, transfer pricing, budgeting, reporting, and analytics.

Reshape Your Year-Round Tax Function With Transfer Pricing Software

In many organizations, transfer pricing adjustments are like a lot of other last-minute activities. They seem to be ignored throughout most of the annual cycle. Then, they suddenly take on a great importance at year-end. That leaves the tax team scrambling to address an entire year’s worth of transactions. It also leads to interdepartmental friction in many cases. If transfer pricing is changed retroactively for the entire year, that can have far-reaching implications.

Building a Sustainable Data Warehouse Design

Data plays a vital role in the growth of an organization. Companies spend large amounts of money on building data and big data infrastructures such as data vaults, data marts, data lakes, and data warehouses. These infrastructures are populated via multiple data sources using robust ETL pipelines that function throughout the day. A data infrastructure must operate 24/7 to provide real-time analysis and data-driven business insights.

Alloy DB Demo - Integrate.io

AlloyDB stands out among cloud databases with its higher scalability, 99.99% availability SLA, and full integration with Google’s suite of AI/ML products—which allow it to deliver the best of the cloud to its customers. One AlloyDB use case involves migrating on-premises or self-managed PostgreSQL—or other hosted cloud-based databases—to AlloyDB. Watch this simple demo on how to achieve this migration easily with Integrate.io ETL.

AI at Scale isn't Magic, it's Data - Hybrid Data

A recent VentureBeat article , “4 AI trends: It’s all about scale in 2022 (so far),” highlighted the importance of scalability. I recommend you read the entire piece, but to me the key takeaway – AI at scale isn’t magic, it’s data – is reminiscent of the 1992 presidential election, when political consultant James Carville succinctly summarized the key to winning – “it’s the economy”.

What is Self Service Analytics? The Role of Accessible BI Explained

Self service analytics (also called self-service business intelligence, or self-service BI) is a term commonly used among analytics vendors and organizations adopting BI, often in the context of being the next big thing in driving more people to use data to find insights. But what is self service analytics? How does self service analytics work? And why does self service analytics matter?

How to Run Workloads on Spark Operator with Dynamic Allocation Using MLRun

With the Apache Spark 3.1 release in early 2021, the Spark on Kubernetes project has been production-ready for a few years. Spark on Kubernetes has become the new standard for deploying Spark. In the Iguazio MLOps platform, we built the Spark Operator into the platform to make the deployment of Spark Operator much simpler.

O'Reilly | Fundamentals of Data Observability

Quickly detect, troubleshoot, and prevent the propagation of a wide range of data incidents through Data Observability, a set of best practices that allow data teams to gain greater visibility of data and its usage. If you're a data engineer, ML engineer, or data architect, or if the quality of your work depends on the quality of your data, this book shows how to focus on the practical aspects of introducing Data Observability in your day-to-day work.

Pros & Cons of Using a Customer Data Platform as Your Data Warehouse

Does your Ecommerce business team understand the customer journey? By tracking the history of individual customer behavior and customer interactions across different channels, your organization can better understand what motivates your audience — and cater to them with the right marketing campaigns.

How to Accelerate HuggingFace Throughput by 193%

Deploying models is becoming easier every day, especially thanks to excellent tutorials like Transformers-Deploy. It talks about how to convert and optimize a Huggingface model and deploy it on the Nvidia Triton inference engine. Nvidia Triton is an exceptionally fast and solid tool and should be very high on the list when searching for ways to deploy a model. Our developers know this, of course, so ClearML Serving uses Nvidia Triton on the backend if a model needs GPU acceleration.

How Twitter maximizes performance with BigQuery

How does a tweet go from one person to hundreds of millions of people? How does the data process so quickly? In this episode of Architecting with Google Cloud, Priyanka chats with Gary and Saurabh from Twitter about how data from over 200 million users goes through the Twitter data center and Google Cloud. Watch along and learn how data stored across tens of thousands of BigQuery tables in Google Cloud runs millions of queries each month.

Keboola is now officially Powered by Snowflake

Over the years, Keboola and Snowflake have seen their own share of successes and incredible achievements. Now, we can proudly announce that Keboola has joined the Powered by Snowflake program. With both companies founded around the same time, Keboola and Snowflake have been working hand in hand for some time now.

Cloudera's Open Data Lakehouse Supercharged with dbt Core(tm)

dbt allows data teams to produce trusted data sets for reporting, ML modeling, and operational workflows using SQL, with a simple workflow that follows software engineering best practices like modularity, portability, and continuous integration/continuous development (CI/CD).

Credit Bureau Credibility - The Voice of the Customer

This is a guest post with exclusive content by Bill Inmon, Mary Levins, and Georgia Burleson. Bill “is an American computer scientist recognized by many as the father of the data warehouse. Inmon wrote the first book, held the first conference, wrote the first column in a magazine, and was the first to offer classes in data warehousing.” -Wikipedia.

Why Doesn't the Modern Data Stack Result in a Modern Data Experience?

The data landscape is exploding with tools. As data professionals we have at our fingertips specialized tools for anything: from specialized databases (graph, geo, you name it) to tools for SQL-driven transformations (looking at you, dbt). Yet, a lot of data work is about provisioning, selecting, administering, and just maintaining those tools. Which is just a pain. As Pavel Dolezal, CEO and co-founder of Keboola said: The answer is in how the Modern Data Architecture is built.

6 Best Data Integration Tools of 2022

Data integration is the data engineering process of combining data across all the different sources in a company (CRM, SaaS apps like Salesforce, APIs, …) into a single unified view. The data integration process includes data extraction, data cleansing, data ingestion, data validation, modeling, and exposing ready-to-be-consumed data sets to other users and applications for business intelligence or data-driven activities.

How to build a self-service BI strategy

Think about the times you've wished you had more insight into your business data. Or all of the times you wished you could answer questions about your business performance without waiting for someone else to get back to you. Gone are the days when businesses rely solely on IT staff to provide reports and analytics. With self-service business intelligence (BI), users can create their own reports, dashboards, and data visualizations without relying on IT help.

Moving to Log Analytics for BigQuery export users

If you’ve already centralized your log analysis on BigQuery as your single pane of glass for logs & events…congratulations! With the introduction of Log Analytics (Public Preview), something great is now even better. It leverages BigQuery while also reducing your costs and accelerating your time to value with respect to exporting and analyzing your Google Cloud logs in BigQuery.

The Data Cloud & Public Sector With Deloitte

Welcome to Data Cloud Now, where we are dedicated to illustrating how the Data Cloud is pushing the possible forward, each and every day. In this episode we are exploring the impact it’s having within the public sector. Explore the impact the Data Cloud is having within the Public Sector with special guests Deloitte’s Monica McEwen & Frank Farrall along with Snowflake’s Jeff Frazier.

Following your local happiness gradient - Dr. Catherine Williams

This episode features an interview with Dr. Catherine Williams, Global Head of IQ at Qualtrics, an experience management software platform. Catherine has extensive background in data science and quantitative analytics. Prior to Qualtrics, Catherine served as Chief Data Scientist and Chief Data and Marketplace Officer at AppNexus and Xandr, which is part of AT&T. Catherine holds a Ph.D. in Mathematics from the University of Washington. She has also held postdoctoral fellowships at Stanford and Columbia Universities. On this episode, Catherine discusses growing a machine learning approach to unstructured data, using data to get to the “why” of customer behavior, and following your local happiness gradient.

How ThoughtSpot Uses ThoughtSpot for Field Marketing

As ThoughtSpot’s SVP of Corporate Marketing I oversee a field marketing team that acts as the glue between our Marketing and Field Sales teams. When people talk about field marketing, they’re often just thinking of events — but we have a far broader remit than that. Each member of the Field Marketing team sits within a specific sales region, acting as a kind of regional CMO.

3-Minute Recap: Unlocking the Value of Cloud Data and Analytics

DBTA recently hosted a roundtable webinar with four industry experts on “Unlocking the Value of Cloud Data and Analytics.” Moderated by Stephen Faig, Research Director, Unisphere Research and DBTA, the webinar featured presentations from Progress, Ahana, Reltio, and Unravel. You can see the full 1-hour webinar “Unlocking the Value of Cloud Data and Analytics” below. Here’s a quick recap of what each presentation covered.

Get Ready for the Next Generation of DataOps Observability

I was chatting with Sanjeev Mohan, Principal and Founder of SanjMo Consulting and former Research Vice President at Gartner, about how the emergence of DataOps is changing people’s idea of what “data observability” means. Not in any semantic sense or a definitional war of words, but in terms of what data teams need to stay on top of an increasingly complex modern data stack.

What Challenges Are Hindering the Success of Your Data Lake Initiative?

Conventional databases are no longer the appropriate solution in a world where data volume is growing every second. Many modern businesses are adopting big data technologies like data lakes to counter data volume and velocity. Data lake infrastructures such as Apache Hadoop are designed to handle data in large capacities. These infrastructures offer benefits such as data replication for enhanced protection and multi-node computing for faster data processing.

Developing More Accurate and Complex Machine-Learning Models with Snowpark for Python

Sophos protects people online with a suite of cybersecurity products. Hear Konstantin Berlin, Head of Artificial Intelligence at Sophos, explain how the Snowflake Data Cloud helps Sophos increase the accuracy of their machine-learning models by allowing data scientists to process large and complex data sets independent of data engineers. Through Snowpark, data scientists can run Python scripts along with SQL without having to move data across environments, significantly increasing the pace of innovation.

Ep 59: New Zealand's Crown Research Institute CDAO, Jan Sheppard on Treating Data as a Treasure

Treating data as a treasure is a foundational principle for Jan Sheppard, the Chief Data and Analytics officer at New Zealand’s Crown Research Institute of Environmental Science and Research (ESR.) This agency leads ongoing research in public health, environmental health, and forensics for the country of New Zealand. Like many other CDAOs, her role is relatively new. But the unique values she applies to data can be traced back many hundreds of years to the indigenous Maori people of her country. Through her work, Jan recognizes the profound impact data can have on people and their environments for generations to come.

7 Best Data Pipeline Tools 2022

The data pipeline is at the heart of your company’s operations. It allows you to take control of your raw data and use it to generate revenue-driving insights. However, managing all the different types of data pipeline operations (data extractions, transformations, loading into databases, orchestration, monitoring, and more) can be a little daunting. Here, we present the 7 best data pipeline tools of 2022, with pros, cons, and who they are most suitable for. 1. Keboola 2. Stitch 3. Segment 4.

Introduction to Automated Data Analytics (With Examples)

Is repetitive and menial work impeding your data scientists, analysts, and engineers from delivering their best work? Consider automating your data analytics to free their hands from routine tasks so they can dedicate their time to doing more meaningful, creative work that requires human attention. In this blog we are going to talk about: Now let’s dive in.

Yellowfin Named Embedded Business Intelligence Software Leader in G2 Fall Reports 2022

Yellowfin has again been recognized in the Leader quadrant in the 2022 G2 Fall Grid Reports for Embedded Business Intelligence (Enterprise and Small Business). This is Yellowfin's 13th quarter in a row to be named a leader in a G2 Grid Report. The Yellowfin team are grateful to our customers for the reviews they have provided for our embedded analytics capability and product suite on G2, a leading business software and service comparison source for trusted user ratings and peer-to-peer reviews.

Talend's contributions to Apache Beam

Apache Beam is an open-source, unified programming model for batch and streaming data processing pipelines that simplifies large-scale data processing dynamics. The Apache Beam model offers powerful abstractions that insulate you from low-level details of distributed data processing, such as coordinating individual workers, reading from sources and writing to sinks, etc.

Building an automated data pipeline from BigQuery to Earth Engine with Cloud Functions

Over the years, vast amounts of satellite data have been collected and ever more granular data are being collected everyday. Until recently, those data have been an untapped asset in the commercial space. This is largely because the tools required for large scale analysis of this type of data were not readily available and neither was the satellite imagery itself. Thanks to Earth Engine, a planetary-scale platform for Earth science data & analysis, that is no longer the case.

Analyzing satellite images in Google Earth Engine with BigQuery SQL

Google Earth Engine (GEE) is a groundbreaking product that has been available for research and government use for more than a decade. Google Cloud recently launched GEE to General Availability for commercial use. This blog post describes a method to utilize GEE from within BigQuery’s SQL allowing SQL speakers to get access to and value from the vast troves of data available within Earth Engine.

How to simplify and fast-track your data warehouse migrations using BigQuery Migration Service

Migrating data to the cloud can be a daunting task. Especially moving data from warehouses and legacy environments requires a systematic approach. These migrations usually need manual effort and can be error-prone. They are complex and involve several steps such as planning, system setup, query translation, schema analysis, data movement, validation, and performance optimization.

Scaling Kafka Brokers in Cloudera Data Hub

This blog post will provide guidance to administrators currently using or interested in using Kafka nodes to maintain cluster changes as they scale up or down to balance performance and cloud costs in production deployments. Kafka brokers contained within host groups enable the administrators to more easily add and remove nodes. This creates flexibility to handle real-time data feed volumes as they fluctuate.

Webinar: Unlocking the Value of Cloud Data and Analytics

From data lakes and data warehouses to data mesh and data fabric architectures, the world of analytics continues to evolve to meet the demand for fast, easy, wide-ranging data insights. Right now, nearly 50% of DBTA subscribers are using public cloud services, and many are investing further in staff, skills, and solutions to address key technical challenges. Even today, the amount of time and resources most organizations spend analyzing data pales in comparison to the effort expended in identifying, cleansing, rationalizing, consolidating, and transforming that data.

Editing and saving a dashboard

In this video you will learn how to edit one of your existing Yellowfin dashboards — such as adding a new report to a dashboard and then save those edits by publishing the dashboard. You will also learn how to edit/change the title of the dashboard, select/change the folders where the dashboard will be saved, and how to add tags to your dashboard. You will also learn how to edit/change the Dashboard Access to either Public or Private.

Enterprise data and analytics in the cloud with Microsoft Azure and Talend

The emergence of the cloud as a cost-effective solution to delivering compute power has caused a paradigm shift in how we approach designing, building, and delivering analytics to business users. Although forklifting an existing analytics environment into the cloud is possible, there’s substantial benefit for those that are willing to review and adjust their systems to capitalize on the strengths of the cloud.

A Guide to Principal Component Analysis (PCA) for Machine Learning

Principal Component Analysis (PCA) is one of the most commonly used unsupervised machine learning algorithms across a variety of applications: exploratory data analysis, dimensionality reduction, information compression, data de-noising, and plenty more. In this blog, we will go step-by-step and cover: Before we delve into its inner workings, let’s first get a better understanding of PCA. Imagine we have a 2-dimensional dataset.

7 Best Change Data Capture (CDC) Tools of 2022

As your data volumes grow, your operations slow down. Data ingestion - extraction of all underlying datasets, transformation, and loading in a storage destination (such as a PostgreSQL or MySQL database) - becomes sluggish, impacting processes down the line. Affecting your data analytics and time to insights. Change Data Capture (CDC) makes data available faster, more efficiently, and without sacrificing data accuracy. In this blog we are going to overview the 7 best change data capture tools of 2022.

How to Do Data Labeling, Versioning, and Management for ML

It has been months ago when Toloka and ClearML met together to create this joint project. Our goal was to showcase to other ML practitioners how to first gather data and then version and manage data before it is fed to an ML model. We believe that following those best practices will help others build better and more robust AI solutions. If you are curious, have a look at the project we have created together.

How to Distribute Machine Learning Workloads with Dask

Tell us if this sounds familiar. You’ve found an awesome data set that you think will allow you to train a machine learning (ML) model that will accomplish the project goals; the only problem is the data is too big to fit in the compute environment that you’re using. In the day and age of “big data,” most might think this issue is trivial, but like anything in the world of data science things are hardly ever as straightforward as they seem.

Keboola + ThoughtSpot = Automated insights in minutes

Keboola and ThoughtSpot partnered up to offer click-and-launch insights machines. With the original integration, you can already cut the time-to-insight. Keboola helps you get clean data and ThoughtSpot helps you turn it into insights. What’s new? The new solution builds out-of-the-box and ready-to-use data pipelines (Keboola Templates) and live self-serve analytic dashboards (ThoughtSpot SpotApps) from the ground up. You just need to click-and-launch your analytic use case.

Power Your Lead Scoring with ML for Near Real-Time Predictions

Every organization wants to identify the right sales leads at the right time to optimize conversions. Lead scoring is a popular method for ranking prospects through an assessment of perceived value and sales-readiness. Scores are used to determine the order in which high-value leads are contacted, thus ensuring the best use of a salesperson’s time. Of course, lead scoring is only as good as the information supplied.

How To Use a Customer Data Platform (CDP) as Your Data Warehouse

Here’s what you need to know about how to use your customer data platform (CDP) as your data warehouse: Whether you’re a mom-and-pop store or an ecommerce giant, understanding the customer journey is crucial to your organization’s success. When you collect data across a wide range of customer touchpoints, you can use this wealth of information for many different use cases: performing audience segmentation, improving your marketing campaigns, boosting customer engagement, and more.

[DEMO] How to manage Talend Studio updates from Talend Management Console?

Talend Cloud provides powerful graphical tools and 900+ connectors and components to connect databases, big data sources, on-premises, and cloud applications. Design cloud-to-cloud and hybrid integration workflows in Talend Studio and publish them to a fully managed cloud platform. If you are using Talend Cloud Management Console with Talend Studio, depending on your license, you can create executable tasks for Jobs, Data Services, and Routes published from Talend Studio and run them directly in the cloud or on Remote Engines, ensuring the security of your data. =