Systems | Development | Analytics | API | Testing

May 2021

Jeeves Grows Up: How an AI Chatbot Became Part of Unravel Data

Jeeves is the stereotypical English butler – and an AI chatbot that answers pertinent and important questions about Spark jobs in production. Shivnath Babu, CTO and co-founder of Unravel Data, spoke yesterday at Data + AI Summit, formerly known as Spark Summit, about the evolution of Jeeves, and how the technology has become a key supporting pillar within Unravel Data’s software.

DataStore vs FeatureStore

I think it’s safe to say that one of the worst things in Machine Learning is the terminology. The maths and statistics are definitely part of the learning curve, but more than that, it feels like you are learning a new language. In some ways, you are. DataStore and FeatureStore are two of the current buzzwords that people are trying to understand. To be fair, DataStore and FeatureStore feel like family rather than strangers.

The Clear SHOW - S02E07 - Manual Orchestration (Pit Stop!)

Before we write the super-easy automation for our feature-store workflow, we have to make sure we all understand how to run a task on a clearml-agent! Join T. Guerre for a quick demo of what ClearML can do for manual orchestration of workflows, once you have used it to manage your experiments! ClearML - Your entire workflow in one MLOps platform

What is Data as a Service (DaaS)?

As the amount of data companies are faced with snowballs, the need for efficient data governance grows. An increasing number of organizations are turning to cloud service providers for data management. In this context, data as a service, often referred to as DaaS, is becoming an essential tool for managing data integration, data storage, and data analytics.

What is File Transfer Protocol?

Transferring files between two or more machines is an essential part of the ETL (extract, transform, load) process. Of course, there are multiple ways to move data, including flat file databases. For example, you can physically copy the data onto a USB drive or send it to the recipient via email. But methods like these are far less efficient than sending data via FTP. So what is FTP exactly, and how do you use it to transfer files and data? Keep reading for all the answers.

The Complete Guide to GDPR Compliance

The General Data Protection Regulation (GDPR) is a landmark piece of legislation that affects how organizations can handle, process, and store the personal data of European Union (EU) citizens and residents. But what does the GDPR require exactly, and how can you be sure that your organization complies with it? We go over everything you need to know in this all-in-one guide to GDPR compliance.

Say Goodbye to Data Quality with ELT

ELT is a three-step process that first extracts raw, structured, and unstructured data from source databases, applications, data stores, and other repositories. It then loads that data into a data lake and transforms it as needed by analysts. Since it doesn't move the data to an intermediate staging area or transform it before loading, the extraction process is speedy. You don’t need to pick and choose what data loads into the data lake or wait for it to be processed.

The Ethics of AI Comes Down to Conscious Decisions

This blog post was written by Pedro Pereira as a guest author for Cloudera. Right now, someone somewhere is writing the next fake news story or editing a deepfake video. An authoritarian regime is manipulating an artificial intelligence (AI) system to spy on technology users. No matter how good the intentions behind the development of a technology, someone is bound to corrupt and manipulate it. Big data and AI amplify the problem. “If you have good intentions, you can make it very good.

Real-time Change Data Capture for data replication into BigQuery

Businesses hoping to make timely, data-driven decisions know that the value of their data may degrade over time and can be perishable. This has created a growing demand to analyze and build insights from data the moment it becomes available, in real-time.

PII Masking Can Protect Your Business

Businesses large and small depend on access to information in order to make smarter, data-driven decisions. And much of that data is personal, sensitive, or confidential. So how can you balance this demand for big data with the need to protect the individuals whom this data describes? When it comes to personally identifiable information (PII), there are multiple very good reasons why you should keep it securely under lock and key.

ETLT with Snowflake, dbt, and Xplenty

What do Xplenty, Snowflake software, and dbt (data build tool) have in common? When used together, they merge the best of ETL (extract, transform, load) and ELT (extract, load, transform) into a powerful, flexible and cost-effective ETLT (extract, transform, load, transform) strategy. In this guide, we’ll show you how to create an ETLT strategy with Xplenty, Snowflake software, and dbt. But first, we’ll explain why you'd want to use this strategy to build an ETLT data transformation stack.

AutoZone: Exceeding customer expectations with speed of service

“Talend is amazing because it’s open, flexible, and visual. The robustness and reliability of Talend have made it an integral part of our solution set. It’s easy to learn and fast to ramp up.” – Jason Vogel, IT Manager, AutoZone AutoZone is America’s #1 vehicle solutions provider. It was founded in 1979 and has since expanded to more than 6,400 stores across three countries, with over 96,000 employees.

How to define your first business use case with ThoughtSpot

Companies today are faced with an analytics conundrum. On one hand, there’s a higher demand than ever for actionable business insights, but on the other there’s limited resources to deliver BI content to on-technical business end-users. To fill this gap, the industry is increasingly turning to the next generation of self-service analytics tools. These tools reduce time to insight, speed up insight to action, and also allow BI teams to focus on more strategic analytics work.

3x Dataflow Throughput with Auto Sharding for BigQuery

Many of you rely on Dataflow to build and operate mission critical streaming analytics pipelines. A key goal for us, the Dataflow team, is to make the technology work for users rather than the other way around. Autotuning, as a fundamental value proposition Dataflow offers, is a key part of making that goal a reality - it helps you focus on your use cases by eliminating the cost and burden of having to constantly tune and re-tune your applications as circumstances change.

How to Launch Your New Data Engineering Strategy

When you secure a new data engineering position, it’s important to get off on the right foot. You need to respond to new data sources, data types, data sets, and applications efficiently. In the era of Big Data, this can be more than challenging. During this early stage of your job transition, you need to impress, and you can do this by re-thinking your data engineering strategy.

Session-based Recommender Systems

Recommendation systems have become a cornerstone of modern life, spanning sectors that include online retail, music and video streaming, and even content publishing. These systems help us navigate the sheer volume of content on the internet, allowing us to discover what’s interesting or important to us. The classic modeling approaches to recommendation systems can be broadly categorized as content-based, as collaborative filtering-based, or as hybrid approaches that combine aspects of the two.

Shorten time to critical insights with Streaming SQL

Data and analytics have become second nature to most businesses, but merely having access to the vast volumes of data from these devices will no longer suffice. Leading enterprises realize that the speed of data presents a new frontier for competitive differentiation. It is imperative for organizations to reduce time-to-insights to gain a competitive advantage by responding decisively to competitors, fine-tuning operations, and serving fickle customers.

From Data Lake To Enterprise Data Platform: The Business Case Has Never Been More Compelling

Companies have had only mixed results in their decades-long quest to make better decisions by harnessing enterprise data. But as a new generation of technologies make it easier than ever to unlock the value of business information, change is coming. We’ve already reaped gains at Hitachi Vantara, where I run a global IT team that supports 11,000 employees and helps more than 10,000 customers rapidly scale digital businesses.

The Future Belongs to the Data-Driven

I’m starting to hear questions like: “What comes next?” “Do things go back to the way they were?” “Are some of the changes wrought by the pandemic here to stay?” I think we all know part of the answer: there is no going (all the way) back. In the analog world, sure, we need some things to revert to bounce back. We need to revitalize retail, tourism and hospitality to get our economies moving again.

Remote work isn't going anywhere-have you addressed these cloud security risks?

It’s been over a year since enterprises around the world had to pivot and transition to work-from-home setups. While some employees are slowly trickling back into the office, majority of organizations have people working both onsite and offsite. This modern workforce has brought out an increasing reliance on cloud infrastructure, an essential tool for collaboration and business continuity. Technology like this isn’t without its risks though.

The Clear SHOW - S02E06 - DataOps pt. II Whaaa, so easy?!

We have a feature store, but is it easy to use while developing? You bet! Join Ariel and T.Guerre to find out how! First time hearing about us? Go to - clear.ml! ClearML: One open-source suite of tools that automates preparing, executing, and analyzing machine learning experiments. Bring enterprise-grade data science tools to any ML project

Have a cool summer with BigQuery user-friendly SQL

With summer just around the corner, things are really heating up. But you’re in luck because this month BigQuery is supplying a cooler full of ice cold refreshments with this release of user-friendly SQL capabilities. We are pleased to announce three categories of BigQuery user-friendly SQL launches: Powerful Analytics Features, Flexible Schema Handling, and New Geospatial Tools.

10 Best Data Analysis Tools for Data Management

Data analysis is a key component for operating a successful business in today's tech-savvy world. When analyzing data sets, however, every business has its own needs. While some companies employ data scientists to work with complex big data, others have fewer and less complicated data sources that even non-technical users can navigate. Your specific needs will influence the type of tool your company chooses for data management.

Pushing Past Pilot Paralysis to Launch and Scale IIOT Use Cases

With billions of industrial IoT (IIOT) devices in place, generating massive volumes of data from “the edge,” the potential for proof of concept success for use cases in the factory can be paralyzing. While the value of this digital revolution, aka Industry 4.0, is clear, realizing the full promise has been slow. Research and real-life experience from Accenture shows that many manufacturers get stuck early on or can’t get beyond proof-of-concept pilots to scale.

The Four Upgrade and Migration Paths to CDP from Legacy Distributions

The move into any new technology requires planning and coordinated effort to ensure a successful transition. This blog will describe the four paths to move from a legacy platform such as Cloudera CDH or HDP into CDP Public Cloud or CDP Private Cloud. The four paths are In-place Upgrade, Side-car Migration, Rolling Side-car Migration, and Migrate to Public Cloud.

How to be a data-driven organization: Key learnings from Chief Product Officer Summit

Many people consider data-driven culture exclusively as one that comprises a large data team (analysts, data scientists), perfect data-sets neatly organized in a structured data warehouse, and the perfect tools to both access and utilize data in every decision made. Ultimately, however, much of data-driven culture comes down to a clear focus on the end-goal. Many people forget data is just an implementation, something that helps us in what we do.

The Ultimate PII Checklist

Data breaches can happen to any company, regardless of size or technical resources. In April 2021, Facebook’s reputation took a massive hit when a data breach impacted more than half a billion users. The worst kind of data breach involves personally identifiable information (PII). PII is essentially any data that contains sensitive details about real people, such as customers and employees.

Product announcement: Say hello to the new and improved Storage UI

Keboola’s Storage UI now comes with a new - slicker - look which will improve the user experience for all Keboola veterans. (New to Keboola? Do not fear. Simply follow along with the guided tour when you sign up for free, and you can unlock all the new features after June 2nd).

Celebrating 9 Years of ThoughtSpot

Today, May 21st marks ThoughtSpot’s nine year anniversary as a company. We’ve come a long way from exchanging ideas at Starbucks and working from an office set-up inside LightSpeed Ventures for the initial few weeks. Today, we offer customers the most innovative cloud analytics platform in the world and help thousands of users ask and answer questions with data.

How to Establish an Effective BI Security Strategy

Business intelligence (BI) tools have been a shot in the arm of the enterprise. Teams can create their own visualizations and enjoy self-service analytics, without needing IT to compile reports or wrangle big data. But, of course, there’s a catch. BI tools expose data to a wider range of people, which means there are new issues of BI security (BISEC) and privacy to think about, especially in the age of GDPR. Here’s what you need to know.

Mapping Your Automation Journey in Financial Services

Automation will fail to achieve most of its potential if treated as a specialty, single-function tool applied only to accelerate narrow parts of business processes. In most industries, financial services included, automation done properly is a journey that delivers a steady stream of benefits resulting from building a broader and increasingly powerful multilayered stack of automated processes and analytics.

Humans and Data? Relationship Status: Complicated

Many people may know me as someone who aims to find a mathematical angle in almost everything. And they wouldn’t be wrong – on my journey to bring my passion for math to the masses, I’ve even shown the mathematical angle for finding love! Yes, really – feel free to read my book on it. So, if there’s just one thing that I want people to take away from my chat with Joe DosSantos on Data Brilliant, it’s that math and data really do touch every part of our lives.

Future of Data Meetup: Collect, Curate, Predict & Visualise your Streaming Data

How do you get your data from A to B? We take you on a journey with your data through: Join us to find out more about managing your data lifecycle, and see it in action during our demo. AGENDA 18:00 - Welcome 18:05 - Best Practice: Streaming Data & Analytics 18:20 - Demo: Collect, Curate, Predict & Visualise your Streaming Data 19:00 - Open Networking 19:30 - END

Effective Cost and Performance Management Amazon EMR Webinar Recording

Amazon EMR is a go-to platform for those who want all the power of Hadoop and Spark in the cloud. However, cost and performance trade-offs can reduce the advantages of EMR over alternatives. Lack of visibility into the root cause of problems, right-sizing options, and cost allocation can add confusion and frustration for EMR users. Unravel Data gives you visibility into the minute-to-minute operations of your workloads on EMR. Get root cause analysis (RCA) of workload breakdowns and slowdowns; AI-powered recommendations; and proactive fixes for many problems. With Unravel Data, you can meet and beat your SLAs, saving thousands - even millions - of dollars per year in the process.

ThoughtSpot Success Series #4 - Data Modeling Best Practices

Introducing the ThoughtSpot Success Series! Want to expand your knowledge of ThoughtSpot? Want to learn some great tips and tricks? Join ThoughtSpot's Customer Success team and other users like yourself as we discuss various topics in our new Success Series. During this event, we'll share how to move beyond flat files and star schemas and into multi-fact table chasm traps and fan traps that help build search-anywhere experiences for your end-users.

Automated Anomaly Detection: The next step for CSPs

Today’s telecom engineers are expected to handle, manage, optimize, monitor and troubleshoot multi-technology and multi-vendor networks, in a competitive and unforgiving market with minimal time to resolution and high costs for errors. With the ongoing growth in operational complexities, effectively managing radio networks, current and legacy core networks, services, and transport and IT operations is becoming a radical challenge.

NVIDIA RAPIDS in Cloudera Machine Learning

In the previous blog post in this series, we walked through the steps for leveraging Deep Learning in your Cloudera Machine Learning (CML) projects. This year, we expanded our partnership with NVIDIA, enabling your data teams to dramatically speed up compute processes for data engineering and data science workloads with no code changes using RAPIDS AI.

The Big Banks Are in Danger of Becoming Utilities. Can Data Make a Difference?

When digital disruption first hit the finance and investment markets, it seemed to signal the death knell for the traditional banks and financial institutions. For a time, it looked as though the best-known brands were doomed to become utilities: only good for basic financial transactions, weighed down by legacy processes and technologies, and viewed as little more than digital dinosaurs.

How 84.51°/Kroger Cut Costs and Improved Efficiency with Unravel Data

84.51° is a wholly owned subsidiary of Kroger, the US retailing giant – the largest supermarket chain in America, and the fifth-largest retailer in the world. As an organization, 84.51° is a descendant of dunnhumby, analytics geniuses who revolutionized customer loyalty programs at Tesco in the UK decades ago.

How to get started with data lineage

Modern enterprises leverage over 400 data sources to stay ahead of the competition. The sheer volume and complexity of data operations raise several challenges for intrepid organizations: How to cut the complexity of data operations? Enterprises turn to data lineage for the answer. Data lineage is the process of recording and visualizing data assets as they flow along your system.

Building interactive data apps with ThoughtSpot

Businesses today run on apps, and those apps run on data. Too often, however, the technical complexity required to surface and explore that data for additional analysis prevents users from doing so. With ThoughtSpot Everywhere, organizations are easily building new data apps powered by the simplicity and ease of use of ThoughtSpot, or adding ThoughtSpot services to their existing SaaS offerings. This is giving them the unprecedented opportunity to create product experiences that stick, monetize data in new ways, and harness data right within existing tools.

How to Use Product Analytics for SaaS Sales Pitches

Imagine you are preparing to approach a prospective client. You have done all the market research needed to understand the edge your product or service has over your competitors. You have identified your niche for higher profitability and you have profiled the key decision-makers that will be targeted by your outreach campaign. Would you like your pitch to fall flat just because you did not dazzle the prospect?

5 Reasons to Use Heroku and ETL

ETL tools and Heroku Connect both offer bidirectional data connections to Salesforce. So it would be natural to assume that you only need one or the other for your Salesforce integration. But, in fact, each tool has its own particular strengths that make the two systems complementary. Heroku is a software development platform and cloud service provider that empowers developers who build, deploy and scale web applications.

Security and Business Intelligence: Why it Matters

Companies deal with high volumes of data every day. In fact, 51% of businesses realize a positive difference in their bottom line by using their business intelligence (BI) to predict customer trends. According to one source, the BI market may reach close to $30 billion before the end of 2022. With so much money going into data management and so much resulting from it, the need for effective cybersecurity measures continues to grow by the day.

Streaming Market Data with Flink SQL Part II: Intraday Value-at-Risk

This article is the second in a multipart series to showcase the power and expressibility of FlinkSQL applied to market data. In case you missed it, part I starts with a simple case of calculating streaming VWAP. Code and data for this series are available on github. Speed matters in financial markets. Whether the goal is to maximize alpha or minimize exposure, financial technologists invest heavily in having the most up-to-date insights on the state of the market and where it is going.

The value of CDP Public Cloud over legacy Hadoop-on-IaaS implementations

Prior the introduction of CDP Public Cloud, many organizations that wanted to leverage CDH, HDP or any other on-prem Hadoop runtime in the public cloud had to deploy the platform in a lift-and-shift fashion, commonly known as “Hadoop-on-IaaS” or simply the IaaS model.

Building a Global GCP Platform With Snowflake | Rise of The Data Cloud

Are you looking to build a best-in-class global GCP platform with the Snowflake Data Cloud? Then this episode of Rise of the Data Cloud is for you! In this episode, Mani Gopalakrishnan, VP of Digital Transformation at Kraft Heinz talks about turning technical skills into commodities, how to digitally transform your company, and much more.

6 Tips for Configuring an ETL Solution in Salesforce

Salesforce is the world's #1 CRM (customer relationship management) platform. The service provides access to valuable data by logging and collecting customer interactions, regardless of the channel in which they take place. Whether it gets the information from phone calls, website transactions, or social media posts, Salesforce delivers customer data in real-time so business owners can gain essential insights.

What is HIPAA, and Why is It Important?

Healthcare information is perhaps the most important data in our lives. Your health records can contain your medical history, results of tests and scans, and details of current health insurance. This data is a special class of personally identifiable information, and HIPAA is the law that protects it.

Key considerations when making a decision on a Cloud Data Warehouse

Making a decision on a cloud data warehouse is a big deal. Beyond there being a number of choices each with very different strengths, the parameters for your decision have also changed. Modernizing your data warehousing experience with the cloud means moving from dedicated, on-premises hardware focused on traditional relational analytics on structured data to a modern platform.

The search for actionable insights: 4 must-have analytics features

Today, data is constantly growing in complexity and quantity. Ensuring your users consistently find reliable answers from their dashboards and reports for informed decision-making can be difficult if you don't actively guide them on on how to optimally use the analytics tools you provide. However, the challenge is exacerbated by the fact most BI solutions on the market today still tailor their tools to experienced analysts first.

How to use ThoughtSpot and Databricks SQL

Today, the rate of innovation around data processing has accelerated beyond what any of us previously thought possible. Databricks recently announced their Databricks SQL offering, which is the next step in this evolution and builds on the foundation of Delta Lake to deliver interactive analytics at scale. This offering pairs with ThoughtSpot’s Modern Analytics Cloud to empower everyone in an organization to find answers to their questions with simple access to the data lake.

Operationalize Your Insights - The Self-Service Data Roadmap, Session 4 of 4

In this webinar, Unravel CDO and VP Engineering Sandeep Uttamchandani describes the fourth and final step for any large, data-driven project: the Operationalize phase. You've found your data (Discover phase), readied it for processing (Prep phase), and built out your processing logic and machine learning model(s) (Build phase). Now you need to Operationalize all your work to data as a live project, in production.

Why Business Intelligence is a Security Risk

Business intelligence (BI) software, such as Microsoft Power Bi, allows organizations to leverage big data and make better business decisions. Select Hub reports that 48 percent of companies place high or critical importance on these solutions. However, BI tools introduce a level of security risk that businesses must address.

Accelerate Moving to CDP with Workload Manager

Since my last blog, What you need to know to begin your journey to CDP, we received many requests for a tool from Cloudera to analyze the workloads and help upgrade or migrate to Cloudera Data Platform (CDP). The good news is Cloudera has a tried and tested tool, Workload Manager (WM) that meets your needs. WM saves time and reduces risks during upgrades or migrations.

5 Factors to Consider When Choosing a Stream Processing Engine

Are you using the right stream processing engine for the job at hand? You might think you are—and you very well might be!—but have you really examined the stream processing engines out there in a side-by-side comparison to make sure? Our Choose the Right Stream Processing Engine for Your Data Needs whitepaper makes those comparisons for you, so you can quickly and confidently determine which engine best meets your key business requirements.

Data insights made simple, flexible, and proactive? Cheers to that.

Stonegate is Britain’s largest pub company, with 1,200 managed pubs and bars across the UK, 3,200 tenanted sites, and multiple brands including Slug & Lettuce, Yates, Be At One, Walkabout, and Popworld. The hospitality business was difficult enough before COVID-19 arrived, but the pandemic forced Stonegate to fundamentally rethink its core business and operational models.

What is a KPI dashboard? 6 key benefits & best practice examples

KPI dashboards are a great way for executives to improve their management of strategic goals, and keep on top of changes, issues and trends in performance at a high-level, with many useful applications when used correctly, alongside other modern analytics tools. This blog covers the role of KPI dashboards can play in organizations today, its business benefits, and best practice examples to ensure your users can get the most out of them.

How to use CDC for database replication

In the age when data is the new oil, more than 80% of IT decision-makers delay their business decisions due to slow data processing. Architecting your ETL pipeline with database replication can speed up your data processes. Database replication creates an analytic database as a separate copy of your production database. This unburdens the transactional database from analytical queries while securing fresh data in the analytical database for faster time-to-insights.

Cycle Time: The Most Important Metric that Helps You Nip Problems in the Bud

The art of management is to act at the right moment. Good leaders allow team members to be autonomous when things are going in the right direction. In the same time, they are swift in fixing up issues when it was a minor problem, thus avert a crisis later on in time. Knowing the right time act is not easy. This ability used to come from years of experience through trials and errors. With modern source control tools (e.g.

The Five Types of Data Integration

Data integration is crucial in today’s business world. Business data comes via many sources, from internal databases to clicks on a website. Being able to access all your data in one place helps your business make better, faster decisions. But how do you integrate all your data, and what’s the best way to do it? Here we discuss five data integration methods, how they work and why businesses continue to choose them.

Solving the Right Data Problem. Finally.

In late 2020, a CEO at an American bank revealed the thinking that’s becoming common in many businesses these days. “We’re a 103-year-old bank,” their CEO told me. “We’re doing everything on spreadsheets. But we are trying to become a highly profitable, digital-first bank that anticipates financial needs and empowers our clients with frictionless experiences. We need to become a data company.”

The Clear SHOW - S02E04 - DataOps is All You Need (?)

Can you build your own feature store in two minutes? (sort of) Yes!!! DataOps is all you need. Join Ariel and T.Guerre to find out how! First time hearing about us? Go to - clear.ml! ClearML: One open-source suite of tools that automates preparing, executing, and analyzing machine learning experiments. Bring enterprise-grade data science tools to any ML project.

The 7 Best Reporting Tools for 2021

Reporting tools solve a key problem for businesses by enabling them to communicate data in a way that is accessible, easy-to-understand, and useful for both frontline staff and management teams. These tools take raw data and turn it into tables, charts, and graphs ready for consumption, turning complicated data into visuals that better enable users to spot patterns and trends.

Security and ELT - A Tragedy

Extract, Load, Transform, or ELT, is a process that extracts data from the source, loads it directly into a data warehouse or data lake, and then transforms it to make it available for business intelligence tools. It supports all data types, from raw to structured. ELT is a popular way to ingest large volumes of raw data quickly, but it brings many security concerns with it.

Automating CDP Private Cloud Installations with Ansible

The introduction of CDP Public Cloud has dramatically reduced the time in which you can be up and running with Cloudera’s latest technologies, be it with containerised Data Warehouse, Machine Learning, Operational Database or Data Engineering experiences or the multi-purpose VM-based Data Hub style of deployment.

cdpcurl: Low-Level CDP API Access

Cloudera Data Platform (CDP) provides an API that enables you to access CDP functionality from a script, or to integrate CDP features with an application. In practice you can use the CDP API to script repetitive tasks, manage CDP resources, or even create custom applications. You can learn more about the API in its official documentation. There are multiple ways to access the API, including through a dedicated CLI, through a Java SDK, and through a low-level tool called cdpcurl.

ThoughtSpot Analytics Cloud

Get consumer-grade analytics for your modern data stack. ThoughtSpot empowers everyone to create, consume, and operationalize data-driven insights. Our consumer-grade search and AI technology delivers true self-service analytics that anyone can use, while our developer-friendly platform ThoughtSpot Everywhere makes it easy to build interactive data apps that integrate with your existing cloud ecosystem.

5 Ways to Improve Data Quality with Teradata

In 1979, Teradata began life as a collaboration between Caltech and Citibank. Today, this enterprise software group is all about redefining business intelligence tools and data management. The Teradata Database is now the Teradata Vantage Advanced SQL Engine, The name not only highlights the evolution of the company but also recognizes that tech consumers now expect more from their tools.

MongoDB vs. MySQL: Detailed Comparison of Performance and Speed

MongoDB and MySQL are similar is some ways, but they also have some obvious differences. Perhaps the most obvious one is that MongoDB is a NoSQL database, while MySQL only responds to commands written in SQL. Potential users may want to examine MongoDB vs. MySQL in the areas of performance and speed. The following article will help you understand the differences, as well as the pros and cons of each database.

Quantifying the value of multi-cloud deployment strategies with CDP Public Cloud

In this article, I will be focusing on the contribution that a multi-cloud strategy has towards these value drivers, and address a question that I regularly get from clients: Is there a quantifiable benefit to a multi-cloud deployment? That question is typically being asked when I explain the ability to leverage container technology that offers a consistent deployment environment across multiple clouds and form factors (public, private, or hybrid cloud).

Mastercard Reduces MTTR and Improves Query Processing with Unravel Data

Mastercard is one of the world’s top payment processing platforms, with more than 700 million cards in use worldwide. In the US, nearly 40% of American adults hold a Mastercard-branded card. And the company is going from strength to strength; despite a dip in valuation of more than a third when the pandemic hit, the company has doubled in value three times in the last nine years, recently reaching a market capitalization of more than $350B dollars.

Unravel Data Featured in CRN's 2021 Big Data 100 List

In a press release delivered today, Unravel Data announced its appearance on CRN’s Big Data 100 list for 2021. Unravel’s entry appears in the Data Management and Integration category. Also featured in this category are other rising stars such as Confluent, Fivetran, Immuta, and Okera, all of whom spoke at new industry conference DataOps Unleashed, held in March.

"Reverse ETL" with Keboola

TL;DR: Yes, you can do it. And no, you don’t need a separate tool for it. “Reverse ETL” is a fairly recent addition to the data engineer’s dictionary. While you can read articles upon articles about it (there’s a pretty good ‘primer’ in the Memory Leak blog), it can be summarized as being the art and science of taking data from your data warehouse and sending it somewhere other than BI - generally into other tools and systems where it becomes operational.

Future of Data Meetup: Continuous SQL With SQL Stream Builder

Continuous SQL is using Structured Query Language (SQL) to create computations against unbounded streams of data, and show the results in a persistent storage. The result stored in a persistent storage can be connected to other applications to have an analytical visualization of your data. Compared to traditional SQL, in Continuous SQL the data has a start, but no end. This means that queries continuously process results to a sink or other target types. When you define your job in SQL, the SQL statement is interpreted and validated against a schema. After the statement is executed, the results that match the criteria are continuously returned.

The Data Chief Live - People Change Management in Driving Self-Service Analytics

Join the Data Chief Live, May 6, for best practices for data and analytics leaders to execute on your vision to create a data-driven organization. This is YOUR one-to-many coaching & advisory session for CDOs, CAOs, and analytics champions. This month, we talk people change management in driving self-service analytics & I am thrilled to be joined by Milad Toliyati, Director of Innovation at The Axis Group, and ThoughtSpot Customer Success Manager Ovi Bodnar, adviser to flagship customers such as Nationwide Building, CWT, and Just Eat.

The Data Chief Live - People Change Management for Driving Self-Service Analytics

Join the Data Chief Live, May 6, for best practices for data and analytics leaders to execute on your vision to create a data-driven organization. This is your one-to-many coaching & advisory session for CDOs, CAOs, and analytics champions. This month, we talk people change management in driving self-service analytics & I am thrilled to be joined by Milad Toliyati, Director of Innovation at Axis Group, and ThoughtSpot Customer Success Manager Josh Royse.

ThoughtSpot Success Series #3 - Use Case Prioritization

Introducing the ThoughtSpot Success Series! Want to expand your knowledge of ThoughtSpot? Want to learn some great tips and tricks? Join ThoughtSpot's Customer Success team and other users like yourself as we discuss various topics in our new Success Series. During this event, we'll share how to build a ThoughtSpot use case pipeline that allows you to maximize the value return on investment & maintain momentum.

5 Tips on Avoiding FTP Security Issues

Flat files are files that contain a representation of a database (aptly named flat file databases), usually in plain text with no markup. CSV files, which separate data fields using comma delimiters, are one common and well-known type of flat file; other types include XML and JSON. Thanks to their simple architecture and lightweight footprint, flat files are a popular choice for representing and storing information.

How Data Affects Healthcare | Rise of The Data Cloud | Snowflake

Data driven healthcare, anonymized data hackathons in a digital data sandbox, how to leverage the power of data for good, compute on demand, how the pandemic has affected digital adoption and how shifting to the cloud impacts patients are just some of the topics being covered in today's episode of Snowflake's Rise of the Data Cloud. Join us as Ashok Chennuru, Chief Data and Analytics Officer at Anthem gives us a peek into the world of AI and healthcare.

Automating and Governing AI over Production Data on Azure - MLOPs Live #14 w/Microsoft

Many enterprises today face numerous challenges around handling data for AI/ML. They find themselves having to manually extract datasets from a variety of sources, which wastes time and resources. In this session, we discuss end-to-end automation of the production pipeline and how to govern AI in an automated way. We touch upon setting up a feedback loop, generating explainable AI and doing all of this — at scale.

Industrializing Enterprise AI with the Right Platform - MLOps Live #9 - With NVIDIA

We discuss how enterprises need a platform that brings together tools to streamline data science workflow with leading edge infrastructure that can tackle the most complex ML models — one that can bring innovative concepts into production sooner, integrated within your existing IT/DevOps-grounded approach.

Simplifying Deployment of ML in Federated Cloud and Edge Environments - MLOPs Live #12 - with AWS

We discuss some common applications for machine learning at the edge and the main challenges associated with deploying distributed cloud and edge applications. We then wrap up the session with a live demo showing how to run a distributed cloud or edge application on Amazon Cloud and Outposts with the Iguazio Data Science Platform.

How Feature Stores Accelerate & Simplify Deployment of AI to Production MLOPs Live #13

The breakdown:

00:00 - Intro
02:15 - MLOps Overview
05:03 - Feature Engineering
07:44 - MLOps Workflow
10:44 - Solution: Feature Store
14:25 - Feature Store Competitive Landscape
17:03 - Features of a Feature Store
21:01 - CTO: Feature Store Sneakpeak
25:55 - Python Code example
27:57 - ML Pipeline example
30:07 - Covid-19 Patient Deterioration
33:26 - LIVE DEMO
52:45 - QA

Fivetran Launches Support for New Databricks + GCP Offering

Businesses can now use Fivetran with the Lakehouse Platform on Google Cloud. We are excited to launch Fivetran support for a newly available solution: Databricks on Google Cloud. We’ve partnered with both Databricks and Google Cloud for many years now, and understand the unique value they each deliver to Fivetran customers, so it was a priority for us to support their joint effort.

The ultimate Google Algorithm update checklist for your website

As we are all well aware, this month Google will be updating its algorithm with the aim of improving the user experience. With these changes, however, it’s reported that many of the top-ranking websites will be affected, meaning they need to take action now to ensure all of the hard SEO work they’ve done is not lost.

Streaming Market Data with Flink SQL Part I: Streaming VWAP

Speed matters in financial markets. Whether the goal is to maximize alpha or minimize exposure, financial technologists invest heavily in having the most up-to-date insights on the state of the market and where it is going. Event-driven and streaming architectures enable complex processing on market events as they happen, making them a natural fit for financial market applications.

New: Data scientists, run transformations and model data in R!

We’re happy to announce that we added R to our selection of transformation and workspace backends in our freemium accounts. Access a wide variety of statistical methods offered by R and model data straight from Keboola. No more context switching and working in a few different tools. Start modeling data and running transformations in R. Create a free account and take Keboola for a spin. R transformations complement Python and SQL where computations or other operations are too difficult.

Database replication techniques

When data is the bloodline of business operations, data breaches, corruptions, inaccessibility due to server downtimes, and accidental deletions can stop business continuity and even drive your business out of business. In this article, we take a look at database replication techniques. That is, how to use data replication to keep data in your database management systems (DBMS) such as PostgreSQL or MySQL accessible and safe.

ThoughtSpot Everywhere - Build Interactive Data Apps

Create more engaging analytics experiences with search & AI. Businesses today run on apps, and those apps run on data. Too often, however, that data is presented in stale, static dashboards. Users want to be able to surface and explore insights on their own. ThoughtSpot Everywhere is a low-code platform that makes it easy to build interactive data apps or embed search and AI in your existing SaaS apps. And thanks to our flexible APIs, your customers will be able to automatically trigger actions and workflows from the analytical insights they uncover.

Enterprise Data Architecture: Time to Upgrade?

ChaosSearch is participating in the upcoming Gartner Data & Analytics Summit (May 4-6), a virtual conference for professionals and executive leaders in Data & Analytics (D&A). The summit will feature expert talks from Gartner analysts, engaging workshops, and the opportunity to participate in roundtable discussions with D&A professionals and executive leaders. This blog post was inspired by the tagline of this year’s Gartner Data & Analytics Summit: Learn, Unlearn, Relearn.

Talend vs. Xplenty: Comparison and Review

The Talend Product Suite tries to be a one-stop-shop for everything data. Whether you need an ESB, iPaaS, API Gateway, or ETL platform, Talend has a tool for it. Xplenty, on the other hand, is a targeted solution that focuses on ETL only. It's an enterprise-grade ETL solution that empowers non-tech-savvy users to build sophisticated data integrations. Both Talend and Xplenty are powerful ETL solutions with excellent reputations and high functionality.

Driving Agility and Scalability through Smart Data

Last year presented business and organizational challenges that hadn’t been seen in a century and the troubling fact is that the challenges applied pains and gains unequally across industry segments. While brick-and-mortar retail was crushed a year ago with mandated store closures, digital commerce retailers realized ten years of digital sales penetration in only three months.

ClearML hits 1.0

May 3rd 2021 – With over 11 man-years of working, and tinkering, long into the night, I am pleased to announce we have hit version 1.0. Following quickly after the release of ClearML 0.17.5, we added the last remaining features we felt 1.0 needed. Namely multi-model support, as well as improved batch operations. With these in place, the choice was clear. The next version released should be the baseline moving forward.

Build interactive analytics in your React App with ThoughtSpot Everywhere

ThoughtSpot has revolutionized access to analytics for business users through search and AI. In addition to being a general purpose analytics tool that allows unprecedented access to business users, product builders can now use ThoughtSpot to deliver search-based analytics to customers. Today, we are launching a brand new SDK that allows you to embed ThoughtSpot into your own web app in literally minutes.

Iguazio Named A Fast Moving Leader by GigaOm in the 'Radar for MLOps' Report

At Iguazio, we’ve spoken and written at length about the challenges of bringing data science to production. The complexity of operationalizing ML can generate huge costs in terms of work hours and compute resources, especially as successful projects get scaled up and expanded. We’re proud to share that the Iguazio Data Science Platform has been named a fast moving leader in the GigaOm Radar for MLOps report.

Building the Modern Analytics and BI Team

We are living in an unprecedented time driven by rapidly changing economic scenarios, the rise of digital native organizations and growing digital revolution, and the emergence of transformative business models. At the heart of much of this revolution is data. Organizations are collecting, analyzing, and mining data at an accelerated rate, creating new opportunities for powerful insights that deliver significant business impact.

Three reasons your cloud data warehouse needscloud analytics now

Today, just 24% of organizations say they've succeeded at becoming data-driven.* This is a challenge many data leaders are still struggling to solve despite increasing demand for data-driven insights from business users. Migrating to a cloud data warehouse is a good first step-and many have done so-but introducing new technology is not the same as ensuring adoption. To truly reap the benefits of your cloud data warehouse investment, you need an equally fast, scalable, and easy-to-adopt analytics solution to make your cloud data available to all.