Systems | Development | Analytics | API | Testing

April 2021

5 Tips to Use Heroku and ETL to Automate Reporting

Heroku is a cloud platform as a service (PaaS) for efficiently building, deploying, monitoring, and scaling applications. Originally created to work with the Ruby programming language, Heroku is now part of the Salesforce platform and supports languages such as Java, Node.js, PHP, Python, and Scala. While Heroku makes it easy to develop production-ready applications fast, one question remains: how can you integrate your Heroku app data with the rest of your data infrastructure and workflows?

Managing Python dependencies for Spark workloads in Cloudera Data Engineering

Apache Spark is now widely used in many enterprises for building high-performance ETL and Machine Learning pipelines. If the users are already familiar with Python then PySpark provides a python API for using Apache Spark. When users work with PySpark they often use existing python and/or custom Python packages in their program to extend and complement Apache Spark’s functionality. Apache Spark provides several options to manage these dependencies.

Reasons Why Cloud Migrations Fail & Ways to Succeed

Organizations are moving big data from on-premises to the cloud, using best-of-breed technologies like Databricks, Amazon EMR, Azure HDI, and Cloudera, to name a few. However, many cloud migrations fail. Why? And, how can you overcome the barriers and succeed? Join Chris Santiago, Director of Solution Engineering, as he describes the biggest pain points and how you can avoid them, and make your move to the cloud a success.

Data Science vs. Big Data Marketing

Data science and big data are essential in today’s world of marketing. You’ve probably already seen multiple instances of both being used for advertising and sales purposes, but you may not realize just how useful they are. If you own a business, you need to know how to use data for your own marketing programs.

The New Releases of Apache NiFi in Public Cloud and Private Cloud

Cloudera released a lot of things around Apache NiFi recently! We just released Cloudera Flow Management (CFM) 2.1.1 that provides Apache NiFi on top of Cloudera Data Platform (CDP) 7.1.6. This major release provides the latest and greatest of Apache NiFi as it includes Apache NiFi 1.13.2 and additional improvements, bug fixes, components, etc. Cloudera also released CDP 7.2.9 on all three major cloud platforms, and it also brings Flow Management on DataHub with Apache NiFi 1.13.2 and more.

Contextual analytics vs dashboards: What's the difference?

For 20 years, standalone BI tools have failed to penetrate more than 25% of the average organization, with most workers using them once a week, according to Eckerson Group. While many modern dashboards are sophisticated and user-friendly, they are still often accessed as standalone tools outside of line-of-business applications. This separation means it isn’t guaranteed that users will adopt BI, or gain insight from their data.

Neither Cloud nor SaaS Will Deliver Your Data's Full Potential

Your data now resides in the cloud, and you’ve chosen SaaS providers that use their own products (or drink their own champagne, as I like to say). Does that mean you’re getting the full value from your data? No. Chances are high your data is still siloed. This time, the culprits are your SaaS providers who collect and store your data, thus limiting the analytics you can perform on it.

Future of Data Meetup: Exploring Data and Creating Interactive Dashboards in the Cloud

In this meetup, we’re going to once again put ourselves in the shoes of an electric car manufacturer that is deploying a recently developed electric motor out into their new cars. We’re going to show how to explore some data that has been previously collected through various different sources and stored into Apache Hive within a data warehouse, with the goal of tracking down a specific set of potentially defective parts. We’ll then take the results of this data exploration and create an interactive dashboard that presents our results in a visually appealing way using a BI tool that’s integrated right into the same data warehouse.

Fast Forward Live: Few-Shot Text Classification

Join us for this month's Machine Learning research discussion with Cloudera Fast Forward Labs. We will discuss few-shot text classification - including a live demo and Q&A. This is an applied research report by Cloudera Fast Forward. We write reports about emerging technologies. Accompanying each report are working prototypes or code that exhibits the capabilities of the algorithm and offer detailed technical advice on its practical application.

iOS 14.5 and Countly: a Match Made in the Clouds

According to Statcounter, Apple’s iOS penetration in the global mobile scene is around 27%, which is more than considerable. However, this penetration is at almost 50% in markets such as Europe or North America which coincidentally, are those at the forefront of enacting strict data privacy policies. So when Apple announced new user data privacy regulations for app developers as part of its iOS 14.5 release, it was not too shocking.

10 Fivetran Competitors & Alternatives

As increasing aspects of business go digital, managing data has never been more crucial. According to Forbes, only one in four businesses has a "well-defined data management structure." If you’re looking to improve how you store, manage, and analyze your business data, it’s time to look at intelligent data integration tools. Fivetran is an ETL tool. ETL stands for "extract, transform, load".

Integrating Data to Build Emotional Health: How SU Queensland Uses Talend to Enrich Service Delivery

The mission statement is so direct and uncomplicated. SU Queensland, a non-profit organization based in Australia, is all about “bringing hope to a young generation.” The realities of delivering on this charter, of course, are multi-dimensional and complex.

4 Tips for Securing Business Intelligence Systems

Harnessing the power of big data is increasingly important not just for business intelligence (BI)—a descriptive model that reveals to enterprises the current state of their companies—but also for data analytics. Data analytics offer predictive models with insight into where a business might head under different scenarios. Your organization's data gives you the opportunity to collect dynamic business intelligence.

Cable Companies Are Growing Up

Cable and Satellite companies in the US have emerged from a decade of acquisitions, consolidation and shakeout and are beginning to assert themselves as full service providers in the communications and media space. With Comcast just announcing its new suite of cellphone plans this month, and Charter, Altice and Dish ramping up their offerings, the Big Three in wireless – AT&T, Verizon and T-Mobile/Sprint – are looking over their shoulders.

Cox Automotive Runs Robust Pipelines on Databricks with Unravel

Cox Automotive is a large, global business. It’s part of Cox Enterprises, a media conglomerate with a strong position in the Fortune 500, and a leader in diversity. Cox also has a strong history of technological innovation, with its core cable television business serving as a leader in the growth and democratization of media over the last several decades.

Why and when enterprises should care about Model Explainability

Machine learning models are often used for decision support—what products to recommend next, when an equipment is due for maintenance, and even predict whether a patient is at risk. The question is, do organizations know how these models arrive at their predictions and outcomes? As the application of ML becomes more widespread, there are instances where an answer to this question becomes essential. This is called model explainability.

The Clear SHOW - S02E03 - Your Code == Feature Store

Ariel and T.Guerre discussing the reasoning behind features stores. Should you get one for your production pipeline? First time hearing about us? Go to - clear.ml! ClearML: One open-source suite of tools that automates preparing, executing, and analyzing machine learning experiments. Bring enterprise-grade data science tools to any ML project.

10 Tips to Help You Write a Flat File Database

Originally developed by IBM, flat file databases have been around since the 1970s. Because these files store data in plain text format, most people use MS Excel to create them. It’s an easy-to-use system that allows for the quick sorting of results. This is because each line of plain text has just one record. Tabs, commas, or other delimiters separate multiple records. In this article, you’ll learn some tips for optimizing your flat file.

Announcing Iguazio Version 3.0: Breaking the Silos for Faster Deployment

We’re delighted to announce the release of the Iguazio Data Science Platform version 3.0. Data Engineers and Data Scientists can now deploy their data pipelines and models to production faster than ever with features that break down silos between Data Scientists, Data Engineers and ML Engineers and give you more deployment options . The development experience has been improved, offering better visibility of the artifacts and greater freedom of choice to develop with your IDE of choice.

Converting HBase ACLs to Ranger policies

CDP is using Apache Ranger for data security management. If you wish to utilize Ranger to have a centralized security administration, HBase ACLs need to be migrated to policies. This can be done via the Ranger webUI, accessible from Cloudera Manager. But first, let’s take a quick overview of HBase method for access control.

How to compete with analytics-first software vendors

These are a new class of vendors like Gainsight and C3 who are building applications based on the idea data will drive a transaction, rather than transactions driving the data. The challenge for every enterprise software vendor is how to respond to this threat because it's going to be difficult. For big vendors, you're going to have vested interests internally who don't see this challenge coming or don't know how to respond to it. Some may even underestimate the threat of the change.

HDFS Data Encryption at Rest on Cloudera Data Platform

Encryption of Data at Rest is a highly desirable or sometimes mandatory requirement for data platforms in a range of industry verticals including HealthCare, Financial & Government organizations. The capability increases security and protects sensitive data from various kinds of attack that could be internal or external to the platform.

AI/ML without DataOps is just a pipe dream!

Let’s start with a real-world example from one of my past machine learning (ML) projects: We were building a customer churn model. “We urgently need an additional feature related to sentiment analysis of the customer support calls.” Creating the data pipeline to extract this dataset took about 4 months! Preparing, building, and scaling the Spark MLlib code took about 1.5-2 months!

Woopra: Your End-to-End Customer Journey Analytics Companion

Customers interact with your business multiple times before reaching any goal. These repeated digital interactions are what make up the customer journey. Your customers’ overall experience across the different channels as they engage with your organization (websites, social media, email, etc.) make up the customer experience. Customer journey analytics refers to the process of analyzing the experience of customers across multiple touchpoints in the customer journey.

Cloudera Data Platform (CDP) Private Cloud on Red Hat OpenShift

Learn how Cloudera and Red Hat help enterprise companies securely manage the complete data lifecycle, putting data to work faster and reducing time to value. Cloudera Data Platform (CDP) Private Cloud on Red Hat® OpenShift® aggregates and visualizes data to derive actionable insights in a secure, hybrid, and open-source environment.

How Xplenty Simplifies Heroku PostgreSQL Data Integration

What can you do with data collected on Heroku PostgreSQL? How will you analyze it and integrate it? With Xplenty, of course! Xplenty lets you connect to a PostgreSQL database on Heroku, design a Dataflow via an intuitive user interface, aggregate the data, and even save it back to PostgreSQL on Heroku or other databases and cloud storage services.

Apache Ozone and Dense Data Nodes

Today’s enterprise data analytics teams are constantly looking to get the best out of their platforms. Storage plays one of the most important roles in the data platforms strategy, it provides the basis for all compute engines and applications to be built on top of it. Businesses are also looking to move to a scale-out storage model that provides dense storages along with reliability, scalability, and performance.

ThoughtSpot Success Series #2 - Defining a Use Case

Introducing the ThoughtSpot Success Series! Want to expand your knowledge of ThoughtSpot? Want to learn some great tips and tricks? Join ThoughtSpot's Customer Success team and other users like yourself as we discuss various topics in our new Success Series. In this session, we'll share how to best define your ThoughtSpot use case in your organization to maximize results & align key stakeholders.

The 6 Soft Skills Data Engineers Need to Succeed

Soft skills can be almost as important as data engineering skills when you apply for a job. Soft skills can make the difference between stress and efficiency or being unsatisfied with your position and a raise. When data engineers and data scientists earn bachelor’s degrees, they usually take classes in topics like data warehousing, programming languages, machine learning, and data science.

Achieving Energy Efficiency With Data Efficiency: Vermont Gas + Data Governance Leaders

Vermont Gas (VGS) is a leader in energy efficiency and innovation, offering a clean, safe, affordable choice for over 53,000 homes, businesses, and institutions in northwest Vermont. They pride themselves on providing timely, comprehensive service for all their customers, ensuring they have heat, hot water, and energy to get through the cold New England winter.

Drinking our own champagne - Cloudera upgrades to CDP Private Cloud

Like most of our customers, Cloudera’s internal operations rely heavily on data. For more than a decade, Cloudera has built internal tools and data analysis primarily on a single production CDH cluster. This cluster runs workloads for every department – from real-time user interfaces for Support to providing recommendations in the Cloudera Data Platform (CDP) Upgrade Advisor to analyzing our business and closing our books.

Construction feat. TF2 Object Detection API

Although the title might sound like a collaboration of two music bands with really bad names, this blog is all about understanding how computer vision and machine learning can be used to improve safety and security in a harsh and dangerous environment of a construction site. The construction industry is one of the most dangerous industries according to the common stats from OSHA.

Future of Data Meetup: Nice to Meet You, NiFi!

You asked for and we are delivering the third in our “Hello:“ series of introductory “Big Data” topics. Our next meetup covers using Apache NiFi. Lots of people want to be a data scientist... but what good is machine learning, artificial intelligence or advanced analytics if you don’t have data? Getting data is incredibly important, but getting data in real time or near real time helps you give near real time insight.

Why Leverage Heroku Connect with an ETL?

Ever since Salesforce acquired Heroku back in 2010, the two services have worked exceptionally well together. Businesses can use Heroku to build flexible and scalable applications while utilizing Salesforce to manage customer data and drive sales. And when you need to share data between these two platforms, there’s a dedicated add-on: Heroku Connect.

What is Streaming Analytics?

What is Streaming Analytics? Streaming Analytics is a type of data analysis that processes data streams for real-time analytics. It continuously processes data from multiple streams and performs simple calculations to complex event processing for delivering sophisticated use cases. The primary purpose is to present the most up-to-date operational events for the user to stay on top of the business needs and take action as changes happen in real-time.

Biba Helou of Capital One Asks 'What's In Your Data Cloud?' | Rise of The Data Cloud | Snowflake

On this episode of the Rise of the Data Cloud podcast, host Steve Hamm sits with Biba Helou, Senior Vice President of Enterprise Data at Capital One, and they talk about the agility the cloud provides, how data can help personalize customer's experiences, and much more. Connect with Biba Helou Inside the Data Cloud, organizations unite their siloed data, discover, and securely share data, and execute diverse analytic workloads across multiple clouds.

Bigtable vs. BigQuery: What's the difference?

Many people wonder if they should use BigQuery or Bigtable. While these two services have a number of similarities, including "Big" in their names, they support very different use cases in your big data ecosystem. At a high level, Bigtable is a NoSQL wide-column database. It's optimized for low latency, large numbers of reads and writes, and maintaining performance at scale.

Xplenty PII & PHI transformations

Personally identifiable information (PII) and protected health information (PHI) are two types of sensitive data that fall under one or more data privacy regulations. HIPAA and GDPR are examples of the regulations that govern what organizations can and need to do with PII and PHI. When you work with large data sets, it can be challenging to maintain compliance with these regulations.

The rise of analytics-first software

We've moved from desktop to SaaS, to a real UX focus. Now we're seeing new vendors that are analytics-first. They’re creating new applications that are challenging the established players. Historically, applications were transaction-first; you build your software thinking about your workflow or the transactions that you want people to do.

The End of Facebook Analytics: Now What?

Facebook recently announced that it will effectively discontinue Facebook Analytics on June 30, 2021. The announcement was not particularly informative and was limited to pointing out ways of retaining the tool’s users by means of diverting business to other features that Facebook already offers. However, the reasons behind this decision were not addressed by Facebook and it brings up the question of what this means for the industry.

How to Debug in Xplenty

With its low-code and no-code features, Xplenty brings the power of ETL and data integration to the masses. But even with Xplenty’s tremendously user-friendly interface, it’s possible that the transformations you design don’t work exactly as you intended—which means you need to debug and resolve the issue fast. Fortunately, there are multiple debugging options in Xplenty for exactly this reason.

6 Data Cleansing Strategies For Your Organization

The success of data-driven initiatives for enterprise organizations depends largely on the quality of data available for analysis. This axiom can be summarized simply as garbage in, garbage out: low-quality data that is inaccurate, inconsistent, or incomplete often results in low-validity data analytics that can lead to poor business decision-making.

Simplify the MongoDB ETL Process

The faster you can extract, transform, and load data from MongoDB, the better it is for your business processes and business intelligence systems. The problem is, most ETL solutions struggle to manage MongoDB’s dynamic schemas, NoSQL support, and JSON data types. That’s not the case with Xplenty – which was optimized for easy, no-fuss MongoDB integrations with ease: no custom code, no delays, no confusion.

What's new in CDP Private Cloud Base 7.1.6?

According to IDG, when customers consider updating to the latest release of a product, they expect new features, enhanced security, and better performance, but increasingly want a more streamlined upgrade process. With each new release of CDP Private Cloud, this is exactly what we strive to deliver. Along with a host of new features and capabilities, we are improving the upgrade process to be as painless as possible.

5 key business benefits of Automated Business Monitoring

Understandably, however, the many automation, AI and machine learning technologies that come with modern analytics solutions can sometimes be hard to keep up with. One area we get asked a lot about is ABM, which we offer with Yellowfin Signals, and what exact advantages it brings to the table for everyday analysis and insight generation.

DataOps vs DevOps

The exponential adoption of IT technologies over the past several decades has had a profound impact on organizations of all sizes. Whether it is a small, medium, or large enterprise, the need to create web applications while managing an extensive set of data effectively is high on every CIO’s priority list. As a result, there has been an ongoing effort to implement better approaches to software development, data analysis, and data management.

How to automate big data governance

Companies deploying big data analytics to gain competitive advantage can quickly sour their successes by lacking a big data governance strategy. Which turns their data assets into data liabilities. In this article, we dive into the field of information governance and information management and explore how to set up and automate a big data governance program for success. Big data governance is a set of processes and principles that ensure the high value of data throughout its lifecycle.

Snowflake CEO Frank Slootman Talks Data Cloud Evolution | Rise of The Data Cloud | Snowflake

On the season 2 premiere of the Rise oF The Data Cloud Podcast, host Steve Hamm talks with Snowflake CEO Frank Slootman, and they give us an update on the transition of Snowflake from a private startup to a public company, the impact of the Data Cloud over the past year for organizations across industries, the future of data sharing and much more.

In the event-driven galaxy, which metadata matters most?

As a developer, you're no stranger to your vast and varied data environment… Or are you? The tremendous amount of data your organization collects is stored in various sources and formats. You need a way to understand where and what data is, to be able to do what you need to do: build amazing event-driven applications.

BI Tool Integrations for Heroku Postgres

Heroku is a powerful platform for application development. Users can build and deploy on the cloud, and you can effortlessly scale up once your app takes off. And behind every app, you'll find an equally powerful database: Heroku Postgres. If you're building Heroku apps, you'll find them to be a rich source of operational and customer data. Add in the right Business Intelligence (BI) tools, and you'll be able to derive insights about the inner workings of your organization.

Cloudera Data Engineering - Integration steps to leverage spark on Kubernetes

Cloudera Data Engineering is a serverless service for Cloudera Data Platform (CDP) that allows you to submit jobs to auto-scaling virtual clusters. CDE enables you to spend more time on your applications, and less time on infrastructure. CDE allows you to create, manage, and schedule Apache Spark jobs without the overhead of creating and maintaining Spark clusters.

No Data Loss and No Service Interruption - HDF to CFM Rolling Migration

The blog “Migrating Apache NiFi Flows from HDF to CFM with Zero Downtime” detailed how many common NiFi dataflows can be easily migrated when the Hortonworks DataFlow and Cloudera Flow Management clusters are running side-by-side. But what if you lack the resources to run multiple NiFi clusters concurrently? Not a problem.

Five lessons in leadership from Snowflake CEO, Frank Slootman

Since the start of the pandemic nearly a year ago, there's been one word on the lips of every business leader, analyst, and investor around the world: cloud. COVID-19 fundamentally changed the way businesses operate. In response, organizations went all in on cloud, betting on the unmatched scale, speed, and security of SaaS applications to help them weather the storm. Nowhere was this shift more pronounced that in our own data and analytics industry.

Reverse ETL: What You Need to Know

Data integration has been around for decades in some form or fashion, as organizations are always looking for ways to combine their enterprise data and collect it in a centralized location. The most commonly used and dominant type of data integration is ETL (extract, transform, load). ETL first extracts data from one or more source systems, transforms it as necessary, and then loads it into a target warehouse or data lake.

5 Success Stories That Show the Value of Enterprise Data Cloud

What’s the fastest and easiest path towards powerful cloud-native analytics that are secure and cost-efficient? In our humble opinion, we believe that’s Cloudera Data Platform (CDP). And sure, we’re a little biased—but only because we’ve seen firsthand how CDP helps our customers realize the full benefits of public cloud.

10 Steps to Achieve Enterprise Machine Learning Success

You’ve probably heard it more than once: Machine learning (ML) can take your digital transformation to another level. It’s a pie-in-the-sky statement that sounds great, right? And while you’d be forgiven for thinking that it might sound too good to be true, operational ML is, in fact, achievable and sustainable. You can get the very kind of ML you need to increase revenue and lower costs. To help teams work smarter and do things faster.

A huge chunk of machine learning models are never operationalized-here's why

As organizations refocus and restrategize this year, machine learning projects seem to be on the top of IT priority lists. Innovation is more important than ever, and this has led to higher spending, increased hiring budgets, and a wider range of ML use cases. Despite this, organizations are facing challenges in actually deploying machine learning models at scale. A lot of models are never operationalized, or if they are, the process to production takes too long.

What is a Flat File Database?

When it comes to data storage, there is almost as much diversity in the types of databases as there is in the data that they contain. Designing and implementing a strong enterprise data strategy means that you need to be aware of the different databases and how you might best apply them within your organization. In IT, the term "flat file" means something very different from the heavy-duty steel construction file cabinets that you might buy from Safco.

The Key to Unlocking IT Modernization's Power? Enterprise level Transformation

The United States Veterans Administration (VA) over the last decade underwent a massive enterprise-wide IT transformation, eliminating its fragmented shadow IT and adopting a centralized system capable of supporting the agency’s 400,000 employees and more effectively utilizing its $240 billion-plus annual budget. The result: A more reliable and modern IT environment that improves access, availability, and user experience -ultimately supporting the VA mission more effectively.

It's time for the augmented consumer

One of the changes that we've seen happening in the analyst space recently is a huge shift in thinking. Gartner in particular is now talking about augmented consumers and multi-experience analytics. To me, this is really interesting because they’re talking about the business user and how they want to work and consume data. In the past it was all about the data analyst, but focusing on users opens up an entirely new level of thinking.

Unleashing the "Power of Many" With Active Intelligence

From the Wright Brothers and Ada Lovelace, to Elon Musk and Steve Jobs, when we consider who is behind the most celebrated innovations and industry transformations, we often think about individual bright thinkers and disruptors. However, over the years, studies have shown that the greatest potential lies in the “power of many," fostered by a shift in how new generations work.

Enabling NVIDIA GPUs to accelerate model development in Cloudera Machine Learning

When working on complex, or rigorous enterprise machine learning projects, Data Scientists and Machine Learning Engineers experience various degrees of processing lag training models at scale. While model training on small data can typically take minutes, doing the same on large volumes of data can take hours or even weeks. To overcome this, practitioners often turn to NVIDIA GPUs to accelerate machine learning and deep learning workloads.

Next Stop - Predicting on Data with Cloudera Machine Learning

This blog series follows the manufacturing and operations data lifecycle stages of an electric car manufacturer – typically experienced in large, data-driven manufacturing companies. The first blog introduced a mock vehicle manufacturing company, The Electric Car Company (ECC) and focused on Data Collection. The second blog dealt with creating and managing Data Enrichment pipelines. The third video in the series highlighted Reporting and Data Visualization.

The Keys to Unlocking the Benefits of a Modern Data Analytics Platform

Many organizations are working to become more data-driven – increasing data use and leveraging data insights to improve decision-making, solve their most challenging problems and improve revenue and profitability. A February 2020 IDC survey showed a direct correlation between quality decision-making and strong data-to-insight capabilities; 57 percent of organizations with the best data analytics pipelines received the highest decision-making score.

Fintech startup, Branch makes data analytics easy with BigQuery

As a startup in the fintech sector, Branch helps redefine the future of work by building innovative, simple-to-use tech solutions. We’re an employer payments platform, helping businesses provide faster pay and fee-free digital banking to their employees. As head of the Behavioral and Data Science team, I was tapped last year to build out Branch’s team and data platform. I brought my enthusiasm for Google Cloud and its easy-to-use solutions to the first day on the job.

Where in the World is Xplenty?

In 2011, Pope John Paul II was beatified, Prince William married Kate Middleton, "Game of Thrones" premiered, and Xplenty was born. On a quiet sycamore tree-lined street in Tel Aviv, Israel, breathing distance from Kiryat Sefer Park, the then-startup had just launched a game-changing Extract, Transform, Load (ETL) tool to process, transform, and move data at speed and generate big data analytics at scale. It would become the most advanced data pipeline platform on the planet.

Yellowfin 9.5 release highlights

With 9.5, we've focused on providing new capabilities and enhancements for everyone involved in the data to design workflow - analysts, developers, users - that streamline processes, introduce functional improvements and enrich the analytic experience for all. For the full list of updates, please read the release notes and check out our release highlights video below to see some of these new enhancements in action for yourself.

Building Automated ML Pipelines in Cloudera Machine Learning

In this video, we'll walk through an example on how you can use Cloudera Machine Learning to run some python code that creates specific Machine Learning models. We’ll then go through some features within Cloudera Machine Learning such as job scheduling and model deployments to see how you can do some more advanced machine development operations!

ThoughtSpot Success Series #1 - Introduction to ThoughtSpot Cloud

Introducing the ThoughtSpot Success Series! Want to expand your knowledge of ThoughtSpot? Want to learn some great tips and tricks? Join ThoughtSpot's Customer Success team and other users like yourself as we discuss various topics in our new Success Series. We'll provide a high-level overview of everything you need to know to get started with ThoughtSpot Cloud.

New flexibility: Run your Dataprep jobs with BigQuery or Dataflow

Cloud Dataprep by Trifacta is Google Cloud’s intelligent data service for visually exploring, cleaning, and preparing structured and unstructured data for analytics and machine learning. Due to its serverless architecture, Dataprep does not need any infrastructure to deploy or manage, and is fully scalable.

Too Many Data Engineers? How to Get the Right Balance

As companies grow and become more data-dependent, data engineers find themselves in huge demand. Employers are snapping up all the best data engineering talent they can find, and some businesses have invested in fast-track professional development paths for DBAs and other more junior data positions. But here’s the thing — data engineers work best when they’re part of a balanced team, just like every other professional. Some organizations overlook this point.

Enabling kubectl for CDE

The kubectl tool provides direct administrative access to the Kubernetes cluster underlying a CDE service, which is useful for troubleshooting, among other things. This video will demonstrate how to set up kubectl access. To enable kubectl, we will need a couple of prerequisites. We wiil need the kubeconfig file from the CDE service. We will need to get and authorize the IAM user, and then need to make sure that everything is set up correctly, both for kubectl and some other tools like k9s.

Snowflake CEO Frank Slootman Talks Data Cloud Evolution | Rise of The Data Cloud | Snowflake

On the season 2 premiere of the Rise oF The Data Cloud Podcast, host Steve Hamm talks with Snowflake CEO Frank Slootman, and they give us an update on the transition of Snowflake from a private startup to a public company, the impact of the Data Cloud over the past year for organizations across industries, the future of data sharing and much more.

Top 5 ETL to Snowflake Tools for 2021

Reports and records. Sales sheets and spreadsheets. Files and financials. Your team has more big data than you can comprehend spread across multiple data sources in more locations than a James Bond movie. Isn't it time you kept this data somewhere safe? Moving data to a data warehouse like Snowflake is like keeping thousands of books in a library or a trove of treasure in an underground vault. Big data, your most prized asset, will be safe and snug.

How to Tap into Higher-Level Abstraction, Efficiency & Automation to Simplify your AI/ML Journey

You’ve already figured out that your data science team cannot keep developing models on their laptops or a managed automated machine learning (AutoML) service and keep their models there. You want to put artificial intelligence (AI) and machine learning (ML) into action and solve real business problems.

Who are the best SaaS providers? Those that drink their own champagne

Today, technology drives organizations. We rely on it, which is why a misstep in selecting technology partners and their SaaS solutions can impact an organization in regrettable ways. It can impede your ability to scale in a graceful manner. It can slow down your journey of automation. It can even introduce risk related to compliance or security, depending on the solution. Whenever you decide to buy, a proper assessment of any SaaS provider is crucial.

Xplenty's MongoDB Connector

MongoDB is a popular non-relational (a.k.a NoSQL) database. It is document-oriented and distributed in nature. MongoDB is known to be highly scalable and flexible. In this post, we'll demonstrate how you can utilize MongoDB in your ETL pipelines with Xplenty. To start with, let us briefly discuss why and when you'd want to use MongoDB over other relational databases.

Cloudera Honored With 5-Star Rating in the 2021 CRN Partner Program Guide

Cloudera is being acknowledged by CRN®, a brand of The Channel Company, in its 2021 Partner Program Guide. This annual guide provides a conclusive list of the most distinguished partner programs from leading technology companies that provide products and services through the IT Channel. The 5-Star rating is awarded to an exclusive group of companies that offer solution providers the best of the best, going above and beyond in their partner programs.

Building a global software development team from Australia

Whilst not as well-known as other tech hotspots, many technology companies have been successfully launched and grown up here in Australia, with great economic and legal conditions, access to a talented and diversified skill-base, and a culture of innovation and adaptability acting as some key - and growing - attractions for leaders. So, why Australia? Sun, sand, surf, etc.

Stacking up against the Competition

One of the most leading questions we often receive is, “How does ClearML Compare to..”. I am sure this is the same for any Open Source product. People always want to find the best. The sad truth is, of course, there usually is no “right answer”. What one person needs, another may not. I am sure that, whichever language you speak natively, there is some saying. In English it would be “one mans rubbish, is another mans gold”.

Troubleshoot BigQuery performance with these dashboards

BigQuery is Google's flagship data analytics offering, enabling companies of all sizes to execute analytical workloads. To get the most out of BigQuery, it’s important to understand and monitor your workloads to keep your applications running reliably. Luckily, with Google’s INFORMATION_SCHEMA views, monitoring your organization’s use at scale has never been easier. Today, we’ll walk through how to monitor your BigQuery reservation and optimize performance.

Hybrid Cloud and Strategic Data Use Accelerate State, Army Missions

Some of the most forward-operational elements of the United States federal government are making strides in leveraging data through hybrid cloud environments—and they’re constantly evaluating progress and recalibrating their approaches along the way. At agencies including the Army and the State Department, work is well underway to find ways of employing emerging technologies that build on cloud services and data optimization to realize new levels of effectiveness.

Mastering Databricks Environments with Unravel Data

Databricks is a great solution for customers looking to unlock the powerful use cases that Spark enables, with the high performance of Databricks and the convenience of a managed service. Databricks is available in AWS, Microsoft Azure, and GCP clouds. If you are already a Databricks customer, you want to get the most out of your investment - and if you're considering Databricks, you'll be wondering how hard it is to move to the platform, and how to optimize your investment once you get there.

Unlock geospatial insights with Data Studio and BigQuery GIS

Chances are, your data contains information about geographic locations in some form, whether it’s addresses, postal codes, GPS coordinates, or regions that are meaningful to your business. Are you putting this data to work to understand your key metrics from every angle? In the past, you might’ve needed specialized Geographic Information System (GIS) software, but today, these capabilities are built into Google BigQuery.

Speeding up small queries in BigQuery with BI Engine

A quick and easy way to speed up small queries in BigQuery (such as to populate interactive applications or dashboards) is to use BI Engine. The New York Times, for example, uses the SQL interface to BI Engine to speed up their Data Reporting Engine. To Illustrate, I’ll use three representative queries on tables between 100 MB and 3 GB — tables that are typically considered smallish by BigQuery standards.

How to strengthen your company's data literacy

Those who use data wisely have competitive advantages and more profits. As a result, companies are increasing their focus on improving their data literacy. For example, the importance of data has led companies like AppNexus1 and Chevron2 to conduct internal data science competitions to identify and hone analytical talent. But, as noted in the kickoff blog post to our series on data-driven organizations, merely having data does not ensure you have a useful interpretation of that data.

Engineering Industry Embracing Qlik's SaaS Analytics to Address Environmental and Sustainability Concerns

Working in the engineering field means navigating a variety of needs. Those range from meeting various local and national regulatory statutes, to measuring and monitoring delivery of essential outputs like drinking water and power supply, to understanding the data surrounding regional operations on both the supply and demand side. Organizations that serve this market operate behind the scenes, yet impact our daily life in the United States.

Fast Forward Live: Representation Learning & Image Analysis

Good representations of data (e.g., text, images) are critical for solving many tasks (e.g., search or recommendations). But what exactly are representations, how can they be built and why are deep learning models useful? In this livestream, we will discuss these questions from a software engineering perspective and walk through a live example!

The Future of Sports Data

I watch sports for a living. I couldn't tell you the last time I watched a baseball game from beginning to end. Data is one of the most valuable resources around. But data is no longer something that languishes in a database to be looked at later. Like sports events, data is now live. The sports industry can reap and build on innovations in the realtime data space. But this is no long a nice-to-have. Driven by changing fan behaviour this is now a commercial imperative.

6 Ways to Lose Sports & Gaming Customers Through Poor Realtime UX

Sports and gaming app users demand an uninterrupted, true realtime experience. Almost 90% of US adults now use a mobile device while watching sports. In competitive arenas with similar offerings, like betting or sports, then you absolutely cannot afford to deliver poor mobile experiences. Customer experience is the new competitive battleground and realtime mobile experiences are an essential part of that.

Transform your business with cloud, search, and AI-driven analytics

Despite huge investments in data and analytics over the last two decades, many companies are still struggling with how to become truly data-driven. What are data leaders doing at the organizations that have figured it out? In this white paper, DATAcated Academy's Kate Strachnyi explores four key strategies for critically evaluating your entire data and analytics stack and systematically removing the barriers that exist between their business users and business-critical insights.

Six Top Trends and Predictions for Data, Analytics, and AI in 2021 And what to do about them

Although making predictions about the future is difficult even under the best of circumstances, it's never been more important for business leaders to focus, prioritize, and act in order to stay ahead of the technological curve-and the competition. The strategies you used to innovate and grow your business in the past will not be the same ones you use today. Rethinking how you use data to react and proactively adapt to change will be critical to your bottom line.