Systems | Development | Analytics | API | Testing

October 2021

Kimball vs Inmon: Which approach should you choose when designing your data warehouse architecture?

Data warehouses are the central data repository that allows Enterprises to consolidate data, automate data operations, and use the central repository to support all reporting, business intelligence (BI), analytics, and decision-making throughout the enterprise. But designing a data warehouse architecture can be quite challenging.

Building a startup: Three questions every startup must answer to beat their behemoths

When speaking about the historical trajectory of significant movements, Ghandi once said, “First they ignore you, then they laugh at you, then they fight you, then you win.” For startup founders, these words aptly illustrate the road — and obstacles — that lie ahead. Startups that succeed are destined, at some point, to face off against the most powerful incumbents in the world. Still, there has never been a better time to be a startup than right now.

Quickly, easily and affordably back up your data with BigQuery table snapshots

Mistakes are part of human nature. Who hasn’t left their car unlocked or accidentally hit “reply all” on an email intended to be private? But making mistakes in your enterprise data warehouse, such as accidentally deleting or modifying data, can have a major impact on your business.

What Is Data Integrity and Why Is It Important?

Your organization’s data is the source of opportunity for your innovation. However, 25 percent of executives surveyed by KPMG either distrust or have limited trust in their data. Without integrity, the information is essentially useless. So exactly what is data integrity? Let’s take a dive deeper into this topic and discuss why it is important for your organization.

These Are the Top 5 Snowflake Database Features for Salesforce Users Everywhere

Confusingly, there are multiple Snowflakes in data management. A Snowflake database refers to a database (called a data warehouse) that integrates data from one or more sources for analytical purposes. Snowflake is the company that powers that database as a Software as a Service (SaaS). For this article, we're going to focus on the Snowflake database (sometimes stylized as SNOWFLAKE) and explain the features that best serve Salesforce users.

Special Broadcasting Service (SBS): Knowing and serving audiences better through data integration

SBS is Australia’s most diverse broadcaster, offering a multi-platform portfolio of TV, radio, and digital services. The company’s data is also diverse and multi-platform, and SBS understands its data is a key element in contributing to improving viewership, customer satisfaction, and loyalty. But what’s the key to using data more effectively? How do you bring together dozens of data sources into a single, central data platform?

High Availability (Multi-AZ) for CDP Operational Database

CDP Operational Database (COD) is an autonomous transactional database powered by Apache HBase and Apache Phoenix. It is one of the main Data Services that runs on Cloudera Data Platform (CDP) Public Cloud. You can access COD right from your CDP console. With COD, application developers can now leverage the power of HBase and Phoenix without the overheads that are often related to deployment and management.

How to defend against Ransomware, the threat that holds your data and business hostage

With contributions by Anwar Haq and George Alifragis Ransomware has grown to become a significant threat to organizations today, no matter the size or industry. Cybercriminals are exploiting vulnerabilities in small businesses and enterprises alike, creating short-term and long-term damage that can impact everything from your employees’ productivity to your relationship with customers.

Entrepreneurship Is Power

In today's video, we're sharing a fireside chat that aired at the 2021 Guardian Summit. Join Snowflake's VP of WW Sales Engineering, Eve Besant, Aaron Walker, Founder/CEO, Camelback, and three fellows from Camelback Ventures, Jerelyn Rodriguez, Co-Founder/CEO, The Knowledge, Jessica Santana, Co-Founder/CEO, America on Tech, and Reuben Ogbonna, Co-Founder/Executive Director, The Marcy Lab School as they discuss how entrepreneurship is power.

Fueling financial freedom with data: A Q&A with Darren Pedroza, VP Enterprise Data and Analytics at First Command Financial Services

For members of the military, financial planning is a difficult and ever-changing process. Few businesses understand this better than First Command Financial Services, which focuses on serving the nation’s military families with flexible solutions to home mortgages, car loans, and wealth management.

The Modern Data Stack Ecosystem - Fall 2021 Edition

In our previous article, The Future of the Modern Data Stack, we examined the motivations of the modern data stack, its current state, and looked optimistically into the future to see where it is headed. If you’re new to the modern data stack, we highly recommend giving the aforementioned article a read. A question we often get from new adopters of the modern data stack is “What tech should we be looking into?”.

New Report Reveals Best Practices for Hybrid & Multi-Cloud Data Management

As enterprises develop complex hybrid and multi-cloud environments to support futuristic use cases, a new report published by Big Data Quarterly (BDQ) aims to educate IT decision-makers and practitioners on the most up-to-date solutions and strategies for modern data management in hybrid and multi-cloud environments.

It Worked Fine in Jupyter. Now What?

You got through all the hurdles getting the data you need; you worked hard training that model, and you are confident it will work. You just need to run it with a more extensive data set, more memory and maybe GPUs. And then...well. Running your code at scale and in an environment other than yours can be a nightmare. You have probably experienced this or read about it in the ML community. How frustrating is that? All your hard work and nothing to show for it.

Commercial Lines Insurance- the End of the Line for All Data

I’ve had the pleasure to participate in a few Commercial Lines insurance industry events recently and as a prior Commercial Lines insurer myself, I am thrilled with the progress the industry is making using data and analytics. However, I do not think Commercial Lines insurance gets the credit it deserves for the industry-leading role it has played in analytics. Commercial Lines truly is an “uber industry” with respect to data.

Twelve Best Cloud & DataOps Articles

Interested in learning about different technologies and methodologies, such as Databricks, Amazon EMR, cloud computing and DataOps? A good place to start is reading articles that give tips, tricks, and best practices for working with these technologies. Here are some of our favorite articles from experts on cloud migration, cloud management, Spark, Databricks, Amazon EMR, and DataOps!

C40 Cities Continues To Advance Climate Action By Harnessing Data With Qlik

The latest United Nations IPCC report paints a sobering picture. Climate change looks likely to accelerate in all regions as we approach the critical global warming threshold of 1.5°C. Such an uptick in temperatures will increase sea level rise and intensify the frequency and magnitude of extreme weather events. For cities, these changes will make governance more difficult in nearly every respect.

5 Reasons Why Consolidating Your Analytics Data Is A Good Investment

Data is the lifeblood that runs through your organization. It powers automated workflows, gives customer service reps the full story every time the phone rings, drives every upgrade planned for a product, informs decision-making leaders on what to focus next, and an endless list of etceteras. Wouldn’t it be amazing to have all your data in one place? Yes. Can you? Well…. It’s complicated.

Data Transformation: Explained

Raw data—like unrefined gold buried deep in a mine—is a precious resource for modern businesses. However, before you can benefit from raw data, the process of data transformation is a necessity. Data transformation is the process where you extract data, sift through data, understand the data, and then transform it into something you can analyze. That’s where ETL (extract, transform, load) pipelines come into play.

5 Ways to Integrate Salesforce With Other Platforms

Salesforce is the world’s leading CRM (customer relationship management) platform: More than 100,000 organizations use it worldwide. Thanks to its vast array of powerful features, it’s no surprise that Salesforce has become the most popular software as a service (SaaS) CRM, with an estimated 18 percent market share. Despite this functionality, Salesforce can’t do everything alone.

Is Your Business Ready for Digital Disruption?

It’s one of the great paradoxes of doing business in the 21st century: Customers expect more personalized products and services even as in-person customer contact decreases. This is just one of several disruptive trends facing every industry. At the first day of Hitachi Financial Services Summit 2021, I addressed these challenges in a keynote session with Marek Chlebicki of Raiffeisen Bank International.

Snowflake Announces Support for Google Cloud Private Service Connect

Snowflake was architected with cross-cloud security built into its core, providing multiple layers of robust protection from network access, to authentication and access control, to data protection using encryption (for more details on Snowflake security, check out the on-demand session from Snowflake Summit). For the most-regulated customers around the world, enabling private connectivity is a critical first line of defense.

The Customer Knows - Qlik's Leadership Highlighted in BARC BI & Analytics Survey 22

It’s great when analysts, media members and industry thought leaders tout your company’s leadership; it’s a point of pride for all of us behind the scenes working to continually improve Qlik Sense’s position as a leading and world-class analytics platform. However, the biggest praise (for me) comes from customers/users – the people Qlik Sense helps – whose recognition assure me we’ve really delivered.

The 11 Best Low-Code Development Platforms

When it comes to app development, low-code is the future. Many companies and organizations are already turning to low-code or no-code solutions for their business-related software needs. While low-code is changing the game when it comes to app development, is low-code really the way to go? And if low-code tools really are the right choice for your organization, how do you go about finding the right platform for your business goals? And what should you even be looking for in a low-code platform?

The Ultimate Map to finding Halloween candy surplus

As Halloween night quickly approaches, there is only one question on every kid’s mind: how can I maximize my candy haul this year with the best possible candy? This kind of question lends itself perfectly to data science approaches that enable quick and intuitive analysis of data across multiple sources.

Do you want to build an ETL pipeline?

Analysts and data scientists use SQL queries to pull data from the data storage underbelly of an enterprise. They mold the data, reshape it, and analyze it, so it can offer revenue-generating business insights to the company. But analytics is only as good as the material it works with. That is, if the underlying data is missing, compromised, incomplete, or wrong, so will the data analysis and inferences derived from it.

Introducing Qlik Forts - Brief Overview and Demo

Qlik Forts is a highly scalable virtual appliance managed by Qlik, configured to run where your data resides either on premise or in a public or private cloud in any region – and its designed to work with existing Qlik Cloud tenants. It can be deployed and managed from the Qlik Sense SaaS console and provides a seamless experience to your Qlik Sense users, who all have a single login through the Qlik Cloud hub and consistent analytics experience, whether analytics are running in Qlik Cloud, Qlik Forts, or both.

Why Your Company Needs API Management

When you need to get new features and products out the door fast, API management is the driving force to keeping everything on track. According to the "State of API Integration 2020" report by Cloud Elements, 83 percent of respondents say that API integration has become a critical part of their business strategy as they move forward on their digital transformation efforts. Leveraging integrations is what helps these companies agile and able to respond quickly to customer demands.

How to Bring Breakthrough Performance and Productivity To AI/ML Projects

By Jean-Baptiste Thomas, Pure Storage & Yaron Haviv, Co-Founder & CTO of Iguazio You trained and built models using interactive tools over data samples, and are now working on building an application around them to bring tangible value to the business. However, a year later, you find that you have spent an endless amount time and resources, but your application is still not fully operational, or isn’t performing as well as it did in the lab. Don’t worry, you are not alone.

Cloudera Machine Learning Workspace Provisioning Pre-Flight Checks

There are many good uses of data. With data, we can monitor our business, the overall business, or specific business units. We can segment based on the customer verticals or whether they run in the public or private cloud. We can understand customers better, see usage patterns and main consumption drivers. We can find customer pain points, see where they get stuck, and understand how different bugs affect them.

New Features in Cloudera Streams Messaging Public Cloud 7.2.12

With the launch of the Cloudera Public Cloud 7.2.12, the Streams Messaging for Data Hub deployments have gotten some interesting new features! From this release, Streams Messaging templates will support scaling with automatic rebalancing allowing you to grow or shrink your Apache Kafka cluster based on demand.

The Business Case for Sustainable Supply Chains Is in the Data

Businesses and consumers are getting better at recognizing the direct carbon cost of the products they use. As such, we’re seeing an increased use of sustainable materials in consumer goods and global products. That is a big positive trend, but there’s a bigger picture to explore. Value chains make up 90% of an organization’s environmental impact, according to the Carbon Trust.

Top 9 Data Aggregation Tools

With all the data that exists today, understanding how to aggregate data is more important than ever. However, before you can go about aggregating your data, you must understand more about it and learn what tools will best suit your needs in your pursuits. Read on to learn more about data aggregation and to pick up some tips on how to select the best aggregation software for your business.

How to Automate Apache NiFi Data Flow Deployments in the Public Cloud

With the latest release of Cloudera DataFlow for the Public Cloud (CDF-PC) we added new CLI capabilities that allow you to automate data flow deployments, making it easier than ever before to incorporate Apache NiFi flow deployments into your CI/CD pipelines. This blog post walks you through the data flow development lifecycle and how you can use APIs in CDP Public Cloud to fully automate your flow deployments.

How to Transform Data using Xplenty

Xplenty is an interactive data transformation and integration platform that offers a wide range of low and no-code transformations to help transform data from any source to any destination. In this video, we will be transforming data using Xplenty's low and no-code transformations. In our demonstration today, we will bring two different database tables into one table destination and perform numerous types of common data transformations.

How to Setup a Database Connection in Amazon RDS with Xplenty

In this video, we will set up a database connection in Amazon RDS with Xplenty. AWS RDS is a feature-rich and mature offering that makes it easy for even less tech-savvy users to operate their own relational databases. Xplenty can help you gain the most benefit from your Amazon RDS deployment. Xplenty offers comprehensive data analysis and management and can integrate your Amazon RDS database with dozens of other services and products. In this way, you can use the technology stack that gives your business the greatest advantage.

Quality Engineering Discussions: 5 Questions with James Espie

In this series, real (and really good) QA practitioners use their experience to support—or debunk what you might know about software quality. James Espie is a test specialist, a quality engineering proponent, and a continuous learner from Auckland, New Zealand. He shares his insights and sporadic bursts of inspiration in a hilarious newsletter called Pie-mail. If you haven’t seen it, you should check it out.

How API Management Works

According to Paolo Malinverno, research vice president at Gartner, “We already live in an API economy where CIOs must look beyond APIs as technology and instead build their company's business models, digital strategies, and ecosystems on them." Application Programming Interfaces (APIs) represent reusable code that forms the backbone of these digital transformation strategies. Understanding how API management works can help you determine how best to leverage your systems to gain the most value.

How to Use Heroku Postgres to Migrate Data

Have you ever needed to move data from a local database to the cloud? Or even just between different databases? Heroku Postgres might be just the tool you need to help with all of your data migration needs. Read on to learn how you can utilize Heroku Connect to migrate data between Heroku Postgres and Salesforce.

M1 Democratizes Data Analytics with Snowflake as Part of Its Digital Transformation

Vibrant and dynamic digital-first telco M1, a subsidiary of Keppel Corporation, is focused on transforming telecommunications in Singapore. M1 provides a suite of services to more than 2 million customers and is Singapore’s first digital network operator. Data analytics have played an essential role in M1’s growth since the company launched commercial services in 1997.

Why Explainability Is the Foundation of Trust in Data

When you were a child, how many times did you use the argument with your parents “but everyone else is doing it” to rip holes in the knees of your jeans or dye your hair blonde or steal beer out of the fridge to take to a party? Only to be countered with “well, would you jump off a cliff just because all your friends did?” Wow, the frustration I felt at such a logical but basic argument.

ClearML-Data Lemonade: getting local datasets quickly and easily

Congratulations on creating a clean(ish) dataset to use for training! Now while the dataset is stored where it’s accessible to everyone, the distribution itself is a hassle! Local workstations, local GPU machines, and cloud machines (that may be spun up and down without disk persistence) are getting data everywhere. …and to say it is annoying is an understatement!

Managing Cost & Resources Usage for Spark

Spark jobs require resources - and those resources? They can be pricey. If you're looking to speed up completion times, optimize costs, and reduce resource usage for your Spark jobs, this is the webinar for you.For Spark jobs running on-premises, optimizing resource usage is key. For Spark jobs running in the cloud, for example on Amazon EMR or Databricks, adding resources is a click away - but it’s an expensive click, so cost management is critical.

Xplenty's Rest API Component Tutorial

In this video, we will set up a Rest API connector using Xplenty. Combining your business data in one secure destination is desirable if you want to analyze and operationalize that data at scale. Xplenty's REST API connector helps you connect to many popular business SaaS and other digital services. Advanced Features of Xplenty's REST API Connector Pagination Schemes When retrieving data in large amounts, it is typical for any API to incorporate a paging mechanism. The commonly used methods for pagination are.

How to Setup a Snowflake Connection within Xplenty

In this video, we will set up a Snowflake connection with Xplenty. Many companies are now strategically using their data assets as the core of their digital transformation efforts. Building a data pipeline can help companies combine information from all their systems to gain insights into their business. A recent study showed that 59.5% of corporations using advanced analytics saw measurable results. Using a warehouse such as Snowflake can help companies leverage this information to drive business strategy.

Operationalizing AI: Lessons from the Field

A casual stroll through recent tech headlines in the past few years makes two things abundantly clear: investment in AI is at an all-time high, and companies really struggle to get value out of AI technology. At first glance, these ideas seem to be at odds with each other: why consider investing in a field that hasn’t lived up to the hype? If you dig into the details, you’ll notice that a gap exists between the development and production use of AI in many companies.

Real-Time Anomaly Detection: Solving Problems and Finding Opportunities

Success in today’s high-velocity business environments means having the correct information to make the right decisions at the right time. As marketplaces grow more competitive and customer expectations continually rise, the “right time” is often real-time. Every transaction generates a plethora of data. Anomalies within your company’s data set can represent opportunities and threats to the business.

5 Secrets to Integrating Snowflake

Many companies are now strategically using their data assets as the core of their digital transformation efforts. Building a data pipeline can help companies combine information from all their systems to gain insights into their business. A recent study showed that 59.5% of corporations using advanced analytics saw measurable results. Using a warehouse such as Snowflake can help companies leverage this information to drive business strategy.

How to Gain Greater Confidence in your Climate Risk Models

We are just over one week until the UN Climate Change Conference of the Parties, COP26 convenes in Glasgow. As governments gather to push forward climate and renewable energy initiatives aligned with the Paris Agreement and the UN Framework Convention on Climate Change, financial institutions and asset managers will monitor the event with keen interest.

Leveraging Automation Technologies for Data Governance

Modern Data Foundation for AI-Driven Results The following is Part II of a three-part series. In Part I of this series, I noted the following: “With just a few clicks on my smart device, I can review data on every place I’ve been, how much I spent, each step I took, what the weather was like and who I was with. Businesses collect the same abundance of data. However, are we getting the benefit and insights from what’s collected?

5 Tips for Pushing Data from Your Warehouse to Salesforce

Salesforce is the most popular CRM (customer relationship management) platform in the world. As last count, the Salesforce CRM enjoyed a 20 percent market share, with more than 150,000 companies among its customers. One of Salesforce’s best features is its ability to run business intelligence (BI) and analytics workloads to identify hidden trends and insights and make smarter decisions for your sales, marketing, and customer support teams.

Introducing Self-Service, No-Code Airflow Authoring UI in Cloudera Data Engineering

Airflow has been adopted by many Cloudera Data Platform (CDP) customers in the public cloud as the next generation orchestration service to setup and operationalize complex data pipelines. Today, customers have deployed 100s of Airflow DAGs in production performing various data transformation and preparation tasks, with differing levels of complexity.

The Snowflake Media Data Cloud Enables Disney Advertising Sales' Innovative Clean Room Data Solution

Snowflake’s newly announced Media Data Cloud unites Snowflake’s powerful data sharing technology, the highest standards of privacy and governance, Snowflake- and partner-delivered solutions, and industry-specific data sets to help marketers, publishers, and advertising technology businesses succeed in the rapidly changing media and entertainment industry.

Developing a Basic Web Application using an Operational DB on CDP

In this video, you'll see a simple demo on how you can build a web application on top of a Cloudera Operational Database. We'll leverage the Apache Phoenix integration to easily write SQL statements against our database and use the python flask library to power the back end API calls. The web application will be hosted within Cloudera Machine Learning, showcasing some of the benefits of having your data within a hybrid data platform.

Top 5 Informatica Alternatives

Informatica Power Center blends four data engineering products into one system, making it one of the most feature-rich but complicated ETL/ELT platforms on the market. With data management, app integration, API gateway, and iPaaS features, smaller teams that struggle to tame this all-encompassing tool might seek out an Informatica alternative instead.

Fueling Sustainable Power Starts With Data

When power company executives were asked to list the most important issues facing their organizations, 45% overwhelmingly cited their top concern as “renewables, sustainability or the environment.” At the same time, global energy consumption continues to rise faster than the population. These two realities are reshaping the international energy sector. The push to produce more energy, in a greener manner, is propelling the industry to re-imagine the power grid of the future.

Don't settle for multi-cloud. Aspire to cross-cloud.

Organizations are more often running data and applications on multiple clouds, and that’s great. However, multi-cloud isn’t enough. To let loose the true power of data on your business, you must be cross-cloud. Cross-cloud means data moves easily between multiple public clouds without any additional work. It means never worrying about where your data and applications live or where your business and technical people are located.

New Snowflake Features Released in September 2021

Support for unstructured data is now in public preview! That’s one of many of the exciting announcements made in September, in addition to a new serverless tasks feature, expanded public cloud regions, enhanced business continuity capabilities, and several new providers on Snowflake Data Marketplace.

User Profiles Are Indispensable to Great CX (and They Must Be Privacy-Compliant)

Until not very long ago, many organizations defined business strategies based on isolated figures used by isolated teams. Unfortunately for them, that approach just won’t cut it anymore. Instead, we are fastly moving to a world where data will be the critical asset for businesses, and handling it accurately will be essential to survival.

The Complete Guide to Data Modeling Techniques

Every day, data analysts face the challenge of understanding and creating reports from their data warehouse. This difficulty may stem from the fact that data comes from multiple locations and often does not cohesively align. However, with a few tools at your disposal, you can efficiently create data modeling reports based on your organization's unique business needs. These techniques are some of the most effective data modeling techniques for data analytics teams everywhere.

Apache Ozone - A High Performance Object Store for CDP Private Cloud

As organizations wrangle with the explosive growth in data volume they are presented with today, efficiency and scalability of storage become pivotal to operating a successful data platform for driving business insight and value. Apache Ozone is a distributed, scalable, and high performance object store, available with Cloudera Data Platform Private Cloud.

How Do You Choose the Right Data Career Path?

I am a firm believer that data is shaping the world around us, and so I constantly strive to drive awareness about the value of data and how it can bring great benefits to businesses and people. And, as I explained to Joe DosSantos in the latest episode of Data Brilliant, that means helping people identify the right data career path for them, and helping organizations understand the roles and skills they need within their business to help them succeed.

Do you want to get the most out of your HubSpot data?

HubSpot is one of the leading CRMs for fast-growing companies. It allows you to run your marketing and sales pipelines smoothly from a single web application. But there is a problem. HubSpot offers limited tools to track and analyze your data. To answer your most pressing analytical queries you need to work with the raw data that is hidden in your CRM.

The Role of Data in Financial Service Mergers

Ned Lowe, CTO of Singlife shares how during their merger with Aviva Singapore, the need for scalability in a data platform and a single view of the one million plus customers was needed. In the process of bringing everything together, Singlife uses Snowflake to fuel future growth. Being in a regulated industry, Snowflake’s security and governance features have ensured their customer’s data is protected.

The Ultimate Guide to Building a Data Pipeline

Data is the new oil. Almost every industry is becoming more and more data-driven, and this trend will only continue to grow in the coming years. With so many organizations now relying on data for decision-making, they must easily access and analyze their information through data pipelines. This article will get you started on how to build your own data pipeline.

Announcing CDP Public Cloud Regional Control Plane in Australia and Europe

We’re excited to announce CDP Public Cloud Regional Control Plane in Australia and Europe. This addition will extend CDP Hybrid capabilities to customers in industries with strict data protection requirements by allowing them to govern their data entirely in-region.

Disrupting the credit landscape with data: A Q&A with Credit Karma CTO, Ryan Graciano

Financing in America can be a confusing and complex process. The myriad offerings, rates, and forms are daunting for even the savviest consumer. Credit Karma simplifies the lending process by anonymizing individual borrower data and procuring multiple financing offers depending on what the consumer is looking to finance. Whether it be a sofa, a car, or a house, customers no longer need to fill out multiple forms; Credit Karma is their one-stop credit application.

AWS and CDC: A Dream Team for CDC

According to a study by Seagate, only 32% of data available to enterprises is put to work. The remaining 68% is unleveraged. One of the challenges noted is: making the different silos of collected data available. Using automation to bring together figures from disparate systems helps leaders make confident and reliable decisions backed by real-time information. This overview discusses how to use CDC Change Data Capture to enable real-time analysis.

7 Tips for Building an API Management Strategy

With tech constantly innovating, various tools and software need to be able to work simultaneously. API's are what makes this possible. The API provides software with a way to interact and communicate. This is how different applications connect, whether it's two internal apps or integrating an app into another company's platform. However, with APIs, as in any software integration, many challenges are likely to arise. To mitigate these challenges, an API management strategy should be in place.

Your Parents Still Don't Know What a Hashtag Is. Let's Teach Them the Basics of Machine Learning and Streaming Data

Quite often, the digital natives of the family — you — have to explain to the analog fans of the family what PDFs are, how to use a hashtag, a phone camera, or a remote. Imagine if you had to explain what machine learning is and how to use it. There’s no need to panic. Cloudera produced a series of ebooks — Production Machine Learning For Dummies, Apache NiFi For Dummies, and Apache Flink For Dummies (coming soon) — to help simplify even the most complex tech topics.

How to Turn your Data Center into a True Private Cloud

According to Domo, on average, every human created at least 1.7 MB of data per second in 2020. That’s a lot of data. For enterprises the net result is an intricate data management challenge that’s not about to get any less complex anytime soon. Enterprises need to find a way of getting insights from this vast treasure trove of data into the hands of the people that need it. For relatively low amounts of data, public cloud is a possible path for some organizations.

What is new in Cloudera Streaming Analytics 1.5?

At the end of May, we released the second version of Cloudera SQL Stream Builder (SSB) as part of Cloudera Streaming Analytics (CSA). Among other features, the 1.4 version of CSA surfaced the expressivity of Flink SQL in SQL Stream Builder via adding DDL and Catalog support, and it greatly improved the integration with other Cloudera Data Platform components, for example via enabling stream enrichment from Hive and Kudu.

The Road Ahead: Digital Infrastructure For the Data-Driven

When conversations turn to digital transformation, it is usually smart phone apps, dizzying feats of AI, and domestic robots that capture all the attention. But the unsung heroes of I.T. know the truth: digital infrastructure is the foundation on which data-driven transformations are built. And increasingly we’re talking about two qualities digital infrastructure must possess if it is to be an effective backbone for digital transformation.

5 Tips for Pushing Data from Your Warehouse to Marketo

Nurturing your customer prospects is a crucial task to increase your conversion rates—so how can you make sure that valuable leads don’t fall through the cracks? That’s exactly where software like Marketo comes in. Marketo is a marketing automation tool to help standardize and automate your marketing funnel, helping push your prospects along the buyer’s journey from awareness to acquisition.

Accelerate Your Data Mesh in the Cloud with Cloudera Data Engineering and Modak Nabu

Modak, a leading provider of modern data engineering solutions, is now a certified solution partner with Cloudera. Customers can seamlessly automate migration to Cloudera’s cloud-based enterprise platform CDP from on-prem deployments and dynamically auto-scale cloud services with Cloudera Data Engineering (CDE)’s integration with Modak Nabu™.

The Rubber-Band Effect: How Organizations Are Catching Up To Themselves

In 2020, in response to the pandemic, we saw an urgent shift to SaaS and various emerging technologies. It was covered at length in “Introducing Trends 2021 – 'The Great Digital Switch'.” Largely driven by necessity, organizations needed to make drastic moves “to keep the lights on” and cater to operations in a more virtual and remote style. This big leap forward drastically changed the IT landscape and infrastructure in a lot of organizations.

Do you want to create and automate a digital marketing report?

When your marketing team manages a myriad of social media platforms (from Facebook to TikTok), it is hard to keep an eye on the ROI of your marketing efforts. Each platform comes with its own set of dashboards that provide marketing analytics. But the trends and insights from those dashboards are limited to each specific platform. Knowing how you performed on Google Ads tells you very little whether you should increase your Facebook advertising expenditures.

How to Transfer Data from Postgres to Salesforce

While Salesforce is an excellent tool for storing and managing information about your contacts, leads, accounts, and other metadata pieces, the power of the system lies in its ability to integrate with other applications. One particularly powerful integration comes from integrating data in a Postgres database into Salesforce. Read on to learn more about Heroku Postgres, Salesforce, and how to transfer data most efficiently.

Admission Control Architecture for Cloudera Data Platform

Apache Impala is a massively parallel in-memory SQL engine supported by Cloudera designed for Analytics and ad hoc queries against data stored in Apache Hive, Apache HBase and Apache Kudu tables. Supporting powerful queries and high levels of concurrency Impala can use significant amounts of cluster resources. In multi-tenant environments this can inadvertently impact adjacent services such as YARN, HBase, and even HDFS.

Watch now: 3 secrets CROs need to know before going to market

Taking a product, service, or even company to market is the fulcrum that can tip the scales to success or failure for the entire organization. In this session from SaaStr Annual 2021, Talend CRO Ann-Christel Graham will lay out strategies that have shaped Talend's own path to transformation: the focus on continuous and maniacal research and validation of customer needs, focus on the customer lifecycle in its entirety to help create predictable revenue, and the need to plan for scale.

New Report Shares Best Practices for Modern Enterprise Data Management in Multi-Cloud World

A new report from Raconteur highlights the most important trends shaping the future of enterprise data management in 2021. The Raconteur Future of Data Report is packed with valuable insights that reveal how the world’s leading businesses are generating and collecting more data than ever before, and how they’re innovating to make better use of it.

Talend iPaaS momentum grows. Talend recognized in the 2021 Gartner Magic Quadrant for Enterprise iPaaS

As organizations continue to embrace cloud-based computing as the cornerstone of their digital transformation, the integration platform as a service (iPaaS) has become a critical component of their integration environments. An iPaaS solution simplifies the integration of data, applications, and systems, whether in the cloud or on-premises, through unified support for API, application, data, and B2B integration styles.

How Cloudera DataFlow Enables Successful Data Mesh Architectures

In this blog, I will demonstrate the value of Cloudera DataFlow (CDF), the edge-to-cloud streaming data platform available on the Cloudera Data Platform (CDP), as a Data integration and Democratization fabric. Within the context of a data mesh architecture, I will present industry settings / use cases where the particular architecture is relevant and highlight the business value that it delivers against business and technology areas.

The Great Data Revolution Is Here, and Qlik Customers Are at the Heart of It

Data – the amount we create, how we create it, how it is accessed (think both people and Artificial Intelligence/machines), and how we use it to inform, propel and influence everyone and everything is one of the biggest challenges and opportunities we face in our lifetime. And it’s driving enormous change.

The Data Chief Live: Beyond the Buzz in Data Mesh, Lakehouse, Data Warehouse

Join The Data Chief Live on October 7 to go beyond the buzz on all things data mesh, lakehouse, and data warehouse. Gain clarity on what is hype, what is real, and how others are delivering business value faster with modern data platforms and processes. You'll hear live from Darren Pedroza, VP Enterprise Data and Analytics, First Command Financial Services, Inc., Zhamak Dehghani, Director of Emerging Technologies at Thoughtworks & author of The Data Mesh, Chris D'Agostino, Global Field CTO Databricks & me.

Processing DICOM Files With Spark on CDP Hybrid Cloud

In this video, you will see how you can use PySpark to process medical images from an MRI and convert them from DICOM format to PNG. The data is read from and written to AWS S3 and we leverage numpy and the pydicom libraries to do the data transformation. We are using data from the "RSNA-MICCAI Brain Tumor Radiogenomic Classification" Kaggle competition but this approach can be used for general purpose DICOM processing.

Why I joined Continual

Today, I’m excited to share that I’ve joined Continual as Head of Marketing. Continual is radically simplifying the path to operational AI with the first continual AI platform built for the modern data stack. More in a bit on what that means, but the “so what?” is about opening the door for more organizations to embed AI across their business at scale.

How Did You Detect and Handle Change Data Capture from The Source?

“Businesses that manage their data effectively derive unique insights from it and tend to move quicker and be leaders in their industries.” That’s according to Kim Stevenson, Senior Vice President and General Manager of NetApp Foundational Data Services However, that data is only helpful if you can find what you need when you need it and put it to good use. This guide discusses CDC Change Data Capture to provides up-to-date information for real-time analytics.

Snowflake BUILD 2021: Opening Keynote

Did you miss the Snowflake Build Data Cloud Dev Summit keynote or—like it so much you want to watch it again? Well, you’re in the right place. Join our SVP of Engineering and Support, Greg Czajkowski, as he kicks off this year’s Snowflake BUILD, shares his vision for the Data Cloud and the opportunity it presents for developers, and features unique applications built on Snowflake. Greg will be joined on stage by Chris Child, Sr. Director of Product, who will highlight the recently launched Powered by Snowflake partner program, including interviews with SK, VideoAmp, and Human who will share their Snowflake story. Chris will also be joined by Snowflake product experts to make exciting announcements and demo cool new products for developers.

5 Common API Management Tools and their Usage

Customers today expect feature-reach and user-friendly access to technology that makes their lives easier. They expect to use this technology to engage with companies anywhere, anytime, and with any device. Organizations must be able to not just meet, but exceed, these expectations. Failing to do so could result in lost revenue as customers move on to competitors.

Struggling to Manage your Multi-Tenant Environments? Use Chargeback!

If your organization is using multi-tenant big data clusters (and everyone should be), do you know the usage and cost efficiency of resources in the cluster by tenants? A chargeback or showback model allows IT to determine costs and resource usage by the actual analytic users in the multi-tenant cluster, instead of attributing those to the platform (“overhead’) or IT department. This allows you to know the individual costs per tenant and set limits in order to control overall costs.

An Introduction to Ranger RMS

Cloudera Data Platform (CDP) supports access controls on tables and columns, as well as on files and directories via Apache Ranger since its first release. It is common to have different workloads using the same data – some require authorizations at the table level (Apache Hive queries) and others at the underlying files (Apache Spark jobs). Unfortunately, in such instances you would have to create and maintain separate Ranger policies for both Hive and HDFS, that correspond to each other.

Meet The Analyst - and Data Scientist - of the Future: Investec's Quaanitah Manique and Keotshepile Mosito

This blog is part of our ongoing "meet the analyst of the future" series, which profiles analysts who are transforming their organizations and supercharging their careers by embracing the future of analytics today.

Lenses.io joins forces with Celonis to bring streaming data to business execution

Today, I’m thrilled to announce that Lenses.io is joining Celonis, the leader in execution management. Together we will raise the bar in how businesses are run by driving them with real-time data, making the power of streaming open, operable and actionable for organizations across the world. When Lenses.io began, we could never have imagined we’d reach this moment.

Introducing the Kafka to Celonis Sink Connector

Apache Kafka has grown from an obscure open-source project to a mass-adopted streaming technology, supporting all kinds of organizations and use cases. Many began their Apache Kafka journey to feed a data warehouse for analytics. Then moved to building event-driven applications, breaking down entire monoliths. Now, we move to the next chapter. Joining Celonis means we’re pleased to open up the possibility of real-time process mining and business execution with Kafka.

5 Tips for Pushing Data From Your Warehouse to Zendesk

For decades, ETL (extract, transform, load) has been a mainstay of organizations’ data integration workflows, moving information from various sources to a centralized data warehouse. More recently, however, businesses have been shaking things up with reverse ETL, which flips the sources and targets by sending data from your warehouse to third-party systems like Zendesk.

What Is a Citizen Integrator?

Organizations at the forefront of their field leverage speed and agility to maintain their competitive edge. The key factor driving this agility is collaboration, which can only be achieved by providing every department with access to the data they need to perform their jobs more efficiently. According to Gartner, digital optimization blurs the classic ownership and activity boundaries within and beyond IT.

Xplenty Acquires FlyData, Adding Data Replication To Our Product Suite

We're excited to announce that Xplenty has acquired the fastest real-time data replication platform on the market, FlyData. This further expands our product suite and adds database replication to our customer offering. This was the highest requested feature from our customers, so we're happy to announce that we now have CDC (change data capture) capabilities.

Our reflections on the 2021 Gartner Magic Quadrant for Data Quality Solutions

Success for any business starts with data that is easily discoverable, understandable, and of value to the people who need it. We call this type of data “healthy data.” You should look at a wide set of measures and metrics to determine whether data is healthy or not, but at the core of all healthy data is a high level of quality.

Four Pillars of an Agile Data Infrastructure

Forbes Insights defines the modernized data center as being built to change just as much as it is built to last. One of the key pillars for a modernized data center is an agile data infrastructure. The Forbes Insights briefing explains, “This means it’s not wedded to any specific deployment method or solution set.

Business Forecasting with Cosmo, Chief Destiny Officer

For background on Cosmo, Chief Destiny Officer, and his alternative methods, read the earlier posts in this series about The role of a CDO and Customer Segmentation. Predicting the future always sounds a little magical, but we were intrigued to meet a CDO who says he actually makes business forecasts using magic instead of modeling. While at Talend we work exclusively with healthy data, we’ve always wondered what goes on at organizations that don’t rely on data for business decisions.

Everything You Need to Know About Netsuite and Salesforce Integration

Netsuite and Salesforce are both prominent SaaS providers. The two cloud-based software companies are home to powerful marketing automation, sales automation, and customer service tools. While Netsuite and Salesforce are both powerful as standalone entities, their tools become even more useful to companies when they are used together. That’s why finding the perfect data integration system to connect Netsuite and Salesforce is essential to your business.

Solving the Cloud Cost Paradox

“You’re crazy if you don’t start in the cloud; you’re crazy if you stay on it,” quips a recent blog post from Andreessen Horowitz. Many companies are surprised by cloud bills that have shot past their expectations. Indeed, IDC predicts increased investment in public cloud cost management through 2022 as enterprises seek to cut cloud waste by 50%. This is the Cloud Cost Paradox.

Closing the Gap Between Data and Action - All in One Cloud

A favorite moment of mine is when I get to share Qlik’s vision for Active Intelligence with a customer for the first time. It usually goes like this: genuine excitement about the possibility – taking informed action in the moment from real-time data…invariably followed by many questions – where do I begin? What do I need? What about the tech stack I have already acquired?