Taming the HiPPO with data-driven decision making
What is the HiPPO? Learn how organizations can overcome this phenomenon through data-driven decisions.
What is the HiPPO? Learn how organizations can overcome this phenomenon through data-driven decisions.
Data fabric might seem like yet another data management technological innovation you have to learn about—but this one is different from the rest. While technology continues to evolve, the data problems that enterprises face are in many cases getting worse over time: data, data, and more data is permeating the business.
We are announcing BigQuery ML inference engine, which allows you to run predictions on a broad range of models hosted across multiple locations.
New BigQuery pricing editions and autoscaling let you choose the right price-performance for your workloads, and pay for only what you use.
BigQuery data clean rooms can help organizations create and manage secure environments for privacy-centric data sharing, analysis, and collaboration.
Data management in the modern enterprise requires skill and stamina. With the mountainous rise in data volume and the complexity of managing disparate data sources across a wide variety of legacy systems and hybrid cloud environments, it can seem virtually impossible to successfully climb past enterprise data obstacles to success. However, data fabric technology eliminates some of those obstacles. Is it the answer for your data strategy?
Your team needs the latest, cutting-edge upgrades for reporting so they can communicate and collaborate via a central platform. How can you ensure they are enabled to access the real-time data and build the reports they need? Keep reading to see some benefits that upgrading to Hubble Enterprise can provide. Hubble Enterprise allows existing Hubble customers to achieve their business goals with easy, immediate access to business-critical data.
From DIY opportunity cost and pipeline maintenance to moving and transforming data, here’s how to judge the cost-effectiveness of ELT.
If you have managed a cloud data platform, you have undoubtedly gotten that call. You know the one, it's usually from finance or the office of the CFO, inquiring about your monthly spend. And it usually comes in one of two forms: While both are clear and present dangers to cloud data platform owners, they don’t have to be.
We recently rolled out our very own GPU autoscaler in Collaboration with Genesis Cloud and it has been quite a success. Also recently, YOLOv8 by Ultralytics was unveiled, the new king of object detection, segmentation and classification. In this blogpost we’ll see that you can train a computer vision model using the ClearML/Genesis Cloud autoscaler at a fraction of the cost of competing cloud services like AWS or GCP. And it even runs 100% off of green energy! 😎
As announced at Snowflake Summit 2022, Iceberg Tables combines unique Snowflake capabilities with Apache Iceberg and Apache Parquet open source projects to support your architecture of choice. As part of the latest Iceberg release, we’ve added catalog support to the Iceberg project to ensure that engines outside of Snowflake can interoperate with Iceberg Tables.
Are you considering venturing into the world of analytics engineering? Analytics engineers are the newest addition to data teams and sit somewhere between data engineers and data analysts. They are technical, business savvy, and love to learn. A huge part of an analytics engineer’s role is learning new modern data tools to implement within data stacks.
The evolution of healthcare has come a long way since local physicians made house calls and homespun remedies were formulated using items from the kitchen spice rack. Today’s healthcare is driven as much by the promise of emerging technologies centered on data processing and advanced analytics as by developing new and specialized drugs.
It’s been a decade since “connected” objects—commonly referred to as “the internet of things” (IoT)— reached broad audiences. Connected toothbrushes, sensors embedded in sneakers, and smart watches have started to change consumer behavior through a data-driven, gamified approach. Technology has rapidly evolved to handle large data volumes at high velocities and big data analytics. AI has become more democratized.
The best description of untrusted data I’ve ever heard is, “We all attend the QBR – Sales, Marketing, Finance – and present quarterly results, except the Sales reports and numbers don’t match Marketing numbers and neither match Finance reports. We argue about where the numbers came from, then after 45 minutes of digging for common ground, we chuck our shovels and abandon the call in disgust.” How would you go about fixing that situation?
Cloudera SQL Stream Builder (SSB) gives the power of a unified stream processing engine to non-technical users so they can integrate, aggregate, query, and analyze both streaming and batch data sources in a single SQL interface. This allows business users to define events of interest for which they need to continuously monitor and respond quickly. There are many ways to distribute the results of SSB’s continuous queries to embed actionable insights into business processes.
Instead of relying on inaccurate, messy and low-quality data, business leaders can instead drive innovation forward by turning to automated data management, creating a clean and steady data flow.
Recently, I published an article on whether self-service BI is attainable, and spoiler alert: it certainly is. Of course, anything of value usually does require a bit of planning, collaboration, and effort. After the article was published, I began having conversations with technical leaders, analysts, and analytics engineers, and the topic of data modeling for self-service analytics came up repeatedly.
As technology advances and digitization takes over, there is an expectation that our lives will be more simple. ‘Self-service’ capabilities like Self-Service BI are the manifestation of this expectation within many technologies. For most, ease of use is no longer enough. Now tools must be simple to use, and flexible enough to cater to a wide range of skills and intricacy of analysis.
Over the past handful of years, systems architecture has evolved from monolithic approaches to applications and platforms that leverage containers, schedulers, lambda functions, and more across heterogeneous infrastructures. Cloudera Data Platform (CDP) is no different: it’s a hybrid data platform that meets organizations’ needs to get to grips with complex data anywhere, turning it into actionable insight quickly and easily.
How is the modern data stack evolving and what will it look like in the future? Experts from Andreessen Horowitz, Accenture and Fivetran weigh in.
The value of embedded analytics is unmistakable. Application teams that embed dashboards and reports drive revenue, reduce customer churn, and differentiate their software from the competition. While embedded dashboards create real value, they can also come with real costs. These costs are not always visible when companies plan for their analytics offering but can significantly impact production, scale, and the speed of bringing analytics to market.
Why and how Fivetran’s usage-based pricing model ties cost to value for our customers.
DuckDB is competitive with the best commercial systems for small and medium data sizes, but is not (yet) good at scaling up to many CPU cores.
Learn how leading companies such as Snowflake, SpotOn and Red Ventures have seen massive analytics efficiencies and boosted ROI using modern data tools, including Fivetran and dbt.
BigQuery’s serverless architecture features storage and query optimizations that deliver transformational data analytics performance.
As the use of ChatGPT becomes more prevalent, I frequently encounter customers and data users citing ChatGPT’s responses in their discussions. I love the enthusiasm surrounding ChatGPT and the eagerness to learn about modern data architectures such as data lakehouses, data meshes, and data fabrics. ChatGPT is an excellent resource for gaining high-level insights and building awareness of any technology. However, caution is necessary when delving deeper into a particular technology.
Enterprise data. Whether you’re ready or not, it’s growing more and more important to organizational success every day. Business decisions, business insights, and business intelligence all depend on having the right data at the right time.
The Stitch team has released many new capabilities the last few months, and we're excited to share all the new features and improvements with you. In this blog post, we'll take a closer look at what's new in this latest release and how it can benefit you. Here's a quick overview of what's new in Winter '23 for Stitch: Let's dive into each of these new capabilities in more detail.
ServiceNow, Inc. offers a well-known SaaS application, with companies in multiple industries using it to help manage digital workloads for a variety of departments and operations. What if it was as easy as just a few clicks to get ServiceNow data directly into your Snowflake account so you could combine it with other data sources, including ERPs, HRs, and CRMs? Well, now it is.
In this post, I will demonstrate how to use the Cloudera Data Platform (CDP) and its streaming solutions to set up reliable data exchange in modern applications between high-scale microservices, and ensure that the internal state will stay consistent even under the highest load.
With the advent of cloud services, IT is transforming and evolving from being traditionally data center-centric to data-centric. The data center is no longer a physical location. It extends beyond the walls of the enterprise, to the cloud, and the edge where the majority of data is being generated.
Whether you call it self-service analytics or self-service business intelligence (BI), there has been much discussion about the perils, myths, promises, and prospects of successfully building self-service capability. Going forward, I’ll use the phrase “self-service BI” but you are welcome to substitute the words “self-service analytics”. So, is self-service BI actually attainable or just snake oil?
A guide to finding the happy balance between enough governance to tame data chaos and enough self-service to empower stakeholders.
We are thrilled to announce that the new DataFlow Designer is now generally available to all CDP Public Cloud customers. Data leaders will be able to simplify and accelerate the development and deployment of data pipelines, saving time and money by enabling true self service.
We just announced the general availability of Cloudera DataFlow Designer, bringing self-service data flow development to all CDP Public Cloud customers. In our previous DataFlow Designer blog post, we introduced you to the new user interface and highlighted its key capabilities. In this blog post we will put these capabilities in context and dive deeper into how the built-in, end-to-end data flow life cycle enables self-service data pipeline development.
How Sift Delivers Fraud Detection Workflow Backtesting at Scale powered by BigQuery.
Effective data governance builds a culture of trust and collaboration around data.
You don’t need to be a data science expert to make use of platform analytics to gain better insights into your business. From Google Analytics to deep-dive data analytics for your APIs, big data is your friend when it comes to understanding your company. The right analytics platform can open up your data in a user-friendly way that empowers you to fuel better-informed business decisions based on your performance, customer data and analytics.
The data we generate, store, and share is growing exponentially as the world inexorably digitizes. With the global data sphere expected to double in size by 2026 as organizations and consumers increasingly go online, automate, and digitize processes, the right tools are required to mine this massive trove of valuable data coming from a widening and diverse pool of sources globally. The competitive edge gained by rapidly converting complex data into business insights is a crucial growth driver.
The data catalog is a critical step in the movement toward becoming a data-driven business. Here are 4 table-stakes questions to ask yourself to deliver reliable, trusted data.
When was the last time you had all the data you needed to make a business decision? Hopefully, it was today. But if it wasn’t, you’re not alone. It’s increasingly difficult for people inside enterprise organizations to harness the power of their data. Even though good decisions are nearly impossible without good data, getting data into the right hands at the right time is easier said than done.
Recently I read a very informative article by Stephen Catanzano in Tech Target (Avoid data sprawl with a single source of truth). To be honest, this is an age-old challenge, and it's getting worse. IDC states that by 2025 the global datasphere will grow to 175 Zeta bytes and that 90% of the data in the world is a replica. Why does this matter? As Stephen points out in his article, a single source of truth is a fundamental concept in data management.
Today we’re excited to announce ThoughtSpot Sage, our new search experience that combines the power of GPT’s natural language processing and generative AI capabilities with the accuracy and security of our patented self-service analytics platform. With this new integration, data teams will be able to exponentially increase their impact across an organization as business users self-serve personalized, actionable, and trustworthy insights like never before.
When I was working at Google back in the mid 2000’s, we dealt with tens of billions of ad impressions a day, trained several machine learning models on years worth of historic data, and used frequently-updated models in ranking ads. The whole system was an amazing feat of engineering and there was no system out there that was even close to handling this much data. It took us years and hundreds of engineers to make this happen, today, the same scale can be achieved in any enterprise.
Every so often, different advocates across organizations ignore the Voice of the Customer. This may be due to changes in business priorities, redistribution of resources, focus on new trends, or that they clear a profit regardless. This brings the value of the customer's voice into question: should we still allocate time and effort towards listening to customers when following new trends is the norm? The short answer is a resounding yes.
Data integration pipelines supply valuable data from producers to consumers, but even the best pipelines can break. Now what?
Where your ELT provider normalizes your data can dramatically increase or decrease your compute costs.
Most organizations spend at least 37% (sometimes over 50%) more than they need to on their cloud data workloads. A lot of costs are incurred down at the individual job level, and this is usually where there’s the biggest chunk of overspending. Two of the biggest culprits are oversized resources and inefficient code. But for an organization running 10,000s or 100,000s of jobs, finding and fixing bad code or right-sizing resources is shoveling sand against the tide.
Retail companies can easily visualize and analyze their geospatial data in BigQuery using the CARTO platform.
Today, data is the lifeblood of every organization. Without it, you’re left without key insights into business processes, consumers, employees, and the overall health of your entire organization. And here’s the scariest part: most organizations are on life support due to their inability to properly manage their data and turn it into actionable insights. Data fabric technology can help in a significant way, which is why it’s been getting so much buzz recently.
Unify data from diverse sources and formats using Aiven Kafka's open source streaming. Analyze with BigQuery for swift, accurate insights.
Elements of the Google Data Cloud including BigQuery, DataFlow, and Vertex AI are behind Glean’s personalized enterprise search platform.
Data fabrics are getting a lot of attention lately, and for good reason. But, for any topic with a lot of hype, there also tends to be a lot of confusion. If you are still trying to fully grasp where the concept of a data fabric architecture fits amongst all of the warehouses, lakes, lakehouses, and meshes of the data engineering world, let's set the record straight. What is a data fabric? A data fabric is a toolset that connects data across disparate sources to create a unified data model.
Monthly active rows enable data professionals and businesses of all sizes to maximize the value of Fivetran.
Recently, we announced enhanced multi-function analytics support in Cloudera Data Platform (CDP) with Apache Iceberg. Iceberg is a high-performance open table format for huge analytic data sets. It allows multiple data processing engines, such as Flink, NiFi, Spark, Hive, and Impala to access and analyze data in simple, familiar SQL tables.
As a product manager, there are many characteristics that you need to embody to successfully manage a product team and launch a product. The product development process requires a lot of upfront planning before it even moves to production. And even then, the actual management of the design and development process has a greater set of requirements.
Starting March 1, 2023, users can access HVR capabilities rebranded as Fivetran Local Data Processing as well as deploy Fivetran in the cloud, on-premises and in hybrid environments.
To get the most out of any application, a graphical user interface improves your efficiency and data streaming without exception. A UI should help you through the steps of an often-complex flow as the visible layer between your problem and solution. Even the most hardcore back end enthusiasts will admit that its significance is undeniable for a complete product. It has to be well organized and easy to understand, yet be able to provide the right tools in the right place.