Systems | Development | Analytics | API | Testing

May 2024

Contributing to Apache Kafka: How to Write a KIP

I’m brand new to writing KIPs (Kafka Improvement Proposals). I’ve written two so far, and my hands sweat every time I hit send on an email with ‘ KIP’ in the title. But I’ve also learned a lot from the process: about Apache Kafka internals, the process of writing KIPs, the Kafka community, and the most important motivation for developing software: our end users. What did I actually write? Let’s review KIP-941 and KIP-1020.

Product Update: Boost Databricks productivity, performance, and efficiency

Today, 65% of IT decision-makers believe their company is falling behind the competition in using data and analytics. Why? Organizations want real-time insights, fraud/anomaly detection, trend analysis, and systems monitoring. The good news – data teams that use DataOps practices and tools will be 10 times more productive.With this in mind, Unravel is hosting a live event to share new capabilities to help you achieve productivity, performance, and cost efficiency with Databricks’ Data Intelligence Platform.

Exploring Data Provenance: Ensuring Data Integrity and Authenticity

Data provenance is a method of creating a documented trail that accounts for data’s origin, creation, movement, and dissemination. It involves storing the ownership and process history of data objects to answer questions like, “When was data created?”, “Who created the data?” and “Why was it created? Data Provenance is vital in establishing data lineage, which is essential for validating, debugging, auditing, and evaluating data quality and determining data reliability.

What Is Metadata Why Is It Important?

Metadata refers to the information about data that gives it more context and relevance. It records essential aspects of the data (e.g., date, size, ownership, data type, or other data sources) to help users discover, identify, understand, organize, retrieve, and use it—transforming information into business-critical assets. Think of it as labels on a box that describe what’s inside. Metadata makes it easier to find and utilize the data that you need. Typical metadata elements include.

5 Ways Advertising, Media and Entertainment Companies are Using Gen AI

The emergence of generative AI (gen AI) heralds a new, groundbreaking era for advertising, media and entertainment. According to a recent Snowflake report, Advertising, Media and Entertainment Data + AI Predictions 2024, gen AI is going to transform the industry — from content creation to customer experience. The companies that will come out ahead during this time are those that most successfully and quickly supercharge their data strategy.

What Separates Hybrid Cloud and 'True' Hybrid Cloud?

Hybrid cloud plays a central role in many of today’s emerging innovations—most notably artificial intelligence (AI) and other emerging technologies that create new business value and improve operational efficiencies. But getting there requires data, and a lot of it. More than that, though, harnessing the potential of these technologies requires quality data—without it, the output from an AI implementation can end up inefficient or wholly inaccurate.

Insight With Eyesight: Qlik Introduces a New Era of Visualization

Our ability to tell stories is an art form as old as language itself. From ancient cave paintings to oral traditions passed through generations, the essence of stories has evolved alongside our communication methods. It began with visual tales etched on cave walls, transitioned into spoken narratives, and eventually found its way into written, printed, and typed forms.

What is Metadata Management? Benefits, Framework, Tools, Use Cases, Best Practices

Before shedding light on metadata management, it is crucial to understand what metadata is. Metadata refers to the information about your data. This data includes elements representing its context, content, and characteristics. It helps you discover, access, use, store, and retrieve your data, having a wide spread of variations. Metadata of an image. Image by Astera. Let’s look at some of the metadata types below.

Graph API: Boost Your Data Skills

In today's data-driven world, the ability to seamlessly connect, manage, and manipulate vast amounts of data is paramount for businesses and developers alike. Graph API stands at the forefront of this technological frontier, offering robust tools that facilitate complex data interactions within applications. This powerful API provides a framework for accessing and integrating data points in an intuitive and effective manner, supporting dynamic data structures across various platforms.

Analyzing AWS Audit Logs in Real Time Using Confluent Cloud and Amazon EventBridge

Last year, we introduced the Connect with Confluent partner program, enabling our technology partners to develop native integrations with Confluent Cloud. This gives our customers access to Confluent data streams from within their favorite applications and allows them to extract maximum value from their data.

Preserving Data Privacy in Life Sciences: How Snowflake Data Clean Rooms Make It Happen

The pharmaceutical industry generates a great deal of identifiable data (such as clinical trial data, patient engagement data) that has guardrails around “use and access.” Data captured for the intended purpose of use described in a protocol is called “primary use.” However, once anonymized, this data can be used for other inferences in what we can collectively define as secondary analyses.

Q&A: Events, Eventing, and Event-Driven Architectures | The Duchess & The Doctor Show

For their inaugural episode, Anna McDonald (the Duchess), Matthias J. Sax (the Doctor), and their extinct friend, Phil, wax rhapsodic about all things eventing: you’ll learn why events are a mindset, why the Duchess thinks you’ll find event immutability relaxing, and why your event streams might need some windows. The Duchess & The Doctor Show features a question-driven format that delivers substantial, yet easily comprehensible answers to user-submitted questions on all things events and eventing, including Apache Kafka, its ecosystem, and beyond!

Revolutionizing The Data Cloud With Snowflake CEO Sridhar Ramaswamy

To kick off the fifth season of "The Data Cloud Podcast," host Steve Hamm is joined by Snowflake CEO Sridhar Ramaswamy. In this episode, Sridhar explains why organizations need to have a data strategy in order to implement a successful AI strategy. He also discusses the steps involved in creating a foundation model from scratch and why he believes AI is the glue that will bind enterprise software together.

Accelerate Your Time Series Analytics with Snowflake's ASOF JOIN, Now Generally Available

Time series data is everywhere. It captures how systems, behaviors and processes change over time. Enterprises across industries, such as Internet of Things (IoT), financial services, manufacturing and more, use this data to drive business and operational decisions. When using time series data to perform analytics and drive decisions, it’s often necessary to join several data sets.

All You Need to Know About Data Aggregation

Data aggregation is the process of combining and summarizing data from disparate sources into a cohesive dataset. It prepares data for analysis, making it easier to obtain insights into patterns and insights that aren’t observable in isolated data points. Once aggregated, data is generally stored in a data warehouse. Then, you can leverage it to gain a holistic perspective on your operations and market trends, design effective risk management practices, and make more informed decisions overall.

Core Infrastructure Requirements for Today's Data Workloads

There's no doubt that, as a technology provider/integrator, you're likely seeing many customers across all segments looking to advanced analytics and artificial intelligence for optimizing their growth. Given the vast volume of data that these innovations consume or create, it's clear to see the importance of being able to offer your customers reliable, secure, sustainable and scalable data infrastructure solutions.

Demo | Snowflake Data Clean Rooms

In December 2023, Snowflake announced its acquisition of data clean room technology provider Samooha. Samooha’s intuitive UI and focus on reducing the complexity of sharing data led to it being named one of the most innovative data science companies of 2024 by Fast Company. Now, Samooha’s offering is integrated into Snowflake and launched as Snowflake Data Clean Rooms, a Snowflake Native App on Snowflake Marketplace.

Hevo for Enterprise Data Excellence

The needs of enterprise data systems are unique. With massive workloads and complex intertwined systems, data teams need partners who can understand their requirements intimately. Hevo can be that partner for you! With a robust system that is seamlessly simple but infinitely scalable, we are ready to support your large data workloads. Our host of new features show our commitment to understanding enterprise needs. With Hevo, you can experience effortless data movement for enterprises.

Introducing Cloudera's AI Assistants

In the last couple of years, AI has launched itself to the forefront of technology initiatives across industries. In fact, Gartner predicts the AI software market will grow from $124 billion in 2022 to $297 billion in 2027. As a data platform company, Cloudera has two very clear priorities. First, we need to help customers get AI models based on trusted data into production faster than ever.

Ensuring Comprehensive Cyber Resilience and Business Continuity

When a data breach occurs, your response is critical. What do you do first? Do you have a plan for communicating with business units, regulators and other concerned parties? The integrity and security of data infrastructure stand as paramount concerns for business leaders across all sectors. As technology evolves and threats become more sophisticated, the pursuit of an unbreakable data infrastructure remains an ongoing challenge.

What Is a Business Glossary? Definition, Components & Benefits

A solid understanding of internal technical and business terms is essential to manage and use data. Business glossaries are pivotal in this aspect, facilitating improved understanding and collaboration among teams. A business glossary breaks down complex terms into easy-to-understand definitions, ensuring that everyone in the organization, from the newest recruit to the CEO, is on the same page regarding business language.

What is a Data Pipeline?

A data pipeline is a series of processes that move raw data from one or more sources to one or more destinations, often transforming and processing the data along the way. Data pipelines are designed to automate the flow of data, enabling efficient and reliable data movement for various purposes, such as data analytics, reporting, or integration with other systems.

Navigating the Enterprise Generative AI Journey: Cloudera's Three Pillars for Success

Generative AI (GenAI) has taken the world by storm, promising to revolutionize industries and transform the way businesses operate. From generating creative content to automating complex tasks, the potential applications of GenAI are vast and exciting. However, implementing GenAI in an enterprise setting comes with its own set of challenges. At Cloudera, we understand the complexities of enterprise GenAI adoption.

Logi Symphony Soars in Latest Dresner Business Intelligence Report

insightsoftware’s Logi Symphony, a leading embedded analytics solution, continues to impress. According to a recent Dresner Advisory Services’ Wisdom of Crowds® Business Intelligence Market Study, Logi Symphony has been recognized as a leader in the field. This recognition highlights Logi Symphony’s commitment to exceptional customer experience and its strong reputation within the BI and analytics industry.

How ClearML Helps Teams Get More out of Slurm

It is a fairly recent trend for companies to amass GPU firepower to build their own AI computing infrastructure and support the growing number of compute requests. Many recent AI tools now enable data scientists to work on data, run experiments, and train models seamlessly with the ability to submit their jobs and monitor their progress. However, for many organizations with mature supercomputing capabilities, Slurm has been the scheduling tool of choice for managing computing clusters.

ClearML Supports Seamless Orchestration and Infrastructure Management for Kubernetes, Slurm, PBS, and Bare Metal

Our early roadmap in 2024 has been largely focused on improving orchestration and compute infrastructure management capabilities. Last month we released a Resource Allocation Policy Management Control Center with a new, streamlined UI to help teams visualize their compute infrastructure and understand which users have access to what resources.

GenAI: Navigating the Risks That Come with Change

For enterprises, commercial use of AI is still in its early stages, and it’s a case of risk and reward, weighing up both and investigating the best way forward. Of course, there’s much to gain from the use of AI. Already, companies are providing better customer service, parsing complex information through natural language inputs, and generally making workflows faster.

Accelerating Deployments of Streaming Pipelines - Announcing Data in Motion on Kubernetes

Organizations are challenged today to become both more data driven and more nimble to adapt quickly to changing conditions. These challenges are the driving forces behind much of their digital transformation or “modernization” efforts.

What is Online Transaction Processing (OLTP)?

OLTP is a transaction-centric data processing that follows a three-tier architecture. Every day, businesses worldwide perform millions of financial transactions. This fact brings to mind client-facing personnel such as bank tellers and supermarket cashiers tapping away on keyboards and at cash registers, and with good reason. According to ACI Worldwide, a payment systems company, there was a 42.2% growth in global real-time transaction volumes in 2023, amounting to 266.2 billion transactions.

Best Data Mining Tools in 2024

Data mining, also known as Knowledge Discovery in Data (KDD), is a powerful technique that analyzes and unlocks hidden insights from vast amounts of information and datasets. Data mining goes beyond simple analysis—leveraging extensive data processing and complex mathematical algorithms to detect underlying trends or calculate the probability of future events.

Snowflake Cortex LLM Functions Moves to General Availability with New LLMs, Improved Retrieval and Enhanced AI Safety

Snowflake Cortex is a fully-managed service that enables access to industry-leading large language models (LLMs) is now generally available. You can use these LLMs in select regions directly via LLM Functions on Cortex so you can bring generative AI securely to your governed data. Your team can focus on building AI applications, while we handle model optimization and GPU infrastructure to deliver cost-effective performance.

How Healthcare and Life Sciences Organizations Are Accelerating Data, Apps and AI Strategy in the Data Cloud

Accelerate Healthcare and Life Sciences is a one-day virtual event, featuring technology and business leaders from Elevance Health, Ginkgo Bioworks, Datavant and more, to discover executive priorities, best practices and potential data and AI challenges that are top of mind for 2024.

How to Create a Heat Grid in Yellowfin Dashboards

Welcome to the latest entry in Yellowfin Japan’s ‘How to?’ blog series! In our previous blog, we created a pie chart that aggregates on the basis of category. By using set analysis, we were able to represent the percentage of Books Only in the numeric display, based on the composition of the items ordered.

SaaS in 60 - Cyclic Group Dimensions

This week a new Dimension type for Master Item Dimensions is available, Cyclic groups. You all know that Master Items offer a centralized, reusable governed library of measures, expressions, visualizations and both single and drillable dimensions – now with Cyclic group defined dimensions – you can dynamically cycle through dimensions in every chart, at the same time using a click of a button or simple selection. Offering new ways to analyze data, save precious screen space and enable multiple use cases on a single chart.

Financial Optimization for SAP Bundle Flyover

Is your finance team spending too much time on data management and not focusing enough on delivering valuable financial insights? Without a proper reporting tool, your finance team is bogged down by manual processes and inefficient reporting cycles. Shorten time-consuming tasks and reduce your dependency on IT with insightsoftware Financial Optimization for SAP. This user-friendly solution provides finance teams with end-to-end process optimization that bridges the SAP to Excel gap, ultimately creating a faster, streamlined, and less error-prone work environment.

Data Filtering: A Comprehensive Guide to Techniques, Benefits, and Best Practices

Data filtering plays an instrumental role in reducing computational time and enhancing the accuracy of AI models. Given the increasing need for organizations to manage large volumes of data, leveraging data filtering has become indispensable.

Reimagine Batch and Streaming Data Pipelines with Dynamic Tables, Now Generally Available

Since Snowflake’s Dynamic Tables went into preview, we have worked with hundreds of customers to understand the challenges they faced producing high-quality data quickly and at scale. The No. 1 pain point: Data pipelines are becoming increasingly complex. This rising complexity is a result of myriad factors.

Better See and Control Your Snowflake Spend with the Cost Management Interface, Now Generally Available

Snowflake is dedicated to providing customers with intuitive solutions that streamline their operations and drive success. As part of our ongoing commitment to helping customers in this way, we’re introducing updates to the Cost Management Interface to make managing Snowflake spend easier at an organization level and accessible to more roles.

Data Accessibility: A Hurdle Before SAP's AI Integration

Unlocking the power of AI within SAP for your team requires overcoming a significant hurdle: data accessibility. SAP data’s complexity, spread across various modules, creates silos of information that your team might struggle to understand and utilize effectively. Inaccessible or misaligned SAP data will hinder your AI system’s ability to learn and deliver valuable results specific to your organization.

Data Prep for AI: Get Your Oracle House in Order

Despite the transformative potential of AI, a large number of finance teams are hesitating, waiting for this emerging technology to mature before investing. According to a recent Gartner report, a staggering 61% of finance organizations haven’t yet adopted AI. Finance has always been considered risk averse, so it is perhaps unsurprising to see that AI adoption in finance significantly lags other departments.

Get Your AI to Production Faster: Accelerators For ML Projects

One of the worst-kept secrets among data scientists and AI engineers is that no one starts a new project from scratch. In the age of information there are thousands of examples available when starting a new project. As a result, data scientists will often begin a project by developing an understanding of the data and the problem space and will then go out and find an example that is closest to what they are trying to accomplish.

Confluent Unveils New Capabilities to Apache Flink Offering to Simplify AI and Bring Stream Processing to Workloads Everywhere

Confluent's new AI Model Inference seamlessly integrates AI and ML capabilities into data pipelines. Confluent's new Freight clusters offer cost-savings for high-throughput use cases with relaxed latency requirements.

Introducing Confluent Cloud Freight Clusters

We’re excited to introduce Freight clusters—a new type of Confluent Cloud cluster designed for high-throughput, relaxed latency workloads that is up to 90% cheaper than self-managing open source Apache Kafka®. Freight clusters utilize the latest innovations in Confluent Cloud’s cloud-native engine, Kora, to deliver low cost networking by trading off ultra low latency performance.

Top Data Governance Tools for 2024

According to Gartner, 80% of companies worldwide are expected to have efficient data management systems in place by 2025. This projection highlights the growing recognition of data governance tools as essential enablers for maintaining and enhancing the quality and security of organizational data within these data management systems. In this blog, we will talk about some of the best data governance tools and software to consider in 2024.

Snowflake's Arctic-TILT: A State-of-the-Art Document Intelligence LLM in a Single A10 GPU

The volume of unstructured data — such as PDFs, images, video and audio files — is surging across enterprises today. Yet documents, which represent a substantial portion of this data and hold significant value, continue to be processed through inefficient and manual methods.

A Look At Gurucul's Threat Detection, Investigation, And Response Platform

In this episode of "Powered by Snowflake, host Julian Forero chats with Nilesh Dherange, Co-founder and CTO of Gurucul, about data, cybersecurity, and machine learning in the context of Gurucul's threat detection, investigation, and response platform. Founded in 2010, Gurucul's mission has always been to bring together different silos of cybersecurity data. In this conversation, Nilesh provides an in-depth demo showing how the platform works, while explaining how its capabilities have advanced over time through the use of machine learning models and the adoption of Snowflake as one of its supported data lakes.

Behind The Scenes Of Snowflake Open Source LLM Arctic

Snowflake CEO Sridhar Ramaswamy and Snowflake Head of AI Baris Gultekin join Adrien Treuille, Director of Product Management, to discuss the launch of Snowflake Arctic, the latest enterprise-grade, truly open large language model (LLM). Snowflake Arctic stands out in the competitive landscape with its exceptional efficiency and cost-effectiveness, emphasizing Snowflake's commitment to open-source development and the future of enterprise AI. Join us to explore how Arctic is set to revolutionize industries by making advanced AI more accessible and trustworthy.

Snowflake: Optimize performance with data observability

Right now, 55% of companies surveyed are failing to achieve time to value with their data and AI investments. Why? Their skilled engineers spend too much time doing toilsome work and optimizing data workloads for performance and efficiency is complicated. With this in mind, Unravel is hosting a live event to help you leverage Unravel to achieve productivity and performance with Snowflake. Watch this 15 minute live event about optimizing performance with data observability with Clinton Ford, VP of Product Marketing at Unravel and Eric Chu, VP of Product at Unravel.

The Ultimate Guide to Retail Data Analytics

Whether you love, hate or remain indifferent towards data, it’s impossible to deny its importance in today’s business landscape. Businesses across all sectors and industries collect data and perform data analysis, to better understand their customers and business processes, in an effort to boost productivity, reduce expenditure and gain competitive advantage.

Snowflake Data Clean Rooms for Marketing

In less than 5 minutes, Ankur Abhishek, Senior Product Manager at Snowflake, demostrates how Snowflake Data Clean Rooms can be used for audience overlap, audience lookalike, and attribution analysis. As Kamakshi Sivaramakrishnan, Senior Director of Product Management at Snowflake, explains, "This is the full marketing lifecycle brought in its entirety in a Snowflake clean room, run securely with multiple parties collaborating with each other. This is demystifying clean rooms.".