Systems | Development | Analytics | API | Testing

Optimization Strategies for Iceberg Tables

Apache Iceberg has recently grown in popularity because it adds data warehouse-like capabilities to your data lake making it easier to analyze all your data—structured and unstructured. It offers several benefits such as schema evolution, hidden partitioning, time travel, and more that improve the productivity of data engineers and data analysts. However, you need to regularly maintain Iceberg tables to keep them in a healthy state so that read queries can perform faster.

New with Confluent Platform: Seamless Migration Off ZooKeeper, Arm64 Support, and More

With the increasing importance of real-time data in modern businesses, companies are leveraging distributed streaming platforms to process and analyze data streams in real time. Many companies are also transitioning to the cloud, which is often a gradual process that takes several years and involves incremental stages. During this transition, many companies adopt hybrid cloud architectures, either temporarily or permanently.

What is AI Analytics?

Imagine your software transforming from merely a tool into a strategic partner that can automatically alert your users to trends, provide explanations of data with a click, and help you ask the right questions of your data-sets - in addition to delivering data-led insights. This is the power of AI analytics solutions for independent software vendors (ISV). Today's users demand more than just functionality. They crave intelligent software that analyzes data, surfaces insights, and empowers them to act.

High Availability (Multi-AZ) for Cloudera Operational Database

In the previous blog post we covered the high availability feature of Cloudera Operational Database (COD) in Amazon AWS. Cloudera recently released a new version of COD, which adds HA support to Microsoft Azure-based databases in the Cloud. In this post, we’ll perform a similar test to validate that the feature works as expected in Azure, too.

Exploring the Top 7 Benefits of Self-hosted Analytics for Businesses

Imagine having the keys to a vault where every piece of data about your business is stored—not just any vault, but one that you built, control, and customize according to your precise specifications! This is the empowering reality of self-hosted analytics. It's like being the captain of your ship, navigating through the vast ocean of digital information with the confidence that comes from knowing every inch of your vessel.

DNS Zone Setup Best Practices on Azure

In Cloudera deployments on public cloud, one of the key configuration elements is the DNS. Get it wrong and your deployment may become wholly unusable with users unable to access and use the Cloudera data services. If the DNS is set up less ideal than it could be, connectivity and performance issues may arise. In this blog, we’ll take you through our tried and tested best practices for setting up your DNS for use with Cloudera on Azure.

Nuclio Demo

Nuclio is a high-performance serverless framework focused on data, I/O, and compute intensive workloads. It is well integrated with popular data science tools, such as Jupyter and Kubeflow; supports a variety of data and streaming sources; and supports execution over CPUs and GPUs. The Nuclio project began in 2017 and is constantly and rapidly evolving; many start-ups and enterprises are now using Nuclio in production. In this video, Tomer takes you through a quick demo of Nuclio, triggering functions both from the UI and the CLI.