Systems | Development | Analytics | API | Testing

Latest Posts

AI at Scale isn't Magic, it's Data - Hybrid Data

A recent VentureBeat article , “4 AI trends: It’s all about scale in 2022 (so far),” highlighted the importance of scalability. I recommend you read the entire piece, but to me the key takeaway – AI at scale isn’t magic, it’s data – is reminiscent of the 1992 presidential election, when political consultant James Carville succinctly summarized the key to winning – “it’s the economy”.

Cloudera's Open Data Lakehouse Supercharged with dbt Core(tm)

dbt allows data teams to produce trusted data sets for reporting, ML modeling, and operational workflows using SQL, with a simple workflow that follows software engineering best practices like modularity, portability, and continuous integration/continuous development (CI/CD).

Scaling Kafka Brokers in Cloudera Data Hub

This blog post will provide guidance to administrators currently using or interested in using Kafka nodes to maintain cluster changes as they scale up or down to balance performance and cloud costs in production deployments. Kafka brokers contained within host groups enable the administrators to more easily add and remove nodes. This creates flexibility to handle real-time data feed volumes as they fluctuate.

How to Distribute Machine Learning Workloads with Dask

Tell us if this sounds familiar. You’ve found an awesome data set that you think will allow you to train a machine learning (ML) model that will accomplish the project goals; the only problem is the data is too big to fit in the compute environment that you’re using. In the day and age of “big data,” most might think this issue is trivial, but like anything in the world of data science things are hardly ever as straightforward as they seem.

Data Governance and Strategy for the Global Enterprise

While the word “data” has been common since the 1940s, managing data’s growth, current use, and regulation is a relatively new frontier. Governments and enterprises are working hard today to figure out the structures and regulations needed around data collection and use. According to Gartner, by 2023 65% of the world’s population will have their personal data covered under modern privacy regulations.

Serverless NiFi Flows with DataFlow Functions: The Next Step in the DataFlow Service Evolution

Cloudera DataFlow for the Public Cloud (CDF-PC) is a cloud-native service for Apache NiFi within the Cloudera Data Platform (CDP). CDF-PC enables organizations to take control of their data flows and eliminate ingestion silos by allowing developers to connect to any data source anywhere with any structure, process it, and deliver to any destination using a low-code authoring experience.

Announcing GA of DataFlow Functions

Today, we’re excited to announce that DataFlow Functions (DFF), a feature within Cloudera DataFlow for the Public Cloud, is now generally available for AWS, Microsoft Azure, and Google Cloud Platform. DFF provides an efficient, cost optimized, scalable way to run NiFi flows in a completely serverless fashion. This is the first complete no-code, no-ops development experience for functions, allowing users to save time and resources.

The Top Three Entangled Trends in Data Architectures: Data Mesh, Data Fabric, and Hybrid Architectures

Data teams have the impossible task of delivering everything (data and workloads) everywhere (on premise and in all clouds) all at once (with little to no latency). They are being bombarded with literature about seemingly independent new trends like data mesh and data fabric while dealing with the reality of having to work with hybrid architectures. Each of these trends claim to be complete models for their data architectures to solve the “everything everywhere all at once” problem.