The Modern Data Lakehouse: An Architectural Innovation

Imagine having self-service access to all business data, anywhere it may be, and being able to explore it all at once. Imagine quickly answering burning business questions nearly instantly, without waiting for data to be found, shared, and ingested. Imagine independently discovering rich new business insights from both structured and unstructured data working together, without having to beg for data sets to be made available.

Blending Data in the Data Warehouse

This is a guest post with exclusive content by Bill Inmon. Bill Inmon “is an American computer scientist, recognized by many as the father of the data warehouse. Inmon wrote the first book, first magazine column, held the first conference, and was the first to offer classes in data warehousing.” -Wikipedia. Our key points: One of the characteristics of most computing and analytical environments is that the environment consists of only one type of data.

Kubernetes Logs Collection with MiNiFi C++

The MiNiFi C++ agent provides many features for collecting and processing data at the edge. All the strengths of MiNiFi C++ make it a perfect candidate for collecting logs of cloud native applications running on Kubernetes. This video explains how to use the MiNiFi C++ agent as a side-car pod or as a DaemonSet to collect logs from Kubernetes applications. It goes through many examples and demonstrations to get you started with your own deployments. Don’t hesitate to reach out to Cloudera to get more details and discuss further options and integrations with Edge Flow Manager.

Top 10 must-read books for data and analytics leaders in 2022

It’s that time of year - back to school, back to books, and our annual must-read books for data and analytics leaders. Given the pace of change in our industry, continuous learning is a must, whether through networking, podcasting, or reading. To cull this year’s list, I focused mainly on books published in the last two years with the themes of data, analytics and AI. I scoured lists and reviews on Amazon, solicited ideas from social networks and got to reading.

Large Scale Industrialization Key to Open Source Innovation

We are now well into 2022 and the megatrends that drove the last decade in data—The Apache Software Foundation as a primary innovation vehicle for big data, the arrival of cloud computing, and the debut of cheap distributed storage—have now converged and offer clear patterns for competitive advantage for vendors and value for customers.