Keeping Small Queries Fast - Short query optimizations in Apache Impala
This is part of our series of blog posts on recent enhancements to Impala. The entire collection is available here.
This is part of our series of blog posts on recent enhancements to Impala. The entire collection is available here.
Here at Cloudera, we’ve seen many large organizations struggle to meet ever-changing and ever-growing business demands. We see it everywhere. Traditional on-premise architectures, which create a fixed, finite set of resources, forces every business request for new insight to be a crazy resource balancing act, coupled with long wait times, or a straight-up no, it cannot be done.
Two of the more painful things in your everyday life as an analyst or SQL worker are not getting easy access to data when you need it, or not having easy to use, useful tools available to you that don’t get in your way! As one of my dear customers, a data worker in Pharma, said to me: “I really don’t care about bells and whistles, I just want to get my task done.” This simple statement captures the essence of almost 10 years of SQL development with modern data warehousing.
Many organizations struggle to meet growing and variable data warehouse demands. No matter how much they pad their annual IT budgets, there never seems to be enough capacity to cover unexpected business requests. This leads to resource restrictions for the various business units that use the platform. When business units are not well served by central IT, “shadow IT” emerges.
It’s all about the Customer Customers today expect services to be highly personalized. In a digital world tuned to understand your likes, dislikes, interests and preferences we expect a similar level of customization in all aspects of our lives. Insurance is no different. Insurance is not something the average consumer thinks about every day but when a life changing event happens, insurance becomes extremely important. It is in this “Moment of Truth” that insurers excel or fail.
Users today are asking ever more from their data warehouse. This is resulting in advancements of what is provided by the technology, and a resulting shift in the art of the possible. As an example of this, in this post we look at Real Time Data Warehousing (RTDW), which is a category of use cases customers are building on Cloudera and which is becoming more and more common amongst our customers.
With the massive explosion of data across the enterprise — both structured and unstructured from existing sources and new innovations such as streaming and IoT — businesses have needed to find creative ways of managing their increasingly complex data lifecycle to speed time to insight.
The customer has never been more right. Across industries, customers have become conditioned to demand not only near-instant responses to their needs but that their needs be anticipated in advance. Financial institutions are not given a pass, despite a competitive landscape flooded with regulation and privacy considerations. The customer still has expectations for a personalized, timely, and relevant experience.
Cloudera services logs offer a breadth of information to assist in cluster maintenance; from assisting in security checks, auditing tasks, and validation for performance tuning and testing tasks – to name a few. However, log records generated by these services do not hold the same value for every organisation.
At Cloudera Fast Forward we work to make the recently possible useful. Our goal is to take the incredible data science and machine learning research developments we see emerging from academia and large industrial labs, and bridge the gap to products and processes that are useful to practitioners working across industries.