Data Discovery and Exploration (DDE) was recently released in tech preview in Cloudera Data Platform in public cloud. In this blog we will go through the process of indexing data from S3 into Solr in DDE with the help of NiFi in Data Flow. The scenario is the same as it was in the previous blog but the ingest pipeline differs. Spark as the ingest pipeline tool for Search (i.e.
The new normal has changed the way we work and the way we conduct business. More and more employees are working from home, customers are shopping online, and everyone’s phone is still attached to their ears. Bottom line: everything we’re doing in business and in our personal lives is leaving a digital trail. In fact, now devices are getting in the game and creating more data than people, 277 times more, according to Cisco.
The COVID-19 pandemic has changed nearly everything. It’s affected nearly all Americans, and as such, it’s impacted every organization they interact with, both B2C and B2B. One industry that has had its operations turned upside down is the grocery industry. Grocery stores and their consumer packaged goods (CPG) suppliers and partners had to improvise and adapt nearly overnight to accommodate the changing demands of shoppers.
Understanding the data we collect is essential—it allows us to identify trends and uncover answers about our world. However, stories in our data frequently go untold. Large datasets are hard to share between research communities due to their size, security restraints, and complexity. Even if these datasets are accessible to users, the tools needed to query them often require deep technical knowledge.
October happens to be the month to celebrate World Smile Day when Harvey Ball, the inventor of the smiley face declared this day as such to give people a reason to smile. This month, BigQuery users have a lot of new reasons to smile about with the release of new user-friendly SQL capabilities now generally available.
Running a large commercial airline requires the complex management of critical components, including fuel futures contracts, aircraft maintenance and customer expectations. Airlines, in just the U.S. alone, average about 45,000 daily flights and transporting over 10 million passengers a year (source: FAA). Airlines typically operate on very thin margins, and any schedule delay immediately angers or frustrates customers.
Apache Spark unifies batch processing, real-time processing, stream analytics, machine learning, and interactive query in one-platform. While Apache Spark provides a lot of capabilities to support diversified use cases, it comes with additional complexity and high maintenance costs for cluster administrators. Let’s look at some of the high-level requirements for the underlying resource orchestrator to empower Spark as a one-platform.