Systems | Development | Analytics | API | Testing

Latest Posts

Beyond Connectivity - Top 5 Ways Data and Analytics Drive Transformation in Telecom

The telecommunications industry is in the midst of a fundamental reinvention and transformation. Faced with a range of emerging pressures – including consolidation, a changing competitive landscape, and commoditization of traditional services – communication service providers (CSPs) are seeking new revenue streams and novel business approaches.

Distributed model training using Dask and Scikit-learn

The theoretical bases for Machine Learning have existed for decades yet it wasn’t until the early 2000’s that the last AI winter came to an end. Since then, interest in and use of machine learning has exploded and its development has been largely democratized. Perhaps not so coincidentally, the same period saw the rise of Big Data, carrying with it increased distributed data storage and distributed computing capabilities made popular by the Hadoop ecosystem.

The Real Role of Robotics in Retail

Automation and robotics in retail is rapidly changing the retail landscape – so much so that there are clearly winners and losers. I’m not talking about the war between brick and mortar stores and digital marketplaces, but rather I’m talking about the retail digital revolution where the winners are delivering greater than 4.5% comparable store/ channel sales growth compared to their brothers that have not embraced automation and robotics.

Maximizing performance of Apache Kudu block cache with Intel Optane DCPMM

Intel Optane DC persistent memory (Optane DCPMM) has higher bandwidth and lower latency than SSD and HDD storage drives. These characteristics of Optane DCPMM provide a significant performance boost to big data storage platforms that can utilize it for caching. One of such platforms is Apache Kudu that can utilize DCPMM for its internal block cache.

The Retail Renaissance - How data and analytics are reshaping retail

The retail landscape is in the midst of a dramatic, data-driven renaissance. New tools help to build new connections — between consumers and retailers, and across supply chains. Data analytics and machine learning further these connections to better understand and predict customer behavior and improve demand forecasting. In this emerging era of smart retail, organizations have access to a range of powerful new capabilities and tools.

5 Steps to Making Better Business Decisions with Machine Learning

Most of the day to day work for knowledge workers is spent helping the business make better decisions, like choosing whether it’s worth expending the effort (or actual money) to achieve the desired business goal. The example I often use when talking about ML is churn prediction (and I’m starting to think I’m overusing it now). It costs money to retain a customer who is thinking of moving, but this is less than the cost of getting new customers.

Take Control of Your Destiny, Leave Retail Laggards in the Dust

Ongoing reports of the “Retail Apocalypse” were fueled once again in 2019 with more than a dozen well-known retail brands closing their doors forever. On the flip side, a “Retail Renaissance” is well underway – and signs indicate that retail leaders that have already invested in their digital transformation journey will continue to reap rewards well into the future.

Why Data Chain of Custody is Essential to Reducing Product Liability Risks

When a market grows as quickly as implantable medical devices, set to top a staggering $153.8 billion by 2026, the potential risk to patients can rise as well. As implantable medical devices proliferate, so do the number of costly, life-threatening, and reputation-tarnishing recalls. A single large recall can account for millions of device units.

Real-time log aggregation with Apache Flink Part 2

We are continuing our blog series about implementing real-time log aggregation with the help of Flink. In the first part of the series we reviewed why it is important to gather and analyze logs from long-running distributed jobs in real-time. We also looked at a fairly simple solution for storing logs in Kafka using configurable appenders only. As a reminder let’s review our pipeline again

Benchmarking Ozone: Cloudera's next-generation Storage for CDP

Apache Hadoop Ozone was designed to address the scale limitation of HDFS with respect to small files and the total number of file system objects. On current data center hardware, HDFS has a limit of about 350 million files and 700 million file system objects. Ozone’s architecture addresses these limitations[4]. This article compares the performance of Ozone with HDFS, the de-facto big data file system.