Systems | Development | Analytics | API | Testing

Machine Learning

Building ML Pipelines Over Federated Data & Compute Environments

A Forbes survey shows that data scientists spend 19% of their time collecting data sets and 60% of their time cleaning and organizing data. All told, data scientists spend around 80% of their time on preparing and managing data for analysis. One of the greatest obstacles that make it so difficult to bring data science initiatives to life is the lack of robust data management tools.

How to Run Spark Over Kubernetes to Power Your Data Science Lifecycle

Spark is known for its powerful engine which enables distributed data processing. It provides unmatched functionality to handle petabytes of data across multiple servers and its capabilities and performance unseated other technologies in the Hadoop world. Although Spark provides great power, it also comes with a high maintenance cost. In recent years, innovations to simplify the Spark infrastructure have been formed, supporting these large data processing tasks.

The Machine Learning Collaboration Tool You'll Want to Ride Solo - User Story

I’ll admit it. I am a gushing fan of this new product from Allegro AI called Allegro Trains. I’m not sure what to call it — what noun I should attach to this creature. “Framework” and “Platform” have become, to my ears, rather meaningless jargon designed to detach suit-wearing types from their money. “Harness” is close.

MLOps for Python: Real-Time Feature Analysis

Data scientists today have to choose between a massive toolbox where every item has its pros and cons. We love the simplicity of Python tools like pandas and Scikit-learn, the operation-readiness of Kubernetes, and the scalability of Spark and Hadoop, so we just use all of them. What happens? Data scientists explore data using pandas, then data engineers use Spark to recode the same logic to scale or with live streams or operational databases.

What's All the Hype About? Iguazio Listed in Five 2020 Gartner Hype Cycles

We are delighted to announce that Iguazio has been named a sample vendor in the 2020 Gartner Hype Cycle for Data Science and Machine Learning, as well as four additional Gartner Hype Cycles for Infrastructure Strategies, Compute Infrastructure, Hybrid Infrastructure Services, and Analytics and Business Intelligence, among industry leaders such as DataRobot, Amazon Web Services, Google Cloud Platform, IBM and Microsoft Azure (some of whom are also close partners of ours).

Predicting 1st-Day Churn in Real-Time - MLOps Live #7 - With Product Madness (an Aristocrat co.)

Michael Leznik - Head of Data Science Matthieu Glotz - Data Scientist Yaron Haviv - CTO & Co-Founder We discuss how technology and new work processes can help the gaming and mobile app industries predict and mitigate 1st-day (or D0) user churn in real time — down to minutes and seconds using modern streaming data architectures such as KAPPA. Also, we explore feature engineering improvements to the RFM (Recency, Frequency, and Monetary) churn prediction framework: The Discrete Wavelet Transform (DWT).

Predicting 1st Day Churn in Real Time

Survival analysis is one of the most developed fields of statistical modeling, with many real-world applications. In the realm of mobile apps and games, retention is one of the initial focuses of the publisher once the app or game has been launched. And it remains a significant focus throughout most of the lifecycle of any endeavor.