Systems | Development | Analytics | API | Testing

Latest Posts

Growing AI Fast with ML-Ops: Breaking the barrier between research and production

AI models get smarter, more accurate, and therefore more useful over the course of their training on large datasets that have been painstakingly curated, often over a period of years. But in real-world applications, datasets start small. To design a new drug, for instance, researchers start by testing a compound and need to use the power of AI to predict the best possible permutation.

Building an MLOps infrastructure on OpenShift

Most data science projects don’t pass the PoC phase and hence never generate any business value. In 2019, Gartner estimated that “through 2022, only 20% of analytic insights will deliver business outcomes”. One of the main reasons for this is undoubtedly that data scientists often lack a clear vision of how to deploy their solutions into production, how to integrate them with existing systems and workflows and how to operate and maintain them.

Enabling distributed NLP research at SIL

In my main position, as a data scientist at SIL International, I work on expanding language possibilities with AI. Practically this includes applying recent advances in Natural Language Processing (NLP) to low resource and multilingual contexts. We work on things like spoken language identification, multilingual dialogue systems, machine translation, and translation quality estimation.

ClearML-Data Lemonade: getting local datasets quickly and easily

Congratulations on creating a clean(ish) dataset to use for training! Now while the dataset is stored where it’s accessible to everyone, the distribution itself is a hassle! Local workstations, local GPU machines, and cloud machines (that may be spun up and down without disk persistence) are getting data everywhere. …and to say it is annoying is an understatement!

Data management is ALL THE RAGE!

Everyone wants to manage their data, and if it’s a feature store, even better! But for optimal data management, we must first discuss lightweight zero upfront setup costs and maximizing utility with ClearML-data. ClearML-data mimics the light weightiness of git for data (who doesn’t know git?) and gives it a spin. It is an open-source dataset management tool which is extremely efficient and conveys how we view DataOps and its distinction from git-like solutions, including.

DataStore vs FeatureStore

I think it’s safe to say that one of the worst things in Machine Learning is the terminology. The maths and statistics are definitely part of the learning curve, but more than that, it feels like you are learning a new language. In some ways, you are. DataStore and FeatureStore are two of the current buzzwords that people are trying to understand. To be fair, DataStore and FeatureStore feel like family rather than strangers.