Systems | Development | Analytics | API | Testing

Latest Posts

A Better Approach to Controlling Modern Data Cloud Costs

As anyone running modern data applications in the cloud knows, costs can mushroom out of control very quickly and easily. Getting these costs under control is really all about not spending more than you have to. Unfortunately, the common approach to managing these expenses—which looks at things only at an aggregated infrastructure level—helps control only about 5% of your cloud spend.

Big Data Meets the Cloud

With interest in big data and cloud increasing around the same time, it wasn’t long until big data began being deployed in the cloud. Big data comes with some challenges when deployed in traditional, on-premises settings. There’s significant operational complexity, and, worst of all, scaling deployments to meet the continued exponential growth of data is difficult, time-consuming, and costly.

A Primer on Hybrid Cloud and Edge Infrastructure

Thank you for your interest in the 451 Research Report, Living on the edge: A primer on hybrid cloud and edge infrastructure. You can download it here. 451 Research: Living on the edge: A primer on hybrid cloud and edge infrastructure Published Date: October 11, 2021 Introduction Without the internet, the cloud is nothing. But few of us really understand what is inside the internet. What is the so-called ‘edge’ of the internet, and why does it matter?

Twelve Best Cloud & DataOps Articles

Interested in learning about different technologies and methodologies, such as Databricks, Amazon EMR, cloud computing and DataOps? A good place to start is reading articles that give tips, tricks, and best practices for working with these technologies. Here are some of our favorite articles from experts on cloud migration, cloud management, Spark, Databricks, Amazon EMR, and DataOps!

Spark Troubleshooting Solutions - DataOps, Spark UI or logs, Platform or APM Tools

Spark is known for being extremely difficult to debug. But this is not all Spark’s fault. Problems in running a Spark job can be the result of problems with the infrastructure Spark is running on, inappropriate configuration of Spark, Spark issues, the currently running Spark job, other Spark jobs running at the same time – or interactions among these layers.

Migrating Data Pipelines from Enterprise Schedulers to Airflow

At Airflow Summit 2021, Unravel’s co-founder and CTO, Shivnath Babu and Hari Nyer, Senior Software Engineer, delivered a talk titled Lessons Learned while Migrating Data Pipelines from Enterprise Schedulers to Airflow. This story, along with the slides and videos included in it, comes from the presentation.

Driving Data Governance and Data Products at ING Bank France

In this episode of Data+AI Battlescars, Sandeep Uttamchandani, Unravel Data’s CDO, speaks with Samir Boualla, CDO at ING Bank France, one of the largest banks in the world. They cover his battlescars in Driving Data Governance Across Business Teams and Building Data Products. At ING Bank France, Samir is the Chief Data Officer. He’s responsible for several teams that govern, develop, and manage data infrastructure and data assets to deliver value to the business.

Spark Troubleshooting, Part 1 - Ten Challenges

“The most difficult thing is finding out why your job is failing, which parameters to change. Most of the time, it’s OOM errors…” Jagat Singh, Quora Spark has become one of the most important tools for processing data – especially non-relational data – and deriving value from it. And Spark serves as a platform for the creation and delivery of analytics, AI, and machine learning applications, among others.

Simplifying Data Management at LinkedIn Part 2

In the second of this two-part episode of Data+AI Battlescars, Sandeep Uttamchandani, Unravel Data’s CDO, speaks with Kapil Surlaker, VP of Engineering and Head of Data at LinkedIn. In part one, they covered LinkedIn’s challenges related to metadata management and data access APIs. This second part dives deep into data quality.