Systems | Development | Analytics | API | Testing

Unravel

Managing Costs for Spark on Databricks Webinar

Are you looking to optimize costs and resource usage for your Spark jobs on Databricks? Then this is the webinar for you. Overallocating resources, such as memory, is a common fault when setting up Spark jobs. And for Spark jobs running on Databricks, adding resources is a click away - but it’s an expensive click, so cost management is critical.

A Primer on Hybrid Cloud and Edge Infrastructure

Thank you for your interest in the 451 Research Report, Living on the edge: A primer on hybrid cloud and edge infrastructure. You can download it here. 451 Research: Living on the edge: A primer on hybrid cloud and edge infrastructure Published Date: October 11, 2021 Introduction Without the internet, the cloud is nothing. But few of us really understand what is inside the internet. What is the so-called ‘edge’ of the internet, and why does it matter?

Twelve Best Cloud & DataOps Articles

Interested in learning about different technologies and methodologies, such as Databricks, Amazon EMR, cloud computing and DataOps? A good place to start is reading articles that give tips, tricks, and best practices for working with these technologies. Here are some of our favorite articles from experts on cloud migration, cloud management, Spark, Databricks, Amazon EMR, and DataOps!

Managing Cost & Resources Usage for Spark

Spark jobs require resources - and those resources? They can be pricey. If you're looking to speed up completion times, optimize costs, and reduce resource usage for your Spark jobs, this is the webinar for you.For Spark jobs running on-premises, optimizing resource usage is key. For Spark jobs running in the cloud, for example on Amazon EMR or Databricks, adding resources is a click away - but it’s an expensive click, so cost management is critical.

Troubleshooting Databricks

The popularity of Databricks is rocketing skyward, and it is now the leading multi-cloud platform for Spark and analytics workloads, offering fully managed Spark clusters in the cloud. Databricks is fast and organizations generally refactor their applications when moving them to Databricks. The result is strong performance. However, as usage of Databricks grows, so does the importance of reliability for Databricks jobs - especially big data jobs such as Spark workloads. But information you need for troubleshooting is scattered across multiple, voluminous log files.

Spark Troubleshooting Solutions - DataOps, Spark UI or logs, Platform or APM Tools

Spark is known for being extremely difficult to debug. But this is not all Spark’s fault. Problems in running a Spark job can be the result of problems with the infrastructure Spark is running on, inappropriate configuration of Spark, Spark issues, the currently running Spark job, other Spark jobs running at the same time – or interactions among these layers.

Migrating Data Pipelines from Enterprise Schedulers to Airflow

At Airflow Summit 2021, Unravel’s co-founder and CTO, Shivnath Babu and Hari Nyer, Senior Software Engineer, delivered a talk titled Lessons Learned while Migrating Data Pipelines from Enterprise Schedulers to Airflow. This story, along with the slides and videos included in it, comes from the presentation.