Cost Saving Best Practices For Databricks Workflows

DatabricksCostAI

May 23

Introduction

Managing pipeline costs effectively is crucial when using Databricks’ Workflows. This article provides practical tips to help you reduce your total cost of ownership without sacrificing performance, as well as tips to help you have a better understanding of your costs. These insights will guide you in optimizing resource usage, ensuring you get the most out of your Databricks environment.

1. Use job compute and spot instances to pay less for the same performance as all purpose clusters

Job clusters are significantly less expensive than all-purpose clusters, consuming less DBUs, which is the portion of the compute that is paid out to Databricks. Couple that with using spot instances when you build a fault-tolerant pipeline, and your costs can sometimes nearly cut in half. For example, the all purpose DS5 v2 cost $4.47 per hour, but if you use job compute, that drops to $3.42. Furthermore, if you add spot instances, that bring the cost down to $2.367, a nearly 50% discount.

*All Purpose* *Compute With Photon Pricing, Azure East*

*Job Compute* *With Photon Pricing, Azure East*

2. Leverage “Warnings” to alert you of pipelines taking longer than expected

Run your job a few times to establish a baseline as to how long each job should run for. Add an additional 5% - 20% of time to the highest execution time you obtained, and set that time as your job execution "Warning". This means you will get an email notification when your job is running longer than that threshold, allowing you to research what is going on.

3. Safeguard against worst-case scenarios with “Timeouts”

Considering your specific pipeline and fault tolerance, add a "Timeout". With a "Timeout", you are telling Databricks that under no circumstance should this job take longer than the time you've set, and if it does, it is better to stop the workflow and for you to figure out what went wrong. I like to set this to be double the warning's time in case there is anything I can do to fix it without re-running the job.

4. Leverage task dependencies to orchestrate code execution efficiently

Utilize task dependencies in Databricks to streamline your workflows by ensuring tasks are executed in the correct order. By setting up dependencies, you can prevent downstream tasks from starting until all necessary upstream tasks have completed successfully. This method reduces errors and unnecessary compute usage by avoiding execution of tasks when criteria is not met.

To implement this, map out your workflow to identify dependent tasks, and configure your job orchestrations accordingly. This will help you maintain a smooth, efficient execution sequence, minimizing bottlenecks and optimizing resource allocation throughout your data pipelines.

5. Use workflows to execute code that takes a long time to execute, even if you don’t run it on a schedule

Executing one-off tasks that you expect will take several hours/days to execute inside a workflow means that you can leverage job compute to make your task more affordable.

Additionally, if you are doing some pipeline development, executing your code inside workflows is great A/B testing of your pipeline’s execution time and configuration.

6. Autoscaling your job compute helps deliver the right power, at the right time

Implementing autoscaling allows for near-time rightsizing of compute resources to match workload demands, minimizing your spend on unused capacity. This dynamic adjustment minimizes costs while maintaining optimal performance as the code your workflow is executing demands it.

One key advice: Test different cluster configurations, including min and max number of workers to find the optimal performance to cost balance.

7. Understand your costs better by implementing tags

Incorporate tagging to achieve clearer insight into your Databricks workflow expenses. Tags allow you to assign metadata to jobs, clusters, and other resources, which simplifies the tracking of costs by department, project, etc. By understanding where and how your resources are being consumed, you can make informed decisions about budget allocation and cost optimization.

Start by defining a consistent tagging strategy across all resources to ensure that every element of your expenditure is accurately monitored. This practice not only aids in cost management but also enhances reporting and accountability within your team.

Closing

I hope you found this guide to lowering your Databricks costs using Workflows helpful. Looking ahead, Databricks’ introduction of “Serverless” compute will introduce some additional dynamics as it relates to Workflows and other areas, and we will be covering those very soon!

#Databricks#Workflows#Cost#CostOptimization#Optimization#Pipelines#JobCompute#Autoscaling#ROI