Databricks Myths vs. My Own Personal Experience

Introduction

As someone who has transitioned from a traditional data stack to Databricks, I want to debunk the myth that Databricks is too costly and complicated. Here’s a detailed comparison of my inherited stack versus the new stack with Databricks.

Inherited Stack:

  • Azure SQL Server + SSIS: Approximately $1.2K/month.

  • Azure Analysis Services + Power BI: Ranging from $4.5K to $9K/month. Despite the STAR schema, inefficiencies persisted in Power BI.

  • Data Refresh: Updated twice daily (ingestion + BI refresh).

  • Labor Costs: Around $25K to $30K/month for a team of three.

  • Challenges: Frequent ETL and BI layer breakdowns.

New Stack with Databricks:

  • Databricks: Utilizing Notebooks, Workflows, All-Purpose Compute, and mounted Azure Data Lake Storage, costing about $80 - $150/month.

  • Power BI Pro Licenses: Totaling approximately $300 for about 30 users.

  • Data Refresh: Updated once daily (ingestion + BI refresh).

  • Labor Costs: Reduced to $10K - $12K/month (my salary).

  • Data Structure: 50 bronze tables, 50 silver tables, and 8 gold tables, with improved STAR schema.

  • Row Counts: Tables ranged from 500 to 40M rows.

Key Benefits Obtained:

  • Cost Savings: Monthly savings of approximately $19K, primarily from labor cost reductions due to the streamlined Databricks platform.

  • Improved Efficiency: Simplified and reliable data pipelines and modeling in Databricks.

  • Enhanced Capabilities: The new platform enabled the development of additional data models, directly contributing to significant cost savings and improved customer service metrics.

Experience and Learning Curve:

  • Initially, I was in analytics roles and lacked extensive engineering experience.

  • I learned Databricks through 16 hours of training and on-the-job practice.

  • The platform’s comprehensive nature allowed me to focus more on value-driven tasks rather than IT overhead.

Conclusion

The shift to Databricks provided not just cost savings but also operational efficiency and enhanced analytical capabilities. With a desire to cut costs, improve reporting, and reduce pipeline failures, Databricks empowered me to achieve these goals seamlessly.

Previous
Previous

Elevating the Notebook Experience with Databricks' Latest Upgrade

Next
Next

Databricks AI/BI Series: AI/BI Dashboards