Databricks Myths vs. My Own Personal Experience
Introduction
As someone who has transitioned from a traditional data stack to Databricks, I want to debunk the myth that Databricks is too costly and complicated. Here’s a detailed comparison of my inherited stack versus the new stack with Databricks.
Inherited Stack:
Azure SQL Server + SSIS: Approximately $1.2K/month.
Azure Analysis Services + Power BI: Ranging from $4.5K to $9K/month. Despite the STAR schema, inefficiencies persisted in Power BI.
Data Refresh: Updated twice daily (ingestion + BI refresh).
Labor Costs: Around $25K to $30K/month for a team of three.
Challenges: Frequent ETL and BI layer breakdowns.
New Stack with Databricks:
Databricks: Utilizing Notebooks, Workflows, All-Purpose Compute, and mounted Azure Data Lake Storage, costing about $80 - $150/month.
Power BI Pro Licenses: Totaling approximately $300 for about 30 users.
Data Refresh: Updated once daily (ingestion + BI refresh).
Labor Costs: Reduced to $10K - $12K/month (my salary).
Data Structure: 50 bronze tables, 50 silver tables, and 8 gold tables, with improved STAR schema.
Row Counts: Tables ranged from 500 to 40M rows.
Key Benefits Obtained:
Cost Savings: Monthly savings of approximately $19K, primarily from labor cost reductions due to the streamlined Databricks platform.
Improved Efficiency: Simplified and reliable data pipelines and modeling in Databricks.
Enhanced Capabilities: The new platform enabled the development of additional data models, directly contributing to significant cost savings and improved customer service metrics.
Experience and Learning Curve:
Initially, I was in analytics roles and lacked extensive engineering experience.
I learned Databricks through 16 hours of training and on-the-job practice.
The platform’s comprehensive nature allowed me to focus more on value-driven tasks rather than IT overhead.
Conclusion
The shift to Databricks provided not just cost savings but also operational efficiency and enhanced analytical capabilities. With a desire to cut costs, improve reporting, and reduce pipeline failures, Databricks empowered me to achieve these goals seamlessly.