5 Reasons Why We Recommend Databricks
Introduction
There are many data tools and platforms out there, many with their unique advantages and disadvantages. When you combine many of their good qualities and avoid the bad ones, you get Databricks.
Unlike some of my other articles that spend time highlighting both strengths and weaknesses of Databricks and some of its' competitors, this article focuses purely on talking about the things that Databricks generally does better all-around than most, if not all, its' competitors.
Quick Word
Before I get started, I’ll pre-emptively address likely criticism: Some will say I say what I say just because I am part of the Databricks Customer Product Advisory Board, MVP program, etc. That will be ignoring one key thing: I started off and remained as a small-time Databricks customer for nearly two years before I worked at a company that openly leverages Databricks. Even longer period of time before I personally had any recognition from Databricks.
Generally, I openly criticize Databricks when it is deserved, but the truth is: Databricks is the first option I would consider if I needed to store, retrieve, and get value out of data. Not the only option, but the one I would start any evaluation with as the one to beat. About 3 years ago when I made the choice to use Databricks, it was a different platform that held this spot, but not any more. Even with its growth, I believe Databricks is underrated, and here are some of the reasons why.
Reasons
All Your Data, One Place
Cost-Efficient: Databricks is cost-efficient in terms of both storage costs and compute (both to read and write).
Lakehouse Federation: For data that resides in a few other platforms, you can query it inside of Databricks without the need for ETL or copying data using a built-in feature called Lakehouse Federation.
Broad Compatibility: If your data is in Delta, Iceberg, CSVs, Excel, and so on, Databricks works with your data. Just bring in your data or connect Databricks to where your data lives.
Why this should matter to you: The sprawl of data is very real and costs companies a lot more money than they realize in terms of tech and people hours. Multiple copies of data residing across multiple systems OR a lot of time spent for employees figuring out how to get data from point "A" to point "B. Databricks makes this less of a burden.
All-In-One + Rich Ecosystem
Bundled Tools: Out of the box, you get several generally “B” grade or better tools to orchestrate your pipelines, write code easily with an AI-assistant, run SQL queries for exploratory data analysis, build dashboards for your business’ operations, etc.
Open Ecosystem: If your team prefers using other tools such as Airflow, Fivetran, Power BI, Sigma, dbt, etc because of familiarity or feature set, Databricks plays really well with them, treating vendors as first-class citizens alongside its’ own built-in toolset, not suppressing the capabilities of either.
Why this should matter to you: Other platforms tend to be heavily open to 3rd party tools but lack the selection/quality of bundled tools OR have some strong built-in tools mixed with some so-so ones in a generally restrictive ecosystem. Databricks gives you the freedom to choose the balance between built-in and third-party tools that fit your business needs.
Powerful & Affordable Analytics
Fast SQL: You want to spend your time getting the right data, not waiting for the query to finish, and Databricks’ SQL experience delivers in this area very well.
Keeping It Simple: Querying data is easy and affordable leveraging the suite of SQL-focused offerings from Databricks, mainly the SQL Serverless Warehouses + SQL Editor + your BI tool of choice (which can be the included AI/BI Dashboards).
Big Data, Small Data: While Databricks generally performs extremely well with very large datasets (see link in comments to one of my performance tests), it also delivers affordable performance with small datasets.
Why this matters to you: You don't want to spend more money than you should on your tech stack. With Databricks, you get what is arguably the highest granularity available in terms of compute options compared to the top contenders. With the right data modeling, your money spent on Databricks will grow at a healthy proportion. On top of that, you get a top-notch SQL experience that is as intuitive as it is feature-rich.
Ready For Your Industry
Right Elements: Whether healthcare, finance, supply chain, etc., regardless of the industry you are in, Databricks is likely to have most, if not all, of the elements that are needed for your team to be able to focus on business value, not complicated stacks.
Why this should matter to you: Databricks is a very broad platform in terms of industries covered, but with great depth of capabilities in each one of those industries. This leads to a healthy supply of talent for businesses/organizations AND ample opportunities for practitioners.
Engineering Led, Client Focused
User Feedback Is Gold: No product is perfect, and Databricks is not the exception, but: Databricks is very good about listening to your feedback to make your tech stack better. I’ve openly criticized Databricks in a few areas in the past. The response? Lots and lots of user feedback sessions to understand the pain-points better, followed by actions to make the product better.
Engineers At The Helm: The leadership team is made up of individuals with strong technical skills, not product portfolio managers. While making money is important, building a product that customers WANT to use vs one that customers feel stuck with is very important as well.
Why this should matter to you: The Databricks team wants you to genuinely feel good about using their platform. To quote Ali Ghodsi, "For me, if we are doing it we are doing it to be the best.", the quote given after being challenged on the quality of a feature. Additional, Databricks' push for you to own your data means that the only kind of "Vendor Lock" that they promote is that of you actually finding the platform as the most capable for your needs.
Closing
I hope you found this article insightful in terms of understanding some of the reasons as to why I believe Databricks is currently the best data platform for most scenarios requiring a data platform. Remember: Your business situation and requirements are unique, but start your data platform considerations with Databricks.
Josue Bogran is a Solutions Architect Manager @ Kythera Labs, as well as an Advisor to SunnyData, Lumel, and Sigma Computing.
SunnyData is a proud partner of Databricks, Prophecy, Fivetran, Sigma, and Monte Carlo.