Day 2 of Databricks vs Snowflake vs Fabric: Evaluating The Toolset

Considerations:

  • While I do believe Databricks is generally speaking the best of the three platforms, this series is written with the intent of being as objective as possible and providing valuable feedback to individuals and businesses evaluating their data stack choices, as well to vendors to take the feedback from both what I say and what others write in the comments, and help them develop better products. 

  • I've spent a good amount of time over the years with these platforms (and their components) in different capacities. Additional knowledge on the different areas has been primarily gained from reading vendor marketing and documentation websites, watching videos from experts in the platforms, as well as spending hands-on time.

  • This is the second article in the series. The first one can be read here. Each category lists the platforms in order from best to worst in the category. For today’s post, my text is lengthier as I wanted to capture some of the unique characteristics of each platform under each category.

Out of the Box Toolset

1) Databricks: Out of the box, Databricks offers best-in-data-platform orchestration, ETL/ELT capabilities, governance (Unity Catalog), coding assistant, AI/ML, and mostly data-anything experience. It also offers basic dashboarding and promising, business-centric capabilities such as AI/BI Genie. Just about everything you need to confidently go into production is bundled, minus a few things such as strong connectors for ingestion (they are working on this) or a mature BI experience. One downside to Databricks’ toolset is that at times, some of them don’t play well with each other, for example, DLTs not playing nicely with the Unity Catalog in the past. Overall though, it is hard to beat the quality and price of what Databricks packages into one platform. Generally, “B” grade or better tools.

2) Snowflake: Snowflake's best contribution is its fantastic engine. It is heavily dependent on tools such as Airflow, dbt and Fivetran for core data engineering needs. It has made investments in AI capabilities in-house and will be interesting to see how these capabilities grow over time. Acquisitions will probably be the best path for them to catch up in terms of these out of the box capabilities vs Databricks. That said, vanilla Databrick is often better than vanilla Snowflake.

3) Fabric: Data Factory is a mature orchestrator with built-in ingestion connectors (which Databricks and Snowflake are short on), as well as Power BI, arguably the most powerful BI tool in the market. If we were evaluating only for analytical needs, Synapse, I mean, Fabric would overtake Snowflake. That said, the tools are generally not as good as what is built-in to Databricks (except for PBI) and it lacks more niche business capabilities that Snowflake executes well on and Databricks execute well-enough on. It feels very much like what Synapse felt like, with a PBI like UI.


Partner Tool Ecosystem

1) Snowflake: It has embraced the partner ecosystem more than any data platform vendor. While it costs them in the out-of-the-box readiness, there is little doubt that most of the mature 3rd party tools historically have tended to prioritize their relationship with Snowflake, including new feature incorporation, before other vendors.

2) Databricks: Databricks has made great strides in encouraging 3rd party vendors to make their tools compatible with Databricks. Many mature tools support Databricks, others are considering doing so hesitantly, others are trying to avoid alienating Snowflake, and a lot of new tool developers are favoring building tools fully compatible with Databricks as they see the rise of the platform as a market opportunity.

3) Fabric: Limited information is available in this category/limited partners. Not a significant focus area for Fabric.

Flexibility

1) Databricks: Robust coding language support, diverse compute choices, solid bundled tools/capabilities + the ability to use 3rd party tools if desired makes Databricks the most flexible platform supporting both data engineering and AI use cases.

2) Snowflake: Snowflake's flexibility is driven by its very mature partner ecosystem which has enabled it to do many things it historically hasn't supported out of the box.

3) Fabric: Fabric supports a variety of languages and has a handful of connector options built-in with Data Factory, but the platform is very centered around dashboards. That said, the reality is that the world revolves around dashboards.

Ease of Use

1) Snowflake: Overall, Snowflake’s UI, SQL-heavy focus, and historical bundling with dbt/Airflow/Fivetran have earned it the reputation of being an easy-to-use platform. While it technically relies on these tools for its easy reputation and the tools are also available for other platforms, it is hard to separate them from Snowflake. Additionally, as previously mentioned, Snowflake’s pricing makes it easier to understand the cost impact of your compute decisions. Documentation is arguably best-in-class.

2) Fabric: If you are a small business, with non heavily technical people, and caring only about dashboards, Fabric makes it easy to build pipelines and develop dashboards quickly. You won’t be building best in class nor resilient systems, but it will be better than building dashboards in Excel. The built-in pipeline capabilities are nice with the UI based, Power BI Power Query inspired toolset. Significantly more easy than what Databricks and Snowflake currently offer, but less adaptable and robust. A word of warning: Power BI is NOT considered an easy BI tool to master. It is extremely easy to build behemoth models that will end up costing you significant $s over time, and good PBI developers are costly.

3) Databricks: The flexibility as well as the historical engineering and AI heavy focus of Databricks has earned it the reputation of being more difficult to learn, paired up with what I’ve shared before as less-than-ideal documentation. Databricks SQL and the SQL Editor have been game-changers to change this perception, and over-time, I expect that you will see a shift toward Databricks being number 1 or 2 in this category if the documentation gets better, text-2-SQL continues to grow, and there is an influx of practical Databricks content available. Serverless is also key to future improvements here, eliminating the need to manage clusters, previously one of Databricks’ biggest pain-points in terms of complexity.

Key Assessment: Business Value

1) Databricks: Databricks offers arguably the most robust data engineering and AI capabilities, along with a strong analytics experience that has done a great job in nearly fully catching up to Snowflake. With Serverless now here to simplify compute management, and the potential success of AI/BI Dashboards, AI/BI Genie, and Lakeflows, Databricks has a strong pathway to cementing itself as the #1 data platform solution. Its biggest challenge now is to prove itself as a platform that is easy to implement  in the eyes of consumers evaluating it against Fabric, without removing the flexibility that its advanced users enjoy. They also have a path forward for democratizing data engineering, and somewhat of a path forward in doing the same for AI.

2) Snowflake: Best in class analytics (SQL) engine, easiest to understand pricing, and a powerful partner ecosystem are the best cards in Snowflake’s favor. It is a formidable platform with plenty of practitioners experienced running the Snowflake/dbt/Airflow/Fivetran stack. On the flipside, its heavy dependence on 3rd party vendors/tools greatly diminishes its value, especially as more and more of these same vendors/tools are available for Databricks as well. It is trying to play catch up in AI vs Databricks, and it will be interesting to see how it executes in this pursuit. For now, it is a good platform that should be considered, but no longer the first platform that should be considered.

3) Fabric: Fabric’s value is hard to evaluate. On one hand, it has Power BI, which is a fantastic BI tool, but on the other hand, Power BI is known for being easy to get started with but hard to master. Now add to this a whole platform built around Power BI, the perceived simplicity of leveraging Fabric will have a negative impact on sustainability of solutions built on Fabric over time due to a compounding effect of bad practices. The solution will be to throw more money into capacity units and consultants. I’ve read a lot of people talk about how Fabric will get better, but I personally heard similar wishful thinking about Synapse in the past. At the end of the day, Power BI is central to Fabric because the rest of Fabric is not enough to stand as a powerhouse platform, at least as of now. That said, I love the competition that having a 3rd major player in the game brings to the table, and hope that Fabric does live out to some of its expectation to raise the bar in the competition.

If you enjoyed this article, follow SunnyData on LinkedIn as well as Josue A. Bogran, the author of this article.

Previous
Previous

How to Migrate Databricks from GCP to Azure or AWS

Next
Next

Why Startups should consider Databricks as a top choice for their data platform for analytics, AI and data management.