Fabric Meets Databricks: A Preliminary Review for Data Practitioners

  • Disclosure: The purpose of this article is an initial analysis and not a comprehensive comparison between Fabric and Databricks. In my opinion, Fabric is not currently suitable for enterprise use. As a consultant dedicated to the analysis and sizing of projects, the shared conclusion we reached along with other engineers and architects is that Fabric does not yet meet the needs of most of our enterprise-sized clients.

Introduction

It is essential to understand OneLake and how it addresses data silos. OneLake is a SaaS data lake platform that leverages virtualization to enable a unified data repository for an entire organization while eliminating the need for data duplication. It uses "Shortcuts," which are symbolic links allowing real-time access to data from various locations without creating copies. 

Fabric is an All-in-One solution, the quasi-evolution of Synapse. So, you pay for all the included services. If you're interested in Purview, DataFactory, PowerBI, and DataFactory, it might be a good option. All these components are perfectly integrated, and the pricing is simple: you contract a fixed capacity and forget about major problems. However, it is highly likely that you are paying much more than you consume.

Initially, it was said to be Microsoft's Data Platform to compete with Databricks or at least inspired by it. I believe that in practice it is not, because the functionality available today indicates that it is aimed at a different type of client. A client who needs an analytical tool with little administration, easy to manage, and that allows them to exploit data without much complexity.

Perhaps Microsoft is looking to repeat the success of PowerBI, and it makes a lot of sense. It is a solution that fits small and medium-sized businesses, business teams with a limited scope, or users tied to the Microsoft ecosystem. Additionally, Copilot and the Low-Code capabilities represent a differential value considering the type of user that will operate on the platform. 

Clearly, there is a market where this solution fits perfectly (and I have no doubt that the platform's functionality will increase). It is a service that neither cannibalizes Azure services like Synapse nor solutions like Databricks. It has a different target audience that can be empowered in the future with tools like Copilot or low-code development that facilitate their work.

Why Fabric is not Ideal for Enterprise

The reasons are several, but the main argument encompassing all of them is that Fabric is a 'stripped-down' version of other Microsoft products that are very good and complete but, in their Fabric version, have a different scope and present restrictions that generate friction or decrease the feasibility of developing certain types of projects under optimal and secure conditions.

For example, Data Factory is an excellent data integration service. However, it has some limitations compared to the standalone version. Fabric has fewer connectors and configuration options. Likewise, it does not support metadata-driven pipelines in integration runtimes, which is an additional and critical restriction for some requirements.

Furthermore, it currently has limited  capability to use Managed Identity, which represents a significant limitation in terms of security and access management. Another major disadvantage of Fabric is the lack of integration with Azure Key Vault, which is essential for the secure management of secrets and keys, maintaining high standards of security and control in enterprise data environments effectively.

Additionally, if we move away from the purely technical scope, Fabric's business model (which is also its greatest virtue) is not cost-efficient. It is expensive and lacks flexibility when provisioning resources, which translates to idle capacity. The companies I have suggested Fabric to (mainly due to PowerBI licenses, dimension, etc.) in real scenarios were consuming only 40-50% of what they were paying for.

Microsoft Fabric … vs Databricks?

Databricks is designed for professionals, offering a mature and robust platform aimed at clients with a serious commitment to data. In the world of data, Databricks is what Adobe Premiere Pro is to video editors: a distinctive tool that enables experienced data teams to solve complex problems and achieve exceptional results.

Let’s compare these 5 core areas between the 2 platforms.

  1. Deployment Model: Fabric is fully managed by Microsoft (SaaS), whereas Databricks requires configuration (IaC) and is cloud-agnostic. Result: Databricks is the better option if you have a data and architecture team; otherwise, Fabric is suitable.

  2. Data Transformation and Management: Databricks has a clear advantage in handling complex data processing, and it offers data sharing capabilities and APIs for data consumption. A distinguishing feature of Fabric is its compatibility with T-SQL and stored procedures (beneficial if coming from SQL, for instance). Result: Databricks is the winner, although Fabric may be a suitable option for those new to data exploration with lower complexity in processes and transformations.

  3. Data Governance and Security: Databricks strongly supports Unity Catalog, while Fabric is in Preview with Purview. Purview's weakness is its lack of security controls and identity management mechanisms, which Databricks covers completely. Result: Databricks is the winner. This is a pending area for Fabric that is likely to improve.

  4. Visual Experience, AI Assistant, and Low-Code: Microsoft offers CoPilot, while Databricks has implemented an AI assistant in its Notebooks to facilitate development. Both are quite effective. Additionally, Microsoft has Low-Code capabilities in Azure and an improved graphical experience (Databricks is working on this), which is ideal for Fabric's target audience. Result: Azure is slightly ahead at present due to its Low-Code capabilities, which provide additional value for newcomers.

  5. CI/CD: Databricks offers full compatibility with Git and DevOps, while Fabric has limited support. Again, Fabric seems geared towards a less professional audience less focused on data best practices. Outcome: Databricks is the winner. Another pending issue for Fabric.

Fabric and Databricks can be Complementary

Here’s a simple explanation of the two ways Databricks can integrate with OneLake:

Using Existing Data Lakes:

If your Databricks is already connected to Azure Data Lake Storage Gen2 (ADLS Gen2), you can integrate it with OneLake by creating a "Shortcut" to the ADLS Gen2 storage account in OneLake. Since OneLake and ADLS Gen2 use the same APIs and data formats (Delta parquet), you can easily update your Databricks notebooks to use OneLake endpoints. This integration keeps the data paths consistent, whether you’re accessing the data through Microsoft Fabric or directly in Databricks notebooks. Note however that this will introduce governance complexity.

Leveraging Data Directly in OneLake:

In this scenario, data is uploaded directly into OneLake. Databricks can then process this data using a standard medallion architecture, where data is cleaned and refined into different layers (bronze, silver, gold). This processed data can be used seamlessly by other components of Microsoft Fabric, like Power BI, without duplicating the data. Power BI can connect to this live data efficiently, benefiting from both live connection and high performance.

Conclusions

In the market, there is room for many options, and competition is a welcome problem. Personally, I believe that Fabric has a place in the market for small businesses or institutions that want to venture into a defined Microsoft stack with its pros and cons. It is not a solution that competes with Databricks or Microsoft's traditional data stack (ADF, Synapse, etc.) 

I believe Databricks can complement those clients targeted by Fabric. Databricks and Fabric can operate together and complement each other. Instead of being direct competitors, these two can combine to leverage each other's strengths. While Databricks provides a robust platform for big data analysis and machine learning, Fabric can offer seamless integration within the Microsoft environment. 

This combination can provide businesses an interesting and flexible solution that spans from processing and analyzing large volumes of data to integration and automation within a familiar Microsoft ecosystem.

Previous
Previous

Databricks AI/BI Series: A Technical Overview of AI/BI Genie

Next
Next

Databricks AI Assistant: SQL Review