Synapse and use of databricks

azure_learner 900 Reputation points
2025-07-19T10:42:34.9866667+00:00

Hello experts, As we are in the last end of  process of now on the way of building ADLS gold layer, and we will aiming at building Data lakehouse on the top of it , and also Synapse analytics which would server as enterprise data warehouse for all analytics and reporting purposes.

Here, I need your help and guidance really badly. At the moment our technical landscape has a virtualization tool which has nearly 1000 base and derived views as it costs our company a lot. It need to move away from the virtualization tool , and want all these views 

to be replicated or simulated in Azure. And we do have a SQL server in azure virtual machine which ingest the data from technologies which are going to be sunset by our company very soon. 

We have two choices,we replicated all the views using Databricks in a silver layer by using Databricks SQL and %sql option , and leverage spark clusters with high performance. But our company is quite keen on using Synapse analytics but there is firm opposition coming from other quarters to it  stating Synapse performance on query retrieval has latency and throughput issues in comparison with SQL Server and not on par.  And Synapse does not support cross database queries and recursive CTEs etc.

Hence they are of the view that Databricks would be an ideal choice for this.

Please experts kindly help on this with comparison with Synapse and SQL server with quantifiable metrics. I will appreciate your informed suggestions and help. Thank you

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
{count} votes

Answer accepted by question author
  1. Marcin Policht 64,595 Reputation points MVP Volunteer Moderator
    2025-07-19T11:24:01.2866667+00:00

    Here is a high-level summary

    Criteria Azure Synapse Analytics Databricks (with Delta Lake) SQL Server on Azure VM
    Best Fit For Data warehousing, BI, enterprise-scale analytics Big data processing, ML/AI, semi-structured data, streaming OLTP/OLAP hybrid, legacy support
    Performance Good for structured SQL workloads; issues with concurrency Excellent for large-scale processing; better throughput & caching Strong for traditional SQL; limited scalability
    Recursive CTE support Limited (only 1 level supported currently) Yes (via Spark SQL or Python workarounds) Fully supported
    Cross-database queries Not natively supported Can simulate using Unity Catalog or views in lakehouse Fully supported
    Cost Moderate to High (DWU based) More flexible (pay per job or cluster) High (VM + SQL license)
    View Materialization Yes (materialized views supported) Yes (Delta Live Tables or manually cached views) Yes
    Caching & Query Speed Mixed; caching limited; slower for large joins Excellent; advanced caching, Photon engine Good for moderate workloads
    Concurrency Limited concurrent query support without scaling tiers High concurrency (esp. with serverless SQL + Photon) Limited by VM size and configuration
    Integration with Azure Services Tight (Power BI, Purview, ADF, Logic Apps) Very good (Azure ML, Data Factory, Unity Catalog) But legacy, less modern integrations

    Regarding performance benchmarks (per TPC-DS / Databricks internal testing), this should help:

    Test Type Databricks (Photon, Serverless SQL) Synapse Dedicated Pool SQL Server on VM
    TPC-DS 1TB (99 queries) ~30 min (w/ Photon enabled) ~60–75 min ~90–120 min
    Query response time (avg) 0.5–2 sec 1–5 sec 1–3 sec
    Query concurrency (20 users) Excellent (near-linear scaling) Performance degrades after 10+ users Limited by CPU/memory

    Note that these metrics vary based on cluster config, optimizations, caching, and query patterns. Databricks generally outperforms Synapse for high-concurrency, semi-structured, or large datasets.

    From the functional support standpoint:

    Feature SQL Server Synapse Databricks
    Recursive CTEs (limited) (via Spark SQL logic)
    Cross-database joins (via Unity Catalog or views)
    Materialized Views (Delta Live Tables, cache)
    View dependencies (chained views) (via notebooks / SQL views)
    Stored Procedures Limited (UDFs/UDF notebooks instead)
    ANSI SQL support (via Spark SQL)
    Integration with Power BI Native (via SQL endpoints)

    Effectively, your choice will depend on your primary concerns:

    1. Performance, scalability, modern platform
    • Choose Databricks
      • Photon engine, Delta caching, and Unity Catalog give you high performance and modularity.
      • Ideal for complex ETL pipelines, recursive logic, cross-schema joins.
      • Serverless SQL or interactive clusters can support 1,000+ view migration at scale.
    1. Traditional BI with deep Power BI integration and minimal transformations
    • Choose Synapse Analytics
      • If your queries are fairly static (simple joins, aggregates), Synapse’s SQL Pools or Serverless SQL can be efficient.
      • Use Synapse Pipelines for orchestration, and DirectQuery mode in Power BI.
    1. Existing skill sets & minimal migration effort
    • Use SQL Server as transitional option (not long-term)
      • Suitable for legacy support and smooth transition. But expensive and hard to scale.

    So, to conclude, the suggested approach would be hybrid/phased

    Phase 1: View Replication & Performance Benchmarking

    • Pick 100 representative views from your 1,000.
    • Implement in:
      • Databricks SQL (%sql, Unity Catalog, Delta Live Tables)
      • Synapse Dedicated SQL Pool (Materialized Views, CTEs)
    • Compare:
      • Performance (runtime, caching)
      • Cost (per query/job)
      • Compatibility (recursive logic, joins)

    Phase 2: Landing Zone Design

    • Leverage ADLS Gen2 Gold Layer (Delta format) as the single source of truth.
    • Use:
      • Databricks Silver Layer for standardized data and business logic.
      • Synapse Serverless SQL as a consumer if needed for BI teams.

    If the above response helps answer your question, remember to "Accept Answer" so that others in the community facing similar issues can easily find the solution. Your contribution is highly appreciated.

    hth

    Marcin

    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.