Synapse and use of databricks

Question

Synapse and use of databricks

azure_learner 900

Hello experts, As we are in the last end of process of now on the way of building ADLS gold layer, and we will aiming at building Data lakehouse on the top of it , and also Synapse analytics which would server as enterprise data warehouse for all analytics and reporting purposes.

Here, I need your help and guidance really badly. At the moment our technical landscape has a virtualization tool which has nearly 1000 base and derived views as it costs our company a lot. It need to move away from the virtualization tool , and want all these views

to be replicated or simulated in Azure. And we do have a SQL server in azure virtual machine which ingest the data from technologies which are going to be sunset by our company very soon.

We have two choices,we replicated all the views using Databricks in a silver layer by using Databricks SQL and %sql option , and leverage spark clusters with high performance. But our company is quite keen on using Synapse analytics but there is firm opposition coming from other quarters to it stating Synapse performance on query retrieval has latency and throughput issues in comparison with SQL Server and not on par. And Synapse does not support cross database queries and recursive CTEs etc.

Hence they are of the view that Databricks would be an ideal choice for this.

Please experts kindly help on this with comparison with Synapse and SQL server with quantifiable metrics. I will appreciate your informed suggestions and help. Thank you

Smaran Thoomu 32,155 Reputation points Microsoft External Staff Moderator

2025-07-21T03:27:08.2433333+00:00

azure_learner Just checking in to see if the below answer helped. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Answer accepted by question author

0 additional answers

Your answer

Smaran Thoomu 32,155 Reputation points Microsoft External Staff Moderator

2025-07-21T03:27:08.2433333+00:00

azure_learner Just checking in to see if the below answer helped. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Answer 1

Here is a high-level summary

Criteria	Azure Synapse Analytics	Databricks (with Delta Lake)	SQL Server on Azure VM
Best Fit For	Data warehousing, BI, enterprise-scale analytics	Big data processing, ML/AI, semi-structured data, streaming	OLTP/OLAP hybrid, legacy support
Performance	Good for structured SQL workloads; issues with concurrency	Excellent for large-scale processing; better throughput & caching	Strong for traditional SQL; limited scalability
Recursive CTE support	Limited (only 1 level supported currently)	Yes (via Spark SQL or Python workarounds)	Fully supported
Cross-database queries	Not natively supported	Can simulate using Unity Catalog or views in lakehouse	Fully supported
Cost	Moderate to High (DWU based)	More flexible (pay per job or cluster)	High (VM + SQL license)
View Materialization	Yes (materialized views supported)	Yes (Delta Live Tables or manually cached views)	Yes
Caching & Query Speed	Mixed; caching limited; slower for large joins	Excellent; advanced caching, Photon engine	Good for moderate workloads
Concurrency	Limited concurrent query support without scaling tiers	High concurrency (esp. with serverless SQL + Photon)	Limited by VM size and configuration
Integration with Azure Services	Tight (Power BI, Purview, ADF, Logic Apps)	Very good (Azure ML, Data Factory, Unity Catalog)	But legacy, less modern integrations

Regarding performance benchmarks (per TPC-DS / Databricks internal testing), this should help:

Test Type	Databricks (Photon, Serverless SQL)	Synapse Dedicated Pool	SQL Server on VM
TPC-DS 1TB (99 queries)	~30 min (w/ Photon enabled)	~60–75 min	~90–120 min
Query response time (avg)	0.5–2 sec	1–5 sec	1–3 sec
Query concurrency (20 users)	Excellent (near-linear scaling)	Performance degrades after 10+ users	Limited by CPU/memory

Note that these metrics vary based on cluster config, optimizations, caching, and query patterns. Databricks generally outperforms Synapse for high-concurrency, semi-structured, or large datasets.

From the functional support standpoint:

Feature	Synapse	Databricks
Recursive CTEs	(limited)	(via Spark SQL logic)
Cross-database joins		(via Unity Catalog or views)
Materialized Views		(Delta Live Tables, cache)
View dependencies (chained views)		(via notebooks / SQL views)
Stored Procedures	Limited	(UDFs/UDF notebooks instead)
ANSI SQL support		(via Spark SQL)
Integration with Power BI	Native	(via SQL endpoints)

Effectively, your choice will depend on your primary concerns:

Performance, scalability, modern platform

Choose Databricks
- Photon engine, Delta caching, and Unity Catalog give you high performance and modularity.
- Ideal for complex ETL pipelines, recursive logic, cross-schema joins.
- Serverless SQL or interactive clusters can support 1,000+ view migration at scale.

Traditional BI with deep Power BI integration and minimal transformations

Choose Synapse Analytics
- If your queries are fairly static (simple joins, aggregates), Synapse’s SQL Pools or Serverless SQL can be efficient.
- Use Synapse Pipelines for orchestration, and DirectQuery mode in Power BI.

Existing skill sets & minimal migration effort

Use SQL Server as transitional option (not long-term)
- Suitable for legacy support and smooth transition. But expensive and hard to scale.

So, to conclude, the suggested approach would be hybrid/phased

Phase 1: View Replication & Performance Benchmarking

Pick 100 representative views from your 1,000.
Implement in:
- Databricks SQL (%sql, Unity Catalog, Delta Live Tables)
- Synapse Dedicated SQL Pool (Materialized Views, CTEs)
Compare:
- Performance (runtime, caching)
- Cost (per query/job)
- Compatibility (recursive logic, joins)

Phase 2: Landing Zone Design

Leverage ADLS Gen2 Gold Layer (Delta format) as the single source of truth.
Use:
- Databricks Silver Layer for standardized data and business logic.
- Synapse Serverless SQL as a consumer if needed for BI teams.

If the above response helps answer your question, remember to "Accept Answer" so that others in the community facing similar issues can easily find the solution. Your contribution is highly appreciated.

hth

Marcin

Share via

Synapse and use of databricks

0 additional answers

Your answer