Here is a high-level summary
| Criteria | Azure Synapse Analytics | Databricks (with Delta Lake) | SQL Server on Azure VM |
|---|---|---|---|
| Best Fit For | Data warehousing, BI, enterprise-scale analytics | Big data processing, ML/AI, semi-structured data, streaming | OLTP/OLAP hybrid, legacy support |
| Performance | Good for structured SQL workloads; issues with concurrency | Excellent for large-scale processing; better throughput & caching | Strong for traditional SQL; limited scalability |
| Recursive CTE support | Limited (only 1 level supported currently) | Yes (via Spark SQL or Python workarounds) | Fully supported |
| Cross-database queries | Not natively supported | Can simulate using Unity Catalog or views in lakehouse | Fully supported |
| Cost | Moderate to High (DWU based) | More flexible (pay per job or cluster) | High (VM + SQL license) |
| View Materialization | Yes (materialized views supported) | Yes (Delta Live Tables or manually cached views) | Yes |
| Caching & Query Speed | Mixed; caching limited; slower for large joins | Excellent; advanced caching, Photon engine | Good for moderate workloads |
| Concurrency | Limited concurrent query support without scaling tiers | High concurrency (esp. with serverless SQL + Photon) | Limited by VM size and configuration |
| Integration with Azure Services | Tight (Power BI, Purview, ADF, Logic Apps) | Very good (Azure ML, Data Factory, Unity Catalog) | But legacy, less modern integrations |
Regarding performance benchmarks (per TPC-DS / Databricks internal testing), this should help:
| Test Type | Databricks (Photon, Serverless SQL) | Synapse Dedicated Pool | SQL Server on VM |
|---|---|---|---|
| TPC-DS 1TB (99 queries) | ~30 min (w/ Photon enabled) | ~60–75 min | ~90–120 min |
| Query response time (avg) | 0.5–2 sec | 1–5 sec | 1–3 sec |
| Query concurrency (20 users) | Excellent (near-linear scaling) | Performance degrades after 10+ users | Limited by CPU/memory |
Note that these metrics vary based on cluster config, optimizations, caching, and query patterns. Databricks generally outperforms Synapse for high-concurrency, semi-structured, or large datasets.
From the functional support standpoint:
| Feature | SQL Server | Synapse | Databricks |
|---|---|---|---|
| Recursive CTEs | (limited) | (via Spark SQL logic) | |
| Cross-database joins | (via Unity Catalog or views) | ||
| Materialized Views | (Delta Live Tables, cache) | ||
| View dependencies (chained views) | (via notebooks / SQL views) | ||
| Stored Procedures | Limited | (UDFs/UDF notebooks instead) | |
| ANSI SQL support | (via Spark SQL) | ||
| Integration with Power BI | Native | (via SQL endpoints) |
Effectively, your choice will depend on your primary concerns:
- Performance, scalability, modern platform
- Choose Databricks
- Photon engine, Delta caching, and Unity Catalog give you high performance and modularity.
- Ideal for complex ETL pipelines, recursive logic, cross-schema joins.
- Serverless SQL or interactive clusters can support 1,000+ view migration at scale.
- Traditional BI with deep Power BI integration and minimal transformations
- Choose Synapse Analytics
- If your queries are fairly static (simple joins, aggregates), Synapse’s SQL Pools or Serverless SQL can be efficient.
- Use Synapse Pipelines for orchestration, and DirectQuery mode in Power BI.
- Existing skill sets & minimal migration effort
- Use SQL Server as transitional option (not long-term)
- Suitable for legacy support and smooth transition. But expensive and hard to scale.
So, to conclude, the suggested approach would be hybrid/phased
Phase 1: View Replication & Performance Benchmarking
- Pick 100 representative views from your 1,000.
- Implement in:
- Databricks SQL (
%sql, Unity Catalog, Delta Live Tables) - Synapse Dedicated SQL Pool (Materialized Views, CTEs)
- Databricks SQL (
- Compare:
- Performance (runtime, caching)
- Cost (per query/job)
- Compatibility (recursive logic, joins)
Phase 2: Landing Zone Design
- Leverage ADLS Gen2 Gold Layer (Delta format) as the single source of truth.
- Use:
- Databricks Silver Layer for standardized data and business logic.
- Synapse Serverless SQL as a consumer if needed for BI teams.
If the above response helps answer your question, remember to "Accept Answer" so that others in the community facing similar issues can easily find the solution. Your contribution is highly appreciated.
hth
Marcin