Hi ,
Thanks for reaching out to Microsoft Q&A.
Detailed technical implementation plan for managing multi level derived views in databricks SQL and synapse analytics with special emphasis on lineage, performance, and modularity as you have requested.
Databricks SQL:
-- vw_base_level_1 (Silver)
CREATE OR REPLACE VIEW vw_base_level_1 AS
SELECT * FROM cleaned_orders;
-- vw_derived_level_2 (Gold)
CREATE OR REPLACE VIEW vw_derived_level_2 AS
SELECT customer_id, SUM(total_amount) AS total_spend
FROM vw_base_level_1
GROUP BY customer_id;
-- vw_derived_level_3 (Gold)
CREATE OR REPLACE VIEW vw_derived_level_3 AS
SELECT d2.customer_id, d2.total_spend, c.customer_segment
FROM vw_derived_level_2 d2
JOIN dim_customers c ON d2.customer_id = c.customer_id;
- Delta Caching: Use
OPTIMIZE,ZORDERon intermediate tables if performance issues arise. - Materialized Views: Consider persisting deep levels as Delta tables with scheduled refresh via Jobs or Workflows.
- Use Lakehouse Federation to access upstream sources efficiently.
- Leverage Unity Catalog for view lineage.
- Databricks also supports view lineage via the Catalog Explorer UI.
Testing
- Create unit test notebooks that validate each view level with
LIMITand assert expected results.
Synapse SQL (Dedicated/Serverless Pools)
-- Base View (Silver)
CREATE VIEW dbo.vw_orders_base AS
SELECT * FROM dbo.orders_cleaned;
-- Derived View Level 2 (Gold)
CREATE VIEW dbo.vw_customer_spend AS
SELECT customer_id, SUM(amount) AS spend
FROM dbo.vw_orders_base
GROUP BY customer_id;
-- Level 3
CREATE VIEW dbo.vw_customer_summary AS
SELECT c.customer_id, c.spend, s.segment
FROM dbo.vw_customer_spend c
JOIN dbo.customer_segment s ON c.customer_id = s.customer_id;
- Use materialized views for levels 3+ if query performance is degrading
- Enable result set caching and partitioned external tables when using serverless pools.
- Avoid nested view chains in Serverless flatten if necessary.
- Dependency/Lineage
- Maintain a custom lineage table or use azure purview for full lineage.
- Use synapse studio's built-in view dependency tracker.
For benchmarking:
Run equivalent TPC-S queries in both systems across:
- Each view level.
- Full chain (
V1 → V6). - Different concurrency scenarios.
Capture:
- Query latency.
- CPU/memory utilization.
- Caching effect.
- View refresh cost (if materialized).
Please 'Upvote'(Thumbs-up) and 'Accept' as answer if the reply was helpful. This will be benefitting other community members who face the same issue.