ADLS and optimal solution with historical data

azure_learner 900 Reputation points
2025-10-10T09:01:01.8166667+00:00

Hi @Marcin Policht Thank you again for your answer:

https://free.blessedness.top/en-us/answers/questions/5579939/adls-and-optimal-solution

My apologies. I missed an important and critical factor : ERP data in ADLS has data since 2016 but the Azure data residing in Managed Instance has historical data since the 1970s .

The main reason for combined KPIs is that business stakeholders want to get deeper insights of data going back to historical and recent data as per use cases .

Please suggest if the above solutions still hold good taking into consideration the new information.

I will be thankful for your suggestion.

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
{count} votes

Answer accepted by question author
  1. Manoj Kumar Boyini 420 Reputation points Microsoft External Staff Moderator
    2025-10-13T12:10:51.4666667+00:00

    Hi azure_learner,

    Thank you for reaching out to Microsoft Q&A. We’ve reviewed your question, and below is the detailed solution. The earlier suggestion (using Synapse or Power BI with PolyBase/DirectQuery) is still valid, but we need to tweak it for performance and usability.

    Still Works

    • Synapse can query both ADLS and Managed Instance.
    • Power BI can pull unified views from Synapse or use composite models.

    Need changes

    • Querying decades of data directly from Managed Instance for every KPI will be slow and costly.
    • You’ll need smart data model—aggregated tables or a semantic layer—to avoid scanning everything.
    • Consider tiering: keep recent data in a high-performance store and archive older data in ADLS, still accessible via Synapse.

    Step to follow -
    **
    Synapse as the Integration Hub

    •     Use serverless SQL pools for ADLS.
    •     Connect Managed Instance via PolyBase or Linked Service.
    •     Create materialized views or aggregated tables for KPIs.

    ETL vs Virtualization

    •        If KPIs require heavy joins across decades, move data into a curated layer (Synapse dedicated pool or Delta Lake).
    •         For lighter queries, virtualization works fine.

    Power BI

    •   Connect to Synapse for unified KPIs.
    •   Use composite models for real-time Managed Instance data.

    Performance Tips

    •   Partition historical data by decade/year.
    •   Precompute KPI aggregates for older data.

    Alternative suggestions

    If users often need both historical and recent data together:

    • Use Azure Data Factory to sync historical data from Managed Instance into ADLS in Parquet/Delta format.
    • Build a single semantic model in Synapse or Power BI.

    Hope the above given steps are helpful or if you have any further query do let us know.

    Thanks,
    Manoj Kumar Boyini


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.