Improve connectivity between Azure Databricks and SAP Datasphere

LearnAndLearn 120 Reputation points
2025-10-19T18:23:09.9533333+00:00

We're ingesting into SAP Datasphere remote tables from Azure Databricks via JDBC through a single, shared DP Agent. Under higher data volumes or when multiple spaces run at once, the DP agent becomes overloaded, disconnects for 10 minutes, and causes taskchains to fail. Because taskchains are dependent, one failure cascades into many.

DSP consumes Databricks data as remote tables over JDBC using one DP Agent shared across all DSP spaces.

When the DP Agent is overloaded by data throughput/concurrency, it disconnects. The automatic reconnect takes 10 minutes.

Any running persistency at the moment of disconnect fails. Due to taskchain dependencies, subsequent chains start and fail as well while the agent remains offline.

We avoid running multiple taskchains simultaneously, but cross-space overlap still occurs because they share the same DP Agent, triggering the same overload/disconnect behavior.

What we need ?

  • Stabilize DSP Databricks ingestion by reducing DP Agent overload risk.
  • Prevent cascading failures when a disconnect happens (retries/backoff/isolation).
  • Restore stakeholder trust with reliable, timely refreshes for SAC dashboards.

Is there any recommendation for this situation ?

Thank you.

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
{count} votes

1 answer

Sort by: Most helpful
  1. Vinodh247 39,201 Reputation points MVP Volunteer Moderator
    2025-10-20T06:48:25.0666667+00:00

    Hi ,

    Thanks for reaching out to Microsoft Q&A.

    seems your issue is a single DP agent bottleneck under concurrent high volume JDBC loads from azure databricks to SAP Datasphere (DSP). When the agent overloads, it disconnects for 10 minutes, causing cascading taskchain failures.

    Recommendations:

    1. Scale DP Agents: Deploy multiple DP Agents (ideally 1 per DSP space or space group) to distribute load and remove the single point of failure.

    Throttle Databricks concurrency: Reduce Spark parallelism using coalesce() or repartition() and control the number of concurrent JDBC writers.

    1. Use JDBC batching: Enable batchsize (1,000 - 5,000 rows) and disable auto-commit to reduce transaction overhead on the DP Agent.
    2. Add retry and backoff logic: Implement exponential backoff retries (ex: 30s -> 90s -> 180s) in orchestrations to handle temporary disconnects gracefully.

    Stagger taskchains: Schedule chain start times to avoid overlapping execution across spaces.

    Improve monitoring: Track DP Agent CPU, memory, and connection counts; alert on rising load before disconnections.

    Long-term: Decouple Databricks from DSP by staging data in ADLS and letting DSP import from there, ensuring scalability and resilience.

    Expected Outcome: This approach stabilizes DSP ingestion, isolates failures, and restores reliability for SAC dashboards with minimal architectural disruption.

    Please 'Upvote'(Thumbs-up) and 'Accept' as answer if the reply was helpful. This will be benefitting other community members who face the same issue.

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.