Intermittent Spark Pool Failures and Increased Startup Time After Upgrading to Spark 3.5 in Azure Synapse

Nikhil Singh 50 Reputation points
2025-09-22T10:41:50.43+00:00

We recently upgraded our Synapse Spark pools from version 3.4 to 3.5. As part of the upgrade process, we followed these steps:

  • Removed custom packages from the pools
  • Upgraded the Spark pool version to 3.5
  • Re-attached the required packages

After the upgrade, everything worked fine for about four days. Suddenly, all Spark pools with attached packages started failing with random errors (e.g., “Interpreter died”). To resolve, we removed and re-added the packages, which temporarily fixed the issue. However, after another two days, the failures returned. The only workaround has been to reattach the packages each time the issue occurs.

Yesterday, we downgraded the Spark pool back to version 3.4, reattached the packages, and so far everything is working (we are still monitoring).

Additionally, we have observed that over the past month, Spark pool startup times have increased significantly—from 5–6 minutes to 15–20 minutes.

Questions:

  • Is this a known issue with Spark 3.5 or recent Synapse backend changes?
  • Has Microsoft made any updates in the past month that could explain these failures and increased startup times?
  • Are there any recommended best practices or workarounds for maintaining package stability and reducing startup delays?

**Any insights or official guidance would be greatly appreciated.**We recently upgraded our Synapse Spark pools from version 3.4 to 3.5. As part of the upgrade process, we followed these steps:

  • Removed custom packages from the pools
  • Upgraded the Spark pool version to 3.5
  • Re-attached the required packages

After the upgrade, everything worked fine for about four days. Suddenly, all Spark pools with attached packages started failing with random errors (e.g., “Interpreter died”). To resolve, we removed and re-added the packages, which temporarily fixed the issue. However, after another two days, the failures returned. The only workaround has been to reattach the packages each time the issue occurs.

Yesterday, we downgraded the Spark pool back to version 3.4, reattached the packages, and so far everything is working (we are still monitoring).

Additionally, we have observed that over the past month, Spark pool startup times have increased significantly—from 5–6 minutes to 15–20 minutes.

Questions:

  • Is this a known issue with Spark 3.5 or recent Synapse backend changes?
  • Has Microsoft made any updates in the past month that could explain these failures and increased startup times?
  • Are there any recommended best practices or workarounds for maintaining package stability and reducing startup delays?

Any insights or official guidance would be greatly appreciated.

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
{count} vote

2 answers

Sort by: Most helpful
  1. Amira Bedhiafi 39,341 Reputation points Volunteer Moderator
    2025-09-22T17:53:05.0566667+00:00

    Hello Nikhil !

    Thank you for posting on Microsoft Learn Q&A.

    Synapse Spark 3.5 is still Public Preview and ships a newer base stack (Python 3.11, Java 17, Delta 3.2, Azure Linux Mariner 3.0) and those jumps often break wheels compiled for older glibc/Python ABIs and can surface as Interpreter died or random kernel exits especially after a restart or when pools rehydrate and reinstall libraries. That also explains slower cold starts when the environment has to pull and resolve more/updated packages.

    https://free.blessedness.top/en-us/azure/synapse-analytics/spark/apache-spark-35-runtime

    The 3.5 runtime introduces new base images and language versions and those require library reinstall or resolve on session start, which increases cold-start times especially if your requirements pull from public feeds at runtime.

    Try to rebuild or repin Python wheels for Python 3.11 and avoid source dists (.tar.gz) that try to compile during pool startup. Prefer manylinux wheels (.whl) and pin your env to be compatible with the versions bundled in the 3.5 runtime.

    You can use workspace packages or requirements files at the pool level instead of ad-hoc pip install in notebooks and keep a single, pinned requirements.txt or environment.yml and update it deliberately.


  2. VRISHABHANATH PATIL 1,380 Reputation points Microsoft External Staff Moderator
    2025-09-24T09:29:02.37+00:00

    Hi @Nikhil Singh

    It is advisable to continue using Spark 3.4 at this time if your workloads require consistent startup times and reliable package installations via requirements.txt.

    Additionally, if your team is not yet prepared to manage the potential instability or troubleshooting overhead associated with the Spark 3.5 preview, delaying the upgrade is recommended.

    Please monitor the https://free.blessedness.top/en-us/azure/synapse-analytics/spark/apache-spark-version-release-notes For future updates.

    Consider evaluating Spark 3.5 in a non-production environment to ensure readiness for eventual migration.

    Thanks,
    Vrishabh


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.