Intermittent Spark Pool Failures and Increased Startup Time After Upgrading to Spark 3.5 in Azure Synapse

Question

Intermittent Spark Pool Failures and Increased Startup Time After Upgrading to Spark 3.5 in Azure Synapse

Nikhil Singh 50

We recently upgraded our Synapse Spark pools from version 3.4 to 3.5. As part of the upgrade process, we followed these steps:

Removed custom packages from the pools
Upgraded the Spark pool version to 3.5
Re-attached the required packages

After the upgrade, everything worked fine for about four days. Suddenly, all Spark pools with attached packages started failing with random errors (e.g., “Interpreter died”). To resolve, we removed and re-added the packages, which temporarily fixed the issue. However, after another two days, the failures returned. The only workaround has been to reattach the packages each time the issue occurs.

Yesterday, we downgraded the Spark pool back to version 3.4, reattached the packages, and so far everything is working (we are still monitoring).

Additionally, we have observed that over the past month, Spark pool startup times have increased significantly—from 5–6 minutes to 15–20 minutes.

Questions:

Is this a known issue with Spark 3.5 or recent Synapse backend changes?
Has Microsoft made any updates in the past month that could explain these failures and increased startup times?
Are there any recommended best practices or workarounds for maintaining package stability and reducing startup delays?

**Any insights or official guidance would be greatly appreciated.**We recently upgraded our Synapse Spark pools from version 3.4 to 3.5. As part of the upgrade process, we followed these steps:

Removed custom packages from the pools
Upgraded the Spark pool version to 3.5
Re-attached the required packages

After the upgrade, everything worked fine for about four days. Suddenly, all Spark pools with attached packages started failing with random errors (e.g., “Interpreter died”). To resolve, we removed and re-added the packages, which temporarily fixed the issue. However, after another two days, the failures returned. The only workaround has been to reattach the packages each time the issue occurs.

Yesterday, we downgraded the Spark pool back to version 3.4, reattached the packages, and so far everything is working (we are still monitoring).

Additionally, we have observed that over the past month, Spark pool startup times have increased significantly—from 5–6 minutes to 15–20 minutes.

Questions:

Is this a known issue with Spark 3.5 or recent Synapse backend changes?
Has Microsoft made any updates in the past month that could explain these failures and increased startup times?
Are there any recommended best practices or workarounds for maintaining package stability and reducing startup delays?

Any insights or official guidance would be greatly appreciated.

Jesse Horn-Artera 0 Reputation points

2025-10-22T16:42:53.7333333+00:00

"Additionally, we have observed that over the past month, Spark pool startup times have increased significantly—from 5–6 minutes to 15–20 minutes."

This is the main question asked by the OP but no Microsoft person here is addressing this in their response. All the pools are now taking at least 100% longer to start for us.

MICROSOFT needs to address the reason as to why all the pools (3.4 and 3.5) are taking longer to startup. This is not just a 3.5 issue. PLEASE PLEASE PLEASE READ THE QUESTION CAREFULLY AND RESPOND THOROUGHLY. DONT CHERY PICK CERTAIN PARTS OF THE OP. THATS A VERY ANNOYING PRACTICE.

2 answers

Your answer

Jesse Horn-Artera 0 Reputation points

2025-10-22T16:42:53.7333333+00:00

"Additionally, we have observed that over the past month, Spark pool startup times have increased significantly—from 5–6 minutes to 15–20 minutes."

This is the main question asked by the OP but no Microsoft person here is addressing this in their response. All the pools are now taking at least 100% longer to start for us.

MICROSOFT needs to address the reason as to why all the pools (3.4 and 3.5) are taking longer to startup. This is not just a 3.5 issue. PLEASE PLEASE PLEASE READ THE QUESTION CAREFULLY AND RESPOND THOROUGHLY. DONT CHERY PICK CERTAIN PARTS OF THE OP. THATS A VERY ANNOYING PRACTICE.

Answer 1

Amira Bedhiafi 39,341 Volunteer Moderator

Hello Nikhil !

Thank you for posting on Microsoft Learn Q&A.

Synapse Spark 3.5 is still Public Preview and ships a newer base stack (Python 3.11, Java 17, Delta 3.2, Azure Linux Mariner 3.0) and those jumps often break wheels compiled for older glibc/Python ABIs and can surface as Interpreter died or random kernel exits especially after a restart or when pools rehydrate and reinstall libraries. That also explains slower cold starts when the environment has to pull and resolve more/updated packages.

https://free.blessedness.top/en-us/azure/synapse-analytics/spark/apache-spark-35-runtime

The 3.5 runtime introduces new base images and language versions and those require library reinstall or resolve on session start, which increases cold-start times especially if your requirements pull from public feeds at runtime.

Try to rebuild or repin Python wheels for Python 3.11 and avoid source dists (.tar.gz) that try to compile during pool startup. Prefer manylinux wheels (.whl) and pin your env to be compatible with the versions bundled in the 3.5 runtime.

You can use workspace packages or requirements files at the pool level instead of ad-hoc pip install in notebooks and keep a single, pinned requirements.txt or environment.yml and update it deliberately.

Nikhil Singh 50 Reputation points

2025-09-23T05:13:53.93+00:00

Hello @Amira Bedhiafi

Given our recent experience, should we continue using Spark version 3.4 for our Synapse Spark pools when we need to include packages via requirements.txt?

Currently, we have downgraded a few pools to Spark 3.4 and attached the requirements.txt file. Everything is working well, and the startup time has returned to the usual 4–5 minutes. At the same time, we are running one Spark pool daily with Spark version 3.5 and requirements.txt. While it hasn’t failed yet, the startup time remains higher at around 10–12 minutes.

Is it advisable to continue with Spark 3.4 for now, until Spark 3.5 moves out of preview or becomes more stable for package management?
Nikhil Singh 50 Reputation points

2025-09-24T02:44:42.0366667+00:00

Let us know if you have any updates for the above queries.
Jesse Horn-Artera 0 Reputation points

2025-10-22T16:33:46.8733333+00:00

All of my pools in 3.4 and 3.5 are taking over 8 minutes to start if they even start at all. I suddenly have 3.4 pools that are timing out and not starting. This is a disaster.
VRISHABHANATH PATIL 1,380 Reputation points Microsoft External Staff Moderator

2025-10-22T17:36:12.1333333+00:00

Hi @Jesse Horn-Artera,

Thank you for reaching out to Microsoft Q&A. Since this thread is now closed, please create a new question, and our team will be happy to assist you further.

Thanks,
Vrishabh
Jesse Horn-Artera 0 Reputation points

2025-10-22T17:50:44.7033333+00:00

Closing the thread to a question never answered.

Answer 2

VRISHABHANATH PATIL 1,380 Microsoft External Staff Moderator

Hi @Nikhil Singh

It is advisable to continue using Spark 3.4 at this time if your workloads require consistent startup times and reliable package installations via requirements.txt.

Additionally, if your team is not yet prepared to manage the potential instability or troubleshooting overhead associated with the Spark 3.5 preview, delaying the upgrade is recommended.

Please monitor the https://free.blessedness.top/en-us/azure/synapse-analytics/spark/apache-spark-version-release-notes For future updates.

Consider evaluating Spark 3.5 in a non-production environment to ensure readiness for eventual migration.

Thanks,
Vrishabh

VRISHABHANATH PATIL 1,380 Reputation points Microsoft External Staff Moderator

2025-09-25T09:47:11.98+00:00

Hi @Nikhil Singh

Let me know if you have any further questions or need additional assistance. Also, if these answers your query, do click the "Upvote" and click "Accept the answer" of which might be beneficial to other community members reading this thread.

Thanks,
Vrishabh
VRISHABHANATH PATIL 1,380 Reputation points Microsoft External Staff Moderator

2025-09-26T08:22:17.34+00:00

Hi @Nikhil Singh

We hope the solution provided has addressed your concern. Please let us know if you require any further assistance or clarification

Thanks,
Vrishabh
Jesse Horn-Artera 0 Reputation points

2025-10-22T16:35:00.79+00:00

Why are my 3.4 pools also experiencing a delayed start?

Share via

Intermittent Spark Pool Failures and Increased Startup Time After Upgrading to Spark 3.5 in Azure Synapse

2 answers

Your answer