Thank you for posting your question on Microsoft Q&A. Here are some troubleshooting steps that may help address your issue.
You are encountering a CONTAINER_LAUNCH_FAILURE error when starting a Databricks cluster with enable_elastic_disk set to false. This typically happens because the cluster cannot provision the required disk configuration when elastic disk is disabled.
Why This Happens
- Elastic Disk Behavior: By default, Databricks uses elastic disk to dynamically attach additional storage when the local disk runs out of space. Disabling it (enable_elastic_disk=false) forces the cluster to rely solely on the ephemeral local disk provided by the VM SKU.
- Impact of Disabling Elastic Disk: If the selected VM SKU does not have sufficient local storage for Databricks system containers and Spark workloads, the container launch will fail. This is especially common with spot instances or smaller VM sizes.
- Azure Container Setup Failure: The error message indicates that the container could not be launched because the node did not meet the storage requirements for Databricks runtime when elastic disk was disabled.
Suggested Solutions
- Re-enable Elastic Disk (Recommended) If your goal is to reduce costs, note that elastic disk charges are minimal compared to compute costs. Keeping it enabled ensures cluster stability. Reference: https://free.blessedness.top/azure/databricks/clusters/configure#elastic-disk
- Choose VM SKUs with Larger Local SSD If you must disable elastic disk, select VM types with sufficient local storage (e.g., Standard_D or L series with large ephemeral disks). Reference:
- Instance Types – Pricing | Databricks
- Compute creation cheat sheet - Azure Databricks | Microsoft Learn
- Validate Pool Configuration
- Ensure both driver and worker pools use compatible VM SKUs with adequate disk.
- Avoid mixing spot and on-demand pools if stability is critical.
- Ensure both driver and worker pools use compatible VM SKUs with adequate disk.
- Instance Types – Pricing | Databricks
- Check Cluster Event Logs Use the Cluster Event Log in the Databricks UI to confirm if the failure is due to disk constraints or image pull issues. Reference: List cluster activity events | Clusters API | REST API reference | Azure Databricks
Takeaways
- Disabling enable_elastic_disk is only safe when you are certain the VM SKU provides enough local storage for Databricks system and Spark workloads.
For most production and development scenarios, keep elastic disk enabled to avoid container launch failures.
Hope the above steps were helpful. If you have any other questions, please feel free to contact us.
Thanks,
Vrishabh