Dependency caching in AzureML pipeline

Question

Dependency caching in AzureML pipeline

Marco Bignotti 50

Hi!

I am using the uv python package manager to run my pipelines. For instance, I have a data prep component where the entry point is:

code: ../../../../
command: uv run --extra extra_group --no-dev --locked src/my_package/train.py
environment: azureml:my-dev-env@latest

In this way, I can submit the job and automatically sync both the dependencies and and the changes made to my project/package. However, I am losing the dependency caching feature of uv that makes it so fast and, each time I re-submit a job, it will re-install everything (that might take some time if heavy packages are included). Do you know if there's a way to cache the dependencies and benefit from it across re-runs?

Thank you so much!

Answer accepted by question author

0 additional answers

Your answer

Answer 1

Alex Burlachenko 18,390 Volunteer Moderator

Marco hey,

you are right, losing that uv cache speed is a huge pain point for iterative development.

the core issue is that each time azureml starts a new job, it provisions a fresh compute node. that node has a clean disk, so the uv cache from a previous job is gone. to keep the cache, you need a way to persist the ~/.cache/uv directory between job runs.

you can mount an azure blob storage container or a file share to your training job. then, you can configure uv to use a custom cache directory located on that mounted storage.

create a datastore that points to an azure storage account. then, in your job configuration, mount this datastore to a path inside the container, for example /uv_cache.

you need to tell uv to use this mounted path. you can set the UV_CACHE_DIR environment variable in your job to point to /uv_cache. uv will then use this persistent location for its cache instead of the local volatile disk.

your command would look something like this.

UV_CACHE_DIR=/uv_cache uv run --extra extra_group --no-dev --locked src/my_package/train.py

the first time the job runs, it will populate the cache in the blob storage. on subsequent runs, uv will find the already downloaded and compiled packages in that persistent location, making the dependency installation step much faster.

this pattern of using a mounted storage for a package cache is a universal technique. you could do the same thing for pip's cache or conda packages to speed up any python environment setup.

mount a persistent datastore to your job and set the UV_CACHE_DIR environment variable to a path on that mount. this should preserve your uv cache across pipeline runs.

regards,

Alex

and "yes" if you would follow me at Q&A - personaly thx.
P.S. If my answer help to you, please Accept my answer

https://ctrlaltdel.blog/

Marco Bignotti 50

Hi @Alex Burlachenko !That's a great advice! Thank you!I have tried something like:

inputs:
	cache_dir:
		type: uri_folder
		mode: rw_mount
code: ../../../../
command: uv run --extra extra_group --no-dev --locked --cache-dir ${{inputs.cache_dir}} src/my_package/train.py
environment: azureml:my-dev-env@latest

Unfortunately I'm hitting this uv error that does not really help much:

Using CPython 3.12.11 interpreter at: /usr/local/bin/python3
Creating virtual environment at: .venv
  × Failed to build `my-package` @
  │ file:///mnt/azureml/cr/j/cd5e718a67e04509ba6b914bacfcedd5/exe/wd`
  ├─▶ Failed to write to the distribution cache
  ╰─▶ Function not implemented (os error 38)

Any idea of what could be the issue?Thanks again!

Alex Burlachenko 18,390 Reputation points Volunteer Moderator

2025-10-01T10:43:56.7233333+00:00

hi again,

I guess this is a classic file system permission issue, but it is a bit tricky because of how azureml mounts the storage. the problem is that you are mounting the uri_folder as a read write mount, but the underlying storage might not support all the posix file operations that uv needs to build packages, specifically hard linking. the error code 38 often points to an operation like link or rename that is not fully supported on the mounted network drive. My original suggestion of using a datastore is actually the key here. the uri_folder input you are using is designed for data, not for an active package cache that requires complex file operations.

try to create a datastore that points to an azure file share, not a blob container. as i mentioned before, the file share protocol is much better for this.

then, in your job, you mount this datastore using the ${{default_datastore}} or by explicitly naming it, not by using an inputs parameter. the mount would be configured at the job level, not as a component input.

your component yaml would not have the inputs section for the cache at all. instead, the datastore mount would be available at a fixed path, like /uv_cache, and you would just set the environment variable.

the reason this works better is that the datastore mount uses a different, more capable driver than the uri_folder rw_mount, which is what is causing the 'function not implemented' error.

ditch the inputs.cache_dir approach. instead, set up a proper azure file share datastore and mount it to your job at a fixed path. then, point uv to that path using the --cache-dir argument or the UV_CACHE_DIR environment variable.

lmk if thats help and mark my answer as an answer pls ))))

rgds,

Alex
Marco Bignotti 50 Reputation points

2025-10-22T15:37:04.9+00:00

Hi @Alex Burlachenko ,Sorry to bother again. Do you have any hint about how to mount the datastore without specifying it in the inputs section of the component definition? I can't find it anywhere in the AzureML documentation.

Thanks again!
Alex Burlachenko 18,390 Reputation points Volunteer Moderator

2025-10-22T19:50:01.1533333+00:00

Hi Marco push me pls tomorrow by my mail ******@hotmail.com I'll try to help you.

Rgds,

Alex
Alex Burlachenko 18,390 Reputation points Volunteer Moderator

2025-10-23T08:42:14.6466667+00:00

Marco (Marco Bignotti) hey, I just anwer to urs mail .... check yours mail box... lmk if its sutiable for u )))

rgds,

Alex

Answer 2

Hello Marco Bignotti,

The error message, Function not implemented (os error 38), is the key. This almost certainly means the underlying storage you're mounting with uri_folder is an Azure Blob Container. The rw_mount for Blob Storage is FUSE-based, and it does not support the full set of POSIX file system operations (like hard linking or certain file rename operations) that uv's cache needs to function.

The good news is that your YAML structure for the component (inputs: cache_dir: type: uri_folder...) is perfectly correct for AzureML v2. The problem isn't the YAML; it's the type of storage you are passing to it.

As Alex correctly pointed out, you must use a storage service that supports a full file system.

Recommendation**:** Use an Azure File Share Datastore

The fix is to create a new datastore in your workspace that points to an Azure File Share (which uses the CIFS/SMB protocol) instead of a Blob Container. This will provide a true, POSIX-compliant file system that uv's cache can interact with.

Steps:

Create an Azure File Share: In the Azure portal, go to a Storage Account (or create a new one) and create a new File Share. Let's call it uvcache.
Create a New Datastore: In your AzureML Workspace, go to Datastores and create a new one.
- Datastore type: Select "Azure File Share".
- Name: Give it a clear name, e.g., uv_cache_fileshare.
- Point it to the Storage Account and File Share (uvcache) you just created.
Update Your Pipeline Job YAML: Your component YAML is correct and doesn't need to change. You just need to update the pipeline job that calls this component to pass in the new File Share datastore.

# pipeline-job.yml
$schema: https://azuremlschemas.azureedge.net/latest/pipelineJob.schema.json
type: pipeline
jobs:
  my_prep_job:
    type: command
    component: azureml:my_prep_component@latest #This is component with the cache_dir input
    inputs:
      # This is the key change:
      # Point to your NEW Azure File Share datastore
      cache_dir:
        type: uri_folder
        path: azureml://datastores/uv_cache_fileshare/path/uv_cache_data
        mode: rw_mount

Your component's command will work perfectly: uv run --extra extra_group --no-dev --locked --cache-dir ${{inputs.cache_dir}} src/my_package/train.py

Please accept the answer and upvote for visibility & remedition of other community members, facing similar challenge.

Marco Bignotti 50 Reputation points

2025-10-20T08:48:56.3033333+00:00

Thank you Nikhil Jha!
Great answer! I voted Alex's result as answer because you both provided a viable solution, but he replied first. However your answer provides great advice as well!

Thank you!
Nikhil Jha (Accenture International Limited) 2,220 Reputation points Microsoft External Staff Moderator

2025-10-20T14:33:38.1033333+00:00

Hi Marco Bignotti,
We are happy to help you.

Now we have updated version of Q&A wherein you can accept both the answers.
If i was helpful kindly accept & upvote.

Thank you.
Marco Bignotti 50 Reputation points

2025-10-21T14:28:47.45+00:00

Unfortunately, it looks like uv doesn't support Azure File Share as well. I have tried, but I receive the same error.

Share via

Dependency caching in AzureML pipeline

0 additional answers

Your answer