Marco hey,
you are right, losing that uv cache speed is a huge pain point for iterative development.
the core issue is that each time azureml starts a new job, it provisions a fresh compute node. that node has a clean disk, so the uv cache from a previous job is gone. to keep the cache, you need a way to persist the ~/.cache/uv directory between job runs.
you can mount an azure blob storage container or a file share to your training job. then, you can configure uv to use a custom cache directory located on that mounted storage.
create a datastore that points to an azure storage account. then, in your job configuration, mount this datastore to a path inside the container, for example /uv_cache.
you need to tell uv to use this mounted path. you can set the UV_CACHE_DIR environment variable in your job to point to /uv_cache. uv will then use this persistent location for its cache instead of the local volatile disk.
your command would look something like this.
UV_CACHE_DIR=/uv_cache uv run --extra extra_group --no-dev --locked src/my_package/train.py
the first time the job runs, it will populate the cache in the blob storage. on subsequent runs, uv will find the already downloaded and compiled packages in that persistent location, making the dependency installation step much faster.
this pattern of using a mounted storage for a package cache is a universal technique. you could do the same thing for pip's cache or conda packages to speed up any python environment setup.
mount a persistent datastore to your job and set the UV_CACHE_DIR environment variable to a path on that mount. this should preserve your uv cache across pipeline runs.
regards,
Alex
and "yes" if you would follow me at Q&A - personaly thx.
P.S. If my answer help to you, please Accept my answer