Batch endpoint does not work with a specific model
Guilherme Matheus
120
Reputation points Microsoft Employee
I have a model that is working fine when I do training + batch inference. But when I do the same from another model I have, I get this error:
2025-10-14T17:05:53: #10 52.37 Collecting pyarrow<20,>=4.0.0 (from mlflow==2.22.2->-r /azureml-environment-setup/condaenv.byjt1f6w.requirements.txt (line 1))
2025-10-14T17:05:53: #10 52.37 Downloading pyarrow-6.0.0.tar.gz (769 kB)
2025-10-14T17:05:53: #10 52.37 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 769.6/769.6 kB 39.2 MB/s eta 0:00:00
2025-10-14T17:05:53: #10 52.37 Installing build dependencies: started
2025-10-14T17:05:53: #10 52.37 Installing build dependencies: finished with status 'error'
2025-10-14T17:05:53: #10 52.39
2025-10-14T17:05:53: #10 52.39 failed
2025-10-14T17:05:53: #10 52.39
2025-10-14T17:05:53: #10 52.39 CondaEnvException: Pip failed
2025-10-14T17:05:53: #10 52.39
2025-10-14T17:05:55: #10 ERROR: process "/bin/sh -c ldconfig /usr/local/cuda/lib64/stubs && conda env create -p /azureml-envs/azureml_f43a770854f1e887af78e52cfb84206a -f azureml-environment-setup/mutated_conda_dependencies.yml && rm -rf \"$HOME/.cache/pip\" && conda clean -aqy && CONDA_ROOT_DIR=$(conda info --root) && rm -rf \"$CONDA_ROOT_DIR/pkgs\" && find \"$CONDA_ROOT_DIR\" -type d -name __pycache__ -exec rm -rf {} + && ldconfig" did not complete successfully: exit code: 1
2025-10-14T17:05:55: ------
2025-10-14T17:05:55: > [ 6/10] RUN ldconfig /usr/local/cuda/lib64/stubs && conda env create -p /azureml-envs/azureml_f43a770854f1e887af78e52cfb84206a -f azureml-environment-setup/mutated_conda_dependencies.yml && rm -rf "$HOME/.cache/pip" && conda clean -aqy && CONDA_ROOT_DIR=$(conda info --root) && rm -rf "$CONDA_ROOT_DIR/pkgs" && find "$CONDA_ROOT_DIR" -type d -name __pycache__ -exec rm -rf {} + && ldconfig:
2025-10-14T17:05:55: 52.37 Collecting pyarrow<20,>=4.0.0 (from mlflow==2.22.2->-r /azureml-environment-setup/condaenv.byjt1f6w.requirements.txt (line 1))
2025-10-14T17:05:55: 52.37 Downloading pyarrow-6.0.0.tar.gz (769 kB)
2025-10-14T17:05:55: 52.37 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 769.6/769.6 kB 39.2 MB/s eta 0:00:00
2025-10-14T17:05:55: 52.37 Installing build dependencies: started
2025-10-14T17:05:55: 52.37 Installing build dependencies: finished with status 'error'
2025-10-14T17:05:55: 52.39
2025-10-14T17:05:55: 52.39 failed
2025-10-14T17:05:55: 52.39
2025-10-14T17:05:55: 52.39 CondaEnvException: Pip failed
2025-10-14T17:05:55: 52.39
2025-10-14T17:05:55: ------
2025-10-14T17:05:55: Dockerfile:8
2025-10-14T17:05:55: --------------------
2025-10-14T17:05:55: 6 | RUN if dpkg --compare-versions `conda --version | grep -oE '[^ ]+$'` lt 4.4.11; then conda install conda==4.4.11; fi
2025-10-14T17:05:55: 7 | COPY azureml-environment-setup/mutated_conda_dependencies.yml azureml-environment-setup/mutated_conda_dependencies.yml
2025-10-14T17:05:55: 8 | >>> RUN ldconfig /usr/local/cuda/lib64/stubs && conda env create -p /azureml-envs/azureml_f43a770854f1e887af78e52cfb84206a -f azureml-environment-setup/mutated_conda_dependencies.yml && rm -rf "$HOME/.cache/pip" && conda clean -aqy && CONDA_ROOT_DIR=$(conda info --root) && rm -rf "$CONDA_ROOT_DIR/pkgs" && find "$CONDA_ROOT_DIR" -type d -name __pycache__ -exec rm -rf {} + && ldconfig
2025-10-14T17:05:55: 9 | # AzureML Conda environment name: azureml_f43a770854f1e887af78e52cfb84206a
2025-10-14T17:05:55: 10 | ENV PATH /azureml-envs/azureml_f43a770854f1e887af78e52cfb84206a/bin:$PATH
2025-10-14T17:05:55: --------------------
2025-10-14T17:05:55: ERROR: failed to solve: process "/bin/sh -c ldconfig /usr/local/cuda/lib64/stubs && conda env create -p /azureml-envs/azureml_f43a770854f1e887af78e52cfb84206a -f azureml-environment-setup/mutated_conda_dependencies.yml && rm -rf \"$HOME/.cache/pip\" && conda clean -aqy && CONDA_ROOT_DIR=$(conda info --root) && rm -rf \"$CONDA_ROOT_DIR/pkgs\" && find \"$CONDA_ROOT_DIR\" -type d -name __pycache__ -exec rm -rf {} + && ldconfig" did not complete successfully: exit code: 1
2025-10-14T17:05:55: CalledProcessError(1, ['docker', 'build', '-f', 'azureml-environment-setup/Dockerfile', '.', '-t', 'gmatheus01rcrmlw.azurecr.io/azureml/azureml_72d9e2d0364abe0ef860b2a28faeafa4', '-t', 'gmatheus01rcrmlw.azurecr.io/azureml/azureml_72d9e2d0364abe0ef860b2a28faeafa4:1'])
2025-10-14T17:05:55: Building docker image failed with exit code: 1
2025-10-14T17:05:55: Logging out of Docker registry: gmatheus01rcrmlw.azurecr.io
2025-10-14T17:05:55: Removing login credentials for https://index.docker.io/v1/
2025-10-14T17:05:55: Traceback (most recent call last):
File "/mnt/azureml/cr/j/3ae789d2668641d0beb32b232a29395c/exe/wd/docker_utilities.py", line 152, in _docker_build_or_error
docker_execute_function(docker_command, build_command, print_command_args=True)
File "/mnt/azureml/cr/j/3ae789d2668641d0beb32b232a29395c/exe/wd/docker_utilities.py", line 23, in docker_execute_function
return killable_subprocess.check_call(command_args, *popen_args,
File "/mnt/azureml/cr/j/3ae789d2668641d0beb32b232a29395c/exe/wd/killable_subprocess.py", line 261, in check_call
raise subprocess.CalledProcessError(process.returncode, cmd)
subprocess.CalledProcessError: Command '['docker', 'build', '-f', 'azureml-environment-setup/Dockerfile', '.', '-t', 'gmatheus01rcrmlw.azurecr.io/azureml/azureml_72d9e2d0364abe0ef860b2a28faeafa4', '-t', 'gmatheus01rcrmlw.azurecr.io/azureml/azureml_72d9e2d0364abe0ef860b2a28faeafa4:1']' returned non-zero exit status 1.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "script.py", line 162, in <module>
docker_utilities._docker_build_or_error(
File "/mnt/azureml/cr/j/3ae789d2668641d0beb32b232a29395c/exe/wd/docker_utilities.py", line 156, in _docker_build_or_error
_write_error_and_exit(error_msg, error_file_path=error_file_path)
File "/mnt/azureml/cr/j/3ae789d2668641d0beb32b232a29395c/exe/wd/docker_utilities.py", line 217, in _write_error_and_exit
sys.exit(1)
SystemExit: 1
I don't know if that is the problem, but I was getting error of pyarrow version:
But after I updated my environment, I got the error I shared above. But, the interesting thing is that I am not using my custom environment in this batch deployment because we don't have a scoring script, so we wanted to use the auto-generated scoring script instead.
My failed batch:
By the way, this job was submitted by ADF using REST API.
Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
Sign in to answer