Batch endpoint does not work with a specific model

Guilherme Matheus 120 Reputation points Microsoft Employee
2025-10-15T17:37:31.8966667+00:00

I have a model that is working fine when I do training + batch inference. But when I do the same from another model I have, I get this error:

2025-10-14T17:05:53: #10 52.37 Collecting pyarrow<20,>=4.0.0 (from mlflow==2.22.2->-r /azureml-environment-setup/condaenv.byjt1f6w.requirements.txt (line 1))
2025-10-14T17:05:53: #10 52.37   Downloading pyarrow-6.0.0.tar.gz (769 kB)
2025-10-14T17:05:53: #10 52.37      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 769.6/769.6 kB 39.2 MB/s eta 0:00:00
2025-10-14T17:05:53: #10 52.37   Installing build dependencies: started
2025-10-14T17:05:53: #10 52.37   Installing build dependencies: finished with status 'error'
2025-10-14T17:05:53: #10 52.39 
2025-10-14T17:05:53: #10 52.39 failed
2025-10-14T17:05:53: #10 52.39 
2025-10-14T17:05:53: #10 52.39 CondaEnvException: Pip failed
2025-10-14T17:05:53: #10 52.39 
2025-10-14T17:05:55: #10 ERROR: process "/bin/sh -c ldconfig /usr/local/cuda/lib64/stubs && conda env create -p /azureml-envs/azureml_f43a770854f1e887af78e52cfb84206a -f azureml-environment-setup/mutated_conda_dependencies.yml && rm -rf \"$HOME/.cache/pip\" && conda clean -aqy && CONDA_ROOT_DIR=$(conda info --root) && rm -rf \"$CONDA_ROOT_DIR/pkgs\" && find \"$CONDA_ROOT_DIR\" -type d -name __pycache__ -exec rm -rf {} + && ldconfig" did not complete successfully: exit code: 1
2025-10-14T17:05:55: ------
2025-10-14T17:05:55:  > [ 6/10] RUN ldconfig /usr/local/cuda/lib64/stubs && conda env create -p /azureml-envs/azureml_f43a770854f1e887af78e52cfb84206a -f azureml-environment-setup/mutated_conda_dependencies.yml && rm -rf "$HOME/.cache/pip" && conda clean -aqy && CONDA_ROOT_DIR=$(conda info --root) && rm -rf "$CONDA_ROOT_DIR/pkgs" && find "$CONDA_ROOT_DIR" -type d -name __pycache__ -exec rm -rf {} + && ldconfig:
2025-10-14T17:05:55: 52.37 Collecting pyarrow<20,>=4.0.0 (from mlflow==2.22.2->-r /azureml-environment-setup/condaenv.byjt1f6w.requirements.txt (line 1))
2025-10-14T17:05:55: 52.37   Downloading pyarrow-6.0.0.tar.gz (769 kB)
2025-10-14T17:05:55: 52.37      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 769.6/769.6 kB 39.2 MB/s eta 0:00:00
2025-10-14T17:05:55: 52.37   Installing build dependencies: started
2025-10-14T17:05:55: 52.37   Installing build dependencies: finished with status 'error'
2025-10-14T17:05:55: 52.39 
2025-10-14T17:05:55: 52.39 failed
2025-10-14T17:05:55: 52.39 
2025-10-14T17:05:55: 52.39 CondaEnvException: Pip failed
2025-10-14T17:05:55: 52.39 
2025-10-14T17:05:55: ------
2025-10-14T17:05:55: Dockerfile:8
2025-10-14T17:05:55: --------------------
2025-10-14T17:05:55:    6 |     RUN if dpkg --compare-versions `conda --version | grep -oE '[^ ]+$'` lt 4.4.11; then conda install conda==4.4.11; fi
2025-10-14T17:05:55:    7 |     COPY azureml-environment-setup/mutated_conda_dependencies.yml azureml-environment-setup/mutated_conda_dependencies.yml
2025-10-14T17:05:55:    8 | >>> RUN ldconfig /usr/local/cuda/lib64/stubs && conda env create -p /azureml-envs/azureml_f43a770854f1e887af78e52cfb84206a -f azureml-environment-setup/mutated_conda_dependencies.yml && rm -rf "$HOME/.cache/pip" && conda clean -aqy && CONDA_ROOT_DIR=$(conda info --root) && rm -rf "$CONDA_ROOT_DIR/pkgs" && find "$CONDA_ROOT_DIR" -type d -name __pycache__ -exec rm -rf {} + && ldconfig
2025-10-14T17:05:55:    9 |     # AzureML Conda environment name: azureml_f43a770854f1e887af78e52cfb84206a
2025-10-14T17:05:55:   10 |     ENV PATH /azureml-envs/azureml_f43a770854f1e887af78e52cfb84206a/bin:$PATH
2025-10-14T17:05:55: --------------------
2025-10-14T17:05:55: ERROR: failed to solve: process "/bin/sh -c ldconfig /usr/local/cuda/lib64/stubs && conda env create -p /azureml-envs/azureml_f43a770854f1e887af78e52cfb84206a -f azureml-environment-setup/mutated_conda_dependencies.yml && rm -rf \"$HOME/.cache/pip\" && conda clean -aqy && CONDA_ROOT_DIR=$(conda info --root) && rm -rf \"$CONDA_ROOT_DIR/pkgs\" && find \"$CONDA_ROOT_DIR\" -type d -name __pycache__ -exec rm -rf {} + && ldconfig" did not complete successfully: exit code: 1


2025-10-14T17:05:55: CalledProcessError(1, ['docker', 'build', '-f', 'azureml-environment-setup/Dockerfile', '.', '-t', 'gmatheus01rcrmlw.azurecr.io/azureml/azureml_72d9e2d0364abe0ef860b2a28faeafa4', '-t', 'gmatheus01rcrmlw.azurecr.io/azureml/azureml_72d9e2d0364abe0ef860b2a28faeafa4:1'])

2025-10-14T17:05:55: Building docker image failed with exit code: 1

2025-10-14T17:05:55: Logging out of Docker registry: gmatheus01rcrmlw.azurecr.io
2025-10-14T17:05:55: Removing login credentials for https://index.docker.io/v1/


2025-10-14T17:05:55: Traceback (most recent call last):
  File "/mnt/azureml/cr/j/3ae789d2668641d0beb32b232a29395c/exe/wd/docker_utilities.py", line 152, in _docker_build_or_error
    docker_execute_function(docker_command, build_command, print_command_args=True)
  File "/mnt/azureml/cr/j/3ae789d2668641d0beb32b232a29395c/exe/wd/docker_utilities.py", line 23, in docker_execute_function
    return killable_subprocess.check_call(command_args, *popen_args,
  File "/mnt/azureml/cr/j/3ae789d2668641d0beb32b232a29395c/exe/wd/killable_subprocess.py", line 261, in check_call
    raise subprocess.CalledProcessError(process.returncode, cmd)
subprocess.CalledProcessError: Command '['docker', 'build', '-f', 'azureml-environment-setup/Dockerfile', '.', '-t', 'gmatheus01rcrmlw.azurecr.io/azureml/azureml_72d9e2d0364abe0ef860b2a28faeafa4', '-t', 'gmatheus01rcrmlw.azurecr.io/azureml/azureml_72d9e2d0364abe0ef860b2a28faeafa4:1']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "script.py", line 162, in <module>
    docker_utilities._docker_build_or_error(
  File "/mnt/azureml/cr/j/3ae789d2668641d0beb32b232a29395c/exe/wd/docker_utilities.py", line 156, in _docker_build_or_error
    _write_error_and_exit(error_msg, error_file_path=error_file_path)
  File "/mnt/azureml/cr/j/3ae789d2668641d0beb32b232a29395c/exe/wd/docker_utilities.py", line 217, in _write_error_and_exit
    sys.exit(1)
SystemExit: 1

I don't know if that is the problem, but I was getting error of pyarrow version:

User's image

But after I updated my environment, I got the error I shared above. But, the interesting thing is that I am not using my custom environment in this batch deployment because we don't have a scoring script, so we wanted to use the auto-generated scoring script instead.

My failed batch:

User's image

By the way, this job was submitted by ADF using REST API.

Azure Machine Learning
{count} votes

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.