Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
This article describes how to log your trained machine learning models, or artifacts, as MLflow models. MLflow is an open-source framework for managing machine learning workflows. This article explores various options for customizing the way that MLflow packages and runs models.
Prerequisites
- The MLflow SDK mlflowpackage
Why log models instead of artifacts?
An MLflow model is a type of artifact. However, a model has a specific structure that serves as a contract between the person that creates the model and the person that intends to use it. This contract helps build a bridge between the artifacts themselves and their meanings.
For the difference between logging artifacts, or files, and logging MLflow models, see Artifacts and models in MLflow.
You can log your model's files as artifacts, but model logging offers the following advantages:
- You can use mlflow.<flavor>.load_modelto directly load models for inference, and you can use thepredictfunction.
- Pipeline inputs can use models directly.
- You can deploy models without specifying a scoring script or an environment.
- Swagger is automatically turned on in deployed endpoints. As a result, you can use the test feature in Azure Machine Learning studio to test models.
- You can use the Responsible AI dashboard. For more information, see Use the Responsible AI dashboard in Azure Machine Learning studio.
Use automatic logging to log models
You can use MLflow autolog functionality to automatically log models. When you use automatic logging, MLflow captures all relevant metrics, parameters, artifacts, and models in your framework. The data that's logged depends on the framework. By default, if automatic logging is turned on, most models are logged. In some situations, some flavors don't log models. For instance, the PySpark flavor doesn't log models that exceed a certain size.
Use either mlflow.autolog or mlflow.<flavor>.autolog to activate automatic logging. The following code uses autolog to log a classifier model that's trained with XGBoost:
import mlflow
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score
mlflow.autolog()
model = XGBClassifier(use_label_encoder=False, eval_metric="logloss")
model.fit(X_train, y_train, eval_set=[(X_test, y_test)], verbose=False)
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
Tip
If you use machine learning pipelines, for example scikit-learn pipelines, use the autolog functionality of that pipeline flavor to log models. Model logging is automatically run when the fit method is called on the pipeline object. For a notebook that logs a model, includes preprocessing, and uses pipelines, see Training and tracking an XGBoost classifier with MLflow.
Log models that use a custom signature, environment, or samples
You can use the MLflow mlflow.<flavor>.log_model method to manually log models. This workflow offers control over various aspects of model logging.
Use this method when:
- You want to indicate a Conda environment or pip packages that differ from the automatically detected packages or environment.
- You want to include input examples.
- You want to include specific artifacts in the package that you need.
- The autologmethod doesn't correctly infer your signature. This case comes up when you work with tensor inputs, which require the signature to have a specific shape.
- The autologmethod doesn't meet all your needs.
The following code logs an XGBoost classifier model:
import mlflow
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score
from mlflow.models import infer_signature
from mlflow.utils.environment import _mlflow_conda_env
mlflow.autolog(log_models=False)
model = XGBClassifier(use_label_encoder=False, eval_metric="logloss")
model.fit(X_train, y_train, eval_set=[(X_test, y_test)], verbose=False)
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
# Infer the signature.
signature = infer_signature(X_test, y_test)
# Set up a Conda environment.
custom_env =_mlflow_conda_env(
    additional_conda_deps=None,
    additional_pip_deps=["xgboost==1.5.2"],
    additional_conda_channels=None,
)
# Sample the data.
input_example = X_train.sample(n=1)
# Log the model manually.
mlflow.xgboost.log_model(model, 
                         artifact_path="classifier", 
                         conda_env=custom_env,
                         signature=signature,
                         input_example=input_example)
Note
- The call to autologuses a configuration oflog_models=False. This setting turns off automatic MLflow model logging. Thelog_modelmethod is used later to manually log the model.
- The infer_signaturemethod is used to try to infer the signature directly from inputs and outputs.
- The mlflow.utils.environment._mlflow_conda_envmethod is a private method in the MLflow SDK. In this example, it streamlines the code. But use this method with caution, because it might change in the future. As an alternative, you can generate the YAML definition manually as a Python dictionary.
Log models that use modified prediction behavior
When you use mlflow.autolog or mlflow.<flavor>.log_model to log a model, the model flavor determines how the inference is performed. The flavor also determines what the model returns. MLflow doesn't enforce specific behavior about the generation of predict results. In some scenarios, you might want to preprocess or post-process your data.
In this situation, you can implement machine learning pipelines that directly move from inputs to outputs. Although this type of implementation can sometimes improve performance, it can be challenging to achieve. In such cases, it can be helpful to customize how your model handles inference. For more information, see the next section, Log custom models.
Log custom models
MLflow supports many machine learning frameworks, including the following flavors:
- CatBoost
- FastAI
- h2o
- Keras
- LightGBM
- MLeap
- ONNX
- Prophet
- PyTorch
- scikit-learn
- spaCy
- Spark MLlib
- statsmodels
- TensorFlow
- XGBoost
For a complete list, see Built-In Model Flavors.
However, you might need to change the way a flavor works or log a model that MLflow doesn't natively support. Or you might need to log a model that uses multiple elements from various frameworks. In these cases, you can create a custom model flavor.
To solve the problem, MLflow offers the PyFunc flavor, a default model interface for Python models. This flavor can log any object as a model as long as that object satisfies two conditions:
- You implement at least the predictmethod.
- The Python object inherits from the mlflow.pyfunc.PythonModelclass.
Tip
Serializable models that implement the scikit-learn API can use the scikit-learn flavor to log the model, regardless of whether the model was built with scikit-learn. If you can persist your model in Pickle format, and the object has at least the predict and predict_proba methods, you can use mlflow.sklearn.log_model to log the model inside an MLflow run.
The easiest way to create a flavor for your custom model is to create a wrapper around your existing model object. MLflow serializes and packages your model for you. Python objects are serializable when the object can be stored in the file system as a file, generally in Pickle format. At runtime, the object can be loaded from that file. Loading restores all the values, properties, and methods that are available when it's saved.
Use this method when:
- You can serialize your model in Pickle format.
- You want to retain the state of the model just after training.
- You want to customize how the predictfunction works.
The following code wraps a model created with XGBoost so that it behaves differently than the XGBoost flavor default implementation. It returns probabilities instead of classes.
from mlflow.pyfunc import PythonModel, PythonModelContext
class ModelWrapper(PythonModel):
    def __init__(self, model):
        self._model = model
    def predict(self, context: PythonModelContext, data):
        # The next line uses a prediction function. However, you could also use model.recommend(), model.forecast(), or a similar function instead.
        return self._model.predict_proba(data)
    # You can add extra functions if you need to. Because the model is serialized,
    # all of them are available when you load your model.
    def predict_batch(self, data):
        pass
Use the following code to log a custom model during a run:
import mlflow
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score
from mlflow.models import infer_signature
mlflow.xgboost.autolog(log_models=False)
model = XGBClassifier(use_label_encoder=False, eval_metric="logloss")
model.fit(X_train, y_train, eval_set=[(X_test, y_test)], verbose=False)
y_probs = model.predict_proba(X_test)
accuracy = accuracy_score(y_test, y_probs.argmax(axis=1))
mlflow.log_metric("accuracy", accuracy)
signature = infer_signature(X_test, y_probs)
mlflow.pyfunc.log_model("classifier", 
                        python_model=ModelWrapper(model),
                        signature=signature)
Tip
In the preceding code, the infer_signature method uses y_probs to infer the signature. The target column contains the target class, but the model returns two probabilities for each class.