Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
This article describes how to train and track the iterations of a scikit-learn model. Scikit-learn is a popular open-source machine learning framework frequently used for supervised and unsupervised learning. The framework provides tools for model fitting, data preprocessing, model selection, model evaluation, and more.
Prerequisites
Install or upgrade scikit-learn in your notebook with the following command:
pip install scikit-learn
Set up a machine learning experiment
Create a machine learning experiment with the MLflow API. The MLflow set_experiment() function creates a machine learning experiment named sample-sklearn if it doesn't exist.
Run the following code to create the experiment:
import mlflow
mlflow.set_experiment("sample-sklearn")
Train a scikit-learn model
After you set up the experiment, create a sample dataset and train a logistic regression model. The following code starts an MLflow run and tracks metrics, parameters, and the final logistic regression model. After you generate the final model, save it to track it.
Run the following code to create the sample dataset and train the logistic regression model:
import mlflow.sklearn
import numpy as np
from sklearn.linear_model import LogisticRegression
from mlflow.models.signature import infer_signature
with mlflow.start_run() as run:
lr = LogisticRegression()
X = np.array([-2, -1, 0, 1, 2, 1]).reshape(-1, 1)
y = np.array([0, 0, 1, 1, 1, 0])
lr.fit(X, y)
score = lr.score(X, y)
signature = infer_signature(X, y)
print("log_metric.")
mlflow.log_metric("score", score)
print("log_params.")
mlflow.log_param("alpha", "alpha")
print("log_model.")
mlflow.sklearn.log_model(lr, "sklearn-model", signature=signature)
print("Model saved in run_id=%s" % run.info.run_id)
print("register_model.")
mlflow.register_model(
"runs:/{}/sklearn-model".format(run.info.run_id), "sample-sklearn"
)
print("All done")
Load and evaluate the model on a sample dataset
After you save the model, load it for inference.
Run the following code in your notebook to load the model and generate predictions on a sample dataset:
# Run inference with the logged model
import numpy as np
from synapse.ml.predict import MLFlowTransformer
spark.conf.set("spark.synapse.ml.predict.enabled", "true")
model = MLFlowTransformer(
inputCols=["x"],
outputCol="prediction",
modelName="sample-sklearn",
modelVersion=1,
)
test_spark = spark.createDataFrame(
data=np.array([-2, -1, 0, 1, 2, 1]).reshape(-1, 1).tolist(), schema=["x"]
)
batch_predictions = model.transform(test_spark)
batch_predictions.show()
Related content
- Explore machine learning models.
- Create machine learning experiments.