FeaturizationConfig Class

Defines feature engineering configuration for automated machine learning experiments in Azure Machine Learning.

Use the FeaturizationConfig class in the featurization parameter of the AutoMLConfig class. For more information, see Configure automated ML experiments.

Create a FeaturizationConfig.

Constructor

FeaturizationConfig(blocked_transformers: List[str] | None = None, column_purposes: Dict[str, str] | None = None, transformer_params: Dict[str, List[Tuple[List[str], Dict[str, Any]]]] | None = None, drop_columns: List[str] | None = None, dataset_language: str | None = None, prediction_transform_type: str | None = None)

Parameters

Name	Description
blocked_transformers	list(str) A list of transformer names to be blocked during featurization. Default value: None
column_purposes	dict A dictionary of column names and feature types used to update column purpose. Default value: None
transformer_params	dict A dictionary of transformer and corresponding customization parameters. Default value: None
drop_columns	list(str) A list of columns to be ignored in the featurization process. This setting is being deprecated. Please drop columns from your datasets as part of your data preparation process before providing the datasets to AutoML. Default value: None
prediction_transform_type	str A str of target transform type to be used to cast target column type. Default value: None
blocked_transformers Required	list(str) A list of transformer names to be blocked during featurization.
column_purposes Required	dict A dictionary of column names and feature types used to update column purpose.
transformer_params Required	dict A dictionary of transformer and corresponding customization parameters.
drop_columns Required	list(str) A list of columns to be ignored in the featurization process. This setting is being deprecated. Please drop columns from your datasets as part of your data preparation process before providing the datasets to AutoML.
dataset_language	str Three character ISO 639-3 code for the language(s) contained in the dataset. Languages other than English are only supported if you use GPU-enabled compute. The langugage_code 'mul' should be used if the dataset contains multiple languages. To find ISO 639-3 codes for different languages, please refer to https://en.wikipedia.org/wiki/List_of_ISO_639-3_codes. Default value: None
prediction_transform_type Required	str A str of target transform type to be used to cast target column type.

Remarks

Featurization customization has methods that allow you to:

Add or remove column purpose. With the add_column_purpose and remove_column_purpose methods you can override the feature type for specified columns, for example, when the feature type of column does not correctly reflect its purpose. The add method supports adding all the feature types given in the FULL_SET attribute of the FeatureType class.
Add or remove transformer parameters. With the add_transformer_params and remove_transformer_params methods you can change the parameters of customizable transformers like Imputer, HashOneHotEncoder, and TfIdf. Customizable transformers are listed in the SupportedTransformers class CUSTOMIZABLE_TRANSFORMERS attribute. Use the get_transformer_params to lookup customization parameters.
Block transformers. Block transformers to be used for the featurization process with the add_blocked_transformers method. The transformers must be one of the transformers listed in the SupportedTransformers class BLOCKED_TRANSFORMERS attribute.
Add a drop column to ignore for featurization and training with the add_drop_columns method. For example, you can drop a column that doesn't contain useful information.
Add or remove prediction transform type. With add_prediction_transform_type and

remove_prediction_transform_type methods you can override the existing target column type. Prediction transform types are listed in the PredictionTransformTypes attribute.

The following code example shows how to customize featurization in automated ML for forecasting. In the example code, dropping a column and adding transform parameters are shown.


   featurization_config = FeaturizationConfig()
   # Force the CPWVOL5 feature to be numeric type.
   featurization_config.add_column_purpose("CPWVOL5", "Numeric")
   # Fill missing values in the target column, Quantity, with zeros.
   featurization_config.add_transformer_params(
       "Imputer", ["Quantity"], {"strategy": "constant", "fill_value": 0}
   )
   # Fill missing values in the INCOME column with median value.
   featurization_config.add_transformer_params(
       "Imputer", ["INCOME"], {"strategy": "median"}
   )
   # Fill missing values in the Price column with forward fill (last value carried forward).
   featurization_config.add_transformer_params("Imputer", ["Price"], {"strategy": "ffill"})

Full sample is available from https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/automated-machine-learning/forecasting-orange-juice-sales/auto-ml-forecasting-orange-juice-sales.ipynb

The next example shows customizing featurization in a regression problem using the Hardware Performance Dataset. In the example code, a blocked transformer is defined, column purposes are added, and transformer parameters are added.


   featurization_config = FeaturizationConfig()
   featurization_config.blocked_transformers = ["LabelEncoder"]
   # featurization_config.drop_columns = ['MMIN']
   featurization_config.add_column_purpose("MYCT", "Numeric")
   featurization_config.add_column_purpose("VendorName", "CategoricalHash")
   # default strategy mean, add transformer param for for 3 columns
   featurization_config.add_transformer_params("Imputer", ["CACH"], {"strategy": "median"})
   featurization_config.add_transformer_params(
       "Imputer", ["CHMIN"], {"strategy": "median"}
   )
   featurization_config.add_transformer_params(
       "Imputer", ["PRP"], {"strategy": "most_frequent"}
   )
   # featurization_config.add_transformer_params('HashOneHotEncoder', [], {"number_of_bits": 3})

Full sample is available from https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/automated-machine-learning/regression-explanation-featurization/auto-ml-regression-explanation-featurization.ipynb

The FeaturizationConfig defined in the code example above can then used in the configuration of an automated ML experiment as shown in the next code example.


   automl_settings = {
       "enable_early_stopping": True,
       "experiment_timeout_hours": 0.25,
       "max_concurrent_iterations": 4,
       "max_cores_per_iteration": -1,
       "n_cross_validations": 5,
       "primary_metric": "normalized_root_mean_squared_error",
       "verbosity": logging.INFO,
   }

   automl_config = AutoMLConfig(
       task="regression",
       debug_log="automl_errors.log",
       compute_target=compute_target,
       featurization=featurization_config,
       training_data=train_data,
       label_column_name=label,
       **automl_settings,
   )

Methods

add_blocked_transformers	Add transformers to be blocked.
add_column_purpose	Add a feature type for the specified column.
add_drop_columns	Add column name or list of column names to ignore.
add_prediction_transform_type	Add a prediction transform type for target column. PredictionTransformTypes class. :type prediction_transform_type: str
add_transformer_params	Add customized transformer parameters to the list of custom transformer parameters. Apply to all columns if column list is empty.
get_transformer_params	Retrieve transformer customization parameters for columns.
remove_column_purpose	Remove the feature type for the specified column. If no feature is specified for a column, the detected default feature is used.
remove_prediction_transform_type	Revert the prediction transform type to default for target column.
remove_transformer_params	Remove transformer customization parameters for specific column or all columns.

add_blocked_transformers

Add transformers to be blocked.

add_blocked_transformers(transformers: str | List[str]) -> None

Parameters

Name	Description
transformers Required	str or list[str] A transformer name or list of transformer names. Transformer names must be one of the transformers listed in the BLOCKED_TRANSFORMERS attribute of the SupportedTransformers class.

add_column_purpose

Add a feature type for the specified column.

add_column_purpose(column_name: str, feature_type: str) -> None

Parameters

Name	Description
column_name Required	str A column name to update.
feature_type Required	FeatureType A feature type to use for the column. Feature types must be one given in the FULL_SET attribute of the FeatureType class.

add_drop_columns

Add column name or list of column names to ignore.

add_drop_columns(drop_columns: str | List[str]) -> None

Parameters

Name	Description
drop_columns Required	str or list[str] A column name or list of column names.

add_prediction_transform_type

Add a prediction transform type for target column.

PredictionTransformTypes class. :type prediction_transform_type: str

add_prediction_transform_type(prediction_transform_type: str) -> None

Parameters

Name	Description
prediction_transform_type Required	A prediction transform type to be used for casting target column. Feature types must be one given in the FULL_SET attribute of the

add_transformer_params

Add customized transformer parameters to the list of custom transformer parameters.

Apply to all columns if column list is empty.

add_transformer_params(transformer: str, cols: List[str], params: Dict[str, Any]) -> None

Parameters

Name	Description
transformer Required	str The transformer name. The transformer name must be one of the CUSTOMIZABLE_TRANSFORMERS listed in the SupportedTransformers class.
cols Required	list(str) Input columns for specified transformer. Some transformers can take multiple columns as input specified as a list.
params Required	dict A dictionary of keywords and arguments.

Remarks

The following code example shows how to customize featurization in automated ML for forecasting. In the example code, dropping a column and adding transform parameters are shown.


   featurization_config = FeaturizationConfig()
   # Force the CPWVOL5 feature to be numeric type.
   featurization_config.add_column_purpose("CPWVOL5", "Numeric")
   # Fill missing values in the target column, Quantity, with zeros.
   featurization_config.add_transformer_params(
       "Imputer", ["Quantity"], {"strategy": "constant", "fill_value": 0}
   )
   # Fill missing values in the INCOME column with median value.
   featurization_config.add_transformer_params(
       "Imputer", ["INCOME"], {"strategy": "median"}
   )
   # Fill missing values in the Price column with forward fill (last value carried forward).
   featurization_config.add_transformer_params("Imputer", ["Price"], {"strategy": "ffill"})

get_transformer_params

Retrieve transformer customization parameters for columns.

get_transformer_params(transformer: str, cols: List[str]) -> Dict[str, Any]

Parameters

Name	Description
transformer Required	str The transformer name. The transformer name must be one of the CUSTOMIZABLE_TRANSFORMERS listed in the SupportedTransformers class.
cols Required	list[str] The columns names to get information for. Use an empty list to specify all columns.

Returns

Type	Description
dict	Transformer parameter settings.

remove_column_purpose

Remove the feature type for the specified column.

If no feature is specified for a column, the detected default feature is used.

remove_column_purpose(column_name: str) -> None

Parameters

Name	Description
column_name Required	str The column name to update.

remove_prediction_transform_type

Revert the prediction transform type to default for target column.

remove_prediction_transform_type() -> None

remove_transformer_params

Remove transformer customization parameters for specific column or all columns.

remove_transformer_params(transformer: str, cols: List[str] | None = None) -> None

Parameters

Name	Description
transformer Required	str The transformer name. The transformer name must be one of the CUSTOMIZABLE_TRANSFORMERS listed in the SupportedTransformers class.
cols	list[str] or None The columns names to remove customization parameters from. Specify None (the default) to remove all customization params for the specified transformer. Default value: None

Share via

FeaturizationConfig Class

Constructor

Parameters

Remarks

Methods

add_blocked_transformers

Parameters

add_column_purpose

Parameters

add_drop_columns

Parameters

add_prediction_transform_type

Parameters

add_transformer_params

Parameters

Remarks

get_transformer_params

Parameters

Returns

remove_column_purpose

Parameters

remove_prediction_transform_type

remove_transformer_params

Parameters

Attributes

blocked_transformers

column_purposes

dataset_language

drop_columns

prediction_transform_type

transformer_params

Feedback