Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
This article shows you how to use covariates, also known as external regressors, to improve AutoML forecasting models.
Covariates are additional variables outside the target time series that can improve forecasting models. For example, if you're forecasting hotel occupancy rates, knowing if it's the weekend could help predict customer behavior.
In this example, you:
- Create a randomized time-series dataset.
- Perform basic feature engineering work.
- Store the dataset as a
FeatureStoretable. - Use the
FeatureStoreas covariates in an AutoML forecasting experiment.
Create the data
This example uses randomly generated time series data for hotel occupancy rates in January 2024. Then, use AutoML to predict the occupancy_rate for the first day of February 2024.
Run the following code to generate the sample data.
df = spark.sql("""SELECT explode(sequence(to_date('2024-01-01'), to_date('2024-01-31'), interval 1 day)) as date, rand() as occupancy_rate FROM (SELECT 1 as id) tmp ORDER BY date""")
display(df)
Feature engineering
Use the sample dataset to feature engineer a feature called is_weekend that a binary classifier of whether or not a date is a weekend.
from pyspark.sql.functions import dayofweek, when
def compute_hotel_weekend_features(df):
''' is_weekend feature computation code returns a DataFrame with 'date' as primary key'''
return df.select("date").withColumn(
"is_weekend",
when(dayofweek("date").isin( 1, 2, 3, 4, 5), 0) # Weekday
.when(dayofweek("date").isin(6, 7), 1) # Weekend
)
hotel_weekend_feature_df = compute_hotel_weekend_features(df)
Create the Feature Store
To use covariates on AutoML, you must use a Feature Store to join one or more covariate feature tables with the primary training data in AutoML.
Store the data frame hotel_weather_feature_df as a Feature Store.
from databricks.feature_engineering import FeatureEngineeringClient
fe = FeatureEngineeringClient()
hotel_weekend_feature_table = fe.create_table(
name='ml.default.hotel_weekend_features', # change to desired location
primary_keys=['date'],
df=hotel_weekend_feature_df,
description='Hotel is_weekend features table'
)
Note
This example uses the Python FeatureEngineeringClient to create and write tables. However, you can also use SQL or DeltaLiveTables to write and create tables. See Work with feature tables in Unity Catalog for more options.
Configure the AutoML experiment
Use the feature_store_lookups parameter to pass the Feature Store to AutoML. feature_store_lookups contains a dictionary with two fields: table_name and lookup_key.
hotel_weekend_feature_lookup = {
"table_name": "ml.default.hotel_weekend_features", # change to location set above
"lookup_key": ["date"]
}
feature_lookups = [hotel_weekend_feature_lookup]
Note
feature_store_lookups can contain multiple feature table lookups.
Run the AutoML experiment
Use the following code to pass the features_lookups to an AutoML experiment API call.
from databricks import automl
summary = automl.forecast(dataset=df, target_col="occupancy_rate", time_col="date", frequency="d", horizon=1, timeout_minutes=30, identity_col=None, feature_store_lookups=feature_lookups)