Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
This article describes the operational semantics of triggered and continuous pipeline modes for Lakeflow Declarative Pipelines.
Pipeline mode is independent of the type of table being computed. Both materialized views and Streaming tables can be updated in either pipeline mode.
To change between triggered and continuous, use the Pipeline mode option in the pipeline settings while creating or editing a pipeline. See Configure Lakeflow Declarative Pipelines.
Note
Refresh operations for materialized views and Streaming tables defined in Databricks SQL always run using triggered pipeline mode.
What is triggered pipeline mode?
If the pipeline uses triggered mode, the system stops processing after successfully refreshing all tables or selected tables, ensuring each table in the update is refreshed based on the data available when the update starts.
What is continuous pipeline mode?
If the pipeline uses continuous execution, Lakeflow Declarative Pipelines processes new data as it arrives in data sources to keep tables throughout the pipeline fresh.
To avoid unnecessary processing in continuous execution mode, pipelines automatically monitor dependent Delta tables and perform an update only when the contents of those dependent tables have changed.
Choose a data pipeline mode
The following table highlights the differences between triggered and continuous pipeline modes:
| Key questions | Triggered | Continuous | 
|---|---|---|
| When does the update stop? | Automatically once complete. | Runs continuously until manually stopped. | 
| What data is processed? | Data available when the update starts. | All data as it arrives at configured sources. | 
| What data freshness requirements is this best for? | Data updates run every 10 minutes, hourly, or daily. | Data updates are desired between every 10 seconds and a few minutes. | 
Triggered pipelines can reduce resource consumption and expense because the cluster runs only long enough to update the pipeline. However, new data won't be processed until the pipeline is triggered. Continuous pipelines require an always-running cluster, which is more expensive but reduces processing latency.
Set trigger interval for continuous pipelines
When configuring pipelines for continuous mode, you can set trigger intervals to control how frequently the pipeline starts an update for each flow.
You can use pipelines.trigger.interval to control the trigger interval for a flow updating a table or an entire pipeline. Because a triggered pipeline processes each table once, the pipelines.trigger.interval is used only with continuous pipelines.
Databricks recommends setting pipelines.trigger.interval on individual tables because streaming and batch queries have different defaults. Set the value on a pipeline only when processing requires controlling updates for the entire pipeline graph.
You set pipelines.trigger.interval on a table using spark_conf in Python or SET in SQL:
@dp.table(
  spark_conf={"pipelines.trigger.interval" : "10 seconds"}
)
def <function-name>():
    return (<query>)
SET pipelines.trigger.interval=10 seconds;
CREATE OR REFRESH MATERIALIZED VIEW TABLE_NAME
AS SELECT ...
To set pipelines.trigger.interval on a pipeline, add it to the configuration object in the pipeline settings:
{
  "configuration": {
    "pipelines.trigger.interval": "10 seconds"
  }
}