Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Applies to:
Databricks SQL
Databricks Runtime
Merges a set of updates, insertions, and deletions based on a source table into a target Delta table.
This statement is supported only for Delta Lake tables.
This page contains details for using the correct syntax with the MERGE command. See Upsert into a Delta Lake table using merge for more guidance on how to use MERGE operations to manage your data.
Syntax
[ common_table_expression ]
MERGE [ WITH SCHEMA EVOLUTION ] INTO target_table_name [target_alias]
USING source_table_reference [source_alias]
ON merge_condition
{ WHEN MATCHED [ AND matched_condition ] THEN matched_action |
WHEN NOT MATCHED [BY TARGET] [ AND not_matched_condition ] THEN not_matched_action |
WHEN NOT MATCHED BY SOURCE [ AND not_matched_by_source_condition ] THEN not_matched_by_source_action } [...]
matched_action
{ DELETE |
UPDATE SET * |
UPDATE SET { column = { expr | DEFAULT } } [, ...] }
not_matched_action
{ INSERT * |
INSERT (column1 [, ...] ) VALUES ( expr | DEFAULT ] [, ...] )
not_matched_by_source_action
{ DELETE |
UPDATE SET { column = { expr | DEFAULT } } [, ...] }
Parameters
-
Common table expressions (CTE) are one or more named queries which can be reused multiple times within the main query block to avoid repeated computations or to improve readability of complex, nested queries.
WITH SCHEMA EVOLUTIONApplies to:
Databricks Runtime 15.2 and aboveEnables automatic schema evolution for this
MERGEoperation. When enabled, the schema of the target Delta table is automatically updated to match the schema of the source table.-
A Table name identifying the table being modified. The table referenced must be a Delta table.
The name must not include an options specification.
The table must not be a foreign table.
-
A Table alias for the target table. The alias must not include a column list.
-
A Table name identifying the source table to be merged into the target table.
-
A Table alias for the source table. The alias must not include a column list.
-
How the rows from one relation are combined with the rows of another relation. An expression with a return type of BOOLEAN.
WHEN MATCHED [ ANDmatched_condition]WHEN MATCHEDclauses are executed when a source row matches a target table row based on themerge_conditionand the optionalmatch_condition.matched_action
DELETEDeletes the matching target table row.
Multiple matches are allowed when matches are unconditionally deleted. An unconditional delete is not ambiguous, even if there are multiple matches.
UPDATEUpdates the matched target table row.
To update all the columns of the target Delta table with the corresponding columns of the source dataset, use
UPDATE SET *. This is equivalent toUPDATE SET col1 = source.col1 [, col2 = source.col2 ...]for all the columns of the target Delta table. Therefore, this action assumes that the source table has the same columns as those in the target table, otherwise the query will throw an analysis error.Note
This behavior changes when automatic schema evolution is enabled. See Automatic schema evolution for Delta Lake merge for details.
Applies to:
Databricks SQL
Databricks Runtime 11.3 LTS and aboveYou can specify
DEFAULTasexprto explicitly update the column to its default value.
If there are multiple
WHEN MATCHEDclauses, then they are evaluated in the order they are specified. EachWHEN MATCHEDclause, except the last one, must have amatched_condition. Otherwise, the query returns a NON_LAST_MATCHED_CLAUSE_OMIT_CONDITION error.If none of the
WHEN MATCHEDconditions evaluate to true for a source and target row pair that matches themerge_condition, then the target row is left unchanged.WHEN NOT MATCHED [BY TARGET] [ ANDnot_matched_condition]WHEN NOT MATCHEDclauses insert a row when a source row does not match any target row based on themerge_conditionand the optionalnot_matched_condition.Applies to:
Databricks SQL
Databricks Runtime 12.2 LTS and aboveWHEN NOT MATCHED BY TARGETcan be used as an alias forWHEN NOT MATCHED.not_matched_conditionmust be a Boolean expression.INSERT *Inserts all the columns of the target Delta table with the corresponding columns of the source dataset. This is equivalent to
INSERT (col1 [, col2 ...]) VALUES (source.col1 [, source.col2 ...])for all the columns of the target Delta table. This action requires that the source table has the same columns as those in the target table.Note
This behavior changes when automatic schema evolution is enabled. See Automatic schema evolution for Delta Lake merge for details.
INSERT ( ... ) VALUES ( ... )The new row is generated based on the specified column and corresponding expressions. All the columns in the target table do not need to be specified. For unspecified target columns, the column default is inserted, or
NULLif none exists.Applies to:
Databricks SQL
Databricks Runtime 11.3 LTS and aboveYou can specify
DEFAULTas an expression to explicitly insert the column default for a target column.
If there are multiple
WHEN NOT MATCHEDclauses, then they are evaluated in the order they are specified. AllWHEN NOT MATCHEDclauses, except the last one, must havenot_matched_conditions. Otherwise, the query returns a NON_LAST_NOT_MATCHED_CLAUSE_OMIT_CONDITION error.WHEN NOT MATCHED BY SOURCE [ ANDnot_matched_by_source_condition]Applies to:
Databricks SQL
Databricks Runtime 12.2 LTS and aboveWHEN NOT MATCHED BY SOURCEclauses are executed when a target row does not match any rows in the source table based on themerge_conditionand the optionalnot_match_by_source_conditionevaluates to true.not_matched_by_source_conditionmust be a Boolean expression that only references columns from the target table.not_matched_by_source_action
DELETEDeletes the target table row.
UPDATEUpdates the target table row.
exprmay only reference columns from the target table, otherwise the query will throw an analysis error.Applies to:
Databricks SQL
Databricks Runtime 11.3 LTS and aboveYou can specify
DEFAULTasexprto explicitly update the column to its default value.
Important
Adding a
WHEN NOT MATCHED BY SOURCEclause to update or delete target rows when themerge_conditionevaluates to false can lead to a large number of target rows being modified. For best performance, applynot_matched_by_source_conditions to limit the number of target rows updated or deleted.If there are multiple
WHEN NOT MATCHED BY SOURCE clauses, then they are evaluated in the order they are specified. EachWHEN NOT MATCHED BY SOURCEclause, except the last one, must have anot_matched_by_source_condition. Otherwise, the query returns a NON_LAST_NOT_MATCHED_BY_SOURCE_CLAUSE_OMIT_CONDITION error.If none of the
WHEN NOT MATCHED BY SOURCEconditions evaluate to true for a target row that doesn't match any rows in the source table based on themerge_condition, then the target row is left unchanged.
Important
MERGE operations fail with a DELTA_MULTIPLE_SOURCE_ROW_MATCHING_TARGET_ROW_IN_MERGE error if more than one row in the source table matches the same row in the target table based on the conditions specified in the ON and WHEN MATCHED clauses. According to the SQL semantics of merge, this type of update operation is ambiguous because it is unclear which source row should be used to update the matched target row. You can preprocess the source table to eliminate the possibility of multiple matches. See the change data capture example. This example preprocesses the change dataset (the source dataset) to retain only the latest change for each key before applying that change into the target Delta table. In Databricks Runtime 15.4 LTS and below, MERGE only considers conditions in the ON clause before evaluating multiple matches.
Examples
You can use MERGE INTO for complex operations like deduplicating data, upserting change data, applying SCD Type 2 operations, etc. See Upsert into a Delta Lake table using merge for a few examples.
WHEN MATCHED
-- Delete all target rows that have a match in the source table.
> MERGE INTO target USING source
ON target.key = source.key
WHEN MATCHED THEN DELETE
-- Conditionally update target rows that have a match in the source table using the source value.
> MERGE INTO target USING source
ON target.key = source.key
WHEN MATCHED AND target.updated_at < source.updated_at THEN UPDATE SET *
-- Multiple MATCHED clauses conditionally deleting matched target rows and updating two columns for all other matched rows.
> MERGE INTO target USING source
ON target.key = source.key
WHEN MATCHED AND target.marked_for_deletion THEN DELETE
WHEN MATCHED THEN UPDATE SET target.updated_at = source.updated_at, target.value = DEFAULT
WHEN NOT MATCHED [BY TARGET]
-- Insert all rows from the source that are not already in the target table.
> MERGE INTO target USING source
ON target.key = source.key
WHEN NOT MATCHED THEN INSERT *
-- Conditionally insert new rows in the target table using unmatched rows from the source table.
> MERGE INTO target USING source
ON target.key = source.key
WHEN NOT MATCHED BY TARGET AND source.created_at > now() - INTERVAL “1” DAY THEN INSERT (created_at, value) VALUES (source.created_at, DEFAULT)
WHEN NOT MATCHED BY SOURCE
-- Delete all target rows that have no matches in the source table.
> MERGE INTO target USING source
ON target.key = source.key
WHEN NOT MATCHED BY SOURCE THEN DELETE
-- Multiple NOT MATCHED BY SOURCE clauses conditionally deleting unmatched target rows and updating two columns for all other matched rows.
> MERGE INTO target USING source
ON target.key = source.key
WHEN NOT MATCHED BY SOURCE AND target.marked_for_deletion THEN DELETE
WHEN NOT MATCHED BY SOURCE THEN UPDATE SET target.value = DEFAULT
WITH SCHEMA EVOLUTION
-- Multiple MATCHED and NOT MATCHED clauses with schema evolution enabled.
> MERGE WITH SCHEMA EVOLUTION INTO target USING source
ON source.key = target.key
WHEN MATCHED THEN UPDATE SET *
WHEN NOT MATCHED THEN INSERT *
WHEN NOT MATCHED BY SOURCE THEN DELETE