Share via


Column mask clause

Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime 12.2 LTS and above check marked yes Unity Catalog only

Specifies a function that is applied to a column whenever rows are fetched from the table. All subsequent queries from that column receive the result of evaluating that function over the column in place of the column's original value. This can be useful for fine-grained access control purposes where the function can inspect the identity or group memberships of the invoking user to determine whether to redact the value.

You can add column masks when you:

Important

The mask is applied as soon as each row is fetched from the data source. Any expressions, predicates, or ordering are applied after the masking. For example, joining between the masked column against another column from another table will use the masked values for the join comparison.

For more information on how to use column masks see Azure Databricks tables.

Syntax

MASK func_name [ USING COLUMNS ( other_column_name | constant_literal [, ...] ) ]

Parameters

  • func_name

    A scalar SQL UDF with at least one parameter.

    The first parameter of the SQL UDF maps 1:1 with the masked column. The masked column's type must be castable to the SQL UDF parameter type. If func_name requires more parameters, arguments must be provided by the USING COLUMNS clause.

    The return type of the function must be castable to the data type of the masked column.

  • other_column_name

    Optionally specifies additional columns of the masked column's table to pass to func_name. Each other_column_name must be castable to the corresponding parameter of func_name.

    Use a column mask to selectively anonymize the value of column_identifier based on the user executing a query against table_name, the value of column_identifier and the optional other_column.

  • constant_literal

    Specifies a constant parameter with the type matching a function parameter. The following types are supported: STRING, numeric (INTEGER, FLOAT, DOUBLE, DECIMAL …), BOOLEAN, INTERVAL, NULL.

Examples

You can find more examples in Azure Databricks tables.

-- Create a table with a masked column
> CREATE FUNCTION mask_ssn(ssn STRING) RETURN CASE WHEN is_member('HumanResourceDept') THEN ssn ELSE '***-**-****' END;
> CREATE TABLE persons(name STRING, ssn STRING MASK mask_ssn);
> INSERT INTO persons VALUES('James', '123-45-6789';

-- As a non-member of 'HumanResourceDept'
> SELECT * FROM persons;
  James  ***-**-****

-- As a member of 'HumanResourceDept'
> SELECT * FROM persons;
  James  123-45-6789

-- Create a masking function with multiple parameters. When applied first parameter will be the column that masking function applies to
> CREATE FUNCTION mask_pii_regional(value STRING, region STRING)
  RETURN IF(is_account_group_member(region || '_HumanResourceDept'), value, 'REDACTED');

-- Create a table with a masked column. Masking function first parameter will be the column that is masked.
-- The rest of the parameters should be specified in `USING COLUMNS (<columnList>)` clause
> CREATE TABLE persons(name STRING, address STRING MASK mask_pii_regional USING COLUMNS (region), region STRING);
> INSERT INTO persons('James', '160 Spear St, San Francisco', 'US')

-- As a non-member of 'US_HumanResourceDept'
> SELECT * FROM persons;
  James | REDACTED | US

-- As a member of 'US_HumanResourceDept'
> SELECT * FROM persons;
  James | 160 Spear St, San Francisco | US