Share via


Data classification system table reference

Important

This feature is in Public Preview.

This page outlines the data classification results table schema and includes sample queries. The table stores detections for sensitive data classes at the column level across enabled catalogs in your metastore.

Table path: system.data_classification.results

Data classification results table schema

The data classification results system table uses the following schema:

Column name Data type Description Example
latest_detected_time timestamp As-of time when the column was most recently scanned. 2025-06-27T12:34
first_detected_time timestamp Time when the column detection was first recorded. 2025-06-27T12:34
catalog_id string ID of the catalog. 3f1a7d6e-9c59-...
table_id string ID of the table. 3f1a7d6e-9c59-...
catalog_name string Catalog name. main_catalog
schema_name string Schema name. public
table_name string Table name. sales_data
column_name string Column name. customer_email
data_type string Data type of the column. Complex types include full struct definitions. struct<name:string, age:int>
class_tag string Tag for the detected entity or tag key and optional value. class.us_ssn or pii: confidential
samples array<string> Up to five sample values that matched the detection. ["a@b.com", ...]
confidence string Confidence of detection. Either HIGH or LOW HIGH
frequency float Estimation of the proportion of matching rows in the sample. Between 0 and 1. 0.87

Example queries

Replace parameter values with your own before running.

Get all detections for a table

SELECT *
FROM system.data_classification.results
WHERE
  catalog_name = "c"
  AND schema_name = "s"
  AND table_name = "t";

Get all high confidence detections

SELECT *
FROM system.data_classification.results
WHERE
  catalog_name = "c"
  AND schema_name = "s"
  AND table_name = "t"
  AND confidence = "HIGH";

Get number of tables impacted by a specific classification

SELECT
  class_tag,
  COUNT(DISTINCT catalog_name, schema_name, table_name) AS num_tables
FROM
  system.data_classification.results
WHERE
  class_tag IS NOT NULL
GROUP BY class_tag;

Get number of users who queried tables with sensitive data in last 30 days

WITH table_accesses AS (
  SELECT
    IFNULL(
      request_params.full_name_arg,
      CONCAT(request_params.catalog_name, '.', request_params.schema_name, '.', request_params.name)
    ) AS table_full_name,
    COUNT(DISTINCT user_identity.email) AS num_users
  FROM
    system.access.audit
  WHERE
    action_name IN ("createTable", "getTable", "updateTable", "deleteTable")
    AND (
      -- For performance, limit the blast radius of the audit log query to only the current catalog
      request_params.catalog_name = :catalog_name OR
      request_params.full_name_arg LIKE :catalog_name || '%'
    )
    AND event_time >= DATE_SUB(current_date(), 30)
  GROUP BY table_full_name
),
sensitive_tables AS (
  SELECT
    DISTINCT CONCAT(catalog_name, '.', schema_name, '.', table_name) AS table_full_name
  FROM
    system.data_classification.results
  WHERE class_tag IS NOT NULL
)

SELECT
  st.table_full_name,
  ta.num_users
FROM
  sensitive_tables st
  JOIN table_accesses ta
  ON st.table_full_name = ta.table_full_name