Share via


Data quality monitoring results system table reference

Important

This feature is in Beta.

This page outlines the data quality monitoring results system table schema and includes sample queries. The table stores results of freshness and completeness checks, as well as downstream impact and root cause analysis, across all tables enabled for data quality monitoring in your metastore.

Table path: system.data_quality_monitoring.table_results

Only account admins can access this table, and they must grant access to others as needed. The system table uses default storage. Because it contains sample values and downstream usage data, use caution when granting access to others.

Data quality monitoring results table schema

The system.data_quality_monitoring.table_results table uses the following schema:

Column name Contents (for struct data type) Data type Description Example data
event_time timestamp Time when the row was generated. 2025-06-27T12:00:00
catalog_name string Name of the catalog. Used to identify the table. main
schema_name string Name of the schema. Used to identify the table. default
table_name string Name of the table. Used to identify the table. events
catalog_id string Stable ID for the catalog. 3f1a7d6e-9c59-4b76-8c32-8d4c74e289fe
schema_id string Stable ID for the schema. 3f1a7d6e-9c59-4b76-8c32-8d4c74e289fe
table_id string Stable ID for the table. 3f1a7d6e-9c59-4b76-8c32-8d4c74e289fe
status string Consolidated health status at the table level. "Unhealthy" if any check or group is unhealthy. Healthy, Unhealthy, Unknown
freshness struct Freshness checks.
status string Overall freshness status. Unhealthy
commit_freshness struct Commit freshness check results.
completeness struct Completeness check results.
status string Status of completeness check. Unhealthy
total_row_count struct Total number of rows in the table over time.
daily_row_count struct Number of rows added each day.
downstream_impact struct Summary of downstream impact based on dependency graph.
impact_level int Severity indicator (0 = none, 1 = low, 2 = medium, 3 = high, 4 = very high). 2
num_downstream_tables int Number of downstream tables affected. 5
num_queries_on_affected_tables int Number of queries run on affected downstream tables over the last 30 days. 120
root_cause_analysis struct Information about upstream jobs contributing to the issue.
upstream_jobs array Metadata for each upstream job.

commit_freshness array structure

The commit_freshness struct contains the following:

Item name Data type Description Example data
status string Status of commit freshness check. Unhealthy
error_code string Error message encountered during check. FAILED_TO_FIT_MODEL
last_value timestamp Last commit timestamp. 2025-06-27T11:30:00
predicted_value timestamp Predicted time by which the table should have been updated. 2025-06-27T11:45:00

total_row_count and daily_row_count array structure

The total_row_count and daily_row_count structs contain the following:

Item name Data type Description Example data
status string Status of the check. Unhealthy
error_code string Error message encountered during check. FAILED_TO_FIT_MODEL
last_value int Number of rows observed in the last 24 hours. 500
min_predicted_value int Minimum expected number of rows in the last 24 hours. 10
max_predicted_value int Maximum expected number of rows in the last 24 hours. 1000

upstream_jobs array structure

The structure of the array shown in the upstream_jobs column is shown in the following table:

Item name Data type Description Example data
job_id string Job ID. 12345
workspace_id string Workspace ID. 6051921418418893
job_name string Job display name. daily_refresh
last_run_status string Status of the most recent run. SUCCESS
run_page_url string URL of Databricks job run page. https://.../runs/123

Downstream impact information

In the logged results table, the column downstream_impact is a struct with the following fields:

Field Type Description
impact_level int Integer value between 1 and 4 indicating the severity of the data quality issue. Higher values indicate greater disruption.
num_downstream_tables int Number of downstream tables that might be affected by the identified issue.
num_queries_on_affected_tables int Total number of queries that have referenced the affected and downstream tables in the past 30 days.

Example queries

Replace parameter values with your own before running.

Get all current incidents in a schema

WITH latest_rows AS (
    SELECT
        *,
        ROW_NUMBER() OVER (
            PARTITION BY table_id
            ORDER BY event_time DESC
        ) AS rn
    FROM
      system.data_quality_monitoring.table_results
    WHERE
      catalog_name = "c"
      AND schema_name = "s"
)

SELECT *
FROM latest_rows
WHERE
  rn = 1
  AND status = "Unhealthy"

Get all incident tables in a schema that have a high downstream impact

WITH latest_rows AS (
    SELECT
        *,
        ROW_NUMBER() OVER (
            PARTITION BY table_id
            ORDER BY event_time DESC
        ) AS rn
    FROM
      system.data_quality_monitoring.table_results
    WHERE
      catalog_name = "c"
      AND schema_name = "s"
)
SELECT *
FROM latest_rows
WHERE rn = 1
  AND downstream_impact.impact_level >= 3

Get all tables in a schema that are currently impacted by a freshness issue

WITH latest_rows AS (
    SELECT
        *,
        ROW_NUMBER() OVER (
            PARTITION BY table_id
            ORDER BY event_time DESC
        ) AS rn
    FROM
      system.data_quality_monitoring.table_results
    WHERE
      catalog_name = "c"
      AND schema_name = "s"
)

SELECT *
FROM latest_rows
WHERE rn = 1
  AND freshness.status = "Unhealthy"

Get all historical records for a table

SELECT *
FROM system.data_quality_monitoring.table_results
WHERE
  catalog_name = "c"
  AND schema_name = "s"
  AND table_name = "t"