Data quality monitoring results system table reference

2025-10-07

Important

This feature is in Beta.

This page outlines the data quality monitoring results system table schema and includes sample queries. The table stores results of freshness and completeness checks, as well as downstream impact and root cause analysis, across all tables enabled for data quality monitoring in your metastore.

Table path: system.data_quality_monitoring.table_results

Only account admins can access this table, and they must grant access to others as needed. The system table uses default storage. Because it contains sample values and downstream usage data, use caution when granting access to others.

Data quality monitoring results table schema

The system.data_quality_monitoring.table_results table uses the following schema:

Column name	Contents (for `struct` data type)	Data type	Description	Example data
`event_time`		timestamp	Time when the row was generated.	`2025-06-27T12:00:00`
`catalog_name`		string	Name of the catalog. Used to identify the table.	`main`
`schema_name`		string	Name of the schema. Used to identify the table.	`default`
`table_name`		string	Name of the table. Used to identify the table.	`events`
`catalog_id`		string	Stable ID for the catalog.	`3f1a7d6e-9c59-4b76-8c32-8d4c74e289fe`
`schema_id`		string	Stable ID for the schema.	`3f1a7d6e-9c59-4b76-8c32-8d4c74e289fe`
`table_id`		string	Stable ID for the table.	`3f1a7d6e-9c59-4b76-8c32-8d4c74e289fe`
`status`		string	Consolidated health status at the table level. "Unhealthy" if any check or group is unhealthy.	`Healthy`, `Unhealthy`, `Unknown`
`freshness`		struct	Freshness checks.
	`status`	string	Overall freshness status.	`Unhealthy`
	`commit_freshness`	struct	Commit freshness check results.
`completeness`		struct	Completeness check results.
	`status`	string	Status of completeness check.	`Unhealthy`
	`total_row_count`	struct	Total number of rows in the table over time.
	`daily_row_count`	struct	Number of rows added each day.
`downstream_impact`		struct	Summary of downstream impact based on dependency graph.
	`impact_level`	int	Severity indicator (0 = none, 1 = low, 2 = medium, 3 = high, 4 = very high).	2
	`num_downstream_tables`	int	Number of downstream tables affected.	5
	`num_queries_on_affected_tables`	int	Number of queries run on affected downstream tables over the last 30 days.	120
`root_cause_analysis`		struct	Information about upstream jobs contributing to the issue.
	`upstream_jobs`	array	Metadata for each upstream job.

`commit_freshness` array structure

The commit_freshness struct contains the following:

Item name	Data type	Description	Example data
`status`	string	Status of commit freshness check.	`Unhealthy`
`error_code`	string	Error message encountered during check.	`FAILED_TO_FIT_MODEL`
`last_value`	timestamp	Last commit timestamp.	`2025-06-27T11:30:00`
`predicted_value`	timestamp	Predicted time by which the table should have been updated.	`2025-06-27T11:45:00`

`total_row_count` and `daily_row_count` array structure

The total_row_count and daily_row_count structs contain the following:

Item name	Data type	Description	Example data
`status`	string	Status of the check.	`Unhealthy`
`error_code`	string	Error message encountered during check.	`FAILED_TO_FIT_MODEL`
`last_value`	int	Number of rows observed in the last 24 hours.	`500`
`min_predicted_value`	int	Minimum expected number of rows in the last 24 hours.	`10`
`max_predicted_value`	int	Maximum expected number of rows in the last 24 hours.	`1000`

`upstream_jobs` array structure

The structure of the array shown in the upstream_jobs column is shown in the following table:

Item name	Data type	Description	Example data
`job_id`	string	Job ID.	`12345`
`workspace_id`	string	Workspace ID.	`6051921418418893`
`job_name`	string	Job display name.	`daily_refresh`
`last_run_status`	string	Status of the most recent run.	`SUCCESS`
`run_page_url`	string	URL of Databricks job run page.	`https://.../runs/123`

Downstream impact information

In the logged results table, the column downstream_impact is a struct with the following fields:

Field	Type	Description
`impact_level`	int	Integer value between 1 and 4 indicating the severity of the data quality issue. Higher values indicate greater disruption.
`num_downstream_tables`	int	Number of downstream tables that might be affected by the identified issue.
`num_queries_on_affected_tables`	int	Total number of queries that have referenced the affected and downstream tables in the past 30 days.

Example queries

Replace parameter values with your own before running.

Get all current incidents in a schema

WITH latest_rows AS (
    SELECT
        *,
        ROW_NUMBER() OVER (
            PARTITION BY table_id
            ORDER BY event_time DESC
        ) AS rn
    FROM
      system.data_quality_monitoring.table_results
    WHERE
      catalog_name = "c"
      AND schema_name = "s"
)

SELECT *
FROM latest_rows
WHERE
  rn = 1
  AND status = "Unhealthy"

Get all incident tables in a schema that have a high downstream impact

WITH latest_rows AS (
    SELECT
        *,
        ROW_NUMBER() OVER (
            PARTITION BY table_id
            ORDER BY event_time DESC
        ) AS rn
    FROM
      system.data_quality_monitoring.table_results
    WHERE
      catalog_name = "c"
      AND schema_name = "s"
)
SELECT *
FROM latest_rows
WHERE rn = 1
  AND downstream_impact.impact_level >= 3

Get all tables in a schema that are currently impacted by a freshness issue

WITH latest_rows AS (
    SELECT
        *,
        ROW_NUMBER() OVER (
            PARTITION BY table_id
            ORDER BY event_time DESC
        ) AS rn
    FROM
      system.data_quality_monitoring.table_results
    WHERE
      catalog_name = "c"
      AND schema_name = "s"
)

SELECT *
FROM latest_rows
WHERE rn = 1
  AND freshness.status = "Unhealthy"

Get all historical records for a table

SELECT *
FROM system.data_quality_monitoring.table_results
WHERE
  catalog_name = "c"
  AND schema_name = "s"
  AND table_name = "t"

Feedback

Was this page helpful?