Anteckning
Åtkomst till den här sidan kräver auktorisering. Du kan prova att logga in eller ändra kataloger.
Åtkomst till den här sidan kräver auktorisering. Du kan prova att ändra kataloger.
Materialized views for dedicated SQL pools in Azure Synapse provide a low maintenance method for complex analytical queries to get fast performance without any query change. I den här artikeln beskrivs den allmänna vägledningen om hur du använder materialiserade vyer.
Materialiserade vyer jämfört med standardvyer
Dedicated SQL pool in Azure Synapse supports standard and materialized views. Båda är virtuella tabeller som skapas med SELECT-uttryck och presenteras för frågor som logiska tabeller. Views encapsulate the complexity of common data computation and add an abstraction layer to computation changes so there's no need to rewrite queries.
En standardvy beräknar sina data varje gång vyn används. Det finns inga data lagrade på disken. People typically use standard views as a tool that helps organize the logical objects and queries in a dedicated SQL pool. Om du vill använda en standardvy måste en fråga referera direkt till den.
A materialized view pre-computes, stores, and maintains its data in a dedicated SQL pool just like a table. There's no recomputation needed each time a materialized view is used. That's why queries that use all or a subset of the data in materialized views can get faster performance. Ännu bättre är att frågor kan använda en materialiserad vy utan att direkt referera till den, så det finns inget behov av att ändra programkoden.
Most of the requirements on a standard view still apply to a materialized view. For details on the materialized view syntax and other requirements, refer to CREATE MATERIALIZED VIEW AS SELECT
| Jämförelse | Utsikt | Materialiserad synpunkt |
|---|---|---|
| View definition | Stored in dedicated SQL pool. | Stored in dedicated SQL pool. |
| Visa innehåll | Genereras varje gång vyn används. | Pre-processed and stored in dedicated SQL pool during view creation. Uppdateras när data läggs till i de underliggande tabellerna. |
| Data refresh | Alltid uppdaterad | Alltid uppdaterad |
| Hastighet för att hämta visningsdata från komplexa frågor | Långsam | Snabbt |
| Extra lagringsutrymme | Nej | Ja |
| Syntax | CREATE VIEW | CREATE MATERIALIZED VIEW AS SELECT |
Benefits of using materialized views
A properly designed materialized view provides the following benefits:
- Reduce the execution time for complex queries with JOINs and aggregate functions. Ju mer komplex frågan är, desto större är potentialen för att spara körtid. Den största fördelen uppnås när en frågas beräkningskostnad är hög och den resulterande datauppsättningen är liten.
- The optimizer in dedicated SQL pool can automatically use deployed materialized views to improve query execution plans. Den här processen är transparent för användare som ger snabbare frågeprestanda och kräver inte frågor för att referera direkt till de materialiserade vyerna.
- Require low maintenance on the views. All incremental data changes from the base tables are automatically added to the materialized views in a synchronous manner, meaning both the base tables and the materialized views are updated in the same transaction. Med den här designen kan sökningar mot materialiserade vyer returnera samma data som när du söker i bas-tabellerna direkt.
- Data i en materialiserad vy kan distribueras på ett annat sätt än bastabellerna.
- Data i materialiserade vyer får samma fördelar med hög tillgänglighet och återhämtning som data i vanliga tabeller.
The materialized views implemented in dedicated SQL pool also provide the following benefits:
Compared to other data warehouse providers, the materialized views implemented in dedicated SQL pool also provide the following benefits:
- Brett stöd för mängdfunktioner. See CREATE MATERIALIZED VIEW AS SELECT (Transact-SQL).
- Stöd för frågespecifik materialiserad vyrekommendering. Se EXPLAIN (Transact-SQL).
- Automatic and synchronous data refresh with data changes in base tables. Användaren behöver inte göra någonting.
Vanliga scenarier
Materialized views are typically used in the following scenarios:
Behöver förbättra prestandan för komplexa analysfrågor mot stora data i storlek
Complex analytical queries typically use more aggregate functions and table joins, causing more compute-heavy operations such as shuffles and joins in query execution. That's why complex analytical queries take longer to complete, especially on large tables.
Users can create materialized views for the data returned from common computations of queries, so there's no recomputation needed when this data is needed by queries, allowing lower compute cost and faster query response.
Behöver snabbare prestanda utan eller minsta frågeändringar
Schema and query changes in dedicated SQL pools are typically kept to a minimum to support regular ETL operations and reporting. People can use materialized views for query performance tuning, if the cost incurred by the views can be offset by the gain in query performance.
In comparison to other tuning options such as scaling and statistics management, it's a less impactful production change to create and maintain a materialized view and its potential performance gain is also higher.
- Creating or maintaining materialized views does not impact the queries running against the base tables.
- Frågeoptimeraren kan automatiskt använda de distribuerade materialiserade vyerna utan direkt vyreferens i en fråga. Den här funktionen minskar behovet av frågeändring vid prestandajustering.
Behöver olika datadistributionsstrategi för snabbare frågeprestanda
Dedicated SQL pool is a distributed query processing system. Data in a SQL table is distributed upto 60 nodes using one of three distribution strategies (hash, round_robin, or replicated).
The data distribution is specified at the table creation time and stays unchanged until the table is dropped. Materialized view, being a virtual table on disk, supports hash and round_robin data distributions. Users can choose a data distribution that is different from the base tables but optimal for the performance of queries that use the views.
Riktlinjer för design
Här är den allmänna vägledningen om hur du använder materialiserade vyer för att förbättra frågeprestanda:
Design för din arbetsbelastning
Innan du börjar skapa materialiserade vyer är det viktigt att du har en djup förståelse för din arbetsbelastning när det gäller frågemönster, prioritet, frekvens och storleken på resulterande data.
Users can run EXPLAIN WITH_RECOMMENDATIONS <SQL_statement> for the materialized views recommended by the query optimizer. Eftersom dessa rekommendationer är frågespecifika kanske en materialiserad vy som gynnar en enskild fråga inte är optimal för andra frågor i samma arbetsbelastning.
Utvärdera dessa rekommendationer med dina arbetsbelastningsbehov i åtanke. De ideala materialiserade vyerna är de som gynnar arbetsbelastningens prestanda.
Var medveten om kompromissen mellan snabbare frågor och kostnaden
För varje materialiserad vy finns det en datalagringskostnad och en kostnad för att underhålla vyn. As data changes in base tables, the size of the materialized view increases and its physical structure also changes. To avoid query performance degradation, each materialized view is maintained separately by the SQL engine.
The maintenance workload gets higher when the number of materialized views and base table changes increase. Användarna bör kontrollera om kostnaden för alla materialiserade vyer kan uppvägas av förbättringen i frågeprestanda.
You can run this query to generate a list of materialized views in a dedicated SQL pool:
SELECT V.name as materialized_view, V.object_id
FROM sys.views V
JOIN sys.indexes I ON V.object_id= I.object_id AND I.index_id < 2;
Alternativ för att minska antalet materialiserade vyer:
Identifiera vanliga datauppsättningar som ofta används av komplexa frågor i din arbetsbelastning. Skapa materialiserade vyer för att lagra dessa datauppsättningar så att optimeraren kan använda dem som byggblock vid skapandet av exekveringsplaner.
Släpp de materialiserade vyer som har låg användning eller som inte längre behövs. A disabled materialized view is not maintained but it still incurs storage cost.
Combine materialized views created on the same or similar base tables even if their data doesn't overlap. Att kombinera materialiserade vyer kan resultera i en större vy i storlek än summan av de separata vyerna, men kostnaden för visningsunderhåll bör minska. Till exempel:
-- Query 1 would benefit from having a materialized view created with this SELECT statement
SELECT A, SUM(B)
FROM T
GROUP BY A
-- Query 2 would benefit from having a materialized view created with this SELECT statement
SELECT C, SUM(D)
FROM T
GROUP BY C
-- You could create a single materialized view of this form
SELECT A, C, SUM(B), SUM(D)
FROM T
GROUP BY A, C
Alla prestandaoptimeringar kräver inte ändringar i frågor
The SQL query optimizer can automatically use deployed materialized views to improve query performance. This support is applied transparently to queries that don't reference the views and queries that use aggregates unsupported in materialized views creation. Ingen frågeändring krävs. You can check a query's estimated execution plan to confirm if a materialized view is used.
Monitor materialized views
A materialized view is stored in the dedicated SQL pool just like a table with a clustered columnstore index (CCI). Reading data from a materialized view includes scanning the CCI index segments and applying any incremental changes from base tables. When the number of incremental changes is too high, resolving a query from a materialized view can take longer than directly querying the base tables.
To avoid query performance degradation, it's a good practice to run DBCC PDW_SHOWMATERIALIZEDVIEWOVERHEAD to monitor the view's overhead_ratio (total_rows / max(1, base_view_row)). Users should REBUILD the materialized view if its overhead_ratio is too high.
Materialized view and result set caching
These two features in dedicated SQL pool are used for query performance tuning. Result set caching is used for getting high concurrency and fast response from repetitive queries against static data.
Om du vill använda det cachelagrade resultatet måste formatet för cachebegärandefrågan matcha med frågan som skapade cacheminnet. In addition, the cached result must apply to the entire query.
Materialiserade vyer tillåter dataändringar i bastabellerna. Data i materialiserade vyer kan tillämpas på en del av en fråga. Det här stödet gör att samma materialiserade vyer kan användas av olika frågor som delar viss beräkning för snabbare prestanda.
Exempel
This example uses a TPCDS-like query that finds customers who spend more money via catalog than in stores, identify the preferred customers and their country/region of origin. The query involves selecting TOP 100 records from the UNION of three sub-SELECT statements involving SUM() and GROUP BY.
WITH year_total AS (
SELECT c_customer_id customer_id
,c_first_name customer_first_name
,c_last_name customer_last_name
,c_preferred_cust_flag customer_preferred_cust_flag
,c_birth_country customer_birth_country
,c_login customer_login
,c_email_address customer_email_address
,d_year dyear
,sum(isnull(ss_ext_list_price-ss_ext_wholesale_cost-ss_ext_discount_amt+ss_ext_sales_price, 0)/2) year_total
,'s' sale_type
FROM customer
,store_sales
,date_dim
WHERE c_customer_sk = ss_customer_sk
AND ss_sold_date_sk = d_date_sk
GROUP BY c_customer_id
,c_first_name
,c_last_name
,c_preferred_cust_flag
,c_birth_country
,c_login
,c_email_address
,d_year
UNION ALL
SELECT c_customer_id customer_id
,c_first_name customer_first_name
,c_last_name customer_last_name
,c_preferred_cust_flag customer_preferred_cust_flag
,c_birth_country customer_birth_country
,c_login customer_login
,c_email_address customer_email_address
,d_year dyear
,sum(isnull(cs_ext_list_price-cs_ext_wholesale_cost-cs_ext_discount_amt+cs_ext_sales_price, 0)/2) year_total
,'c' sale_type
FROM customer
,catalog_sales
,date_dim
WHERE c_customer_sk = cs_bill_customer_sk
AND cs_sold_date_sk = d_date_sk
GROUP BY c_customer_id
,c_first_name
,c_last_name
,c_preferred_cust_flag
,c_birth_country
,c_login
,c_email_address
,d_year
UNION ALL
SELECT c_customer_id customer_id
,c_first_name customer_first_name
,c_last_name customer_last_name
,c_preferred_cust_flag customer_preferred_cust_flag
,c_birth_country customer_birth_country
,c_login customer_login
,c_email_address customer_email_address
,d_year dyear
,sum(isnull(ws_ext_list_price-ws_ext_wholesale_cost-ws_ext_discount_amt+ws_ext_sales_price, 0)/2) year_total
,'w' sale_type
FROM customer
,web_sales
,date_dim
WHERE c_customer_sk = ws_bill_customer_sk
AND ws_sold_date_sk = d_date_sk
GROUP BY c_customer_id
,c_first_name
,c_last_name
,c_preferred_cust_flag
,c_birth_country
,c_login
,c_email_address
,d_year
)
SELECT TOP 100
t_s_secyear.customer_id
,t_s_secyear.customer_first_name
,t_s_secyear.customer_last_name
,t_s_secyear.customer_birth_country
FROM year_total t_s_firstyear
,year_total t_s_secyear
,year_total t_c_firstyear
,year_total t_c_secyear
,year_total t_w_firstyear
,year_total t_w_secyear
WHERE t_s_secyear.customer_id = t_s_firstyear.customer_id
AND t_s_firstyear.customer_id = t_c_secyear.customer_id
AND t_s_firstyear.customer_id = t_c_firstyear.customer_id
AND t_s_firstyear.customer_id = t_w_firstyear.customer_id
AND t_s_firstyear.customer_id = t_w_secyear.customer_id
AND t_s_firstyear.sale_type = 's'
AND t_c_firstyear.sale_type = 'c'
AND t_w_firstyear.sale_type = 'w'
AND t_s_secyear.sale_type = 's'
AND t_c_secyear.sale_type = 'c'
AND t_w_secyear.sale_type = 'w'
AND t_s_firstyear.dyear+0 = 1999
AND t_s_secyear.dyear+0 = 1999+1
AND t_c_firstyear.dyear+0 = 1999
AND t_c_secyear.dyear+0 = 1999+1
AND t_w_firstyear.dyear+0 = 1999
AND t_w_secyear.dyear+0 = 1999+1
AND t_s_firstyear.year_total > 0
AND t_c_firstyear.year_total > 0
AND t_w_firstyear.year_total > 0
AND CASE WHEN t_c_firstyear.year_total > 0 THEN t_c_secyear.year_total / t_c_firstyear.year_total ELSE NULL END
> CASE WHEN t_s_firstyear.year_total > 0 THEN t_s_secyear.year_total / t_s_firstyear.year_total ELSE NULL END
AND CASE WHEN t_c_firstyear.year_total > 0 THEN t_c_secyear.year_total / t_c_firstyear.year_total ELSE NULL END
> CASE WHEN t_w_firstyear.year_total > 0 THEN t_w_secyear.year_total / t_w_firstyear.year_total ELSE NULL END
ORDER BY t_s_secyear.customer_id
,t_s_secyear.customer_first_name
,t_s_secyear.customer_last_name
,t_s_secyear.customer_birth_country
OPTION ( LABEL = 'Query04-af359846-253-3');
Check the query's estimated execution plan. There are 18 shuffles and 17 joins operations, which take more time to execute. Now let's create one materialized view for each of the three sub-SELECT statements.
CREATE materialized view nbViewSS WITH (DISTRIBUTION=HASH(customer_id)) AS
SELECT c_customer_id customer_id
,c_first_name customer_first_name
,c_last_name customer_last_name
,c_preferred_cust_flag customer_preferred_cust_flag
,c_birth_country customer_birth_country
,c_login customer_login
,c_email_address customer_email_address
,d_year dyear
,sum(isnull(ss_ext_list_price-ss_ext_wholesale_cost-ss_ext_discount_amt+ss_ext_sales_price, 0)/2) year_total
, count_big(*) AS cb
FROM dbo.customer
,dbo.store_sales
,dbo.date_dim
WHERE c_customer_sk = ss_customer_sk
AND ss_sold_date_sk = d_date_sk
GROUP BY c_customer_id
,c_first_name
,c_last_name
,c_preferred_cust_flag
,c_birth_country
,c_login
,c_email_address
,d_year
GO
CREATE materialized view nbViewCS WITH (DISTRIBUTION=HASH(customer_id)) AS
SELECT c_customer_id customer_id
,c_first_name customer_first_name
,c_last_name customer_last_name
,c_preferred_cust_flag customer_preferred_cust_flag
,c_birth_country customer_birth_country
,c_login customer_login
,c_email_address customer_email_address
,d_year dyear
,sum(isnull(cs_ext_list_price-cs_ext_wholesale_cost-cs_ext_discount_amt+cs_ext_sales_price, 0)/2) year_total
, count_big(*) as cb
FROM dbo.customer
,dbo.catalog_sales
,dbo.date_dim
WHERE c_customer_sk = cs_bill_customer_sk
AND cs_sold_date_sk = d_date_sk
GROUP BY c_customer_id
,c_first_name
,c_last_name
,c_preferred_cust_flag
,c_birth_country
,c_login
,c_email_address
,d_year
GO
CREATE materialized view nbViewWS WITH (DISTRIBUTION=HASH(customer_id)) AS
SELECT c_customer_id customer_id
,c_first_name customer_first_name
,c_last_name customer_last_name
,c_preferred_cust_flag customer_preferred_cust_flag
,c_birth_country customer_birth_country
,c_login customer_login
,c_email_address customer_email_address
,d_year dyear
,sum(isnull(ws_ext_list_price-ws_ext_wholesale_cost-ws_ext_discount_amt+ws_ext_sales_price, 0)/2) year_total
, count_big(*) AS cb
FROM dbo.customer
,dbo.web_sales
,dbo.date_dim
WHERE c_customer_sk = ws_bill_customer_sk
AND ws_sold_date_sk = d_date_sk
GROUP BY c_customer_id
,c_first_name
,c_last_name
,c_preferred_cust_flag
,c_birth_country
,c_login
,c_email_address
,d_year
Check the execution plan of the original query again. Now the number of joins changes from 17 to 5 and there's no shuffle. Select the Filter operation icon in the plan, its Output List shows the data is read from the materialized views instead of the base tables.
With materialized views, the same query runs faster without a code change.
Nästa steg
For more development tips, see Dedicated SQL pool development overview.