Edit

Share via


Frequently asked questions about preparing data for AI

Note

You can now author Prep data for AI features in both the Power BI service and Power BI Desktop. Users can consume these features everywhere that Copilot exists.

Tooling features

What features does Power BI currently have to help me prepare my data for Copilot?

Today, Power BI offers four main tooling features to configure your model to be ready for natural language processing:

  • AI data schemas: Allows you to select a subset of a schema for Copilot consumption.
  • Verified answers: A configured response that a model author sets that's validated for accuracy and reliability. Authors can set specific visuals for Copilot to use in a verified answer when a user asks a question that falls into the assigned category.
  • AI instructions: Instructions you can set on your model to provide more context on the data in the model. They help guide Copilot to understand when to focus on what data, and help Copilot understand certain mappings that language users might use when they interact with Copilot.
  • Descriptions: Descriptions set on tables and columns to provide more detail on context on the data. Descriptions are only used in Data Analysis Expressions (DAX) queries and Copilot search capabilities.

In what order should I implement Copilot in Power BI tooling features?

To get the most value from Copilot in Power BI, we suggest implementing its tooling features in the following sequence:

  1. Define the AI data schema.

    Start by selecting the specific tables, fields, and measures that Copilot should reference when it answers data questions.

    During model development, you might include elements that aren't relevant for end-user queries. When you narrow the schema, you help Copilot focus on the most meaningful parts of your model, which reduces ambiguity. This practice is important for large datasets that have overlapping or similarly named fields.

    Here's an example of how AI data schemas can help Copilot focus on the right data:

    When you use the entire schema, Copilot doesn't always know what the user means when they say sales. In this case, Copilot returned gross profit margin (GPM), which is a legitimate interpretation of sales, but not the metric that this team typically uses to analyze sales.

    The model author uses the Prep data for AI feature and removes the Total GPM measure from the schema that's passed to Copilot.

    Now, when the user asks the same question, Copilot has more clarity on where to get the answer from and correctly interprets sales by the team's definition and measurement.

    Screenshot that shows an example of how users can refine an AI data schema to help Copilot focus on the correct data for user queries.

  2. Create verified answers.

    Set up verified answers for common or nuanced questions that users might ask.

    Select a visual, and then select Create verified answer. Then add trigger phrases that reflect how users are likely to phrase their questions. When users enter a matching or similar phrase in Copilot, it returns the trusted visual. This process helps to ensure consistent, high-quality responses across reports.

    The following example shows the benefit of a verified answer. The user asks for sales by area. Copilot interprets area as product area and returns a list of products and their sales. However, the user was looking for sales by region or location.

    The model author sets a verified answer by using a visual that includes sales by region. Then, the author includes trigger phrases that, when a user asks them, should return this specific visual response.

    Now, when the user asks for sales by area, Copilot returns the verified answer that the model author approves.

    Screenshot that shows an example of how verified answers improve the accuracy of Copilot responses to user queries.

  3. Add AI instructions.

    After you define the schema and verified answers, you can use AI instructions to guide the behavior of Copilot at the model level.

    Instructions help clarify business logic, map user terminology to model fields, and tell Copilot how to interpret or analyze specific types of data. They're helpful because they provide context that Copilot wouldn't infer on its own.

    The following example shows how you can use AI instructions to provide more context to Copilot. The user asks about sales during busy season of 2012. Busy season is a well-defined, commonly used phrase within this organization. However, the semantic model has no indication of this term anywhere. The model author sets an instruction that defines busy season as June to August.

    Now, when the user asks the question about sales during busy season, Copilot understands this defined term and can provide the response.

    Screenshot that shows an example of how AI instructions provide additional context to Copilot for interpreting user queries.

  4. Add descriptions to tables and columns.

    Descriptions provide extra metadata that Copilot can use to understand your model.

    Although descriptions currently influence only some Copilot behaviors, they'll play a larger role in future capabilities. Adding them now helps build a strong foundation for long-term success with natural language interactions in Power BI.

Can I create tooling on a report instead of on the model?

Today, tooling and configuration features are only available on the model. Configuring different reports built off the same model isn't yet supported. The schema, verified answers, instructions, and descriptions are set on the semantic model, but not on the report.

When I prepare my data for Copilot, which capabilities are affected?

Refer to the following table:

Capability AI data schemas Verified answers AI instructions Descriptions
Get a summary of my report No No Yes No
Ask a question about the visuals on my report No Yes Yes No
Ask a question about my semantic model Yes Yes Yes No
Create a report page No No Yes No
Search No Yes No Yes
DAX query No No Yes Yes

Know which feature to use

I'm trying to get Copilot to select the right field. Which feature should I use?

  1. Define your AI data schema.

    Remove any tables, columns, or fields that are irrelevant to your users' needs. This action helps Copilot focus on the most relevant parts of your model, and helps to ensure that it selects the right fields when it responds to queries.

  2. Use verified answers for visuals in reports.

    If Copilot can use a visual in your report to derive an answer to a question, create a verified answer. This practice helps to ensure that Copilot consistently returns the correct visual when users ask questions with specific trigger phrases.

  3. Customize instructions for specific fields.

    After you set the schema and verified answers, you can use AI instructions to guide Copilot when selecting particular fields. We recommend that you use instructions for fine tuning and for advanced scenarios, after other Prep data for AI features are set. By using this sequence of steps, you can help ensure that Copilot returns the most accurate and contextually relevant results to users, guided by your model's structure and your defined instructions.

I'm trying to get Copilot to understand the term I'm using. Which feature should I use?

If Copilot is struggling to understand a term that always has the same single correct item to reference in your model, you can provide an alternative name through AI instructions.

For example, if your team calls the people who sell your products closers, then you should provide a reference in your AI instructions. Set sellers to also be known as closers.

I'm trying to get Copilot to understand terms with conditions or groupings. Which feature should I use?

If your team uses certain terms that aren't an exact 1:1 match with tables or fields in your model, you can use AI instructions to help clarify different items with certain conditions or groupings.

For example, a sales team might classify anyone who sells more than 100% of their targets in any given month as high performers. They should provide the following instructions to Copilot:

High performer means a seller who meets 100% or more of their monthly target.

Now, when a user asks, "Who were the high performers last month?" Copilot knows exactly what high performer means in your team and organization.

In another example, an organization might classify seasons. Your team might call January to May slow season. June to September might be busy season. October to December might be standard season.

Within AI instructions, you could set the following definitions:

  • Slow season means January to May.
  • Busy season means June to September.
  • Standard season means October to December.

Now, when a user asks, "What were the total sales for busy season last year?" Copilot understands what timeframe the user means by busy season.

I'm trying to get Copilot to return the correct answer to the most commonly asked questions. Which feature should I use?

Consumers of your report and data likely ask some questions more frequently. You can address this occurrence by applying verified answers to your model. Apply a verified answer by selecting a visual and setting trigger phrases. When a user asks about the topic, Copilot returns information by using the visual assigned.

For example, consumers of the report and model might often ask, "What product had the highest sales last week?" You can set a verified answer to help Copilot to understand where it can find the right information. This method helps authors and consumers trust that the answer is correct.

I'm trying to get Copilot to return different answers based on the domains or user groups. Which feature should I use?

The capabilities as they exist today are limited to broad consumption. You can't currently create a glossary based on different groups, or define a term in two different ways. For example, engineers might define usage as number of times clicked, but product managers might define usage as paying customers in a given month. You can't currently give usage different definitions in the same model.

Prep data for AI

I get an error that says, "Copilot is currently syncing with the data model." What does this mean?

For Copilot to be able to perform at its best, it's critical that it understands the underlying data in the semantic model. One way that Copilot in Power BI tries to understand the underlying data is by indexing the semantic model to accurately search for relevant values to match on. This process allows Copilot to effectively answer questions based on the user's prompt.

Consider a dataset related to Hawaii tourism. To answer questions like, "How did weather affect tourist visits on Maui?" Copilot needs to understand that Maui is an instance value in the semantic model in the Island name column of the Island table.

For Copilot to effectively search these instance values, the semantic model is indexed when Power BI Q&A is enabled. It's reindexed when Power BI detects changes to the model.

Model indexing frequency

Indexing is done for all models that have the Q&A setting enabled.

Note

The Q&A setting is on by default for import and Direct Lake models. You can find more details about this setting in Q&A settings documentation.

Reindexing occurs when one of the following actions takes place:

  • For import models:
    • The model was published or republished to the service.
    • The model was refreshed via manual or scheduled refresh and Copilot and Q&A was used within the last 14 days.
  • For DirectQuery and Direct Lake models:
    • The model was published or republished to the service.
    • The index is older than 24 hours and Copilot and Q&A was used within the last 14 days.

The following message in Copilot indicates that the model is currently in the process of indexing. The message should automatically resolve after indexing finishes.

Screenshot that shows a Copilot message that indicates that the model is currently indexing.

Note

This error doesn't mean that Copilot isn't available to users. This message indicates that any new instance values added or changed in the model might not reflect in Copilot responses until the indexing activity finishes.

Indexing methodology

Text columns in the semantic model are the only columns that are indexed. Columns that are hidden in the AI schema through the Prep your data for AI feature aren't indexed.

Up to five million instance values are indexed with columns. The smallest cardinality is indexed first. DISTINCTCOUNT determines the column's cardinality for import models and COLUMNSTATISTICS determines the column's cardinality for DirectQuery models. For DirectQuery sources, the COLUMNSTATISTICS function uses the APPROXIMATEDISTINCTCOUNT function for underlying data sources that support it to efficiently determine approximate column cardinalities.

To further prevent overloading the underlying system for DirectQuery models with an influx of queries due to indexing, the results of COLUMNSTATISTICS are cached. The statistics are recomputed every seven days. During the indexing process, if the five million instance value upper bound would be crossed with the indexing of the next column, then the indexing of the column is skipped entirely.

If the indexing limit is reached, Copilot still answers based on the index it built, which doesn't include all instance values. Users see the following warning when the semantic model in question hits the indexing limit.

Screenshot that shows the Copilot message that indicates that the model is currently indexing.

Known limitations

  • Indexing has an upper bound limit of five million instance values or 1,000 model entities (tables/columns) for large semantic models.
  • Text values of 100+ characters aren't indexed.
  • DirectQuery models only index columns for data sources that support APPROXIMATEDISTINCTCOUNT.
  • Indexing for DirectQuery and Direct Lake models occurs once during a 24-hour time period unless the model is republished.
  • If the underlying semantic model refresh fails, the data index might be stale until the next successful semantic model refresh.
  • The first data index generation for the semantic model might be delayed by 15 minutes to allow for backend activities to generate the index.