Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Applies to:
Databricks SQL
Important
This feature is in Public Preview.
The vector_search() function allows you to query a Mosaic AI Vector Search index using SQL.
Requirements
- This function is not available on classic SQL warehouses.
- For more information, see Databricks SQL pricing page.
- This function is available in regions where Mosaic AI Vector Search is supported.
Syntax
In Databricks Runtime 15.3 and above, use query_text or query_vector to specify what to search for in the index.
SELECT * FROM vector_search(
index,
{ query_text | query_vector },
[ num_results ]
)
In Databricks Runtime 15.2 and below, use query to specify what to search for in the index.
SELECT * FROM vector_search(
index, query, num_results
)
Arguments
All arguments must be passed by name, like vector_search(index => indexName, query_text => queryText).
index: ASTRINGconstant, the fully qualified name of an existing vector search index in the same workspace for invocations. The definer must haveSelectpermission on the index.- Use one of the following to specify the expression to search for in the index:
- For Databricks Runtime 15.3 or above, use
query_textto search for a specific string of text in the embedding source column of your Delta Sync Index. The query must be aSTRINGexpression of the string to search for in the index. - For Databricks Runtime 15.3 or above, use
query_vectorto search for a specific vector in the embedding vector column of your Delta Sync Index. This argument is Required for searching a Delta Sync Index using self-managed vectors. The query must be anARRAY<FLOAT>, orARRAY<DOUBLE>, orARRAY<DECIMAL(_, _)>expression of the embedding vector to search for in the index. - For Databricks Runtime 15.2 or below, use
queryto specify the string to search for in your index.
- For Databricks Runtime 15.3 or above, use
num_results(optional): An integer constant, the max number of records to return. Defaults to 10.query_type(optional): The type of search you want to perform on your vector search index. Defaults toANNif not explicitly specified.- If the
query_typeisANN, then eitherquery_textorquery_vectormust be specified to perform an approximate nearest neighbor search or similarity search. - If the
query_typeisHYBRID, thenquery_textmust be specified. You can specify bothquery_textandquery_vectorfor a hybrid search. Hybrid search in this instance means a combination of similarity search and keyword search where a keyword search uses the literal text as the target.
- If the
The following table summarizes which arguments can be used when you have a Delta Sync index with an embedding model:
query_type |
query_text |
query_vector |
Description |
|---|---|---|---|
| ANN (also known as approximate nearest neighbor or similarity search) | ✓ |
|
|
| ANN (also known as approximate nearest neighbor or similarity search) | ✓ | Use query_vector for a similarity search. |
|
| Hybrid | ✓ |
|
|
| Hybrid | ✓ | ✓ | Use query_text for keyword search and use query_vector for similarity search. |
The following table describes the different scenarios and the arguments that can be used when you have a Delta sync index without an embedding model:
query_type |
query_text |
query_vector |
Description |
|---|---|---|---|
ANN (also known as approximate nearest neighbor or similarity search) |
✓ | Usequery_vector for a similarity search. Similarity search requires an embedding vector as a search target. Because an embedding model is not available in this scenario to calculate the embedding vector, you must provide it. |
|
| Hybrid | ✓ | ✓ | For a hybrid search for a specific string of text and a vector, specify both query_text and query_vector. |
Returns
A table of the top matching records from the index. All the columns of the index are included.
Examples
The following sections show example SQL queries for different index searches.
Hybrid search queries
The following hybrid search example combines the following search types to find the provided terms in text or metadata of the vector search index:
- Vector similarity search: To find similar semantic meaning for
Wi-Fi issues. - Keyword search: To find
Wi-Fi issues LMP-9R2on a keyword index.
SELECT * FROM vector_search(
index => 'main.support_docs.index',
query_text => 'Wi-Fi issues LMP-9R2',
query_type => 'HYBRID',
num_results => 3)
| doc_id | title | product_code |
|---|---|---|
| 1403 | Wi-Fi Troubleshooting Guide — LMP-9R2 | LMP-9R2 |
| 1332 | Known Connectivity Issues for LMP-9R2 Devices | LMP-9R2 |
| 1271 | General Wi-Fi Troubleshooting Guide | LMP-8R2 |
The following hybrid search example specifies both query_text and query_vector for the term, Wi-Fi issues LMP-9R2. In this example, keyword search performs better on proprietary terms unique to a company (like “LMP-9R2” in this case), whereas vector search, which are typically trained on public datasets, does not recognize terms like “LMP-9R2.”
SELECT * FROM vector_search(
index => 'main.support_docs.index',
query_text => 'Wi-Fi issues LMP-9R2',
query_vector => array( 0.0213, 0.1045, 0.0871, 0.0562, 0.1459, ... 0.0131),-- a self computed embedding of the `query_text` param
query_type => 'HYBRID',
num_results => 3 )
| doc_id | title | product_code |
|---|---|---|
| 1403 | Wi-Fi Troubleshooting Guide — LMP-9R2 | LMP-9R2 |
| 1332 | Known Connectivity Issues for LMP-9R2 Devices | LMP-9R2 |
| 1271 | General Wi-Fi Troubleshooting Guide | LMP-8R2 |
Text queries on indexes with embedding source columns
Search over an index of product SKUs to find similar products by name. The following example uses query_text which is only supported in Databricks Runtime 15.3 and above. For Databricks Runtime 15.2 and below, use query instead of query_text.
SELECT * FROM VECTOR_SEARCH(index => "main.db.my_index", query_text => "iphone", num_results => 2)
| ID | Product name |
|---|---|
| 10 | iPhone |
| 20 | iPhone SE |
The following example searches for multiple terms at the same time by using a LATERAL subquery.
SELECT
query_txt,
query_id,
search.*
FROM
query_table,
LATERAL(
SELECT * FROM VECTOR_SEARCH(index => "main.db.my_index", query_text => query_txt, num_results => 2)
) as search
query_txt |
query_id | search.id | search.product_name |
|---|---|---|---|
| iphone | 1 | 10 | iPhone 10 |
| iphone | 1 | 20 | iPhone SE |
| pixel 8 | 2 | 30 | Pixel 8 |
| pixel 8 | 2 | 40 | Pixel 8a |
Text queries on indexes with embedding source columns
Search over an index of images with pre-computed embeddings to find similar images by embedding. The following example uses query_vector which is only supported in Databricks Runtime 15.3 and above. For Databricks Runtime 15.2 and below, use query instead of query_vector.
SELECT * FROM VECTOR_SEARCH(index => "main.db.my_index", query_vector => ARRAY(0.45, -0.35, 0.78, 0.22), num_results => 3)
SELECT * FROM VECTOR_SEARCH(index => "main.db.my_index", query_vector => ARRAY(0.45F, -0.35F, 0.78F, 0.22F), num_results => 3)
SELECT * FROM VECTOR_SEARCH(index => "main.db.my_index", query_vector => ARRAY(0.45D, -0.35D, 0.78D, 0.22D), num_results => 3)
| id | image_name | image_embedding |
|---|---|---|
| 28 | horse.jpg | [0.46, -0.34, 0.77, 0.21] |
| 27 | donkey.jpg | [0.44, -0.36, 0.79, 0.23] |
| 5 | elk.jpg | [0.23, -0.44, 0.77, 0.80] |
The following example searches for multiple terms at the same time by using a LATERAL subquery.
SELECT
query_embedding,
search.*
FROM
query_table,
LATERAL(
SELECT * FROM VECTOR_SEARCH(index => "main.db.my_index", query_vector => image_embedding, num_results => 1)
) as search
| query_embedding | search.id | search.image_name | search_image_embedding |
|---|---|---|---|
| [0.45, -0.35, 0.78, 0.22] | 27 | donkey.jpg | [0.46, -0.34, 0.77, 0.21] |
| [0.14, 0.29, 0.30, -0.90] | 3 | landscape.jpg | [0.15, 0.30, 0.31, -0.91] |
| [0.23, -0.44, 0.77, 0.80] | 10 | golden_gate_bridge.jpg | [0.28, -0.40, 0.23, 0.81] |
| [0.88, 0.88, 0.88, 0.88] | 44 | blank.jpg | [0.88, 0.88, 0.88, 0.88] |
Limitations
The following limitations apply during the preview:
- Querying
DIRECT_ACCESSindex types are not supported. - Input parameters
filters_jsonorcolumnsare not supported. - Vector Search with
num_resultsgreater than 100 are not supported. vector_searchcannot be used with model serving endpoints using Foundation Model APIs provisioned throughput.