{"id":19100515,"url":"https://github.com/Azure-Samples/azure-sql-db-openai","last_synced_at":"2025-04-18T17:33:51.473Z","repository":{"id":171171075,"uuid":"644543746","full_name":"Azure-Samples/azure-sql-db-openai","owner":"Azure-Samples","description":"Samples on how to use Azure SQL database with Azure OpenAI","archived":false,"fork":false,"pushed_at":"2024-11-08T16:52:44.000Z","size":2179,"stargazers_count":73,"open_issues_count":2,"forks_count":31,"subscribers_count":17,"default_branch":"main","last_synced_at":"2024-11-08T17:42:07.659Z","etag":null,"topics":["azure-sql","azure-sql-database","cosine-distance","cosine-similarity","open-ai"],"latest_commit_sha":null,"homepage":"","language":"TSQL","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Azure-Samples.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.md","code_of_conduct":".github/CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-05-23T18:36:40.000Z","updated_at":"2024-11-08T16:52:47.000Z","dependencies_parsed_at":"2024-02-02T18:29:55.618Z","dependency_job_id":"ea1d467a-5925-4555-8faa-203758687ed4","html_url":"https://github.com/Azure-Samples/azure-sql-db-openai","commit_stats":null,"previous_names":["azure-samples/azure-sql-db-openai"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Azure-Samples%2Fazure-sql-db-openai","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Azure-Samples%2Fazure-sql-db-openai/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Azure-Samples%2Fazure-sql-db-openai/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Azure-Samples%2Fazure-sql-db-openai/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Azure-Samples","download_url":"https://codeload.github.com/Azure-Samples/azure-sql-db-openai/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223783096,"owners_count":17201904,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["azure-sql","azure-sql-database","cosine-distance","cosine-similarity","open-ai"],"created_at":"2024-11-09T03:52:52.063Z","updated_at":"2025-04-18T17:33:51.466Z","avatar_url":"https://github.com/Azure-Samples.png","language":"TSQL","funding_links":[],"categories":["others","Code Samples \u0026 Workshops"],"sub_categories":[],"readme":"---\npage_type: sample\nlanguages:\n- sql\nproducts:\n- azure-openai\n- azure-sql-database\nurlFragment: azure-sql-db-openai\nname: Vector similarity search with Azure SQL \u0026 Azure OpenAI\ndescription: |\n  Use Azure OpenAI from Azure SQL database to get the vector embeddings of any chosen text, and then calculate the cosine similarity to find related topics\n---\n\n# Vector similarity search with Azure SQL \u0026 Azure OpenAI\n\nThis example shows how to use Azure OpenAI from Azure SQL database to get the vector embeddings of any chosen text, and then calculate the [cosine similarity](https://learn.microsoft.com/en-us/azure/storage/common/storage-sas-overview) against the Wikipedia articles (for which vector embeddings have been already calculated,) to find the articles that covers topics that are close - or similar - to the provided text.\n\nFor an introduction on text and code embeddings, check out this OpenAI article: [Introducing text and code embeddings](https://openai.com/blog/introducing-text-and-code-embeddings).\n\n## Native or Classic?\n\nAzure SQL database can be used to easily and quickly perform vector similarity search. There are two options for this: a native option and a classic option.\n\nThe **native option** is to use the new Vector Functions, recently introduced in Azure SQL database. Vector Functions are a set of functions that can be used to perform vector operations directly in the database. \n\n\u003e [!NOTE]  \n\u003e Vector Functions are in Public Preview. Learn the details about vectors in Azure SQL here: https://aka.ms/azure-sql-vector-public-preview\n\n![](_assets/azure-sql-cosine-similarity-vector-type.gif)\n\nThe **classic option** is to use the classic T-SQL to perform vector operations, with the support of columnstore indexes for getting good performance.\n\n\u003e [!IMPORTANT]  \n\u003e This branch (the `main` branch) uses the native vector support in Azure SQL. If you want to use the classic T-SQL, switch to the `classic` branch.\n\n## Download and import the Wikipedia Article with Vector Embeddings\n\nDownload the [wikipedia embeddings from here](https://cdn.openai.com/API/examples/data/vector_database_wikipedia_articles_embedded.zip), unzip it and upload it (using [Azure Storage Explorer](https://learn.microsoft.com/azure/vs-azure-tools-storage-manage-with-storage-explorer?tabs=windows) for example) to an Azure Blob Storage container.\n\nIn the example the unzipped CSV file `vector_database_wikipedia_articles_embedded.csv` is assumed to be uploaded to a blob container named `playground` and in a folder named `wikipedia`.\n\nOnce the file is uploaded, get the [SAS token](https://learn.microsoft.com/azure/storage/common/storage-sas-overview) to allow Azure SQL database to access it. (From Azure storage Explorer, right click on the `playground` container and then select `Get Shared Access Signature`. Set the expiration date to some time in future and then click on \"Create\". Copy the generated query string somewhere, for example into Notepad, as it will be needed later)\n\nUse a client tool like [Azure Data Studio](https://azure.microsoft.com/products/data-studio/) to connect to an Azure SQL database and then use the `./vector-embeddings/01-import-wikipedia.sql` to create the `wikipedia_articles_embeddings` table where the uploaded CSV file will be imported.\n\nMake sure to replace the `\u003caccount\u003e` and `\u003csas-token\u003e` placeholders with the value correct for your environment:\n\n- `\u003caccount\u003e` is the name of the storage account where the CSV file has been uploaded\n- `\u003csas-token\u003e` is the Share Access Signature obtained before\n\nRun each section (each section starts with a comment) separately. At the end of the process (will take up to a couple of minutes) you will have all the CSV data imported in the `wikipedia_articles_embeddings` table.\n\n## Add embeddings columns to table\n\nIn the imported data, vectors are stored as JSON arrays. To take advtange of vector processing, the arrays must be saved into a more compact and optimized binary format index. Thanks to the new `VECTOR` type, turning a vector into a set of values that can be saved into a column is very easy:\n\n```sql\nalter table wikipedia_articles_embeddings\nadd title_vector_ada2 vector(1536);\n\nupdate \n    wikipedia_articles_embeddings\nset \n    title_vector_ada2 = cast(title_vector as vector(1536)),\n```\n\nThe script `./vector-embeddings/02-use-native-vectors.sql` does exactly that. It takes the existing columns with vectors stored in JSON arrays and turns them into vectors saved in binary format.\n\n## Find similar articles by calculating cosine distance\n\nMake sure to have an Azure OpenAI [embeddings model](https://learn.microsoft.com/azure/cognitive-services/openai/concepts/models#embeddings-models) deployed and make sure it is using the `text-embedding-ada-002` model.\n\nOnce the Azure OpenAI model is deployed, it can be called from Azure SQL database using [sp_invoke_external_rest_endpoint](https://learn.microsoft.com/sql/relational-databases/system-stored-procedures/sp-invoke-external-rest-endpoint-transact-sql), to get the embedding vector for the \"the foundation series by isaac asimov\", text, for example, using the following code (make sure to replace the `\u003cyour-api-name\u003e` and `\u003capi-key\u003e` with yout Azure OpenAI deployment):\n\n```sql\ndeclare @inputText nvarchar(max) = 'the foundation series by isaac asimov';\ndeclare @retval int, @response nvarchar(max);\ndeclare @payload nvarchar(max) = json_object('input': @inputText);\nexec @retval = sp_invoke_external_rest_endpoint\n    @url = 'https://\u003cyour-api-name\u003e.openai.azure.com/openai/deployments/\u003cdeployment-id\u003e/embeddings?api-version=2023-03-15-preview',\n    @method = 'POST',\n    @headers = '{\"api-key\":\"\u003capi-key\u003e\"}',\n    @payload = @payload,\n    @response = @response output;\nselect @response;\n```\n\nThe vector returned in the response can extracted using `json_query`:\n\n```sql\nset @re = json_query(@response, '$.result.data[0].embedding')\n```\n\nNow it is just a matter of taking the vector of the sample text and the vectors of all wikipedia articles and calculating the cosine similarity. The math can be easily expressed in T-SQL:\n\n```sql\nvector_distance('cosine', @embedding, title_vector) \n```\n\n## Encapsulating logic to retrieve embeddings\n\nThe described process can be wrapped into stored procedures to make it easy to re-use. The scripts in the `./vector-embeddings/` directory show how to create a stored procedure to retrieve the embeddings from OpenAI:\n\n- `03-store-openai-credentials.sql`: stores the Azure OpenAI credentials in the Azure SQL database\n- `04-create-get-embeddings-procedure.sql`: creates a stored procedure to encapsulate the call to OpenAI using the script. \n\n## Finding similar articles\n\nThe script `05-find-similar-articles.sql` uses the created stored procedure and the process explained above to find similar articles to the provided text. \n\n## Alternative sample with Python and a local embedding model\n\nIf you don't want to, or can't use OpenAI to generate embeddings, you can use a local model like `https://huggingface.co/sentence-transformers/multi-qa-MiniLM-L6-cos-v1` to generate embeddings. The Python script `./python/hybrid_search.py` shows how to \n\n- use Python to generate the embeddings \n- do similarity search in Azure SQL database\n- use [Fulltext search in Azure SQL database with BM25 ranking](https://learn.microsoft.com/en-us/sql/relational-databases/search/limit-search-results-with-rank?view=sql-server-ver16#ranking-of-freetexttable)\n- do re-ranking applying Reciprocal Rank Fusion (RRF) to combine the BM25 ranking with the cosine similarity ranking\n\nMake sure to setup the database for this sample using the `./python/00-setup-database.sql` script. The database can be either an Azure SQL DB or a SQL Server database.\n\n## Conclusions\n\nAzure SQL database, now has support to perform vector operations directly in the database, making it easy to perform vector similarity search. Using vector search along with fulltext search and BM25 ranking, it is possible to build powerful search engines that can be used in a variety of scenarios. \n\n\u003e [!NOTE]  \n\u003e Vector Functions are in Early Adopter Preview. Get access to the preview via https://aka.ms/azuresql-vector-eap-announcement\n\n## More resources\n\n- [Azure SQL \u0026 AI](https://aka.ms/sql-ai)\n- [Azure SQL Vector Samples](https://github.com/Azure-Samples/azure-sql-db-vector-search)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FAzure-Samples%2Fazure-sql-db-openai","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FAzure-Samples%2Fazure-sql-db-openai","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FAzure-Samples%2Fazure-sql-db-openai/lists"}