https://github.com/lazauk/aoai-assistants-vectorstore
Introduction to the process of uploading up to 10,000 files to the Vector Store object in Azure OpenAI's Assistants API.
https://github.com/lazauk/aoai-assistants-vectorstore
10k agent ai assistant azure entra-id file-upload openai python sdk vector-store
Last synced: 6 months ago
JSON representation
Introduction to the process of uploading up to 10,000 files to the Vector Store object in Azure OpenAI's Assistants API.
- Host: GitHub
- URL: https://github.com/lazauk/aoai-assistants-vectorstore
- Owner: LazaUK
- License: mit
- Created: 2024-10-30T15:13:51.000Z (12 months ago)
- Default Branch: main
- Last Pushed: 2024-11-06T23:06:14.000Z (11 months ago)
- Last Synced: 2025-04-05T23:42:26.815Z (6 months ago)
- Topics: 10k, agent, ai, assistant, azure, entra-id, file-upload, openai, python, sdk, vector-store
- Language: Jupyter Notebook
- Homepage:
- Size: 25.4 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Azure OpenAI Assistants API: Creating your first 10K Vector Store
**Vector Store** is a new object in Azure OpenAI (AOAI) Assistants API, that makes uploaded files searcheable by automatically parsing, chunking and embedding their content.
At the time of writing (```October 2024```), Vector Store was supporting the ingestion of up to **10,000** files.
> [!WARNING]
> Uploading thousands of files may fail due to timeouts or other API operation disruptions. Therefore, the upload process enforces two **maximum file** limits:
> - up to _100_ files max, when creating a new Vector Store;
> - up to _500_ files max **per batch**, when adding files to an existing Vector Store.## Table of contents:
- [Pre-requisites](https://github.com/LazaUK/AOAI-Assistants-VectorStore#pre-requisites)
- [Scenario 1: Authenticating with API Key](https://github.com/LazaUK/AOAI-Assistants-VectorStore#scenario-1-authenticating-with-api-key)
- [Scenario 2: Authenticating with Entra ID](https://github.com/LazaUK/AOAI-Assistants-VectorStore#scenario-2-authenticating-with-entra-id)## Pre-requisites
1. Upgrade openai Python package to its latest supported version:
``` PowerShell
pip install --upgrade openai
```
2. Set the following 3 environment variables before running the notebooks:| Environment Variable | Description |
| --- | --- |
| _AZURE_OPENAI_API_BASE_ | Base URL of the AOAI endpoint |
| _AZURE_OPENAI_API_VERSION_ | API version of the AOAI endpoint |
| _AZURE_OPENAI_API_KEY_ | API key of the AOAI endpoint (_required for Scenario 1 only_) |## Scenario 1: Authenticating with API Key
1. Retrieve values of environment variables:
``` Python
AOAI_API_BASE = os.getenv("AZURE_OPENAI_API_BASE")
AOAI_API_VERSION = os.getenv("AZURE_OPENAI_API_VERSION")
AOAI_API_KEY = os.getenv("AZURE_OPENAI_API_KEY")
```
2. Instantiate Azure OpenAI client:
``` Python
client = AzureOpenAI(
azure_endpoint = AOAI_API_BASE,
api_version = AOAI_API_VERSION,
api_key = AOAI_API_KEY
)
```
3. Instantiate new Vector Store:
``` Python
vector_store = client.beta.vector_stores.create(
name = ""
)
```
4. Populate the Vector Store with your files in batches:
``` Python
file_batch = client.beta.vector_stores.file_batches.upload_and_poll(
vector_store_id = vector_store.id,
files = file_streams
)
```
5. If successful, you should see a message like this:
``` JSON
Uploading files to the vector store from folder1...
Files upload status: completed
- cancelled: 0
- completed: 100
- failed: 0
- in progress: 0
----------------------------------------
Total: 100Uploading files to the vector store from folder2...
Files upload status: completed
- cancelled: 0
- completed: 500
- failed: 0
- in progress: 0
----------------------------------------
Total: 500
```## Scenario 2: Authenticating with Entra ID
1. Retrieve values of environment variables:
``` Python
AOAI_API_BASE = os.getenv("AZURE_OPENAI_API_BASE")
AOAI_API_VERSION = os.getenv("AZURE_OPENAI_API_VERSION")
```
2. Define Entra ID as a token provider:
``` Python
token_provider = get_bearer_token_provider(
DefaultAzureCredential(),
"https://cognitiveservices.azure.com/.default"
)
```
3. Instantiate Azure OpenAI client:
``` Python
client = AzureOpenAI(
azure_endpoint = AOAI_API_BASE,
api_version = AOAI_API_VERSION,
azure_ad_token_provider = token_provider
)
```
4. Instantiate new Vector Store:
``` Python
vector_store = client.beta.vector_stores.create(
name = ""
)
```
5. Populate the Vector Store with your files in batches:
``` Python
file_batch = client.beta.vector_stores.file_batches.upload_and_poll(
vector_store_id = vector_store.id,
files = file_streams
)
```
6. If successful, you should see a message like this:
``` JSON
Uploading files to the vector store from folder1...
Files upload status: completed
- cancelled: 0
- completed: 100
- failed: 0
- in progress: 0
----------------------------------------
Total: 100Uploading files to the vector store from folder2...
Files upload status: completed
- cancelled: 0
- completed: 500
- failed: 0
- in progress: 0
----------------------------------------
Total: 500
```