{"id":28423172,"url":"https://github.com/neuralmagic/deepsparse-milvus","last_synced_at":"2025-06-24T23:31:15.326Z","repository":{"id":63101203,"uuid":"559595879","full_name":"neuralmagic/deepsparse-milvus","owner":"neuralmagic","description":"Example of DeepSparse Engine running with Milvus","archived":false,"fork":false,"pushed_at":"2022-11-14T20:08:16.000Z","size":16125,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-24T22:41:00.056Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/neuralmagic.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-10-30T15:52:50.000Z","updated_at":"2025-05-12T19:05:36.000Z","dependencies_parsed_at":"2022-11-12T22:46:02.951Z","dependency_job_id":null,"html_url":"https://github.com/neuralmagic/deepsparse-milvus","commit_stats":null,"previous_names":["rib-2/deepsparse-milvus","robertgshaw2-neuralmagic/deepsparse-milvus","robertgshaw2-redhat/deepsparse-milvus","neuralmagic/deepsparse-milvus"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/neuralmagic/deepsparse-milvus","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/neuralmagic%2Fdeepsparse-milvus","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/neuralmagic%2Fdeepsparse-milvus/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/neuralmagic%2Fdeepsparse-milvus/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/neuralmagic%2Fdeepsparse-milvus/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/neuralmagic","download_url":"https://codeload.github.com/neuralmagic/deepsparse-milvus/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/neuralmagic%2Fdeepsparse-milvus/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261774660,"owners_count":23207775,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-06-05T08:35:35.341Z","updated_at":"2025-06-24T23:31:15.320Z","avatar_url":"https://github.com/neuralmagic.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# DeepSparse + Milvus: Text-Search with BERT\n\nThis example demonstrates how to create a semantic search engine using FastAPI, DeepSparse, Milvus, and MySQL.\n\nWe will create 4 services:\n- Milvus Server - vector database used to hold the embeddings of the article dataset and perform the search queries\n- MySQL Server - holds the mapping from Milvus ids to original article data\n- DeepSparse Server - inference runtime used to generate the embeddings for the queries\n- Application Server - endpoint called by the client side with queries for searching\n\nWe will demonstrate running on a local machine as well as in a VPC on AWS with independent-scaling of the App, Database, and Model Serving Components.\n\n## Application Architecture\n\nWe have provided a sample dataset in `client/example.csv`. These data are articles about various topics, in `(title,text)` pairs. We will create an application that will allow users to upload arbitrary `text` and find the 10 most similiar articles using semantic search.\n\nThe app server is built on FastAPI and exposes a both `/load` and `/search` endpoints. \n\nThe `/load` endpoint accepts a csv file with `(title, text)` representing a series of articles. On `/load`, we project the `text` into the embedding space with BERT running on DeepSparse. We then store each embedding in Milvus with a primary key `id` and store the `(id,title,text)` tripes in MySQL.\n\nThe `/search` endpoint enables clients to send `text` to the server. The app server sends the `text` to DeepSparse Server, which returns the embedding of the query. This embedding is sent to Milvus, which searches for the 10 most similiar vectors in the database and returns their `ids` to the app server. The app server then looks up the `(title,text)` in MySQL and returns them back to the client.\n\nAs such, we can scale the app server, databases, and model service independently!\n\n## Running Locally\n\n### Start the Server\n\n#### Installation:\n- Milvus and Postgres are installed using Docker containers. [Install Docker](https://docs.docker.com/engine/install/) and [Docker Compose](https://docs.docker.com/compose/install/linux/).\n- DeepSparse is installed via PyPI. Create a virtual enviornment and run `pip install -r server/deepsparse-requirements.txt`.\n- The App Server is based on FastAPI. Create a virtual enviornment and run `pip install -r server/app-requirements.txt`.\n\n#### 1. Start Milvus\n\nMilvus has a convient `docker-compose` file which can be downloaded with `wget` that launches the necessary services needed for Milvus. \n\n``` bash\ncd server/database-server\nwget https://raw.githubusercontent.com/milvus-io/milvus/master/deployments/docker/standalone/docker-compose.yml -O docker-compose.yml\nsudo docker-compose up\ncd ..\n\n```\nThis command should create `milvus-etcd`, `milvus-minio`, and `milvus-standalone`.\n\n#### 2. Start MySQL\n\nMySQL can be started with the base MySQL image available on Docker Hub. Simply run the following command.\n\n```bash\ndocker run -p 3306:3306 -e MYSQL_ROOT_PASSWORD=123456 -d mysql:5.7\n```\n\n#### 3. Start DeepSparse Server\n\nDeepSparse not only includes high performance runtime on CPUs, but also comes with tooling that simplify the process of adding inference to an application. Once example of this is the Server functionality, which makes it trivial to stand up a model service using DeepSparse.\n\nWe have provided a configuration file in `/server/deepsparse-server/server-config-deepsparse.yaml`, which sets up an embedding extraction endpoint running a sparse version of BERT from SparseZoo. You can edit this file to adjust the number of workers you want (this is the number of concurrent inferences that can occur). Generally, its a fine starting point to use `num_cores/2`.\n\nHere's what the config file looks like.\n\n```yaml\nnum_workers: 4  # number of streams - should be tuned, num_cores / 2 is good place to start\n\nendpoints: \n  - task: embedding_extraction\n    model: zoo:nlp/masked_language_modeling/bert-base/pytorch/huggingface/wikipedia_bookcorpus/pruned80_quant-none-vnni\n    route: /predict\n    name: embedding_extraction_pipeline\n    kwargs:\n      return_numpy: False\n      extraction_strategy: reduce_mean\n      sequence_length: 512\n      engine_type: deepsparse\n```\n\nTo start DeepSparse, run the following:\n\n```bash\ndeepsparse.server --config_file server/deepsparse-server/server-config-deepsparse.yaml\n```\n\nTO BE REMOVED --- hack to remove bug in Server\n\n- Run `vim deepsparse-env/lib/python3.8/site-packages/deepsparse/server/server.py`\n- In `_add_pipeline_endpoint()`, udpate `app.add_api_route` by commenting out `response_model=output_schema`.\n\nESC-I enters insert mode; ESC Exits insert mode. :wq writes file and quits.\n\n**Potential Improvements**\n\nThere is both a throughput-focused step (`load`) where we need to process a large number of embeddings at once with no latency requirements and there is a latency-focused step (`search`) where we need to process one embedding and return to the user as fast as possible. For simplicity, we currently only use one configuration of DeepSparse with `batch_size=1`, which is a latency-oriented setup.\n\nAn extension to this project would be configuring DeepSparse to have multiple endpoints or adding another DeepSparse Server instance with a configuration for high throughput.\n\n#### 4. Start The App Server\n\nThe App Server is built on `FastAPI` and `uvicorn` and orchestrates DeepSparse, Milvus, and MySQL to create a search engine. \n\nRun the following to launch.\n\n```bash\npython3 server/app-server/src/app.py\n```\n\n### Use the Search Engine!\n\nWe have provided both a Jupyter notebook and latency testing script to interact with the server. \n\n#### Jupyter Notebook\nThe Jupyter notebook is self-documenting and is a good starting point to play around with the application.\n\nYou can run with the following command:\n`juptyer notebook example-client.ipynb`\n\n#### Latency Testing Script\nThe latency testing script generates multiple clients to test response time from the server. It provides metrics on both overall query latency as well as metrics on the model serving query latency (the end to end time from the app server querying DeepSparse until a response is returned.) \n\nYou can run with the following command:\n```bash\npython3 client/latency-test-client.py --url http://localhost:5000/ --dataset_path client/example.csv --num_clients 8\n```\n- `--url` is the location of the app server\n- `--dataset_path` is the location of the dataset path on client side\n- `--num_clients` is the number of clients that will be created to send requests concurrently\n\n## Running in an AWS VPC with Independent-Scaling\n\n### Create a VPC\n\nFirst, we will create a VPC that houses our instances and enables us to communicate between the App Server, Milvus, MySQL, and DeepSparse.\n\n- Navigate to `Create VPC` in the AWS console\n- Select `VPC and more`. Name it `semantic-search-demo-vpc`\n- Make sure you have `IPv4 CIDR block` set. We use `10.0.0.0/16` in the example.\n- Number of AZs to 1, Number of Public Subnets to 1, and Number of Private Subnets to 0.\n\nWhen we create our services, we will add them to the VPC and only enable communication to the backend model service and databases from within the VPC, isloating the model and database services from the internet.\n\n### Create a Database Instance\n\nLaunch an EC2 Instance.\n- Navigate to EC2 \u003e Instances \u003e Launch an Instance\n- Name the instance `database-server`\n- Select Amazon Linux\n\nEdit the `Network Setting`.\n- Put the `app-server` into the `semantic-search-demo-vpc` VPC\n- Choose the public subnet\n- Set `Auto-Assign Public IP` to `Enabled`.\n- Add a `Custom TCP` security group rule with port `19530` with `source-type` of `Custom` and Source equal to the CIDR of the VPC (in our case `10.0.0.0/16`). This is how the App Server will Talk to Milvus\n- Add a `Custom TCP` security group rule with port `3306` with `source-type` of `Custom` and Source equal to the CIDR of the VPC (in our case `10.0.0.0/16`). This is how the App Server will Talk to MySQL\n\nLaunch the instance and then SSH into your newly created instance and start-up the app server.\n```\nssh -i path/to/your/private-key.pem ec2-user@your-instance-public-ip\n```\nInstall Docker/Docker Compose and add group membership for the default ec2-user:\n```\nsudo yum update -y\nsudo yum install docker -y\nsudo usermod -a -G docker ec2-user\nid ec2-user\nnewgrp docker\npip3 install --user docker-compose\n```\n\nStart Docker and Check it is running with the following:\n```\nsudo service docker start\ndocker container ls\n```\n\nDownload Milvus Docker Image and Launch Milvus with `docker-compose`:\n```\nwget https://raw.githubusercontent.com/milvus-io/milvus/master/deployments/docker/standalone/docker-compose.yml -O docker-compose.yml\ndocker-compose up\n```\n\nSSH from another terminal into the same instance to setup MySQL.\n```\nssh -i path/to/your/private-key.pem ec2-user@your-instance-public-ip\n```\n\nRun the following to launch MySQL:\n```bash\ndocker run -p 3306:3306 -e MYSQL_ROOT_PASSWORD=123456 -d mysql:5.7\n```\n\nYour databases are up and running!\n\n### Create the Application Server\n\nLaunch an EC2 Instance.\n- Navigate to EC2 \u003e Instances \u003e Launch an Instance\n- Name the instance `app-server`\n- Select Amazon Linux\n\nEdit the `Network Setting` to expose the App Endpoint to the Internet while still giving access to the backend database and model service.\n- Put the `app-server` into the `semantic-search-demo-vpc` VPC\n- Choose the public subnet\n- Set `Auto-Assign Public IP` to `Enabled`.\n- Add a `Custom TCP` security group rule with port `5000` with `source-type` of `Anywhere`. This exposes the app to the internet.\n\nClick Launch Instance and SSH into your newly created instance and launch the app server.\n\nFrom the command line run:\n```\nssh -i path/to/your/private-key.pem ec2-user@your-instance-public-ip\n```\n\nClone this repo with Git:\n```bash\nsudo yum update -y\nsudo yum install git -y\nsudo git clone https://github.com/rsnm2/deepsparse-milvus.git\n```\n\nInstall App Requirements in a virutal enviornment.\n```bash\npython3 -m venv app-env\nsource app-env/bin/activate\npip3 install -r deepsparse-milvus/text-search-engine/server/app-requirements.txt\n```\n\nRun the following to activate.\n```bash\npython3 deepsparse-milvus/text-search-engine/server/app-server/src/app.py --database host private.ip.of.database.server --model_host private.ip.of.model.server\n```\n\nYour App Server is up and Running!\n\n### Create DeepSparse AWS Instance\n\nLaunch an EC2 Instance.\n- Navigate to EC2 \u003e Instances \u003e Launch an Instance\n- Name the instance `database-server`\n- Select Amazon Linux and a `c6i.4xlarge` instance type\n\nEdit the `Network Setting` to expose the App Endpoint to the Internet while still giving access to the backend database and model service.\n- Put the `app-server` into the `semantic-search-demo-vpc` VPC\n- Choose the public subnet\n- Set `Auto-Assign Public IP` to `Enabled`.\n- Add a `Custom TCP` security group rule with port `5543` with `source-type` of `Custom` and Source equal to the CIDR of the VPC (in our case `10.0.0.0/16`). This is how the App Server will Talk to DeepSparse\n\nClick Launch Instance and SSH into your newly created instance and launch the DeepSparse Server.\n```\nssh -i path/to/your/private-key.pem ec2-user@your-instance-public-ip\n```\n\nClone this repo with Git:\n```bash\nsudo yum update -y\nsudo yum install git -y\ngit clone https://github.com/rsnm2/deepsparse-milvus.git\n```\n\nInstall App Requirements in a virutal enviornment.\n```bash\npython3 -m venv deepsparse-env\nsource deepsparse-env/bin/activate\npip3 install -r deepsparse-milvus/text-search-engine/server/deepsparse-requirements.txt\n```\n\nTO BE REMOVED --- hack to remove bug in Server\n\n- Run `vim deepsparse-env/lib/python3.7/site-packages/deepsparse/server/server.py`\n- In `_add_pipeline_endpoint()`, udpate `app.add_api_route` by commenting out `response_model=output_schema`.\n\n\nRun the following to start a model server with DeepSparse as the runtime engine. \n```bash\ndeepsparse.server --config-file deepsparse-milvus/text-search-engine/server/deepsparse-server/server-config-onnxruntime.yaml```\n```\n\nYou should see a Uvicorn server running!\n\nWe have also provided a config file with ONNX as the runtime engine for performance comparison. \nYou can launch a server with ONNX Runtime with the following:\n```bash\ndeepsparse.server --config-file deepsparse-milvus/text-search-engine/server/deepsparse-server/server-config-onnx.yaml\n```\n**Note: you should have either DeepSparse or ONNXRuntime running but not both***\n\n### Benchmark Performance\n\nFrom your local machine, run the following, which creates 4 clients that continously make requests to the server.\n\n```bash\npython3 client/latency-test-client.py --url http://app-server-public-ip:5000/ --dataset_path client/example.csv --num_clients 4 --iters_per_client 25\n```\n\nWith DeepSparse running in the Model Server, the latency looks like this, where Model Latency is the time it takes to process\na request by Model Server and Query Latency is the full end to end time on the client side (Network Latency + Model Latency + Database Latency).\n\n```\nModel Latency Stats:\n{'count': 100,\n 'mean': 97.6392858400186,\n 'median': 97.46583750006721,\n 'std': 0.7766356131548698}\n\nQuery Latency Stats:\n{'count': 100,\n 'mean': 425.1315195999632,\n 'median': 425.0526745017851,\n 'std': 34.73163016766087}\n```\n\n**RS Note: when scaling this out with more clients, the rest of the system becomes the bottleneck for scaling. So, need to investigate a bit more how to show off the performance of DeepSparse**\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fneuralmagic%2Fdeepsparse-milvus","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fneuralmagic%2Fdeepsparse-milvus","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fneuralmagic%2Fdeepsparse-milvus/lists"}