{"id":24760923,"url":"https://github.com/ziozzang/embedding-server","last_synced_at":"2026-05-06T23:33:52.413Z","repository":{"id":234417450,"uuid":"788854824","full_name":"ziozzang/embedding-server","owner":"ziozzang","description":"Testing Embedding Server (Compatible OpenAI API). model from LLaMa/Mistral","archived":false,"fork":false,"pushed_at":"2024-04-19T08:30:08.000Z","size":7,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-01-28T18:20:04.461Z","etag":null,"topics":["embedding-models","embedding-vectors","flask","openai-api"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ziozzang.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2024-04-19T08:07:38.000Z","updated_at":"2024-04-19T11:58:37.000Z","dependencies_parsed_at":"2024-04-19T09:26:26.241Z","dependency_job_id":"1ef19b94-5bbf-4cf9-8476-61d3f3a77187","html_url":"https://github.com/ziozzang/embedding-server","commit_stats":null,"previous_names":["ziozzang/embedding-server"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ziozzang%2Fembedding-server","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ziozzang%2Fembedding-server/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ziozzang%2Fembedding-server/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ziozzang%2Fembedding-server/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ziozzang","download_url":"https://codeload.github.com/ziozzang/embedding-server/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245097830,"owners_count":20560319,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["embedding-models","embedding-vectors","flask","openai-api"],"created_at":"2025-01-28T18:20:06.220Z","updated_at":"2026-05-06T23:33:52.357Z","avatar_url":"https://github.com/ziozzang.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# embedding-server\n\nCreating an OpenAI Embedding API Compatible Server using Open-LLM model.\n\nPurpose:\n- For embedding models to generate well-separated embedding vectors, they need to be trained with a good tokenizer and a large amount of data.\n- However, finding a suitable embedding model for multi-language environments can be challenging.\n- The quality of the embedding model that can be obtained depends on the appropriate level of the model's performance.\n\nLimitations:\n- When extracting and creating an embedding model, there are inherent limitations.\n\nProcess:\n## Model Extraction\n- Use mergekit to extract the 0th layer.\n- Typically, this extraction includes the embed_tokens layer along with the RMSNorm layer.\n- Use the following command to extract the model using mergekit:\n```\n# mergekit [config_file] [save_target_path]\nmergekit-yaml ./example.yaml ./test\n```\n\n## Embedding Extraction\n- The `embedding` function in `server.py` handles the embedding extraction. It loads the model using the `AutoModel` and `AutoTokenizer` from the `transformers` library and processes the tokens.\n- The embeddings are extracted by processing the tokens through the loaded layers.\n- However, positional encoding is not separately handled in this process.\n- Since other embedding models do not require appropriate positional encoding, the code for it is not included.\n\n## Model Configuration\n- In the `server.py` code, you need to set the model name and path:\n```\nmodels_list = {\n  # Model Name: Model Path\n  'text-embedding-3-small': './test/'\n}\n```\n- The following code in the script sets the default model name:\n```\nmodel = data.get('model', 'text-embedding-3-small')\n```\n\n## Execution \u0026 Testing\n```\n# Start the server\npython server.py\n\n# Test\ncurl -X POST -H \"Content-Type: application/json\" -d '{\"input\": \"hello\", \"model\": \"text-embedding-3-small\"}' http://localhost:5000/v1/embeddings\n```\n\nRequired Libraries:\n- Install the necessary libraries using the following command:\n```\npip install flask torch transformers\n```\n\nLicense:\n- MIT\n\nWarning: Testing\n- The code has been tested for functionality, but the quality of the embeddings has not been evaluated.\n- The following models have been tested:\n  - LLaMa v2 / v3\n  - Gemma 7B 1.1\n  - Mistral 7B\n\nReferences:\n- https://www.sbert.net/docs/pretrained_models.html\n- https://platform.openai.com/docs/guides/embeddings\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fziozzang%2Fembedding-server","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fziozzang%2Fembedding-server","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fziozzang%2Fembedding-server/lists"}