{"id":20714374,"url":"https://github.com/ganymedenil/text2vec-onnx","last_synced_at":"2025-04-23T08:44:17.515Z","repository":{"id":245867500,"uuid":"816621981","full_name":"GanymedeNil/text2vec-onnx","owner":"GanymedeNil","description":"text2vec onnxruntime","archived":false,"fork":false,"pushed_at":"2024-06-24T14:38:57.000Z","size":16,"stargazers_count":6,"open_issues_count":0,"forks_count":2,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-11T13:58:41.350Z","etag":null,"topics":["bert","bert-embeddings","embedding","onnx","similarity","similarity-matrix","similarity-score","similarity-search","text2vec"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/GanymedeNil.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-18T05:25:01.000Z","updated_at":"2024-11-28T08:02:00.000Z","dependencies_parsed_at":"2024-06-24T15:05:33.122Z","dependency_job_id":"e218901c-aaea-4d41-b27e-1dcbbe27ce58","html_url":"https://github.com/GanymedeNil/text2vec-onnx","commit_stats":null,"previous_names":["ganymedenil/text2vec-onnx"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GanymedeNil%2Ftext2vec-onnx","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GanymedeNil%2Ftext2vec-onnx/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GanymedeNil%2Ftext2vec-onnx/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GanymedeNil%2Ftext2vec-onnx/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/GanymedeNil","download_url":"https://codeload.github.com/GanymedeNil/text2vec-onnx/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250400991,"owners_count":21424493,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bert","bert-embeddings","embedding","onnx","similarity","similarity-matrix","similarity-score","similarity-search","text2vec"],"created_at":"2024-11-17T02:31:30.371Z","updated_at":"2025-04-23T08:44:17.494Z","avatar_url":"https://github.com/GanymedeNil.png","language":"Python","funding_links":["https://www.buymeacoffee.com/ganymedenil"],"categories":[],"sub_categories":[],"readme":"# text2vec-onnx\n\n本项目是 [text2vec](https://github.com/shibing624/text2vec) 项目的 onnxruntime 推理版本，实现了向量获取和文本匹配搜索。为了保证项目的轻量，只使用了 `onnxruntime` 、 `tokenizers` 和 `numpy` 三个库。\n\n主要在 [GanymedeNil/text2vec-base-chinese-onnx](https://huggingface.co/GanymedeNil/text2vec-base-chinese-onnx) 模型上进行测试，理论上支持 BERT 系列模型。\n\n## 安装\n\n### CPU 版本\n```bash\npip install text2vec2onnx[cpu]\n```\n### GPU 版本\n```bash\npip install text2vec2onnx[gpu]\n```\n\n## 使用\n\n### 模型下载\n以下载 GanymedeNil/text2vec-base-chinese-onnx 为例，下载模型到本地。\n\n- huggingface 模型下载\n```bash\nhuggingface-cli download --resume-download GanymedeNil/text2vec-base-chinese-onnx --local-dir text2vec-base-chinese-onnx\n```\n\n### 向量获取\n\n```python\nfrom text2vec2onnx import SentenceModel\nembedder = SentenceModel(model_dir_path='local-dir')\nemb = embedder.encode(\"你好\")\n```\n\n### 文本匹配搜索\n\n```python\nfrom text2vec2onnx import SentenceModel, semantic_search\n\nembedder = SentenceModel(model_dir_path='local-dir')\n\ncorpus = [\n    \"谢谢观看 下集再见\",\n    \"感谢您的观看\",\n    \"请勿模仿\",\n    \"记得订阅我们的频道哦\",\n    \"The following are sentences in English.\",\n    \"Thank you. Bye-bye.\",\n    \"It's true\",\n    \"I don't know.\",\n    \"Thank you for watching!\",\n]\ncorpus_embeddings = embedder.encode(corpus)\n\nqueries = [\n    'Thank you. Bye.',\n    '你干啥呢',\n    '感谢您的收听']\n\nfor query in queries:\n    query_embedding = embedder.encode(query)\n    hits = semantic_search(query_embedding, corpus_embeddings, top_k=1)\n    print(\"\\n\\n======================\\n\\n\")\n    print(\"Query:\", query)\n    print(\"\\nTop 5 most similar sentences in corpus:\")\n    hits = hits[0]  # Get the hits for the first query\n    for hit in hits:\n        print(corpus[hit['corpus_id']], \"(Score: {:.4f})\".format(hit['score']))\n\n\n```\n\n## License\n[Appache License 2.0](LICENSE)\n\n## References\n- [text2vec](https://github.com/shibing624/text2vec)\n\n\n## Buy me a coffee\n\u003cdiv align=\"center\"\u003e\n\u003ca href=\"https://www.buymeacoffee.com/ganymedenil\" target=\"_blank\"\u003e\u003cimg src=\"https://cdn.buymeacoffee.com/buttons/v2/default-yellow.png\" alt=\"Buy Me A Coffee\" style=\"height: 60px !important;width: 217px !important;\" \u003e\u003c/a\u003e\n\u003c/div\u003e\n\u003cdiv align=\"center\"\u003e\n\u003cimg height=\"360\" src=\"https://user-images.githubusercontent.com/9687786/224522468-eafb7042-d000-4799-9d16-450489e8efa4.png\"/\u003e\n\u003cimg height=\"360\" src=\"https://user-images.githubusercontent.com/9687786/224522477-46f3e80b-0733-4be9-a829-37928260038c.png\"/\u003e\n\u003c/div\u003e","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fganymedenil%2Ftext2vec-onnx","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fganymedenil%2Ftext2vec-onnx","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fganymedenil%2Ftext2vec-onnx/lists"}