https://github.com/ganymedenil/text2vec-onnx
text2vec onnxruntime
https://github.com/ganymedenil/text2vec-onnx
bert bert-embeddings embedding onnx similarity similarity-matrix similarity-score similarity-search text2vec
Last synced: 6 months ago
JSON representation
text2vec onnxruntime
- Host: GitHub
- URL: https://github.com/ganymedenil/text2vec-onnx
- Owner: GanymedeNil
- License: apache-2.0
- Created: 2024-06-18T05:25:01.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-06-24T14:38:57.000Z (over 1 year ago)
- Last Synced: 2025-04-11T13:58:41.350Z (6 months ago)
- Topics: bert, bert-embeddings, embedding, onnx, similarity, similarity-matrix, similarity-score, similarity-search, text2vec
- Language: Python
- Homepage:
- Size: 15.6 KB
- Stars: 6
- Watchers: 1
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# text2vec-onnx
本项目是 [text2vec](https://github.com/shibing624/text2vec) 项目的 onnxruntime 推理版本,实现了向量获取和文本匹配搜索。为了保证项目的轻量,只使用了 `onnxruntime` 、 `tokenizers` 和 `numpy` 三个库。
主要在 [GanymedeNil/text2vec-base-chinese-onnx](https://huggingface.co/GanymedeNil/text2vec-base-chinese-onnx) 模型上进行测试,理论上支持 BERT 系列模型。
## 安装
### CPU 版本
```bash
pip install text2vec2onnx[cpu]
```
### GPU 版本
```bash
pip install text2vec2onnx[gpu]
```## 使用
### 模型下载
以下载 GanymedeNil/text2vec-base-chinese-onnx 为例,下载模型到本地。- huggingface 模型下载
```bash
huggingface-cli download --resume-download GanymedeNil/text2vec-base-chinese-onnx --local-dir text2vec-base-chinese-onnx
```### 向量获取
```python
from text2vec2onnx import SentenceModel
embedder = SentenceModel(model_dir_path='local-dir')
emb = embedder.encode("你好")
```### 文本匹配搜索
```python
from text2vec2onnx import SentenceModel, semantic_searchembedder = SentenceModel(model_dir_path='local-dir')
corpus = [
"谢谢观看 下集再见",
"感谢您的观看",
"请勿模仿",
"记得订阅我们的频道哦",
"The following are sentences in English.",
"Thank you. Bye-bye.",
"It's true",
"I don't know.",
"Thank you for watching!",
]
corpus_embeddings = embedder.encode(corpus)queries = [
'Thank you. Bye.',
'你干啥呢',
'感谢您的收听']for query in queries:
query_embedding = embedder.encode(query)
hits = semantic_search(query_embedding, corpus_embeddings, top_k=1)
print("\n\n======================\n\n")
print("Query:", query)
print("\nTop 5 most similar sentences in corpus:")
hits = hits[0] # Get the hits for the first query
for hit in hits:
print(corpus[hit['corpus_id']], "(Score: {:.4f})".format(hit['score']))```
## License
[Appache License 2.0](LICENSE)## References
- [text2vec](https://github.com/shibing624/text2vec)## Buy me a coffee
![]()
![]()