{"id":13412069,"url":"https://github.com/UKPLab/sentence-transformers","last_synced_at":"2025-03-14T18:30:20.999Z","repository":{"id":37405396,"uuid":"198616978","full_name":"UKPLab/sentence-transformers","owner":"UKPLab","description":"State-of-the-Art Text Embeddings","archived":false,"fork":false,"pushed_at":"2025-03-10T11:40:27.000Z","size":21432,"stargazers_count":16179,"open_issues_count":1251,"forks_count":2563,"subscribers_count":145,"default_branch":"master","last_synced_at":"2025-03-10T22:40:03.952Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://www.sbert.net","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/UKPLab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-07-24T10:53:51.000Z","updated_at":"2025-03-10T17:45:29.000Z","dependencies_parsed_at":"2023-02-18T09:45:43.182Z","dependency_job_id":"45e8e810-4141-40ad-a577-d51eb18006ff","html_url":"https://github.com/UKPLab/sentence-transformers","commit_stats":{"total_commits":1211,"total_committers":173,"mean_commits":7.0,"dds":"0.36333608587943844","last_synced_commit":"4f38caff0fe3c1b6fdbe73ce18d250dce32685fc"},"previous_names":[],"tags_count":50,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UKPLab%2Fsentence-transformers","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UKPLab%2Fsentence-transformers/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UKPLab%2Fsentence-transformers/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UKPLab%2Fsentence-transformers/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/UKPLab","download_url":"https://codeload.github.com/UKPLab/sentence-transformers/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243624986,"owners_count":20321208,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-30T20:01:20.697Z","updated_at":"2025-03-14T18:30:20.992Z","avatar_url":"https://github.com/UKPLab.png","language":"Python","funding_links":[],"categories":["BERT Text Match:","Python","Sentence Embeddings","文本数据和NLP","文本匹配 文本检索 文本相似度","Retrieval-Augmented Generation","Sdks \u0026 Libraries","🥡 Text Representation","Industry Strength Natural Language Processing","others","Document Representations","4. **Vector Databases and Embeddings**","Tools \u0026 Evaluation","📖 Natural Language Processing (NLP)","2. Libraries \u0026 Frameworks","Libraries","Embedding Models \u0026 Libraries","1. Core Frameworks \u0026 Libraries","🧱 Infrastructure and Building Blocks","Embedding Fine-tuning"],"sub_categories":["Single-Document-Summarization (as references)","其他_文本生成、文本对话","BERT and other Transformer Language Models","Benchmarking \u0026 Comparison","Tools","Python","Books","📥 Scientific Document Parsing and Scholarly Retrieval","Frameworks"],"readme":"\u003c!--- BADGES: START ---\u003e\n[![HF Models](https://img.shields.io/badge/%F0%9F%A4%97-models-yellow)](https://huggingface.co/models?library=sentence-transformers)\n[![GitHub - License](https://img.shields.io/github/license/UKPLab/sentence-transformers?logo=github\u0026style=flat\u0026color=green)][#github-license]\n[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/sentence-transformers?logo=pypi\u0026style=flat\u0026color=blue)][#pypi-package]\n[![PyPI - Package Version](https://img.shields.io/pypi/v/sentence-transformers?logo=pypi\u0026style=flat\u0026color=orange)][#pypi-package]\n[![Docs - GitHub.io](https://img.shields.io/static/v1?logo=github\u0026style=flat\u0026color=pink\u0026label=docs\u0026message=sentence-transformers)][#docs-package]\n\u003c!-- [![PyPI - Downloads](https://img.shields.io/pypi/dm/sentence-transformers?logo=pypi\u0026style=flat\u0026color=green)][#pypi-package] --\u003e\n\n[#github-license]: https://github.com/UKPLab/sentence-transformers/blob/master/LICENSE\n[#pypi-package]: https://pypi.org/project/sentence-transformers/\n[#conda-forge-package]: https://anaconda.org/conda-forge/sentence-transformers\n[#docs-package]: https://www.sbert.net/\n\u003c!--- BADGES: END ---\u003e\n\n# Sentence Transformers: Multilingual Sentence, Paragraph, and Image Embeddings using BERT \u0026 Co.\n\nThis framework provides an easy method to compute dense vector representations for **sentences**, **paragraphs**, and **images**. The models are based on transformer networks like BERT / RoBERTa / XLM-RoBERTa etc. and achieve state-of-the-art performance in various tasks. Text is embedded in vector space such that similar text are closer and can efficiently be found using cosine similarity.\n\nWe provide an increasing number of **[state-of-the-art pretrained models](https://www.sbert.net/docs/sentence_transformer/pretrained_models.html)** for more than 100 languages, fine-tuned for various use-cases.\n\nFurther, this framework allows an easy  **[fine-tuning of custom embeddings models](https://www.sbert.net/docs/sentence_transformer/training_overview.html)**, to achieve maximal performance on your specific task.\n\nFor the **full documentation**, see **[www.SBERT.net](https://www.sbert.net)**.\n\n## Installation\n\nWe recommend **Python 3.9+**, **[PyTorch 1.11.0+](https://pytorch.org/get-started/locally/)**, and **[transformers v4.34.0+](https://github.com/huggingface/transformers)**.\n\n**Install with pip**\n\n```\npip install -U sentence-transformers\n```\n\n**Install with conda**\n\n```\nconda install -c conda-forge sentence-transformers\n```\n\n**Install from sources**\n\nAlternatively, you can also clone the latest version from the [repository](https://github.com/UKPLab/sentence-transformers) and install it directly from the source code:\n\n````\npip install -e .\n```` \n\n**PyTorch with CUDA**\n\nIf you want to use a GPU / CUDA, you must install PyTorch with the matching CUDA Version. Follow\n[PyTorch - Get Started](https://pytorch.org/get-started/locally/) for further details how to install PyTorch.\n\n## Getting Started\n\nSee [Quickstart](https://www.sbert.net/docs/quickstart.html) in our documentation.\n\nFirst download a pretrained model.\n\n````python\nfrom sentence_transformers import SentenceTransformer\n\nmodel = SentenceTransformer(\"all-MiniLM-L6-v2\")\n````\n\nThen provide some sentences to the model.\n\n````python\nsentences = [\n    \"The weather is lovely today.\",\n    \"It's so sunny outside!\",\n    \"He drove to the stadium.\",\n]\nembeddings = model.encode(sentences)\nprint(embeddings.shape)\n# =\u003e (3, 384)\n````\n\nAnd that's already it. We now have a numpy arrays with the embeddings, one for each text. We can use these to compute similarities.\n\n````python\nsimilarities = model.similarity(embeddings, embeddings)\nprint(similarities)\n# tensor([[1.0000, 0.6660, 0.1046],\n#         [0.6660, 1.0000, 0.1411],\n#         [0.1046, 0.1411, 1.0000]])\n````\n\n## Pre-Trained Models\n\nWe provide a large list of [Pretrained Models](https://www.sbert.net/docs/sentence_transformer/pretrained_models.html) for more than 100 languages. Some models are general purpose models, while others produce embeddings for specific use cases. Pre-trained models can be loaded by just passing the model name: `SentenceTransformer('model_name')`.\n\n## Training\n\nThis framework allows you to fine-tune your own sentence embedding methods, so that you get task-specific sentence embeddings. You have various options to choose from in order to get perfect sentence embeddings for your specific task. \n\nSee [Training Overview](https://www.sbert.net/docs/sentence_transformer/training_overview.html) for an introduction how to train your own embedding models. We provide [various examples](https://github.com/UKPLab/sentence-transformers/tree/master/examples/training) how to train models on various datasets.\n\nSome highlights are:\n- Support of various transformer networks including BERT, RoBERTa, XLM-R, DistilBERT, Electra, BART, ...\n- Multi-Lingual and multi-task learning\n- Evaluation during training to find optimal model\n- [20+ loss-functions](https://www.sbert.net/docs/package_reference/sentence_transformer/losses.html) allowing to tune models specifically for semantic search, paraphrase mining, semantic similarity comparison, clustering, triplet loss, contrastive loss, etc.\n\n## Application Examples\n\nYou can use this framework for:\n\n- [Computing Sentence Embeddings](https://www.sbert.net/examples/applications/computing-embeddings/README.html)\n- [Semantic Textual Similarity](https://www.sbert.net/docs/usage/semantic_textual_similarity.html)\n- [Semantic Search](https://www.sbert.net/examples/applications/semantic-search/README.html)\n- [Retrieve \u0026 Re-Rank](https://www.sbert.net/examples/applications/retrieve_rerank/README.html) \n- [Clustering](https://www.sbert.net/examples/applications/clustering/README.html)\n- [Paraphrase Mining](https://www.sbert.net/examples/applications/paraphrase-mining/README.html)\n- [Translated Sentence Mining](https://www.sbert.net/examples/applications/parallel-sentence-mining/README.html)\n- [Multilingual Image Search, Clustering \u0026 Duplicate Detection](https://www.sbert.net/examples/applications/image-search/README.html)\n\nand many more use-cases.\n\nFor all examples, see [examples/applications](https://github.com/UKPLab/sentence-transformers/tree/master/examples/applications).\n\n## Development setup\n\nAfter cloning the repo (or a fork) to your machine, in a virtual environment, run:\n\n```\npython -m pip install -e \".[dev]\"\n\npre-commit install\n```\n\nTo test your changes, run:\n\n```\npytest\n```\n\n## Citing \u0026 Authors\n\nIf you find this repository helpful, feel free to cite our publication [Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks](https://arxiv.org/abs/1908.10084):\n\n```bibtex \n@inproceedings{reimers-2019-sentence-bert,\n    title = \"Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks\",\n    author = \"Reimers, Nils and Gurevych, Iryna\",\n    booktitle = \"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing\",\n    month = \"11\",\n    year = \"2019\",\n    publisher = \"Association for Computational Linguistics\",\n    url = \"https://arxiv.org/abs/1908.10084\",\n}\n```\n\nIf you use one of the multilingual models, feel free to cite our publication [Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation](https://arxiv.org/abs/2004.09813):\n\n```bibtex\n@inproceedings{reimers-2020-multilingual-sentence-bert,\n    title = \"Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation\",\n    author = \"Reimers, Nils and Gurevych, Iryna\",\n    booktitle = \"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing\",\n    month = \"11\",\n    year = \"2020\",\n    publisher = \"Association for Computational Linguistics\",\n    url = \"https://arxiv.org/abs/2004.09813\",\n}\n```\n\nPlease have a look at [Publications](https://www.sbert.net/docs/publications.html) for our different publications that are integrated into SentenceTransformers.\n\nMaintainer: [Tom Aarsen](https://github.com/tomaarsen), 🤗 Hugging Face\n\nhttps://www.ukp.tu-darmstadt.de/\n\nDon't hesitate to open an issue if something is broken (and it shouldn't be) or if you have further questions.\n\n\u003e This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FUKPLab%2Fsentence-transformers","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FUKPLab%2Fsentence-transformers","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FUKPLab%2Fsentence-transformers/lists"}