Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ulf1/simiscore-semantic
An ML API to compute semantic similarity scores between sentence examples.
https://github.com/ulf1/simiscore-semantic
ml-api sbert sentence-bert similarity-score
Last synced: 6 days ago
JSON representation
An ML API to compute semantic similarity scores between sentence examples.
- Host: GitHub
- URL: https://github.com/ulf1/simiscore-semantic
- Owner: ulf1
- License: apache-2.0
- Created: 2021-04-07T08:00:26.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2023-07-30T19:12:15.000Z (over 1 year ago)
- Last Synced: 2024-10-28T15:25:42.890Z (about 2 months ago)
- Topics: ml-api, sbert, sentence-bert, similarity-score
- Language: Python
- Homepage:
- Size: 70.3 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README
[![DOI](https://zenodo.org/badge/355460576.svg)](https://zenodo.org/badge/latestdoi/355460576)
# simiscore-semantic
An ML API to compute semantic similarity scores between sentence examples.
The API is programmed with the [`fastapi` Python package](https://fastapi.tiangolo.com/),
and the semantic similarities are computed based on [SBert (`sentence-transformers` package)](https://github.com/UKPLab/sentence-transformers).
The deployment is configured for Docker Compose.## Docker Deployment
Call Docker Compose```sh
export API_PORT=8083
docker-compose -f docker-compose.yml up --build# or as oneliner:
API_PORT=8083 docker-compose -f docker-compose.yml up --build
```(Start docker daemon before, e.g. `open /Applications/Docker.app` on MacOS).
Check
```sh
curl http://localhost:8083
```Notes: Only `main.py` is used in `Dockerfile`.
## Local Development
### Install a virtual environment
```sh
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt --no-cache-dir
pip install -r requirements-dev.txt --no-cache-dir
```(If your git repo is stored in a folder with whitespaces, then don't use the subfolder `.venv`. Use an absolute path without whitespaces.)
### Specify where to store the SBert model
SBert allows to set the `cache_folder` via the environment variable `SENTENCE_TRANSFORMERS_HOME` (See [here](https://github.com/UKPLab/sentence-transformers/blob/bd19871d99068f4824ff6ef213d91596885889f7/sentence_transformers/SentenceTransformer.py#L48)).```sh
mkdir ./sbert-models
export SENTENCE_TRANSFORMERS_HOME="$(pwd)/sbert-models"
```### Start Server
```sh
source .venv/bin/activate
# uvicorn app.main:app --reload
gunicorn app.main:app --reload --bind=0.0.0.0:8083 \
--worker-class=uvicorn.workers.UvicornH11Worker \
--workers=1 --timeout=600
```### Usage Examples
a) Send a list of strings.
```sh
curl -X 'POST' \
'http://localhost:8083/similarities/' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '[
"Der Film ist super.",
"Der Spielfilm ist gut.",
"Der Film ist Müll.",
"Der Spielfilm ist schlecht."
]'
```b) Send an JSON object with UUID4 as keys and text as values.
```sh
curl -X 'POST' \
'http://localhost:8083/similarities/' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"80ba0456-1d26-4d22-8e80-113b919502ee": "Der Film ist super.",
"a47fe293-26e7-40f0-b0b5-202e0955458f": "Der Spielfilm ist gut.",
"86e356a3-5b42-4e03-91fc-cf69098b6dd2": "Der Film ist Müll.",
"779d0245-8f54-49ec-9f0f-8e29dc987b41": "Der Spielfilm ist schlecht."
}'
```### Other commands and help
* Check syntax: `flake8 --ignore=F401 --exclude=$(grep -v '^#' .gitignore | xargs | sed -e 's/ /,/g')`
* Run Unit Tests: `PYTHONPATH=. pytest`
- Show the docs: [http://localhost:8083/docs](http://localhost:8083/docs)
- Show Redoc: [http://localhost:8083/redoc](http://localhost:8083/redoc)### Clean up
```sh
find . -type f -name "*.pyc" | xargs rm
find . -type d -name "__pycache__" | xargs rm -r
rm -r .pytest_cache
rm -r .venv
```## Appendix
### Citation
```
@software{ulf_hamster_2022_7096002,
author = {Ulf Hamster and
Luise Köhler},
title = {simiscore-semantic: ML API for semantic similarities},
month = sep,
year = 2022,
publisher = {Zenodo},
version = {0.1.0},
doi = {10.5281/zenodo.7096002},
url = {https://doi.org/10.5281/zenodo.7096002}
}
```### References
- Sebastián Ramírez, 2018, FastAPI, [https://github.com/tiangolo/fastapi](https://github.com/tiangolo/fastapi)
- Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992, Hong Kong, China. Association for Computational Linguistics. [http://dx.doi.org/10.18653/v1/D19-1410](http://dx.doi.org/10.18653/v1/D19-1410)### Support
Please [open an issue](https://github.com/satzbeleg/simiscore-semantic/issues) for support.### Contributing
Please contribute using [Github Flow](https://guides.github.com/introduction/flow/). Create a branch, add commits, and [open a pull request](https://github.com/satzbeleg/simiscore-semantic/compare/).### Acknowledgements
The "Evidence" project was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) - [433249742](https://gepris.dfg.de/gepris/projekt/433249742) (GU 798/27-1; GE 1119/11-1).### Maintenance
- till 31.Aug.2023 (v0.1.0) the code repository was maintained within the DFG project [433249742](https://gepris.dfg.de/gepris/projekt/433249742)
- since 01.Sep.2023 (v0.2.0) the code repository is maintained by Ulf Hamster.