https://github.com/ulf1/simiscore-kshingle
An ML API to compute similarity scores between shingled sentence examples.
https://github.com/ulf1/simiscore-kshingle
k-shingling ml-api similarity-score
Last synced: 8 months ago
JSON representation
An ML API to compute similarity scores between shingled sentence examples.
- Host: GitHub
- URL: https://github.com/ulf1/simiscore-kshingle
- Owner: ulf1
- License: apache-2.0
- Created: 2021-04-07T08:09:06.000Z (about 5 years ago)
- Default Branch: main
- Last Pushed: 2023-07-30T19:11:30.000Z (almost 3 years ago)
- Last Synced: 2025-02-08T16:38:07.274Z (over 1 year ago)
- Topics: k-shingling, ml-api, similarity-score
- Language: Python
- Homepage:
- Size: 54.7 KB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README
[](https://zenodo.org/badge/latestdoi/355463288)
# simiscore-kshingle
An ML API to compute similarity scores between sentences based on k-shingled substrings.
The API is programmed with the [`fastapi` Python package](https://fastapi.tiangolo.com/),
uses the packages [`datasketch`](http://ekzhu.com/datasketch/index.html) and [`kshingle`](https://github.com/ulf1/kshingle) to compute similarity scores.
The deployment is configured for Docker Compose.
## Docker Deployment
Call Docker Compose
```sh
export API_PORT=8082
docker-compose -f docker-compose.yml up --build
# or as oneliner:
API_PORT=8082 docker-compose -f docker-compose.yml up --build
```
(Start docker daemon before, e.g. `open /Applications/Docker.app` on MacOS).
Check
```sh
curl http://localhost:8082
```
Notes: Only `main.py` is used in `Dockerfile`.
## Local Development
### Install a virtual environment
```sh
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt --no-cache-dir
pip install -r requirements-dev.txt --no-cache-dir
```
(If your git repo is stored in a folder with whitespaces, then don't use the subfolder `.venv`. Use an absolute path without whitespaces.)
### Start Server
```sh
source .venv/bin/activate
# uvicorn app.main:app --reload
gunicorn app.main:app --reload --bind=0.0.0.0:8082 \
--worker-class=uvicorn.workers.UvicornH11Worker \
--workers=1 --timeout=600
```
### Run some requests
```sh
curl -X POST "http://localhost:8082/similarities/" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-d '["Die Kuh macht muh.", "Die Muh macht kuh."]'
```
### Other commands and help
* Check syntax: `flake8 --ignore=F401 --exclude=$(grep -v '^#' .gitignore | xargs | sed -e 's/ /,/g')`
* Run Unit Tests: `PYTHONPATH=. pytest`
- Show the docs: [http://localhost:8082/docs](http://localhost:8082/docs)
- Show Redoc: [http://localhost:8082/redoc](http://localhost:8082/redoc)
### Clean up
```sh
find . -type f -name "*.pyc" | xargs rm
find . -type d -name "__pycache__" | xargs rm -r
rm -r .pytest_cache
rm -r .venv
```
## Appendix
### Citation
```
@software{ulf_hamster_2022_7096465,
author = {Ulf Hamster and
Luise Köhler},
title = {simiscore-kshingle},
month = sep,
year = 2022,
publisher = {Zenodo},
version = {0.1.0},
doi = {10.5281/zenodo.7096465},
url = {https://doi.org/10.5281/zenodo.7096465}
}
```
### References
- Sebastián Ramírez, 2018, FastAPI, [https://github.com/tiangolo/fastapi](https://github.com/tiangolo/fastapi)
- Eric Zhu, Vadim Markovtsev, aastafiev, Wojciech Łukasiewicz, ae-foster, Sinusoidal36, Ekevoo, Kevin Mann, Keyur Joshi, Peter Kubov, Qin TianHuan, Spandan Thakur, Stefano Ortolani, Titusz, Vojtech Letal, Zac Bentley, fpug, & oisincar. (2021). ekzhu/datasketch: v1.5.4 (v1.5.4). Zenodo. [https://doi.org/10.5281/zenodo.5758425](https://doi.org/10.5281/zenodo.5758425)
- Ulf Hamster. (2022). kshingle: Shingling text data (0.10.0). Zenodo. [https://doi.org/10.5281/zenodo.7096407](https://doi.org/10.5281/zenodo.7096407)
### Support
Please [open an issue](https://github.com/satzbeleg/simiscore-kshingle/issues) for support.
### Contributing
Please contribute using [Github Flow](https://guides.github.com/introduction/flow/). Create a branch, add commits, and [open a pull request](https://github.com/satzbeleg/simiscore-kshingle/compare/).
### Acknowledgements
The "Evidence" project was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) - [433249742](https://gepris.dfg.de/gepris/projekt/433249742) (GU 798/27-1; GE 1119/11-1).
### Maintenance
- till 31.Aug.2023 (v0.1.0) the code repository was maintained within the DFG project [433249742](https://gepris.dfg.de/gepris/projekt/433249742)
- since 01.Sep.2023 (v0.2.0) the code repository is maintained by Ulf Hamster.