{"id":15912053,"url":"https://github.com/ulf1/simiscore-kshingle","last_synced_at":"2025-10-19T15:20:08.400Z","repository":{"id":98707280,"uuid":"355463288","full_name":"ulf1/simiscore-kshingle","owner":"ulf1","description":"An ML API to compute similarity scores between shingled sentence examples.","archived":false,"fork":false,"pushed_at":"2023-07-30T19:11:30.000Z","size":56,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-02-08T16:38:07.274Z","etag":null,"topics":["k-shingling","ml-api","similarity-score"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ulf1.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null},"funding":{"github":["ulf1","knit-bee"]}},"created_at":"2021-04-07T08:09:06.000Z","updated_at":"2023-07-30T18:59:46.000Z","dependencies_parsed_at":null,"dependency_job_id":"f2e94cbf-a703-4dff-8b39-35aead135816","html_url":"https://github.com/ulf1/simiscore-kshingle","commit_stats":null,"previous_names":["ulf1/simiscore-kshingle"],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ulf1%2Fsimiscore-kshingle","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ulf1%2Fsimiscore-kshingle/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ulf1%2Fsimiscore-kshingle/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ulf1%2Fsimiscore-kshingle/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ulf1","download_url":"https://codeload.github.com/ulf1/simiscore-kshingle/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246923927,"owners_count":20855641,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["k-shingling","ml-api","similarity-score"],"created_at":"2024-10-06T16:01:33.699Z","updated_at":"2025-10-19T15:20:03.380Z","avatar_url":"https://github.com/ulf1.png","language":"Python","funding_links":["https://github.com/sponsors/ulf1","https://github.com/sponsors/knit-bee"],"categories":[],"sub_categories":[],"readme":"[![DOI](https://zenodo.org/badge/355463288.svg)](https://zenodo.org/badge/latestdoi/355463288)\n\n\n# simiscore-kshingle\nAn ML API to compute similarity scores between sentences based on k-shingled substrings. \nThe API is programmed with the [`fastapi` Python package](https://fastapi.tiangolo.com/), \nuses the packages [`datasketch`](http://ekzhu.com/datasketch/index.html) and [`kshingle`](https://github.com/ulf1/kshingle) to compute similarity scores.\nThe deployment is configured for Docker Compose.\n\n\n## Docker Deployment\nCall Docker Compose\n\n```sh\nexport API_PORT=8082\ndocker-compose -f docker-compose.yml up --build\n\n# or as oneliner:\nAPI_PORT=8082 docker-compose -f docker-compose.yml up --build\n```\n\n(Start docker daemon before, e.g. `open /Applications/Docker.app` on MacOS).\n\nCheck\n\n```sh\ncurl http://localhost:8082\n```\n\nNotes: Only `main.py` is used in `Dockerfile`.\n\n\n\n## Local Development\n\n### Install a virtual environment\n\n```sh\npython3 -m venv .venv\nsource .venv/bin/activate\npip install --upgrade pip\npip install -r requirements.txt --no-cache-dir\npip install -r requirements-dev.txt --no-cache-dir\n```\n\n(If your git repo is stored in a folder with whitespaces, then don't use the subfolder `.venv`. Use an absolute path without whitespaces.)\n\n\n### Start Server\n\n```sh\nsource .venv/bin/activate\n# uvicorn app.main:app --reload\ngunicorn app.main:app --reload --bind=0.0.0.0:8082 \\\n    --worker-class=uvicorn.workers.UvicornH11Worker \\\n    --workers=1 --timeout=600\n```\n\n### Run some requests\n\n```sh\ncurl -X POST \"http://localhost:8082/similarities/\" \\\n    -H \"accept: application/json\" \\\n    -H \"Content-Type: application/json\" \\\n    -d '[\"Die Kuh macht muh.\", \"Die Muh macht kuh.\"]'\n```\n\n### Other commands and help\n* Check syntax: `flake8 --ignore=F401 --exclude=$(grep -v '^#' .gitignore | xargs | sed -e 's/ /,/g')`\n* Run Unit Tests: `PYTHONPATH=. pytest`\n- Show the docs: [http://localhost:8082/docs](http://localhost:8082/docs)\n- Show Redoc: [http://localhost:8082/redoc](http://localhost:8082/redoc)\n\n\n### Clean up \n```sh\nfind . -type f -name \"*.pyc\" | xargs rm\nfind . -type d -name \"__pycache__\" | xargs rm -r\nrm -r .pytest_cache\nrm -r .venv\n```\n\n\n## Appendix\n\n### Citation\n\n```\n@software{ulf_hamster_2022_7096465,\n  author       = {Ulf Hamster and\n                  Luise Köhler},\n  title        = {simiscore-kshingle},\n  month        = sep,\n  year         = 2022,\n  publisher    = {Zenodo},\n  version      = {0.1.0},\n  doi          = {10.5281/zenodo.7096465},\n  url          = {https://doi.org/10.5281/zenodo.7096465}\n}\n```\n\n### References\n- Sebastián Ramírez, 2018, FastAPI, [https://github.com/tiangolo/fastapi](https://github.com/tiangolo/fastapi)\n- Eric Zhu, Vadim Markovtsev, aastafiev, Wojciech Łukasiewicz, ae-foster, Sinusoidal36, Ekevoo, Kevin Mann, Keyur Joshi, Peter Kubov, Qin TianHuan, Spandan Thakur, Stefano Ortolani, Titusz, Vojtech Letal, Zac Bentley, fpug, \u0026 oisincar. (2021). ekzhu/datasketch: v1.5.4 (v1.5.4). Zenodo. [https://doi.org/10.5281/zenodo.5758425](https://doi.org/10.5281/zenodo.5758425)\n- Ulf Hamster. (2022). kshingle: Shingling text data (0.10.0). Zenodo. [https://doi.org/10.5281/zenodo.7096407](https://doi.org/10.5281/zenodo.7096407)\n\n### Support\nPlease [open an issue](https://github.com/satzbeleg/simiscore-kshingle/issues) for support.\n\n\n### Contributing\nPlease contribute using [Github Flow](https://guides.github.com/introduction/flow/). Create a branch, add commits, and [open a pull request](https://github.com/satzbeleg/simiscore-kshingle/compare/).\n\n### Acknowledgements\nThe \"Evidence\" project was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) - [433249742](https://gepris.dfg.de/gepris/projekt/433249742) (GU 798/27-1; GE 1119/11-1).\n\n### Maintenance\n- till 31.Aug.2023 (v0.1.0) the code repository was maintained within the DFG project [433249742](https://gepris.dfg.de/gepris/projekt/433249742)\n- since 01.Sep.2023 (v0.2.0) the code repository is maintained by Ulf Hamster.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fulf1%2Fsimiscore-kshingle","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fulf1%2Fsimiscore-kshingle","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fulf1%2Fsimiscore-kshingle/lists"}