{"id":26241322,"url":"https://github.com/tomaston1996/doc-similarity-api","last_synced_at":"2026-04-07T07:47:48.010Z","repository":{"id":268173966,"uuid":"903539149","full_name":"TomAston1996/doc-similarity-api","owner":"TomAston1996","description":"📄 Document similarity matcher using NLP | FastAPI | SpaCy","archived":false,"fork":false,"pushed_at":"2025-02-13T14:04:12.000Z","size":69,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-09-02T00:40:01.933Z","etag":null,"topics":["docker","fastapi","nlp","postgresql","redis"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/TomAston1996.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-12-14T21:25:51.000Z","updated_at":"2025-08-27T16:32:19.000Z","dependencies_parsed_at":null,"dependency_job_id":"70f7952b-624a-41e8-afb7-7a37d48982c3","html_url":"https://github.com/TomAston1996/doc-similarity-api","commit_stats":null,"previous_names":["tomaston1996/doc-similarity-api"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/TomAston1996/doc-similarity-api","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TomAston1996%2Fdoc-similarity-api","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TomAston1996%2Fdoc-similarity-api/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TomAston1996%2Fdoc-similarity-api/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TomAston1996%2Fdoc-similarity-api/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/TomAston1996","download_url":"https://codeload.github.com/TomAston1996/doc-similarity-api/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TomAston1996%2Fdoc-similarity-api/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31504897,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-07T03:10:19.677Z","status":"ssl_error","status_checked_at":"2026-04-07T03:10:13.982Z","response_time":105,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["docker","fastapi","nlp","postgresql","redis"],"created_at":"2025-03-13T08:20:13.697Z","updated_at":"2026-04-07T07:47:47.987Z","avatar_url":"https://github.com/TomAston1996.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Contributors][contributors-shield]][contributors-url]\n[![Forks][forks-shield]][forks-url]\n[![Stargazers][stars-shield]][stars-url]\n[![Issues][issues-shield]][issues-url]\n[![MIT License][license-shield]][license-url]\n[![LinkedIn][linkedin-shield]][linkedin-url]\n\n# 📄 Document Similarity API\n\nThe goal of Document Similarity API is use Natural Language Processing (NLP) to find similar documents based on the cosine similary score of the document title and it's textual contents.\nThe problem this is trying to solve is replication in the work place. Often, work is replicated and the context of historical work could aid in delivering new work more quickly.\n\nThe plan is to use the SpaCy library to preprocess and calculate vector values for all documents uploaded to a Documents table. \nWhen a user searches for a similarity match for any new documents the API should return any similarity matches.\n\n## 🧑‍💻 Tech Stack\n\n![Python]\n![FastAPI]\n![Postgres]\n![Docker]\n\n## 🔧 Setup\n\n### 📋 Dependencies\nRun the command ```pip install -r requirements.txt``` to install dependencies.\n\n### 🐋 Docker\nDocker Engine is required to run the PostreSQL database.\nDownload docker desktop [here](https://www.docker.com/products/docker-desktop/).\n\nRun ```docker-compose --env-file .env up --build``` from your root directory to build and run your docker image from the Dockerfile\n\n### ⚙️ Environment\nSet up your environment variables in a ```.env``` file which should look similar to the below:\n```\nPOSTGRES_PASSWORD=\u003cdb_password\u003e\nPOSTGRES_DB=\u003cdb_name\u003e\nPOSTGRES_USER=\u003cdb_user\u003e\nPOSTGRES_HOST_PORT=\u003cdb_host_port\u003e\nPOSTGRES_HOST_NAME=\u003cdb_host_name\u003e\n```\n\n## 🧑‍🤝‍🧑 Developers \n\n| Name           | Email                      |\n| -------------- | -------------------------- |\n| Tom Aston      | mailto:mail@tomaston.dev     |\n\n\u003c!-- MARKDOWN LINKS \u0026 IMAGES --\u003e\n\u003c!-- https://www.markdownguide.org/basic-syntax/#reference-style-links --\u003e\n[contributors-shield]: https://img.shields.io/github/contributors/TomAston1996/doc-similarity-api.svg?style=for-the-badge\n[contributors-url]: https://github.com/TomAston1996/doc-similarity-api/graphs/contributors\n[forks-shield]: https://img.shields.io/github/forks/TomAston1996/doc-similarity-api.svg?style=for-the-badge\n[forks-url]: https://github.com/TomAston1996/doc-similarity-api/network/members\n[stars-shield]: https://img.shields.io/github/stars/TomAston1996/doc-similarity-api.svg?style=for-the-badge\n[stars-url]: https://github.com/TomAston1996/doc-similarity-api/stargazers\n[issues-shield]: https://img.shields.io/github/issues/TomAston1996/doc-similarity-api.svg?style=for-the-badge\n[issues-url]: https://github.com/TomAston1996/doc-similarity-api/issues\n[license-shield]: https://img.shields.io/github/license/TomAston1996/doc-similarity-api.svg?style=for-the-badge\n[license-url]: https://github.com/TomAston1996/doc-similarity-api/blob/master/LICENSE.txt\n[linkedin-shield]: https://img.shields.io/badge/-LinkedIn-black.svg?style=for-the-badge\u0026logo=linkedin\u0026colorB=555\n[linkedin-url]: https://linkedin.com/in/tomaston96\n[Python]: https://img.shields.io/badge/python-3670A0?style=for-the-badge\u0026logo=python\u0026logoColor=ffdd54\n[FastAPI]: https://img.shields.io/badge/FastAPI-005571?style=for-the-badge\u0026logo=fastapi\n[Postgres]: https://img.shields.io/badge/postgres-%23316192.svg?style=for-the-badge\u0026logo=postgresql\u0026logoColor=white\n[Docker]: https://img.shields.io/badge/docker-%230db7ed.svg?style=for-the-badge\u0026logo=docker\u0026logoColor=white\n[Redis]: https://img.shields.io/badge/redis-%23DD0031.svg?style=for-the-badge\u0026logo=redis\u0026logoColor=white\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftomaston1996%2Fdoc-similarity-api","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftomaston1996%2Fdoc-similarity-api","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftomaston1996%2Fdoc-similarity-api/lists"}