{"id":16548662,"url":"https://github.com/george-gca/ai_papers_search_tool","last_synced_at":"2026-05-02T01:33:50.662Z","repository":{"id":150309488,"uuid":"553331237","full_name":"george-gca/ai_papers_search_tool","owner":"george-gca","description":"Automatic paper clustering and search tool by fastext from Facebook Research","archived":false,"fork":false,"pushed_at":"2024-11-25T19:25:20.000Z","size":81,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-01-14T16:05:04.269Z","etag":null,"topics":["fasttext","fasttext-embeddings","fasttext-python","nlp","python","scikit-learn"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/george-gca.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-10-18T03:51:46.000Z","updated_at":"2024-11-25T19:25:24.000Z","dependencies_parsed_at":"2024-10-28T21:33:00.343Z","dependency_job_id":null,"html_url":"https://github.com/george-gca/ai_papers_search_tool","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/george-gca%2Fai_papers_search_tool","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/george-gca%2Fai_papers_search_tool/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/george-gca%2Fai_papers_search_tool/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/george-gca%2Fai_papers_search_tool/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/george-gca","download_url":"https://codeload.github.com/george-gca/ai_papers_search_tool/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241852158,"owners_count":20030969,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["fasttext","fasttext-embeddings","fasttext-python","nlp","python","scikit-learn"],"created_at":"2024-10-11T19:26:38.907Z","updated_at":"2026-05-02T01:33:50.624Z","avatar_url":"https://github.com/george-gca.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# AI Papers Search Tool\n\nAutomatic paper clustering and search tool by [fastText from Facebook Research](https://fasttext.cc/).\n\nBased on [CVPR_paper_search_tool by Jin Yamanaka](https://github.com/jiny2001/CVPR_paper_search_tool). I decided to split the code into multiple projects:\n\n- [AI Papers Scrapper](https://github.com/george-gca/ai_papers_scrapper) - Download papers pdfs and other information from main AI conferences\n- [AI Papers Cleaner](https://github.com/george-gca/ai_papers_cleaner) - Extract text from papers PDFs and abstracts, and remove uninformative words\n- this project - Automatic paper clustering\n- [AI Papers Searcher](https://github.com/george-gca/ai_papers_searcher) - Web app to search papers by keywords or similar papers\n- [AI Conferences Info](https://github.com/george-gca/ai_conferences_info) - Contains the titles, abstracts, urls, and authors names extracted from the papers\n\nI also added support for more conferences in a single web app, customized it a little further, and hosted it on [PythonAnywhere](https://www.pythonanywhere.com/). You can see a running example of the web app [here](https://georgegca.pythonanywhere.com/).\n\n## Requirements\n\n[Docker](https://www.docker.com/) or, for local installation:\n\n- Python 3.10+\n- [Poetry](https://python-poetry.org/docs/)\n\n\u003e Note: Poetry installation currently not working due to [a bug when installing fasttext](https://github.com/facebookresearch/fastText/pull/1292).\n\n## Usage\n\nTo make it easier to run the code, with or without Docker, I created a few helpers. Both ways use `start_here.sh` as an entry point. Since there are a few quirks when calling the specific code, I created this file with all the necessary commands to run the code. All you need to do is to uncomment the relevant lines and run the script:\n\n```bash\ntrain_paper_finder=1\ncreate_for_app=1\n# skip_train_paper_finder=1\n```\n\n### Running without Docker\n\nYou first need to install [Python Poetry](https://python-poetry.org/docs/). Then, you can install the dependencies and run the code:\n\n```bash\npoetry install\nbash start_here.sh\n```\n\n### Running with Docker\n\nTo help with the Docker setup, I created a `Dockerfile` and a `Makefile`. The `Dockerfile` contains all the instructions to create the Docker image. The `Makefile` contains the commands to build the image, run the container, and run the code inside the container. To build the image, simply run:\n\n```bash\nmake\n```\n\nTo call `start_here.sh` inside the container, run:\n\n```bash\nmake run\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgeorge-gca%2Fai_papers_search_tool","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgeorge-gca%2Fai_papers_search_tool","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgeorge-gca%2Fai_papers_search_tool/lists"}