{"id":29555779,"url":"https://github.com/catoverflow/websearch","last_synced_at":"2025-10-06T21:51:58.754Z","repository":{"id":46301725,"uuid":"415460623","full_name":"Catoverflow/WebSearch","owner":"Catoverflow","description":"Simple search engine for class. Support bool search and semantic search.","archived":false,"fork":false,"pushed_at":"2021-12-27T01:55:28.000Z","size":1912,"stargazers_count":4,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-07-18T11:39:22.336Z","etag":null,"topics":["search","search-engine"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Catoverflow.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-10-10T01:38:10.000Z","updated_at":"2021-12-27T01:55:31.000Z","dependencies_parsed_at":"2022-09-19T09:12:15.719Z","dependency_job_id":null,"html_url":"https://github.com/Catoverflow/WebSearch","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Catoverflow/WebSearch","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Catoverflow%2FWebSearch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Catoverflow%2FWebSearch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Catoverflow%2FWebSearch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Catoverflow%2FWebSearch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Catoverflow","download_url":"https://codeload.github.com/Catoverflow/WebSearch/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Catoverflow%2FWebSearch/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278686638,"owners_count":26028325,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-06T02:00:05.630Z","response_time":65,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["search","search-engine"],"created_at":"2025-07-18T09:30:31.028Z","updated_at":"2025-10-06T21:51:58.728Z","avatar_url":"https://github.com/Catoverflow.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# WebSearch\n\nChinese Report for this experiment is [report.md](./report.md). Original news data crawled is ~1.9G, and will not be provided here. While you can download the [processed data](https://drive.google.com/file/d/1Snm6uNhU4FYfE4flS0hBDkuIjeR7Q4jO/view?usp=sharing).\n\nTo use this search engine, you need to extract `output.zip` to `./output` folder, and just run `bool_search.py` or `semantic_search.py`, `process.py` is used for process raw data and generated output only.\n\n\u003e Note: The code quality is broken including but not limited to ill-formed class methods, mixed OPP and OOP codes, etc. Please issue pull requests if you want to make it better, the authors are just too busy or lazy to fix these.\n\n## Dependencies\n\n- Install pip dependencies\n\n    `pip install -r requirements.txt`\n\n- Download NLTK data and set `NLTK_DATA` to download path.\n\n    [wordnet](https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/corpora/wordnet.zip) -\u003e `NLTK_DATA/corpora/wordnet`\n\n    [stopwords](https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/corpora/stopwords.zip) -\u003e `NLTK_DATA/corpora/stopwords`\n\n- Download pre-processed data(stated above)\n\nYou can adjust some parameters in source code to get better searching experience.\n\n## Credits\n\nThanks to [SuzanaK](https://github.com/SuzanaK) for synonyms list (licensed under [BY-SA 3.0](http://creativecommons.org/licenses/by-sa/3.0/)), and all open source tools used in this project.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcatoverflow%2Fwebsearch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcatoverflow%2Fwebsearch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcatoverflow%2Fwebsearch/lists"}