{"id":22751717,"url":"https://github.com/snehawk20/elasticsearch-search-engine","last_synced_at":"2026-05-09T17:35:17.879Z","repository":{"id":109603741,"uuid":"555677113","full_name":"snehawk20/elasticsearch-search-engine","owner":"snehawk20","description":"A search engine which takes keywords as queries and retrieves a ranked list of results","archived":false,"fork":false,"pushed_at":"2023-02-18T22:47:04.000Z","size":135536,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-05T08:51:25.148Z","etag":null,"topics":["beautifulsoup","elasticsearch","flask","okapi-bm25"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/snehawk20.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-10-22T04:08:55.000Z","updated_at":"2023-02-18T23:31:04.000Z","dependencies_parsed_at":"2023-03-21T05:06:02.432Z","dependency_job_id":null,"html_url":"https://github.com/snehawk20/elasticsearch-search-engine","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/snehawk20%2Felasticsearch-search-engine","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/snehawk20%2Felasticsearch-search-engine/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/snehawk20%2Felasticsearch-search-engine/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/snehawk20%2Felasticsearch-search-engine/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/snehawk20","download_url":"https://codeload.github.com/snehawk20/elasticsearch-search-engine/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246285666,"owners_count":20752953,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["beautifulsoup","elasticsearch","flask","okapi-bm25"],"created_at":"2024-12-11T05:06:53.662Z","updated_at":"2026-05-09T17:35:12.841Z","avatar_url":"https://github.com/snehawk20.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Elasticsearch based Search Engine\n\n## Salient features\n\n- Scraping\n    - Scraped ~7000 documents using ``https://en.wikipedia.org/wiki/Science_fiction_film`` as a seed using `BeautifulSoup`\n    - Customizable depth\n    - Duplicate detection\n    - Saved in `.json` format with `paragraphs`, `table of contents` , `url` and `title` as fields\n\n- Tokenization\n    - Standard tokenizer\n    - Token filters: `stop`, `lowercase`, `snowball stemmer`\n\n- Support for `BM25` and `Jelinek-Mercer` Language Model\n\n- Retrieval of top `k` relevant documents in order\n\n- Support for `conjunctive` and `disjunctive` queries\n\n- User interface with the following features\n    - `Dropdown keyword suggestions` based on Levenstein distance using Fuzzy search\n    - `Snippets` that displays the most relevant fragments built using `unified highlighter`\n    - Interface to change between the models and modes as per user's requirements\n    - Displaying results as clickable links for better access\n    \n## To run\n``` python3 run.py```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsnehawk20%2Felasticsearch-search-engine","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsnehawk20%2Felasticsearch-search-engine","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsnehawk20%2Felasticsearch-search-engine/lists"}