https://github.com/smile040501/search-engine
A search engine that takes keyword queries as input and retrieves a ranked list of relevant results as output. It scraps a few thousand pages from one of the seed Wiki pages and uses Elasticsearch for a full-text search engine.
https://github.com/smile040501/search-engine
dirichlet-smoothing elasticsearch elasticsearch-client information-retrieval lavenshtein marvel marvel-wiki material-ui nodejs-server okapi-bm25 react scrapy-crawler search-engine web-scraping wikipedia-crawler
Last synced: 7 months ago
JSON representation
A search engine that takes keyword queries as input and retrieves a ranked list of relevant results as output. It scraps a few thousand pages from one of the seed Wiki pages and uses Elasticsearch for a full-text search engine.
- Host: GitHub
- URL: https://github.com/smile040501/search-engine
- Owner: Smile040501
- License: mit
- Created: 2023-01-07T18:41:21.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2023-01-13T14:36:39.000Z (over 2 years ago)
- Last Synced: 2025-01-16T10:35:27.154Z (9 months ago)
- Topics: dirichlet-smoothing, elasticsearch, elasticsearch-client, information-retrieval, lavenshtein, marvel, marvel-wiki, material-ui, nodejs-server, okapi-bm25, react, scrapy-crawler, search-engine, web-scraping, wikipedia-crawler
- Language: JavaScript
- Homepage:
- Size: 66.8 MB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Search Engine
A search engine that takes keyword queries as input and retrieves a ranked list of relevant results as output. It scraps a few thousand pages from the seed Wiki page: [List of Marvel Cinematic Universe films](https://en.wikipedia.org/wiki/List_of_Marvel_Cinematic_Universe_films) and uses [**Elasticsearch**](https://www.elastic.co/elasticsearch/) for a full-text search engine. On top of the elasticsearch framework, it has a search portal built with [React.js](https://reactjs.org/) and [Node.js](https://nodejs.org/en/) that allows to give the input query and show the retrieved results.
## Features
- Cleaning and pre-processing of the scrapped data
- Proper visualization of the ranked list of pages that hold the relevant answers
- Support for Okapi BM-25 and LM-Dirichlet scoring model
- Query keyword suggestions based on Levenshtein edit distance
- Support for both disjunctive and conjunctive keyword queries
- A configuration window for users to choose any of the scoring models and the number of results to show on the result page## License
[MIT](LICENSE)
## Author
**Mayank Singla**
- [**GitHub**][github]
- [**LinkedIn**][linkedin][github]: https://github.com/Smile040501
[linkedin]: https://www.linkedin.com/in/mayank-singla-001pt