An open API service indexing awesome lists of open source software.

https://github.com/snehawk20/elasticsearch-search-engine

A search engine which takes keywords as queries and retrieves a ranked list of results
https://github.com/snehawk20/elasticsearch-search-engine

beautifulsoup elasticsearch flask okapi-bm25

Last synced: about 2 months ago
JSON representation

A search engine which takes keywords as queries and retrieves a ranked list of results

Awesome Lists containing this project

README

          

# Elasticsearch based Search Engine

## Salient features

- Scraping
- Scraped ~7000 documents using ``https://en.wikipedia.org/wiki/Science_fiction_film`` as a seed using `BeautifulSoup`
- Customizable depth
- Duplicate detection
- Saved in `.json` format with `paragraphs`, `table of contents` , `url` and `title` as fields

- Tokenization
- Standard tokenizer
- Token filters: `stop`, `lowercase`, `snowball stemmer`

- Support for `BM25` and `Jelinek-Mercer` Language Model

- Retrieval of top `k` relevant documents in order

- Support for `conjunctive` and `disjunctive` queries

- User interface with the following features
- `Dropdown keyword suggestions` based on Levenstein distance using Fuzzy search
- `Snippets` that displays the most relevant fragments built using `unified highlighter`
- Interface to change between the models and modes as per user's requirements
- Displaying results as clickable links for better access

## To run
``` python3 run.py```