Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/pakio/EsBM25SemanticHybridComparison
(Demo) Elasticsearch with ML node and ingest pipeline for hybrid search (Lexical + Semantic)
https://github.com/pakio/EsBM25SemanticHybridComparison
Last synced: about 8 hours ago
JSON representation
(Demo) Elasticsearch with ML node and ingest pipeline for hybrid search (Lexical + Semantic)
- Host: GitHub
- URL: https://github.com/pakio/EsBM25SemanticHybridComparison
- Owner: pakio
- License: other
- Created: 2022-12-03T07:39:36.000Z (almost 2 years ago)
- Default Branch: master
- Last Pushed: 2022-12-04T03:05:21.000Z (almost 2 years ago)
- Last Synced: 2024-08-02T13:17:37.109Z (3 months ago)
- Language: Python
- Homepage:
- Size: 1.58 MB
- Stars: 5
- Watchers: 2
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Elasticsearch BM25 vs KNN vs Hybrid comparison system
This is the ready-to-use demo repository to test Elasticsearch semantic search with embedded transformer using ingest pipeline.
![Demo](https://github.com/pakio/EsBM25SemanticHybridComparison/blob/master/demo.gif?raw=true)
- Data
- wikimedia enwiki 20221201 dump [url](https://dumps.wikimedia.org/enwiki/20221201/)
- Model
- sentence-transformers/msmarco-MiniLM-L-12-v3 [link(Hugging Face)](https://huggingface.co/sentence-transformers/msmarco-MiniLM-L-12-v3)# prerequisites
This repository uses the softwares/tools/frameworkds below.
- docker
- docker-compose
- python (>3.10)# How to run
## 1. Launch Elasticsearch and upload model
Run `./Es/setup.sh` to launch Elasticsearch, upload model, and configure the ingest pipeline.## 2. Ingest data
Run `./indexer/setup.sh` to download, and index the data.
If you observe 429 error, reduce the batch size and please retry.## 3. Launch comparison tool
There is a GUI comparison tool under `./eval` directory.
Go `./eval` directory and run `streamlit run main.py` to launch the comparison tool.# note
This repository enables Elasticsearch Trial License inorder to use ML node to run embedding transformer model in ingest pipeline.