https://github.com/rid17pawar/semantic-search-model-experiments
Experiments in the field of Semantic Search using BM-25 Algorithm, Mean of Word Vectors, along with state of the art Transformer based models namely USE and SBERT.
https://github.com/rid17pawar/semantic-search-model-experiments
bm25 fasttext fasttext-embeddings glove glove-embeddings information-retrieval sbert semantic-search universal-sentence-encoder word2vec word2vec-embeddinngs
Last synced: 7 months ago
JSON representation
Experiments in the field of Semantic Search using BM-25 Algorithm, Mean of Word Vectors, along with state of the art Transformer based models namely USE and SBERT.
- Host: GitHub
- URL: https://github.com/rid17pawar/semantic-search-model-experiments
- Owner: rid17pawar
- Created: 2023-07-02T06:03:50.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-10-29T10:18:12.000Z (almost 2 years ago)
- Last Synced: 2025-01-17T22:08:49.201Z (9 months ago)
- Topics: bm25, fasttext, fasttext-embeddings, glove, glove-embeddings, information-retrieval, sbert, semantic-search, universal-sentence-encoder, word2vec, word2vec-embeddinngs
- Language: Jupyter Notebook
- Homepage:
- Size: 298 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Semantic-Search-Model-Experiments
## Dataset Used For Semantic Search/ Information Retrieval:
[CISI Dataset - Kaggle](https://www.kaggle.com/datasets/dmaso01dsta/cisi-a-dataset-for-information-retrieval)## Experiments:
#### Experiment-1. Using BM-25 Algorithm and Parameter Tuning For Semantic Search*BM-25 Algorithm variations used:*
- BM25Okapi
- BM25L
- BM25Plus#### Result:
*BEST MODEL: BM25Plus*
#### Experiment-2. Using Mean of Word Vectors (MWV) with Pretrained Embeddings For Semantic Search
*BM-25 Algorithm variations used:*
- word2vec
- GloVe
- FastText#### Result:
*BEST MODEL: word2vec*
#### Experiment-3. Using LDA Topic Modelling For Semantic Search
#### Result:
*Performs worst than BM-25*#### Experiment-4. Using Universal Sentence Encoder (USE) For Semantic Search
*USE Model variations used:*
- Transformer Encoder
- Deep Averaging Network(DAN) Encoder#### Result:
*BEST MODEL: USE-Transformer*
#### Experiment-5. Using Pretrained and Finetuned Sentence Transformers (SBERT) For Semantic Search
#### Result:
*BEST MODEL: Finetuned SBERT*
#### Final Result:

**Overall Best Model: Finetuned SBERT**