https://github.com/rid17pawar/semantic-search-model-experiments

Experiments in the field of Semantic Search using BM-25 Algorithm, Mean of Word Vectors, along with state of the art Transformer based models namely USE and SBERT.
https://github.com/rid17pawar/semantic-search-model-experiments

bm25 fasttext fasttext-embeddings glove glove-embeddings information-retrieval sbert semantic-search universal-sentence-encoder word2vec word2vec-embeddinngs

Last synced: 7 months ago
JSON representation

Experiments in the field of Semantic Search using BM-25 Algorithm, Mean of Word Vectors, along with state of the art Transformer based models namely USE and SBERT.

Host: GitHub
URL: https://github.com/rid17pawar/semantic-search-model-experiments
Owner: rid17pawar
Created: 2023-07-02T06:03:50.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2023-10-29T10:18:12.000Z (almost 2 years ago)
Last Synced: 2025-01-17T22:08:49.201Z (9 months ago)
Topics: bm25, fasttext, fasttext-embeddings, glove, glove-embeddings, information-retrieval, sbert, semantic-search, universal-sentence-encoder, word2vec, word2vec-embeddinngs
Language: Jupyter Notebook
Homepage:
Size: 298 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Semantic-Search-Model-Experiments

## Dataset Used For Semantic Search/ Information Retrieval:
[CISI Dataset - Kaggle](https://www.kaggle.com/datasets/dmaso01dsta/cisi-a-dataset-for-information-retrieval)

## Experiments:
#### Experiment-1. Using BM-25 Algorithm and Parameter Tuning For Semantic Search

*BM-25 Algorithm variations used:*
- BM25Okapi
- BM25L
- BM25Plus

#### Result:
![image](https://github.com/rid17pawar/Semantic-Search-Model-Experiments/assets/47048717/4b061f8d-8626-4f5e-8270-1aaefecb5ad7)

*BEST MODEL: BM25Plus*

#### Experiment-2. Using Mean of Word Vectors (MWV) with Pretrained Embeddings For Semantic Search

*BM-25 Algorithm variations used:*
- word2vec
- GloVe
- FastText

#### Result:
![image](https://github.com/rid17pawar/Semantic-Search-Model-Experiments/assets/47048717/885b4171-b6d7-4176-8226-c92c4828597a)

*BEST MODEL: word2vec*

#### Experiment-3. Using LDA Topic Modelling For Semantic Search

#### Result:
*Performs worst than BM-25*

#### Experiment-4. Using Universal Sentence Encoder (USE) For Semantic Search

*USE Model variations used:*
- Transformer Encoder
- Deep Averaging Network(DAN) Encoder

#### Result:
![image](https://github.com/rid17pawar/Semantic-Search-Model-Experiments/assets/47048717/30f4a784-f121-403c-8dfe-95ad4c638a01)

*BEST MODEL: USE-Transformer*

#### Experiment-5. Using Pretrained and Finetuned Sentence Transformers (SBERT) For Semantic Search

#### Result:
![image](https://github.com/rid17pawar/Semantic-Search-Model-Experiments/assets/47048717/0d96bc97-6620-490d-ac3d-46b41021eb46)

*BEST MODEL: Finetuned SBERT*

#### Final Result:
![image](https://github.com/rid17pawar/Semantic-Search-Model-Experiments/assets/47048717/3d3094a6-0234-45ee-9ef1-b86ad68dc37e)

![image](https://github.com/rid17pawar/Semantic-Search-Model-Experiments/assets/47048717/2d00cdf3-a0d5-4cd7-8945-4df601138813)

**Overall Best Model: Finetuned SBERT**

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/rid17pawar/semantic-search-model-experiments

Awesome Lists containing this project

README