https://github.com/faisalahmedbijoy/document-similarity-using-doc2vec-and-gensim

Document Similarity Measurement Using Doc2Vec and Gensim Library
https://github.com/faisalahmedbijoy/document-similarity-using-doc2vec-and-gensim

doc2vec-model doc2vec-word2vec docuemntation-tool gensim gensim-library ner nlp nsl

Last synced: 12 days ago
JSON representation

Document Similarity Measurement Using Doc2Vec and Gensim Library

Host: GitHub
URL: https://github.com/faisalahmedbijoy/document-similarity-using-doc2vec-and-gensim
Owner: FaisalAhmedBijoy
Created: 2022-07-07T18:36:16.000Z (almost 3 years ago)
Default Branch: main
Last Pushed: 2023-09-24T19:11:55.000Z (almost 2 years ago)
Last Synced: 2025-04-11T10:45:11.673Z (3 months ago)
Topics: doc2vec-model, doc2vec-word2vec, docuemntation-tool, gensim, gensim-library, ner, nlp, nsl
Language: Python
Homepage:
Size: 119 MB
Stars: 5
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Document-similarity-using-doc2vec-and-gensim
Python implementation of a document similarity checking using Doc2Vec.
## File structure
```bash
Document-similarity-using-doc2vec-and-gensim/
├── data/
│ ├── 20news-bydate.tar.gz
│ ├── 20news-bydate-test
│ └── 20news-bydate-train
├── models/
│ ├── doc2vec_model.bin
│ ├── doc2vec_model.model
│ ├── doc2vec_vector.txt
│ └── doc2vec_model.bin.dv.vectors.npy
├── dataset_preprocess.py
├── inference.py
├── README.md
├── requirements.txt
└── train.py
```
- **data/train_data.txt**: Training data file
- **models/doc2vec_model.bin**: Trained Doc2Vec model file
- **models/doc2vec_model.bin.dv.vectors.npy**: Document vectors file for the trained model
- **README.md**: Project documentation file
- **requirements.txt**: Required Python packages
- **inference.py**: Script to check similarity between two documents
- **train.py**: Script to train the Doc2Vec model

## Installation
Install the dependencies using pip:
```bash
gensim==4.2.0
nltk==3.5
numpy==1.23.1
numpy==1.23.2
pandas==1.2.0
scikit_learn==0.23.2
```
Install the required packages:

```bash
pip install -r requirements.txt
```

## Training the Doc2Vec model
```bash
python train.py
```

## Inference
Check the similarity between two documents
```bash
python inference.py
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/faisalahmedbijoy/document-similarity-using-doc2vec-and-gensim

Awesome Lists containing this project

README