An open API service indexing awesome lists of open source software.

https://github.com/subhadipsinha722133/duplicate-question-nlp

The "Duplicate Question Pairs" task in Natural Language Processing (NLP) involves determining whether two questions have the same meaning, even if they are phrased differently. The goal is to identify semantic equivalence, not just word-for-word similarity.
https://github.com/subhadipsinha722133/duplicate-question-nlp

deep-learning keras nlp-machine-learning sklearn tensorflow

Last synced: 9 months ago
JSON representation

The "Duplicate Question Pairs" task in Natural Language Processing (NLP) involves determining whether two questions have the same meaning, even if they are phrased differently. The goal is to identify semantic equivalence, not just word-for-word similarity.

Awesome Lists containing this project

README

          

# Duplicate-Question-NLP
The "Duplicate Question Pairs" task in Natural Language Processing (NLP) involves determining whether two questions have the same meaning, even if they are phrased differently. The goal is to identify semantic equivalence, not just word-for-word similarity.
![Project Sample] ( https://drive.google.com/file/d/1Kh53CAixypLCEINCyBtOMUJV8abLAMGp/view?usp=drive_link )

## Project Features

- **Semantic Equivalence Detection:** Uses NLP models to determine if a pair of questions are duplicates.
- **Jupyter Notebook Implementation:** Main workflow and experiments are provided as notebooks.
- **Preprocessing Pipelines:** Includes text cleaning, tokenization, and feature engineering.
- **Model Training & Evaluation:** Covers various machine learning models for duplicate detection.

## Main Files to Highlight

- `data/`: Contains datasets used for training and evaluation.
- `notebooks/`: Jupyter Notebooks for data exploration, preprocessing, model training, and analysis.
- `requirements.txt`: List of Python dependencies.
- `README.md`: Project overview and instructions.

## Setup & Usage Instructions

1. **Clone the Repository**
```bash
git clone https://github.com/subhadipsinha722133/Duplicate-Question-NLP.git
cd Duplicate-Question-NLP
```

2. **Install Dependencies**
```bash
pip install -r requirements.txt
```

3. **Run Jupyter Notebooks**
```bash
jupyter notebook
```
- Open any notebook in the `notebooks/` directory to get started.

## Dataset & Model Details

- **Dataset:** Commonly uses the Quora Question Pairs dataset (or similar) for training and evaluation. Please place your dataset in the `data/` folder.
- **Models:** Implements models such as Logistic Regression, Random Forest, and advanced NLP models (e.g., TF-IDF, word embeddings).
- **Evaluation Metrics:** Accuracy, Precision, Recall, F1 Score.

## Author & Contact

- **Author:** Subhadip Sinha
- **GitHub:** [subhadipsinha722133](https://github.com/subhadipsinha722133)
- **Email:** [sinhasunhadip34@gmail.com]