https://github.com/subhadipsinha722133/duplicate-question-nlp
The "Duplicate Question Pairs" task in Natural Language Processing (NLP) involves determining whether two questions have the same meaning, even if they are phrased differently. The goal is to identify semantic equivalence, not just word-for-word similarity.
https://github.com/subhadipsinha722133/duplicate-question-nlp
deep-learning keras nlp-machine-learning sklearn tensorflow
Last synced: 9 months ago
JSON representation
The "Duplicate Question Pairs" task in Natural Language Processing (NLP) involves determining whether two questions have the same meaning, even if they are phrased differently. The goal is to identify semantic equivalence, not just word-for-word similarity.
- Host: GitHub
- URL: https://github.com/subhadipsinha722133/duplicate-question-nlp
- Owner: subhadipsinha722133
- Created: 2025-08-20T11:02:14.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2025-09-19T09:59:26.000Z (9 months ago)
- Last Synced: 2025-09-19T11:48:37.250Z (9 months ago)
- Topics: deep-learning, keras, nlp-machine-learning, sklearn, tensorflow
- Language: Jupyter Notebook
- Homepage:
- Size: 2.66 MB
- Stars: 4
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Duplicate-Question-NLP
The "Duplicate Question Pairs" task in Natural Language Processing (NLP) involves determining whether two questions have the same meaning, even if they are phrased differently. The goal is to identify semantic equivalence, not just word-for-word similarity.
![Project Sample] ( https://drive.google.com/file/d/1Kh53CAixypLCEINCyBtOMUJV8abLAMGp/view?usp=drive_link )
## Project Features
- **Semantic Equivalence Detection:** Uses NLP models to determine if a pair of questions are duplicates.
- **Jupyter Notebook Implementation:** Main workflow and experiments are provided as notebooks.
- **Preprocessing Pipelines:** Includes text cleaning, tokenization, and feature engineering.
- **Model Training & Evaluation:** Covers various machine learning models for duplicate detection.
## Main Files to Highlight
- `data/`: Contains datasets used for training and evaluation.
- `notebooks/`: Jupyter Notebooks for data exploration, preprocessing, model training, and analysis.
- `requirements.txt`: List of Python dependencies.
- `README.md`: Project overview and instructions.
## Setup & Usage Instructions
1. **Clone the Repository**
```bash
git clone https://github.com/subhadipsinha722133/Duplicate-Question-NLP.git
cd Duplicate-Question-NLP
```
2. **Install Dependencies**
```bash
pip install -r requirements.txt
```
3. **Run Jupyter Notebooks**
```bash
jupyter notebook
```
- Open any notebook in the `notebooks/` directory to get started.
## Dataset & Model Details
- **Dataset:** Commonly uses the Quora Question Pairs dataset (or similar) for training and evaluation. Please place your dataset in the `data/` folder.
- **Models:** Implements models such as Logistic Regression, Random Forest, and advanced NLP models (e.g., TF-IDF, word embeddings).
- **Evaluation Metrics:** Accuracy, Precision, Recall, F1 Score.
## Author & Contact
- **Author:** Subhadip Sinha
- **GitHub:** [subhadipsinha722133](https://github.com/subhadipsinha722133)
- **Email:** [sinhasunhadip34@gmail.com]