Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/vivekkdagar/naivebayesclassifier
Multinomial Naive Bayes Language Classification model
https://github.com/vivekkdagar/naivebayesclassifier
artificial-intelligence beautifulsoup4 college-project github joblib kaggle kaggle-dataset linux machine-learning multinomial-naive-bayes naive-bayes naive-bayes-classifier natural-language-processing popos pycharm python3 scikit-learn simple-project
Last synced: about 1 month ago
JSON representation
Multinomial Naive Bayes Language Classification model
- Host: GitHub
- URL: https://github.com/vivekkdagar/naivebayesclassifier
- Owner: vivekkdagar
- License: mit
- Created: 2024-01-30T06:08:26.000Z (12 months ago)
- Default Branch: main
- Last Pushed: 2024-01-30T06:55:44.000Z (12 months ago)
- Last Synced: 2024-10-31T17:44:39.605Z (3 months ago)
- Topics: artificial-intelligence, beautifulsoup4, college-project, github, joblib, kaggle, kaggle-dataset, linux, machine-learning, multinomial-naive-bayes, naive-bayes, naive-bayes-classifier, natural-language-processing, popos, pycharm, python3, scikit-learn, simple-project
- Language: Python
- Homepage:
- Size: 1.04 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Multinomial Naive Bayes Language Classification Model
This repository provides a tutorial on implementing language classification using the Multinomial Naive Bayes algorithm. The tutorial includes a Python implementation to detect the language of a given text. The code consists of two main files: `main.py` for user interaction and `detector.py` containing the `LanguageClassifier` class.
## Overview
The Multinomial Naive Bayes algorithm is widely used for text classification tasks, including language identification. This tutorial demonstrates how to train a language classifier using a provided dataset and then use the trained model to predict the language of input text.
## Prerequisites
Before running the code, ensure you have the following dependencies installed:
- Python
- Required libraries: `requests`, `bs4`, `pandas`, `scikit-learn`, `joblib`Install the necessary dependencies using the following command:
```bash
pip install requests bs4 pandas scikit-learn joblib
```## Usage
1. **Clone the Repository:**
```bash
git clone https://github.com/vivekkdagar/NaiveBayesClassifier.git
cd NaiveBayesClassifier
```2. **Run the Main Script:**
```bash
python3 main.py
```3. **Select Data Source and input data:**
- Choose the mode ('raw', 'file', or 'website') to input text data.4. **Results:**
- The predicted language for the provided text will be displayed.## Code Structure
- `main.py`: Handles user interaction and data input.
- `detector.py`: Contains the `LanguageClassifier` class responsible for training and predicting languages.## Data Preprocessing
The `LanguageClassifier` class preprocesses the training data by removing special characters and transforming the text into a bag-of-words representation using the `CountVectorizer` from scikit-learn.
## Training the Model
The tutorial uses a provided dataset, "Language Detection.csv," to train the Multinomial Naive Bayes model. The model is then serialized using the `joblib` library for future use.
## Additional Notes
- To modify or extend the training dataset, edit the "Language Detection.csv" file.
- Adjust the HTML tag in the `scrape_website` function within `main.py` based on your specific use case.## References
- [Language Detection Dataset on Kaggle](https://www.kaggle.com/datasets/basilb2s/language-detection)
- [Beautiful Soup documentation](https://www.crummy.com/software/BeautifulSoup/bs4)