https://github.com/vivekkdagar/naivebayesclassifier

Multinomial Naive Bayes Language Classification model
https://github.com/vivekkdagar/naivebayesclassifier

artificial-intelligence beautifulsoup4 college-project github joblib kaggle kaggle-dataset linux machine-learning multinomial-naive-bayes naive-bayes naive-bayes-classifier natural-language-processing popos pycharm python3 scikit-learn simple-project

Last synced: 3 months ago
JSON representation

Multinomial Naive Bayes Language Classification model

Host: GitHub
URL: https://github.com/vivekkdagar/naivebayesclassifier
Owner: vivekkdagar
License: mit
Created: 2024-01-30T06:08:26.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-01-30T06:55:44.000Z (over 1 year ago)
Last Synced: 2025-02-12T08:33:00.889Z (5 months ago)
Topics: artificial-intelligence, beautifulsoup4, college-project, github, joblib, kaggle, kaggle-dataset, linux, machine-learning, multinomial-naive-bayes, naive-bayes, naive-bayes-classifier, natural-language-processing, popos, pycharm, python3, scikit-learn, simple-project
Language: Python
Homepage:
Size: 1.04 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # Multinomial Naive Bayes Language Classification Model

This repository provides a tutorial on implementing language classification using the Multinomial Naive Bayes algorithm. The tutorial includes a Python implementation to detect the language of a given text. The code consists of two main files: `main.py` for user interaction and `detector.py` containing the `LanguageClassifier` class.

## Overview

The Multinomial Naive Bayes algorithm is widely used for text classification tasks, including language identification. This tutorial demonstrates how to train a language classifier using a provided dataset and then use the trained model to predict the language of input text.

## Prerequisites

Before running the code, ensure you have the following dependencies installed:

- Python

- Required libraries: `requests`, `bs4`, `pandas`, `scikit-learn`, `joblib`

Install the necessary dependencies using the following command:

```bash

pip install requests bs4 pandas scikit-learn joblib

```

## Usage

1. **Clone the Repository:**

   ```bash

   git clone https://github.com/vivekkdagar/NaiveBayesClassifier.git

   cd NaiveBayesClassifier

   ```

2. **Run the Main Script:**

   ```bash

   python3 main.py

   ```

3. **Select Data Source and input data:**

   - Choose the mode ('raw', 'file', or 'website') to input text data.

4. **Results:**

   - The predicted language for the provided text will be displayed.

## Code Structure

- `main.py`: Handles user interaction and data input.

- `detector.py`: Contains the `LanguageClassifier` class responsible for training and predicting languages.

## Data Preprocessing

The `LanguageClassifier` class preprocesses the training data by removing special characters and transforming the text into a bag-of-words representation using the `CountVectorizer` from scikit-learn.

## Training the Model

The tutorial uses a provided dataset, "Language Detection.csv," to train the Multinomial Naive Bayes model. The model is then serialized using the `joblib` library for future use.

## Additional Notes

- To modify or extend the training dataset, edit the "Language Detection.csv" file.

- Adjust the HTML tag in the `scrape_website` function within `main.py` based on your specific use case.

## References

- [Language Detection Dataset on Kaggle](https://www.kaggle.com/datasets/basilb2s/language-detection)

- [Beautiful Soup documentation](https://www.crummy.com/software/BeautifulSoup/bs4)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/vivekkdagar/naivebayesclassifier

Awesome Lists containing this project

README