Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/dummumounika/ecommerce-sales-categorization
This repository contains Python code for text classification and analysis of e-commerce sales data. The script processes textual descriptions of products and categorizes them into predefined categories using a Naive Bayes classifier. It also includes various analysis and visualization methods to explore the dataset.
https://github.com/dummumounika/ecommerce-sales-categorization
machine-learning matplotlib-pyplot ntlk numpy pandas python scikit-learn
Last synced: 10 days ago
JSON representation
This repository contains Python code for text classification and analysis of e-commerce sales data. The script processes textual descriptions of products and categorizes them into predefined categories using a Naive Bayes classifier. It also includes various analysis and visualization methods to explore the dataset.
- Host: GitHub
- URL: https://github.com/dummumounika/ecommerce-sales-categorization
- Owner: DummuMounika
- Created: 2024-05-08T05:38:52.000Z (8 months ago)
- Default Branch: master
- Last Pushed: 2024-05-08T23:47:06.000Z (8 months ago)
- Last Synced: 2024-11-08T23:32:55.559Z (2 months ago)
- Topics: machine-learning, matplotlib-pyplot, ntlk, numpy, pandas, python, scikit-learn
- Language: Python
- Homepage:
- Size: 20.9 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Ecommerce-Sales-Categorization
Python script for text classification and analysis of e-commerce sales data.
# Overview
This repository contains a Python script that processes textual descriptions of products from an e-commerce dataset and categorizes them into predefined categories using a Naive Bayes classifier. Additionally, the script provides various analysis and visualization methods to explore the dataset, including plotting category distribution, analyzing top customers, and visualizing sales by country and month.
# Features
**Text Classification:** Utilizes a Naive Bayes classifier to categorize product descriptions.
**Natural Language Processing (NLP):** Preprocesses text data using tokenization and lemmatization and filtering out invalid words.
**Analysis and Visualization:** Provides insights into the dataset through various analysis and visualization methods.
**Error Handling:** Handles file loading errors and unexpected errors during execution.# Usage
1. Ensure Python and required libraries are installed.
2. Clone this repository to your local machine.
3. Prepare the training dataset in CSV format with 'Description' and 'Category' columns.
4. Run the script(text_classifier.py), provide the necessary file paths as arguments to input files.
5. Explore the output results such as predicted_categories.json and predicted_categories.csv files.
6. Analyze the results and visualizations generated by the script.# Acknowledgements
1. scikit-learn: Library for machine learning in Python.
2. NLTK: Toolkit for natural language processing.
3. Matplotlib: Visualization library in Python.# Author
MOUNIKA DUMMU
MANEESH SETTIPETA
VIKRAM SAMUDRALA