https://github.com/fyt3rp4til/fasttext-meta-ecommerce-category-classification
https://github.com/fyt3rp4til/fasttext-meta-ecommerce-category-classification
fasttext-model natural-language-processing pandas-dataframe regex sklearn
Last synced: about 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/fyt3rp4til/fasttext-meta-ecommerce-category-classification
- Owner: FYT3RP4TIL
- Created: 2024-09-04T14:24:08.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2024-09-08T05:04:02.000Z (8 months ago)
- Last Synced: 2025-01-22T19:12:02.081Z (4 months ago)
- Topics: fasttext-model, natural-language-processing, pandas-dataframe, regex, sklearn
- Language: Jupyter Notebook
- Homepage:
- Size: 7.61 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ๐ **FastText-(Meta)-Ecommerce-Category-Classification**
This project demonstrates how to perform text classification on e-commerce product descriptions using FastText.
## ๐ Dataset
The dataset used in this project contains e-commerce item descriptions categorized into four classes:
1. ๐ Household
2. ๐ฅ๏ธ Electronics
3. ๐งฅ Clothing and Accessories
4. ๐ BooksDataset source: [Kaggle - E-commerce Text Classification](https://www.kaggle.com/datasets/saurabhshahane/ecommerce-text-classification)
## ๐ง Data Preparation
### Loading the Data
We use pandas to load and inspect the dataset:
```python
import pandas as pddf = pd.read_csv("ecommerce_dataset.csv", names=["category", "description"], header=None)
print(df.shape)
df.head(3)
```Output:
```
(50425, 2)
category description
0 Household Paper Plane Design Framed Wall Hanging Motivat...
1 Household SAF 'Floral' Framed Painting (Wood, 30 inch x ...
2 Household SAF 'UV Textured Modern Art Print Framed' Pain...
```### Preparing Labels for FastText
FastText expects labels to be prefixed with `__label__`. We create a new column combining the label and description:
```python
df['category'] = '__label__' + df['category'].astype(str)
df['category_description'] = df['category'] + ' ' + df['description']
```## ๐งน Text Preprocessing
We preprocess the text data using regular expressions to:
1. Remove punctuation
2. Remove extra spaces
3. Convert text to lowercase```python
import redef preprocess(text):
text = re.sub(r'[^\w\s\']',' ', text)
text = re.sub(' +', ' ', text)
return text.strip().lower()df['category_description'] = df['category_description'].map(preprocess)
```## ๐พ Generating CSV for FastText
We split the data into training and testing sets, then save them as CSV files:
```python
train.to_csv("ecommerce.train", columns=["category_description"], index=False, header=False)
test.to_csv("ecommerce.test", columns=["category_description"], index=False, header=False)
```## ๐๏ธ Training and Evaluation
We use FastText to train the model and evaluate its performance:
```python
import fasttextmodel = fasttext.train_supervised(input="ecommerce.train")
model.test("ecommerce.test")
```Results:
```
(10085, 0.9682697074863659, 0.9682697074863659)
```The model achieves approximately 96.83% precision and recall on the test set.
## ๐ฎ Predictions
We can use the trained model to make predictions on new product descriptions. Let's examine some examples:
### ๐ฅ๏ธ Electronics Prediction
```python
product_description = "wintech assemble desktop pc cpu 500 gb sata hdd 4 gb ram intel c2d processor 3"
prediction = model.predict(product_description)
print(f"Product: {product_description}")
print(f"Predicted Category: {prediction[0][0]}")
print(f"Confidence: {prediction[1][0]:.2%}")
```Output:
```
Product: wintech assemble desktop pc cpu 500 gb sata hdd 4 gb ram intel c2d processor 3
Predicted Category: __label__electronics
Confidence: 98.56%
```The model correctly identifies this as an electronics product with high confidence.
### ๐งฅ Clothing and Accessories Prediction
```python
product_description = "ockey men's cotton t shirt fabric details 80 cotton 20 polyester super combed cotton rich fabric"
prediction = model.predict(product_description)
print(f"Product: {product_description}")
print(f"Predicted Category: {prediction[0][0]}")
print(f"Confidence: {prediction[1][0]:.2%}")
```Output:
```
Product: ockey men's cotton t shirt fabric details 80 cotton 20 polyester super combed cotton rich fabric
Predicted Category: __label__clothing_accessories
Confidence: 100.00%
```The model correctly classifies this as a clothing item with very high confidence.
### ๐ Books Prediction
```python
product_description = "think and grow rich deluxe edition"
prediction = model.predict(product_description)
print(f"Product: {product_description}")
print(f"Predicted Category: {prediction[0][0]}")
print(f"Confidence: {prediction[1][0]:.2%}")
```Output:
```
Product: think and grow rich deluxe edition
Predicted Category: __label__books
Confidence: 100.00%
```The model accurately identifies this as a book with very high confidence.
## ๐ Word Similarities
We can also find similar words using the trained model:
```python
model.get_nearest_neighbors("painting")
```Output:
```
[(0.9976388216018677, 'vacuum'),
(0.9968333840370178, 'guard'),
(0.9968314170837402, 'heating'),
(0.9966275095939636, 'lid'),
(0.9962871670722961, 'lamp'),
...]
```This shows words that the model considers similar to "painting" in the context of e-commerce products.
```python
model.get_nearest_neighbors("sony")
```Output:
```
[(0.9988397359848022, 'external'),
(0.998672366142273, 'binoculars'),
(0.9981507658958435, 'dvd'),
(0.9975149631500244, 'nikon'),
(0.9973592162132263, 'glossy'),
...]
```These results show words that the model associates closely with the brand "Sony" in the e-commerce context.
## ๐ Conclusion
This project demonstrates the effectiveness of FastText in classifying e-commerce product descriptions. With high accuracy and the ability to make quick predictions, this model can be a valuable tool for automating product categorization in e-commerce platforms.
For further improvements, consider:
- Experimenting with different preprocessing techniques
- Fine-tuning FastText hyperparameters
- Exploring other deep learning models for comparisonHappy classifying! ๐