https://github.com/olajuwondele/fruit_image_classification
A complete image classification pipeline using TensorFlow/Keras with CNN, MLP, LSTM, Autoencoder, and Vision Transformer models. Includes dataset scraping, preprocessing, augmentation, training, and evaluation.
https://github.com/olajuwondele/fruit_image_classification
autoencoder-classification cnn deep-learning keras lstm mlp neural-network tensorflow vision-transformer
Last synced: about 2 months ago
JSON representation
A complete image classification pipeline using TensorFlow/Keras with CNN, MLP, LSTM, Autoencoder, and Vision Transformer models. Includes dataset scraping, preprocessing, augmentation, training, and evaluation.
- Host: GitHub
- URL: https://github.com/olajuwondele/fruit_image_classification
- Owner: OlajuwonDele
- Created: 2025-06-23T09:15:00.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2025-06-23T09:44:38.000Z (11 months ago)
- Last Synced: 2025-06-23T10:33:45.171Z (11 months ago)
- Topics: autoencoder-classification, cnn, deep-learning, keras, lstm, mlp, neural-network, tensorflow, vision-transformer
- Language: Jupyter Notebook
- Homepage:
- Size: 7.22 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
### Fruit Image Classification with Deep Learning Models
## Overview
This Jupyter notebook implements a fruit image classification system using various deep learning architectures. The project demonstrates a complete pipeline from dataset collection to model evaluation.
## Key Features
Dataset Collection: Uses DuckDuckGo Search API to scrape fruit images (grapes, grapefruit, apple, banana, mango, orange)
## Data Preprocessing:
Downloads and verifies image integrity
Implements data augmentation (rotation, zoom, flipping)
Splits data into training/validation sets
## Model Architectures:
CNN (Convolutional Neural Network)
MLP (Multi-Layer Perceptron)
LSTM (Long Short-Term Memory)
Autoencoder Classifier
Vision Transformer (ViT)
Evaluation: Comprehensive metrics including accuracy and confusion matrices
## Performance
The models achieved the following validation accuracies:
CNN: 70.36%
Autoencoder Classifier: 68.70%
Vision Transformer: 65.10%
MLP: 39.06%
LSTM: 29.64%
The CNN model performed best, highlighting the strength of convolutional architectures for image classification. Although the Vision Transformer (ViT) showed promise, its performance was limited by the relatively small dataset size, which is a known challenge for transformer-based models that typically require large amounts of data to generalize effectively. With more data, ViT models are expected to perform significantly better.
## Technical Details
Framework: TensorFlow/Keras
Image Size: 150x150 pixels
Batch Size: 32
Epochs: 15
Data Augmentation: Rotation, zoom, horizontal flip
Validation Split: 20%
The notebook provides a solid foundation for image classification tasks and showcases how different architectures perform on the same dataset. The CNN model would be recommended for production use given its superior performance for small datasets, ViT for larger datasets.