https://github.com/alecruces/optirice
Exploring optimization methods like Gradient Descent and BCGD in semi-supervised learning for rice seeds classification
https://github.com/alecruces/optirice
classification optimization-methods semisupervised-learning
Last synced: 8 months ago
JSON representation
Exploring optimization methods like Gradient Descent and BCGD in semi-supervised learning for rice seeds classification
- Host: GitHub
- URL: https://github.com/alecruces/optirice
- Owner: alecruces
- Created: 2024-04-08T14:55:50.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-11-13T16:05:34.000Z (over 1 year ago)
- Last Synced: 2025-04-14T10:57:57.666Z (11 months ago)
- Topics: classification, optimization-methods, semisupervised-learning
- Language: Jupyter Notebook
- Homepage:
- Size: 727 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Gradient Descent and BCGD Methods with an Application on Rice Seeds Classification
> Semi-supervised learning applied to rice seeds classification, using Gradient Descent and Block Coordinate Gradient Descent (BCGD) methods for optimization.
## Keywords
Optimization Methods, Semi-Supervised Learning, Rice Seeds Classification
---
## Table of Contents
1. [About the Project](#about-the-project)
2. [Key Features](#key-features)
3. [Key Results](#key-results)
4. [Data Overview](#data-overview)
5. [Methodology](#methodology)
6. [Screenshots and Graphs](#screenshots-and-graphs)
7. [Technologies Used](#technologies-used)
8. [Setup & Installation](#setup--installation)
9. [Usage](#usage)
10. [Contributing](#contributing)
11. [License](#license)
12. [Contact](#contact)
---
### About the Project
This project explores semi-supervised learning using optimization methods for classifying rice seed types. We apply three optimization methods—Gradient Descent, Randomized Block Coordinate Gradient Descent (BCGD), and Gauss-Southwell BCGD—to both a synthetic dataset and a real-world rice seeds dataset. By minimizing an objective function, the study demonstrates how labeled data can guide label predictions for unlabeled instances, with performance measured in accuracy and computation time.
### Key Features
- **Semi-Supervised Learning**: Efficient label prediction for unlabeled data using a small set of labeled data.
- **Optimization Techniques**: Implementation of three optimization methods with convergence analysis and performance comparison.
- **Real-World Application**: Classification of rice seeds based on morphological attributes.
### Key Results
- **Convergence Behavior**: Gradient Descent converges in fewer iterations, while Gauss-Southwell BCGD is the fastest in terms of computational time.
- **Accuracy**: Achieved approximately 70% accuracy on true labels for the real rice seeds dataset, indicating effective label assignment.
- **Similarity Measure**: Exponential similarity (RBF kernel) yielded better performance compared to Euclidean distance for clustering.
### Data Overview
This project utilizes a synthetic dataset and a real-world rice seed dataset:
- **Rice Type Data Set**: [Rice Type Classification Dataset on Kaggle](https://www.kaggle.com/datasets/mssmartypants/rice-type-classification)
Contains attributes of rice grains, including area, perimeter, axis lengths, eccentricity, and convex area, used for classification between Jasmine and Gonen rice types.
### Methodology
The following optimization techniques were applied:
- **Gradient Descent**: Basic optimization technique with convergence based on gradient computation.
- **BCGD with Randomized Rule**: Coordinates updated based on random selection of blocks.
- **BCGD with Gauss-Southwell Rule**: Blocks are selected based on the direction of maximum gradient magnitude.
Additional methodology aspects include:
- **Similarity Measures**: RBF kernel used for calculating similarity weights between data points.
- **Stopping Criteria**: Convergence is determined by either a set iteration limit or the magnitude of the gradient.
### Screenshots and Graphs
1. **Distribution of Synthetic Data (Scatter Plot)**
Visualization of labeled and unlabeled points before applying the optimization methods.

Distribution of examples for the synthetic data set before appliying optimization algorithms. Symbol + denotes unlabeled data, dotes denotes labeled data (two classes: yellow and purple.)
2. **Accuracy vs. Iterations for Synthetic Data (Line Chart)**
Comparison of accuracy across iterations for each optimization method on the synthetic dataset.
Accuracy as a function of iteration number based on continuous values (left) and on loss function (right).
3. **Distribution of Rice Seeds Data (Scatter Plot)**
Shows the labeled and unlabeled distribution of rice seeds, with attributes such as eccentricity and perimeter.

Distribution of examples for the real data set after applying Gradient Descent algorithm. Symbol + denotes the predicted unlabeled data (now labeled with a class), dotes denotes the true unlabeled data.
4. **Accuracy and Loss over Time (Line Charts)**
Shows accuracy as a function of CPU time for rice seed datasets, and loss
Accuracy as a function of CPU time based on continuous values (left) and on loss function (right) with a time window of 10,000 ms.
Loss as a function of iteration number (left) and as a function of CPU time (right) with time window of 10,000 ms.
### Technologies Used
> 🛠️ Highlighting essential tools and libraries.
- : Main programming language.
- **NumPy** and **Pandas**: Data manipulation and processing.
- **Matplotlib**: Visualization of data distribution, accuracy, and loss trends.
### Setup & Installation
Clone the repository and install dependencies to replicate the study:
```bash
# Clone the repository
git clone https://github.com/username/OptiRice.git
```
# Navigate to the project directory
cd OptiRice
# Install dependencies
pip install -r requirements.txt
### Usage
The repository includes the following files:
- **`code.ipynb`**: Jupyter notebook with the complete workflow, from data loading to optimization and performance evaluation.
- **`Report.pdf`**: Detailed report covering the methodology, convergence analysis, and findings.
To run the project, open `code.ipynb` in Jupyter Notebook or view it on [nbviewer](https://nbviewer.org/github/alecruces/OptiRice/blob/main/code.ipynb).
### Contributing
Contributions are welcome! Please refer to the contributing guidelines for more details.