https://github.com/espacio-root/sem5-ml-project

semester 5 ML project
https://github.com/espacio-root/sem5-ml-project

Last synced: 7 months ago
JSON representation

semester 5 ML project

Host: GitHub
URL: https://github.com/espacio-root/sem5-ml-project
Owner: Espacio-root
Created: 2025-09-28T19:55:24.000Z (8 months ago)
Default Branch: master
Last Pushed: 2025-10-05T18:29:07.000Z (8 months ago)
Last Synced: 2025-10-06T10:17:27.169Z (8 months ago)
Language: Jupyter Notebook
Size: 108 MB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Prerequisites

### Set up virtual environment
```bash
python3 -m venv ./venv
source ./venv/bin/activate
pip install -r requirements.txt
```

### Get the datasets
1. Download the [first dataset](https://www.kaggle.com/datasets/techsash/waste-classification-data) and place it in `./data/d1.zip`
2. Download the [second dataset](https://www.kaggle.com/datasets/rayhanzamzamy/non-and-biodegradable-waste-dataset) and place it in `./data/d2.zip`
3. Download the [third dataset](https://www.kaggle.com/datasets/alistairking/recyclable-and-household-waste-classification) and place it in `./data/d3.zip`
4. Run the following commands from the project root (`./`):
```bash
unzip ./data/d1.zip -d ./data/d1
unzip ./data/d2.zip -d ./data/d2
unzip ./data/d3.zip -d ./data/d3
```

---

# Project Structure

### Data Cleaning and Preprocessing

- **`preprocess.ipynb`** — Processes and combines the three datasets into one and stores it in CSV format as `trash.csv`.
- **`feature_extraction.ipynb`** — Uses a ResNet (CNN) model to extract features from each image in `trash.csv` and stores them in `final.csv`.

---

### Models

- **`random_forest_model.ipynb`** — Uses a Random Forest classifier and achieves **99.17%** accuracy.
- **`xgboost_model.ipynb`** — Uses the XGBoost algorithm and achieves **99.61%** accuracy.
- **`lightgbm_model.ipynb`** — Uses the LightGBM algorithm and achieves **99.63%** accuracy.

---

**Note:**
Since it might take several hours to generate `final.csv` in `feature_extraction.ipynb`, you can directly download it from [Google Drive](https://drive.google.com/file/d/1afvtsZXpUFNodYUS1kl6pmtIf6xvAZgy/view) and place it in `./data/final.csv`.

---

# Inference

Now you can test the model on your own images!
To do this:

1. Download or provide an image.
2. Run the following command with the correct path to your image:
```bash
python3 inference.py path/to/image
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/espacio-root/sem5-ml-project

Awesome Lists containing this project

README