Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/oislen/catclassifier
A machine learning pipeline for classifying images of cats and dogs.
https://github.com/oislen/catclassifier
aws-ec2 beautifulsoup cnn deep-learning docker image-classification kaggle keras machine-learning microsoft-playwright multiprocessing python pytorch scrapy web-scraping
Last synced: 10 days ago
JSON representation
A machine learning pipeline for classifying images of cats and dogs.
- Host: GitHub
- URL: https://github.com/oislen/catclassifier
- Owner: oislen
- License: mit
- Created: 2022-08-08T07:03:04.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-11-03T13:31:13.000Z (3 months ago)
- Last Synced: 2024-12-07T03:15:47.739Z (2 months ago)
- Topics: aws-ec2, beautifulsoup, cnn, deep-learning, docker, image-classification, kaggle, keras, machine-learning, microsoft-playwright, multiprocessing, python, pytorch, scrapy, web-scraping
- Language: Python
- Homepage:
- Size: 2.4 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Cat Classification
## Overview
This git repository contains code and configurations for implementing a Convolutional Neural Network to classify images containing cats or dogs. The data was sourced from the [dogs-vs-cats](https://www.kaggle.com/competitions/dogs-vs-cats/overview) Kaggle competition, and also from [freeimages.com](https://www.freeimages.com/) using a web scraper. Docker containers were used to deploy the application on an EC2 spot instances in order to scale up hardware and computation power.
## Repo Contents
* The __aws__ subdirectory contains batch and shell scripts for configuring ec2 spot instances and the deploying docker container remotely.
* The __conda__ subdirectory contains batch and shell scripts for creating a local conda environment for the project.
* The __data_prep__ subdirectory contains python utility scripts to data cleansing and processing for modelling.
* The __kaggle__ subdirectory contains python scripts for downloading and unzipping competition data from Kaggle.
* The __model__ subdirectory contains python scripts for initiating and training CNN models.
* The __ref__ subdirectory contains previous analysis and kernals on dogs vs cats classification from Kaggle community members.
* The __report__ subdirectory contains reportable images and plots generated by the application.
* The __webscrapers__ subdirectory contains webscraping tools for downloading cats and dogs images from [freeimages.com](https://www.freeimages.com/).## Application Scripts
The main dog and cat image classification application is contained within the root scripts:
* The __01_prg_kaggle_data.py__ script downloads / unzips the cat vs dogs competition data.
* The __02_prg_scrape_imgs.py__ script scrapes additional cat and dog images from [freeimages.com](https://www.freeimages.com/).
* The __03_prg_keras_model.py__ script trains, fits and makes image predictions of the cat and dog images using a CNN model.
* The __analysis_results.ipynb__ file contains a high level summary aof the analysis results.
* The __cons.py__ script contains programme constants and configurations.
* The __Dockerfile__ builds the application container for deployment on ec2.
* The __exeDocker.bat__ executes the Docker build process locally on windows.
* The __requirements.txt__ file contains the python package dependencies for the application.## Analysis Results
See the analysis results notebook for a summary of the project; including image processing, CNN architecture and model performance.
* https://nbviewer.org/github/oislen/CatClassifier/blob/main/notebooks/torch_analysis_results.ipynb## Docker Container
The application docker container is available on dockerhub here:
https://hub.docker.com/repository/docker/oislen/cat-classifier