Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/oislen/catclassifier

A machine learning pipeline for classifying images of cats and dogs.
https://github.com/oislen/catclassifier

aws-ec2 beautifulsoup cnn deep-learning docker image-classification kaggle keras machine-learning microsoft-playwright multiprocessing python pytorch scrapy web-scraping

Last synced: 10 days ago
JSON representation

A machine learning pipeline for classifying images of cats and dogs.

Awesome Lists containing this project

README

        

# Cat Classification

## Overview

This git repository contains code and configurations for implementing a Convolutional Neural Network to classify images containing cats or dogs. The data was sourced from the [dogs-vs-cats](https://www.kaggle.com/competitions/dogs-vs-cats/overview) Kaggle competition, and also from [freeimages.com](https://www.freeimages.com/) using a web scraper. Docker containers were used to deploy the application on an EC2 spot instances in order to scale up hardware and computation power.

## Repo Contents

* The __aws__ subdirectory contains batch and shell scripts for configuring ec2 spot instances and the deploying docker container remotely.
* The __conda__ subdirectory contains batch and shell scripts for creating a local conda environment for the project.
* The __data_prep__ subdirectory contains python utility scripts to data cleansing and processing for modelling.
* The __kaggle__ subdirectory contains python scripts for downloading and unzipping competition data from Kaggle.
* The __model__ subdirectory contains python scripts for initiating and training CNN models.
* The __ref__ subdirectory contains previous analysis and kernals on dogs vs cats classification from Kaggle community members.
* The __report__ subdirectory contains reportable images and plots generated by the application.
* The __webscrapers__ subdirectory contains webscraping tools for downloading cats and dogs images from [freeimages.com](https://www.freeimages.com/).

## Application Scripts

The main dog and cat image classification application is contained within the root scripts:

* The __01_prg_kaggle_data.py__ script downloads / unzips the cat vs dogs competition data.
* The __02_prg_scrape_imgs.py__ script scrapes additional cat and dog images from [freeimages.com](https://www.freeimages.com/).
* The __03_prg_keras_model.py__ script trains, fits and makes image predictions of the cat and dog images using a CNN model.
* The __analysis_results.ipynb__ file contains a high level summary aof the analysis results.
* The __cons.py__ script contains programme constants and configurations.
* The __Dockerfile__ builds the application container for deployment on ec2.
* The __exeDocker.bat__ executes the Docker build process locally on windows.
* The __requirements.txt__ file contains the python package dependencies for the application.

## Analysis Results

See the analysis results notebook for a summary of the project; including image processing, CNN architecture and model performance.
* https://nbviewer.org/github/oislen/CatClassifier/blob/main/notebooks/torch_analysis_results.ipynb

## Docker Container

The application docker container is available on dockerhub here:

https://hub.docker.com/repository/docker/oislen/cat-classifier