https://github.com/hasnainroopawalla/captcha-classification

A MATLAB project that solves CAPTCHA images using an Image pre-processing pipeline and Decision Trees.
https://github.com/hasnainroopawalla/captcha-classification

decision-trees imageprocessing knn matlab svm

Last synced: 6 months ago
JSON representation

A MATLAB project that solves CAPTCHA images using an Image pre-processing pipeline and Decision Trees.

Host: GitHub
URL: https://github.com/hasnainroopawalla/captcha-classification
Owner: hasnainroopawalla
Created: 2020-12-24T16:12:08.000Z (almost 5 years ago)
Default Branch: main
Last Pushed: 2020-12-26T08:48:16.000Z (almost 5 years ago)
Last Synced: 2025-02-15T21:47:39.032Z (8 months ago)
Topics: decision-trees, imageprocessing, knn, matlab, svm
Language: MATLAB
Homepage:
Size: 2.14 MB
Stars: 1
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Captcha Classification
This project was built for the course - "Introduction to Image Analysis" (1MD110) at Uppsala University

The objective is to accurately solve noisy CAPTCHA images (distorted images containing letters and digits used in cyber-security). In this task, each CAPTCHA image is extremely noisy and consists of 3 digits in very erratic orientations as well as several stray marks.

## Input Examples
![Example 1](https://github.com/hasnainroopawalla/Captcha-Classification/blob/main/images/ex1.png)
![Example 2](https://github.com/hasnainroopawalla/Captcha-Classification/blob/main/images/ex2.png)

## Pre-Processing Pipeline
![Pipeline](https://github.com/hasnainroopawalla/Captcha-Classification/blob/main/images/pipeline.PNG)

Result of Pre-Processing (Example):

![Pre-processing example](https://github.com/hasnainroopawalla/Captcha-Classification/blob/main/images/op1.png)

## Feature Selection
The set of features used to train the model are as follows:
* Circularity
* Area
* Centroid
* Orientation
* Solidity

## General Flow
Each training image is split into 3 distinct props (digits) and the above mentioned features are extracted for each prop. Following is the result of splitting into 3 props:

![Prop 1](https://github.com/hasnainroopawalla/Captcha-Classification/blob/main/images/p1.png)
![Prop 1](https://github.com/hasnainroopawalla/Captcha-Classification/blob/main/images/p2.png)
![Prop 1](https://github.com/hasnainroopawalla/Captcha-Classification/blob/main/images/p3.png)

Each **prop** returns a `1 x 6` feature vector

Each **image** returns a `3 x 1 x 6` feature vector (each dimension corresponds to each digit)

## Training and Evaluation
Training images - 1100

Validation images - 100

3 digits are extracted from each image which corresponds to 3300 training samples

3 models were trained and the results are reported below:
* KNN (k=3)
* Linear SVM
* Decision Trees with Adaptive Boosting (maxSplits=30)

## Results
![Results 1](https://github.com/hasnainroopawalla/Captcha-Classification/blob/main/images/results.PNG)
![Results 2](https://github.com/hasnainroopawalla/Captcha-Classification/blob/main/images/results2.PNG)

Best results were obtained by using *Decision Trees with Adaptive Boosting (maxSplits=30)* with the following metrics:

* A training accuracy of ~97% was obtained
* Validation accuracy of ~82% was obtained (better evaluation can be performed using cross-validation)
* Accuracy of ~61% was obtained on a Hidden Test Set

## Future work
* Splitting of Digits can be optimized for overlapping digits by conducting repeated (and controlled) Erosion followed by Dilation to break connected components
* Resize image to the same size before feature extraction for consistency (or flatten the image itself)
* Train a CNN architecture to improve accuracy and performance
* Perform cross-validation for better evaluation

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/hasnainroopawalla/captcha-classification

Awesome Lists containing this project

README