{"id":25397369,"url":"https://github.com/hasnainroopawalla/captcha-classification","last_synced_at":"2026-06-23T04:31:30.272Z","repository":{"id":137986381,"uuid":"324193076","full_name":"hasnainroopawalla/Captcha-Classification","owner":"hasnainroopawalla","description":"A MATLAB project that solves CAPTCHA images using an Image pre-processing pipeline and Decision Trees.","archived":false,"fork":false,"pushed_at":"2020-12-26T08:48:16.000Z","size":2240,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-11-09T05:19:37.273Z","etag":null,"topics":["decision-trees","imageprocessing","knn","matlab","svm"],"latest_commit_sha":null,"homepage":"","language":"MATLAB","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hasnainroopawalla.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2020-12-24T16:12:08.000Z","updated_at":"2025-01-20T15:14:00.000Z","dependencies_parsed_at":null,"dependency_job_id":"17952a96-00d0-490c-8e0c-a5ed81839a85","html_url":"https://github.com/hasnainroopawalla/Captcha-Classification","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/hasnainroopawalla/Captcha-Classification","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hasnainroopawalla%2FCaptcha-Classification","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hasnainroopawalla%2FCaptcha-Classification/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hasnainroopawalla%2FCaptcha-Classification/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hasnainroopawalla%2FCaptcha-Classification/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hasnainroopawalla","download_url":"https://codeload.github.com/hasnainroopawalla/Captcha-Classification/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hasnainroopawalla%2FCaptcha-Classification/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34675970,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-23T02:00:07.161Z","response_time":65,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["decision-trees","imageprocessing","knn","matlab","svm"],"created_at":"2025-02-15T21:47:43.877Z","updated_at":"2026-06-23T04:31:30.253Z","avatar_url":"https://github.com/hasnainroopawalla.png","language":"MATLAB","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Captcha Classification\nThis project was built for the course - \"Introduction to Image Analysis\" (1MD110) at Uppsala University\n\nThe objective is to accurately solve noisy CAPTCHA images (distorted images containing letters and digits used in cyber-security). In this task, each CAPTCHA image is extremely noisy and consists of 3 digits in very erratic orientations as well as several stray marks.\n\n## Input Examples\n![Example 1](https://github.com/hasnainroopawalla/Captcha-Classification/blob/main/images/ex1.png)\n![Example 2](https://github.com/hasnainroopawalla/Captcha-Classification/blob/main/images/ex2.png)\n\n\n## Pre-Processing Pipeline\n![Pipeline](https://github.com/hasnainroopawalla/Captcha-Classification/blob/main/images/pipeline.PNG)\n\nResult of Pre-Processing (Example):\n\n![Pre-processing example](https://github.com/hasnainroopawalla/Captcha-Classification/blob/main/images/op1.png)\n\n## Feature Selection\nThe set of features used to train the model are as follows:\n* Circularity\n* Area\n* Centroid\n* Orientation\n* Solidity\n\n\n## General Flow\nEach training image is split into 3 distinct props (digits) and the above mentioned features are extracted for each prop. Following is the result of splitting into 3 props:\n\n![Prop 1](https://github.com/hasnainroopawalla/Captcha-Classification/blob/main/images/p1.png)\n![Prop 1](https://github.com/hasnainroopawalla/Captcha-Classification/blob/main/images/p2.png)\n![Prop 1](https://github.com/hasnainroopawalla/Captcha-Classification/blob/main/images/p3.png)\n\nEach **prop** returns a `1 x 6` feature vector\n\nEach **image** returns a `3 x 1 x 6` feature vector (each dimension corresponds to each digit)\n\n## Training and Evaluation\nTraining images - 1100\n\nValidation images - 100\n\n3 digits are extracted from each image which corresponds to 3300 training samples\n\n3 models were trained and the results are reported below:\n* KNN (k=3)\n* Linear SVM\n* Decision Trees with Adaptive Boosting (maxSplits=30)\n\n## Results\n![Results 1](https://github.com/hasnainroopawalla/Captcha-Classification/blob/main/images/results.PNG)\n![Results 2](https://github.com/hasnainroopawalla/Captcha-Classification/blob/main/images/results2.PNG)\n\nBest results were obtained by using *Decision Trees with Adaptive Boosting (maxSplits=30)* with the following metrics:\n\n* A training accuracy of ~97% was obtained\n* Validation accuracy of ~82% was obtained (better evaluation can be performed using cross-validation)\n* Accuracy of ~61% was obtained on a Hidden Test Set\n\n## Future work\n* Splitting of Digits can be optimized for overlapping digits by conducting repeated (and controlled) Erosion followed by Dilation to break connected components\n* Resize image to the same size before feature extraction for consistency (or flatten the image itself)\n* Train a CNN architecture to improve accuracy and performance\n* Perform cross-validation for better evaluation\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhasnainroopawalla%2Fcaptcha-classification","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhasnainroopawalla%2Fcaptcha-classification","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhasnainroopawalla%2Fcaptcha-classification/lists"}