{"id":13813289,"url":"https://github.com/jaysonsantos/captcha-breaker","last_synced_at":"2025-05-15T00:32:35.068Z","repository":{"id":141312273,"uuid":"68456229","full_name":"jaysonsantos/captcha-breaker","owner":"jaysonsantos","description":"A simple machine learning powered captcha breaker","archived":true,"fork":false,"pushed_at":"2016-09-17T16:16:39.000Z","size":33,"stargazers_count":43,"open_issues_count":0,"forks_count":4,"subscribers_count":5,"default_branch":"master","last_synced_at":"2024-08-04T04:03:43.063Z","etag":null,"topics":["captcha-breaker","scikit-image","scikit-learn"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jaysonsantos.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-09-17T14:13:24.000Z","updated_at":"2024-08-04T04:03:48.235Z","dependencies_parsed_at":"2024-08-04T04:03:46.505Z","dependency_job_id":"128968e2-f709-447a-bf80-e6ea810bf004","html_url":"https://github.com/jaysonsantos/captcha-breaker","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jaysonsantos%2Fcaptcha-breaker","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jaysonsantos%2Fcaptcha-breaker/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jaysonsantos%2Fcaptcha-breaker/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jaysonsantos%2Fcaptcha-breaker/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jaysonsantos","download_url":"https://codeload.github.com/jaysonsantos/captcha-breaker/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":225319206,"owners_count":17455724,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["captcha-breaker","scikit-image","scikit-learn"],"created_at":"2024-08-04T04:01:11.655Z","updated_at":"2024-11-19T08:30:20.802Z","avatar_url":"https://github.com/jaysonsantos.png","language":"Jupyter Notebook","funding_links":[],"categories":["Jupyter Notebook"],"sub_categories":[],"readme":"# captcha-breaker\nA simple machine learning powered captcha breaker created using scikit-learn.\nFor now the project is written inside a Jupyter notebook for a better visualization as this is just a proof of concept.\n\n# Code from the notebook\n\n\n```python\nimport math\nimport os\nimport re\n\nimport numpy as np\n\nfrom skimage import img_as_float, io\nfrom skimage.color import *\nfrom skimage.restoration import denoise_tv_chambolle\nfrom sklearn import svm, metrics\n\n# Files are named whatever-actualCaptchaTyped.png\nconfirmed_images_re = re.compile(r'-([a-zA-Z0-9]{6})\\.png$')\n```\n\n\n```python\nfrom matplotlib import pyplot as plt\n%matplotlib inline\n```\n\n\n```python\ndef load_image(path):\n    img = img_as_float(rgb2gray(io.imread(path)))[9:38,10:177]\n    img[img != 0] = 1\n    return img\n\ndef get_letters(img, number=6, avg_size=29):\n    for i in range(number):\n        start = i * avg_size\n        nimg = img.copy()[:,start:start + avg_size]\n        width_difference = avg_size - nimg.shape[1]\n        if width_difference != 0:\n            nimg = np.append(nimg, np.ones((nimg.shape[0], width_difference)), axis=1)\n        yield nimg\n\nnot_trained_captcha = load_image('captchas/captcha-54f0d97919921-9ZAC1F.png')\nfig, ax = plt.subplots(ncols=6)\n\nfor i, letter in enumerate(get_letters(not_trained_captcha)):\n    ax[i].imshow(letter)\n```\n\n\n![png](splitted_captcha.png)\n\n\n\n```python\nimgs = []\nlimit_images = 30000\ntotal_to_train = int(limit_images * 0.8)\nloaded_images = 0\nfor filename in os.listdir('captchas'):\n    match = confirmed_images_re.search(filename)\n    if not match:\n        continue\n    try:\n        imgs.append((match.group(1).lower(), load_image('captchas/{}'.format(filename))))\n    except (IndexError, OSError):  # Pillow and its errors\n        continue\n    loaded_images += 1\n    if loaded_images == limit_images:\n        break\nprint('{} images'.format(len(imgs)))\nletters_image = []\nletters_ascii = []\nfor image in imgs:\n    letters, image = image\n    for column, letter_image in enumerate(get_letters(image)):\n        letters_image.append(letter_image.flatten())\n        letters_ascii.append(letters[column])\n```\n\n    30000 images\n\n\n\n```python\nmodel = svm.SVC(C=10, gamma=0.001, probability=False)\nmodel.fit(letters_image[:total_to_train], letters_ascii[:total_to_train])\n```\n\n\n\n\n    SVC(C=10, cache_size=200, class_weight=None, coef0=0.0,\n      decision_function_shape=None, degree=3, gamma=0.001, kernel='rbf',\n      max_iter=-1, probability=False, random_state=None, shrinking=True,\n      tol=0.001, verbose=False)\n\n\n\n\n```python\npredicted = model.predict(letters_image[total_to_train:])\nexpected = letters_ascii[total_to_train:]\nprint(metrics.classification_report(expected, predicted))\n```\n\n                 precision    recall  f1-score   support\n    \n              0       0.66      0.75      0.70      1296\n              1       0.81      0.87      0.84      4128\n              2       0.94      0.95      0.95      4518\n              3       0.92      0.93      0.93      4420\n              4       0.96      0.98      0.97      4499\n              5       0.94      0.93      0.94      4271\n              6       0.90      0.94      0.92      4532\n              7       0.97      0.96      0.96      4578\n              8       0.87      0.90      0.88      4476\n              9       0.93      0.94      0.93      4593\n              a       0.98      0.97      0.97      4481\n              b       0.80      0.88      0.84      4339\n              c       0.93      0.89      0.91      4495\n              d       0.88      0.89      0.89      4548\n              e       0.90      0.90      0.90      4397\n              f       0.88      0.90      0.89      4359\n              g       0.89      0.90      0.90      4356\n              h       0.94      0.90      0.92      4371\n              i       0.81      0.83      0.82      4305\n              j       0.95      0.94      0.95      4363\n              k       0.93      0.92      0.92      4470\n              l       0.94      0.92      0.93      4267\n              m       0.97      0.95      0.96      4403\n              n       0.95      0.95      0.95      4501\n              o       0.84      0.78      0.81      4003\n              p       0.86      0.88      0.87      4457\n              q       0.95      0.96      0.96      4427\n              r       0.88      0.87      0.88      4399\n              s       0.95      0.89      0.92      4428\n              t       0.94      0.93      0.93      4537\n              u       0.95      0.90      0.92      4459\n              v       0.97      0.97      0.97      4412\n              w       0.97      0.96      0.96      4461\n              x       0.97      0.95      0.96      4508\n              y       0.97      0.94      0.95      4499\n              z       0.95      0.93      0.94      4444\n    \n    avg / total       0.92      0.92      0.92    156000\n    \n\n\n\n```python\ndef decode_captcha(filename, func=None):\n    func = func or model.predict\n    return func([l.flatten() for l in get_letters(load_image(filename))])\n\nfilename = 'captchas/{}'.format(np.random.choice(os.listdir('captchas/')))\nprint(filename, ''.join(decode_captcha(filename, mo0del.predict)))\n```\n\n    captchas/captcha-54f0d89a91c67-OFUS8R.png 0fus8r\n\n\n\n```python\nfrom sklearn.grid_search import GridSearchCV\nparams = [{'kernel': ['rbf'], 'gamma': [1e-3, 1e-4],\n                    'C': [1, 10, 100, 1000]},\n                   {'kernel': ['linear'], 'C': [1, 10, 100, 1000]}]\n   \nclf = GridSearchCV(svm.SVC(), params, n_jobs=1)\n# Use this to get the best params for the model\n# clf.fit(letters_image[:total_to_train], letters_ascii[:total_to_train])\n```\n\n\n```python\nprint(clf.best_estimator_)\nclf.grid_scores_\n```\n\n\n```python\n%timeit decode_captcha('captchas/captcha-54f0d99253782-wh4ow7.png')\n```\n\n    10 loops, best of 3: 162 ms per loop\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjaysonsantos%2Fcaptcha-breaker","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjaysonsantos%2Fcaptcha-breaker","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjaysonsantos%2Fcaptcha-breaker/lists"}