{"id":25664643,"url":"https://github.com/khasbilegt/numiner","last_synced_at":"2025-04-22T14:23:05.273Z","repository":{"id":57447293,"uuid":"253820528","full_name":"khasbilegt/numiner","owner":"khasbilegt","description":"MNIST like dataset creation tool for Handwritten Text Recognition.","archived":false,"fork":false,"pushed_at":"2020-05-20T17:28:51.000Z","size":664,"stargazers_count":4,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-04-13T14:45:09.115Z","etag":null,"topics":["dataset-generation","handwriting-recognition","handwritten-character-recognition","handwritten-digit-recognition","machine-learning","optical-character-recognition","pypi-package","python38"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/khasbilegt.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-04-07T14:37:15.000Z","updated_at":"2023-03-24T11:28:12.000Z","dependencies_parsed_at":"2022-09-02T23:40:40.670Z","dependency_job_id":null,"html_url":"https://github.com/khasbilegt/numiner","commit_stats":null,"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/khasbilegt%2Fnuminer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/khasbilegt%2Fnuminer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/khasbilegt%2Fnuminer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/khasbilegt%2Fnuminer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/khasbilegt","download_url":"https://codeload.github.com/khasbilegt/numiner/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250256081,"owners_count":21400462,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dataset-generation","handwriting-recognition","handwritten-character-recognition","handwritten-digit-recognition","machine-learning","optical-character-recognition","pypi-package","python38"],"created_at":"2025-02-24T06:29:24.016Z","updated_at":"2025-04-22T14:23:05.252Z","avatar_url":"https://github.com/khasbilegt.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ch1 align=\"center\"\u003e\n  NUMiner\n\u003c/h1\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://travis-ci.org/khasbilegt/numiner\"\u003e\n    \u003cimg src=\"https://travis-ci.org/khasbilegt/numiner.svg?branch=master\" alt=\"Build Status\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://github.com/PyCQA/bandit\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/security-bandit-yellow.svg\"\n         alt=\"security: bandit\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://badge.fury.io/py/numiner\"\u003e\n    \u003cimg src=\"https://badge.fury.io/py/numiner.svg\" alt=\"PyPI version\"\u003e\n  \u003c/a\u003e\n  \u003ca href='https://coveralls.io/github/khasbilegt/numiner?branch=master'\u003e\n    \u003cimg src='https://coveralls.io/repos/github/khasbilegt/numiner/badge.svg?branch=master' alt='Coverage Status' /\u003e\n  \u003c/a\u003e\n  \u003ca href='https://github.com/psf/black'\u003e\n    \u003cimg src='https://img.shields.io/badge/code%20style-black-000000.svg' alt='Code style: black' /\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"#installation\"\u003eInstallation\u003c/a\u003e •\n  \u003ca href=\"#how-to-use\"\u003eHow To Use\u003c/a\u003e •\n  \u003ca href=\"#sample-sheet-image\"\u003eSheet\u003c/a\u003e •\n  \u003ca href=\"#contributing\"\u003eContributing\u003c/a\u003e •\n  \u003ca href=\"#license\"\u003eLicense\u003c/a\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003eThis is a Python library that creates MNIST like training dataset for Handwritten Text Recognition related researches\u003c/p\u003e\n\n## Installation\n\nUse the package manager [pip](https://pip.pypa.io/en/stable/) to install numiner.\n\n```bash\n$ pip install numiner\n```\n\nUse the package manager [pipenv](https://pypi.org/project/pipenv/) to install numiner.\n\n```bash\n$ pipenv install numiner\n```\n\nUse the package manager [poetry](https://pypi.org/project/poetry/) to install numiner.\n\n```bash\n$ poetry add numiner\n```\n\n## How To Use\n\nIn general, the package has two main modes. One is `sheet` and another one is `letter`.\n\n`sheet` - takes a path called `\u003csource\u003e` to a folder that's holding all the scanned _sheet_ images or an actual image path and saves the processed images in the `\u003cresult\u003e` path\n\n```bash\n$ numiner -s/--sheet \u003csource\u003e \u003cresult\u003e\n```\n\n`letter` - takes a path called `\u003csource\u003e` to a folder that's holding all the cropped raw images or an actual image path and saves the processed images in the `\u003cresult\u003e` path\n\n```bash\n$ numiner -l/--letter \u003csource\u003e \u003cresult\u003e\n```\n\nAlso you can override the default sheet labels by giving `json` file:\n\n```bash\n$ numiner --labels path/to/labels.json -s path/to/source path/to/result\n```\n\nFor sure you can also do this:\n\n```bash\n$ numiner --help\n\nusage: numiner [-h] [-v] [-s \u003csource\u003e \u003cresult\u003e] [-l \u003csource\u003e \u003cresult\u003e] [-c \u003cpath\u003e]\n\noptional arguments:\n  -h, --help                    show this help message and exit\n  -v, --version                 show program's version number and exit\n  --clean \u003cpath\u003e\n  -s/--sheet \u003csource\u003e \u003cresult\u003e  a path to a folder or file that's holding the \u003csource\u003e\n                                sheet image(s) \u0026 a path to a folder where all \u003cresult\u003e\n                                images will be saved\n  -l/--letter \u003csource\u003e \u003cresult\u003e a path to a folder or a file that's holding the cropped\n                                image(s) \u0026 a path to a folder where all \u003cresult\u003e images\n                                will be saved\n  --labels \u003cpath\u003e               a path to .json file that's holding top to bottom, left\n                                to right labels of the sheet with their ids\n```\n\n```bash\n$ numiner convert --help\n\nusage: numiner convert [-h] -p \u003csrc\u003e \u003cdest\u003e SIZE RATIO\n\npositional arguments:\n  SIZE                  number of images that each class contains\n  RATIO                 test, train or percentage of the test data\n                        in that case the rest of it will become\n                        train data\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -p \u003csrc\u003e \u003cdest\u003e, --paths \u003csrc\u003e \u003cdest\u003e\n                        source and destination paths\n```\n\n## Sample Sheet image\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"assets/sample_sheet.jpg\" width=\"60%\"\u003e\n\u003c/p\u003e\n\nYou can also get the empty sheet file from [here](assets/sheet.pdf).\n\n## Extracted letters from the sheet\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"assets/sheet_segmented.png\"\u003e\n\u003c/p\u003e\n\n## Final image processing order\n\nFollowed the same approach that EMNIST used when they were first creating their dataset from NIST SD images.\n\n1. Letter extracted from the sheet\n2. Binary version of original image\n3. Letter itself fitted into a square shape plus 2 pixel wide borders on each side without losing the aspect ratio\n4. From previous step, image resized to 28x28 and taken threshold results in final image\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"assets/letter_a_original.png\" width=\"24%\"\u003e\n  \u003cimg src=\"assets/letter_a_binary.png\" width=\"24%\"\u003e\n  \u003cimg src=\"assets/letter_a_cropped.png\" width=\"24%\"\u003e\n  \u003cimg src=\"assets/letter_a_final.png\" width=\"24%\"\u003e\n\u003c/div\u003e\n\n## Contributing\n\nPull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.\n\nPlease make sure to update tests as appropriate.\n\nIf you want to read more about how this project came to life, you can check out my [thesis report](https://github.com/khasbilegt/thesis-report/blob/master/main.pdf).\n\n## License\n\n[MIT](https://choosealicense.com/licenses/mit/)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkhasbilegt%2Fnuminer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkhasbilegt%2Fnuminer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkhasbilegt%2Fnuminer/lists"}