https://github.com/khasbilegt/numiner
MNIST like dataset creation tool for Handwritten Text Recognition.
https://github.com/khasbilegt/numiner
dataset-generation handwriting-recognition handwritten-character-recognition handwritten-digit-recognition machine-learning optical-character-recognition pypi-package python38
Last synced: 5 months ago
JSON representation
MNIST like dataset creation tool for Handwritten Text Recognition.
- Host: GitHub
- URL: https://github.com/khasbilegt/numiner
- Owner: khasbilegt
- License: mit
- Created: 2020-04-07T14:37:15.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2020-05-20T17:28:51.000Z (over 5 years ago)
- Last Synced: 2025-04-13T14:45:09.115Z (6 months ago)
- Topics: dataset-generation, handwriting-recognition, handwritten-character-recognition, handwritten-digit-recognition, machine-learning, optical-character-recognition, pypi-package, python38
- Language: Python
- Homepage:
- Size: 648 KB
- Stars: 4
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
NUMiner
Installation •
How To Use •
Sheet •
Contributing •
LicenseThis is a Python library that creates MNIST like training dataset for Handwritten Text Recognition related researches
## Installation
Use the package manager [pip](https://pip.pypa.io/en/stable/) to install numiner.
```bash
$ pip install numiner
```Use the package manager [pipenv](https://pypi.org/project/pipenv/) to install numiner.
```bash
$ pipenv install numiner
```Use the package manager [poetry](https://pypi.org/project/poetry/) to install numiner.
```bash
$ poetry add numiner
```## How To Use
In general, the package has two main modes. One is `sheet` and another one is `letter`.
`sheet` - takes a path called `` to a folder that's holding all the scanned _sheet_ images or an actual image path and saves the processed images in the `` path
```bash
$ numiner -s/--sheet
````letter` - takes a path called `` to a folder that's holding all the cropped raw images or an actual image path and saves the processed images in the `` path
```bash
$ numiner -l/--letter
```Also you can override the default sheet labels by giving `json` file:
```bash
$ numiner --labels path/to/labels.json -s path/to/source path/to/result
```For sure you can also do this:
```bash
$ numiner --helpusage: numiner [-h] [-v] [-s ] [-l ] [-c ]
optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit
--clean
-s/--sheet a path to a folder or file that's holding the
sheet image(s) & a path to a folder where all
images will be saved
-l/--letter a path to a folder or a file that's holding the cropped
image(s) & a path to a folder where all images
will be saved
--labels a path to .json file that's holding top to bottom, left
to right labels of the sheet with their ids
``````bash
$ numiner convert --helpusage: numiner convert [-h] -p SIZE RATIO
positional arguments:
SIZE number of images that each class contains
RATIO test, train or percentage of the test data
in that case the rest of it will become
train dataoptional arguments:
-h, --help show this help message and exit
-p , --paths
source and destination paths
```## Sample Sheet image
![]()
You can also get the empty sheet file from [here](assets/sheet.pdf).
## Extracted letters from the sheet
![]()
## Final image processing order
Followed the same approach that EMNIST used when they were first creating their dataset from NIST SD images.
1. Letter extracted from the sheet
2. Binary version of original image
3. Letter itself fitted into a square shape plus 2 pixel wide borders on each side without losing the aspect ratio
4. From previous step, image resized to 28x28 and taken threshold results in final image
![]()
![]()
![]()
![]()
## Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Please make sure to update tests as appropriate.
If you want to read more about how this project came to life, you can check out my [thesis report](https://github.com/khasbilegt/thesis-report/blob/master/main.pdf).
## License
[MIT](https://choosealicense.com/licenses/mit/)