An open API service indexing awesome lists of open source software.

https://github.com/strcoder4007/ancient-ocr

Scripts for generating dataset and converting ancient languages to modern languages.
https://github.com/strcoder4007/ancient-ocr

Last synced: 2 months ago
JSON representation

Scripts for generating dataset and converting ancient languages to modern languages.

Awesome Lists containing this project

README

        

## How to run [New]
1. Start by running ```extract_images.py``` to extract images from excel file.
2. Then run ```crop_images.py``` to get cropped images using contours.
2. Then run ```process_images.py``` to clean the images and generate augmentations.
3. Run ```split.py``` to split data into train and validation set.

## How to run [OLD]
1. Start by running ```get_data.py```, this will get the data from hugging face wiki and save top 50,000 words in ```kaithi_50000.txt```
2. Then run ```process_images.py``` and ```generate_labels.py```
3. Run ```split.py``` to split data into train and validation set.