https://github.com/strcoder4007/ancient-ocr
Scripts for generating dataset and converting ancient languages to modern languages.
https://github.com/strcoder4007/ancient-ocr
Last synced: 2 months ago
JSON representation
Scripts for generating dataset and converting ancient languages to modern languages.
- Host: GitHub
- URL: https://github.com/strcoder4007/ancient-ocr
- Owner: strcoder4007
- Created: 2025-01-03T12:24:18.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2025-02-17T08:44:26.000Z (2 months ago)
- Last Synced: 2025-02-17T09:34:21.541Z (2 months ago)
- Language: Python
- Size: 142 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: readme.md
Awesome Lists containing this project
README
## How to run [New]
1. Start by running ```extract_images.py``` to extract images from excel file.
2. Then run ```crop_images.py``` to get cropped images using contours.
2. Then run ```process_images.py``` to clean the images and generate augmentations.
3. Run ```split.py``` to split data into train and validation set.## How to run [OLD]
1. Start by running ```get_data.py```, this will get the data from hugging face wiki and save top 50,000 words in ```kaithi_50000.txt```
2. Then run ```process_images.py``` and ```generate_labels.py```
3. Run ```split.py``` to split data into train and validation set.