https://github.com/strcoder4007/ancient-ocr

Scripts for generating dataset and converting ancient languages to modern languages.
https://github.com/strcoder4007/ancient-ocr

Last synced: 2 months ago
JSON representation

Scripts for generating dataset and converting ancient languages to modern languages.

Host: GitHub
URL: https://github.com/strcoder4007/ancient-ocr
Owner: strcoder4007
Created: 2025-01-03T12:24:18.000Z (4 months ago)
Default Branch: main
Last Pushed: 2025-02-17T08:44:26.000Z (2 months ago)
Last Synced: 2025-02-17T09:34:21.541Z (2 months ago)
Language: Python
Size: 142 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: readme.md

Awesome Lists containing this project

README

## How to run [New]
1. Start by running ```extract_images.py``` to extract images from excel file.
2. Then run ```crop_images.py``` to get cropped images using contours.
2. Then run ```process_images.py``` to clean the images and generate augmentations.
3. Run ```split.py``` to split data into train and validation set.

## How to run [OLD]
1. Start by running ```get_data.py```, this will get the data from hugging face wiki and save top 50,000 words in ```kaithi_50000.txt```
2. Then run ```process_images.py``` and ```generate_labels.py```
3. Run ```split.py``` to split data into train and validation set.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/strcoder4007/ancient-ocr

Awesome Lists containing this project

README