{"id":16938608,"url":"https://github.com/shreeshrii/kraken_devanagari","last_synced_at":"2025-03-21T05:42:15.834Z","repository":{"id":85548858,"uuid":"242759996","full_name":"Shreeshrii/kraken_devanagari","owner":"Shreeshrii","description":"Kraken models for Devanagari","archived":false,"fork":false,"pushed_at":"2020-03-03T04:39:51.000Z","size":46249,"stargazers_count":2,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-01-26T02:32:02.523Z","etag":null,"topics":["devanagari","kraken","ocr","sanskrit","training-data"],"latest_commit_sha":null,"homepage":null,"language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Shreeshrii.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-02-24T14:39:12.000Z","updated_at":"2025-01-04T14:21:19.000Z","dependencies_parsed_at":"2023-03-17T19:01:27.525Z","dependency_job_id":null,"html_url":"https://github.com/Shreeshrii/kraken_devanagari","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Shreeshrii%2Fkraken_devanagari","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Shreeshrii%2Fkraken_devanagari/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Shreeshrii%2Fkraken_devanagari/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Shreeshrii%2Fkraken_devanagari/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Shreeshrii","download_url":"https://codeload.github.com/Shreeshrii/kraken_devanagari/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244745715,"owners_count":20503048,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["devanagari","kraken","ocr","sanskrit","training-data"],"created_at":"2024-10-13T21:01:11.174Z","updated_at":"2025-03-21T05:42:15.815Z","avatar_url":"https://github.com/Shreeshrii.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"# kraken_devanagari\nExperimental Devanagari Recognition model for [kraken](https://github.com/mittagessen/kraken). \n\n## devanew_best.mlmodel\nRecognizer for Devanagari script for kraken (uses the old `bbox` type segmentation)\n\n### Training\n\nThe model was trained using `kraken version 2.0.8` on synthetic training data (line images from ground truth text files and fonts) generated using [tesseract's text2image](https://github.com/tesseract-ocr/tesseract) and [kraken's linegen](https://github.com/mittagessen/kraken/blob/master/kraken/linegen.py). See [log](https://github.com/Shreeshrii/kraken_devanagari/blob/master/devanew.log) for details of training.\n\n* Training set 38761 lines, \n* Validation set 4307 lines, \n* Alphabet 133 symbols. \n* Accuracy on Validation set - 0.9795386542342217.\n\nSample of training data used is available in `devatrain` and `legacytrain` directories. \n\nComplete manifest of training data is available in [devanew-manifest.txt](https://github.com/Shreeshrii/kraken_devanagari/blob/master/devanew-manifest.txt).\n\n### Evaluation \n\nThe model was evaluated on similar line images and had average accuracy of approximately 95%.\n\n* devatest - 95.48% Accuracy\n* legacytest - 95.48% Accuracy\n\n### Conclusions\n\nThe segmentation algorithm of kraken is suited for Latin script and fails for certain types of Devanagari page images. \n\nThe accuracy on page images with typefaces unlike the images in training data will be lower.\n\nThe model can be further finetuned based on requirements eg. for one particular font or for one particular scanned book, which will give better accuracy. \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshreeshrii%2Fkraken_devanagari","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fshreeshrii%2Fkraken_devanagari","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshreeshrii%2Fkraken_devanagari/lists"}