{"id":13824804,"url":"https://github.com/Saeed-Biabani/Scene-Text-Recognition","last_synced_at":"2025-07-08T19:33:13.317Z","repository":{"id":231912616,"uuid":"773467757","full_name":"Saeed-Biabani/Scene-Text-Recognition","owner":"Saeed-Biabani","description":"Text recognition (optical character recognition) with deep learning methods in farsi.","archived":false,"fork":false,"pushed_at":"2024-07-11T17:17:51.000Z","size":589,"stargazers_count":8,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-08-04T09:02:44.143Z","etag":null,"topics":["crnn","deep-learning","farsi","ocr","persian","persian-ocr","python","pytorch","text-recognition"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Saeed-Biabani.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-03-17T18:34:34.000Z","updated_at":"2024-08-04T09:02:49.537Z","dependencies_parsed_at":"2024-04-06T19:25:35.096Z","dependency_job_id":"a28ae455-6b21-43df-80c5-18485d6e21e1","html_url":"https://github.com/Saeed-Biabani/Scene-Text-Recognition","commit_stats":null,"previous_names":["saeed-biabani/scene-text-recognition"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Saeed-Biabani%2FScene-Text-Recognition","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Saeed-Biabani%2FScene-Text-Recognition/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Saeed-Biabani%2FScene-Text-Recognition/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Saeed-Biabani%2FScene-Text-Recognition/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Saeed-Biabani","download_url":"https://codeload.github.com/Saeed-Biabani/Scene-Text-Recognition/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":225457865,"owners_count":17477362,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crnn","deep-learning","farsi","ocr","persian","persian-ocr","python","pytorch","text-recognition"],"created_at":"2024-08-04T09:01:09.691Z","updated_at":"2024-11-20T02:31:11.428Z","avatar_url":"https://github.com/Saeed-Biabani.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003ch1 align=\"center\"\u003eScene Text Recognition\u003c/h1\u003e\n\u003c/p\u003e\n\nScene Text Recognition With Deep Learning Methods In Farsi.\n\n#### **Quick Links**\n- [Dependencies](#Dependencies)\n- [Getting Started](#Getting-Started)\n- [Overview](#Overview)\n- [Training](#Training)\n- [Samples](#Samples)\n- [References](#References)\n- [License](#License)\n\n## Dependencies\n- Install Dependencies `$ pip install -r requirements.txt`\n- Download Pretrained Weights [Here](https://huggingface.co/ordaktaktak/Scene-Text-Recognition)\n\n## Getting Started\n\n\u003cp align=\"center\"\u003e\n  \u003cdiv align=\"center\"\u003e\u003cimg src=\"figures/crnn.png\" height = 500 \u003e\u003c/div\u003e\n  \u003cdiv align=\"center\"\u003e\u003cfigcaption\u003e\u003cstrong\u003eFig. 1: Model architectur.\u003c/strong\u003e\u003c/figcaption\u003e\u003c/div\u003e\n\u003c/p\u003e\n\n- Project Structure\n```\n.\n├── src\n│   ├── nn\n│   │   ├── feature_extractor.py\n│   │   ├── layers.py\n│   │   └── ocr_model.py\n│   └── utils\n│       ├── dataset.py\n│       ├── labelConverter.py\n│       ├── loss_calculator.py\n│       ├── misc.py\n│       ├── trainUtils.py\n│       └── transforms.py\n├── config.py\n└── train.py\n```\n\n- place dataset path in `config.py` file.\n```python\nds_path = {\n    \"train_ds\" : \"path/to/train/dataset\",\n    \"test_ds\" : \"path/to/test/dataset\",\n}\n```\n\n- DataSet Structure (each image must eventually contain a word)\n```\n.\n├── Images\n│   ├── img_1.jpg\n│   ├── img_2.jpg\n│   ├── img_3.jpg\n│   ├── img_4.jpg\n│   └── img_5.jpg\n│   ...\n└── labels.json\n```\n\n- `labels.json` Contents\n```json\n{\"img_1\": \"بالا\", \"img_2\": \"و\", \"img_3\": \"بدانند\", \"img_4\": \"چندین\", \"img_5\": \"به\", ...}\n```\n## Overview\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"figures/ROC.png\"\u003e\n\u003c/p\u003e\n\n## Training\n\n### Objective Function\nDenote the training dataset by $\\ TD = \\langle X_i , Y_i \\rangle\\$ where $\\ X_i$ is the training image and $\\ Y_i$ is the word label. The training conducted by minimizing the objective function that negative log-likelihood of the conditional probability of word label.\n```math\nO = -\\sum_{(X_i, Y_i) \\in TD} \\log P(Y_i|X_i)\n```\nThis function calculates a cost from an image and its word label, and the modules in the framework are trained end-to-end manner.\n\n\u003cp align=\"center\"\u003e\n  \u003cdiv align=\"center\"\u003e\u003cimg src=\"figures/LearningCurve.png\"\u003e\u003c/div\u003e\n  \u003cdiv align=\"center\"\u003e\u003cfigcaption\u003e\u003cstrong\u003eFig. 1: Model Training History.\u003c/strong\u003e\u003c/figcaption\u003e\u003c/div\u003e\n\u003c/p\u003e\n\n### CTC Loss\nCTC takes a sequence $\\ H = h_1 , . . . , h_T$ , where $\\ T$ is the sequence length, and outputs the probability of $\\ \\pi$, which is defined as\n```math\nP(\\pi|H) = \\prod_{t = 1}^T y_{{\\pi}_t}^t\n```\nwhere $\\ y_{{\\pi}_t}^t$ is the probability of generating character $\\ \\pi_t$ at each time step $\\ t$.\n\n\u003cdiv align = \"center\"\u003e\n  \u003ctable\u003e\n    \u003ctr\u003e\n      \u003cth\u003eModel\u003c/th\u003e\n      \u003cth\u003eInput Size\u003c/th\u003e\n      \u003cth\u003eRecall\u003c/th\u003e\n      \u003cth\u003ePrecision\u003c/th\u003e\n      \u003cth\u003eF1\u003c/th\u003e\n      \u003cth\u003eParams\u003c/th\u003e\n      \u003cth\u003eSpeed\u003csup\u003e(img/s)\u003c/th\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd\u003e $\\ OCR-Base$ \u003c/td\u003e\n      \u003ctd\u003e $\\ 1$ $\\ \\times$ $\\ 64$ $\\ \\times$ $\\ 192$\u003c/td\u003e\n      \u003ctd\u003e $\\ 0.993$ \u003c/td\u003e\n      \u003ctd\u003e $\\ 0.997$ \u003c/td\u003e\n      \u003ctd\u003e $\\ 0.997$ \u003c/td\u003e\n      \u003ctd\u003e $\\ 35,023,143$ \u003c/td\u003e\n      \u003ctd\u003e $\\ 89.24$ \u003c/td\u003e\n    \u003c/tr\u003e\n  \u003c/table\u003e \n\u003c/div\u003e\n\n## Samples\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"figures/samples.png\"\u003e\n\u003c/p\u003e\n\n## References\n- [What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis](https://arxiv.org/abs/1904.01906)\n- [An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition](https://arxiv.org/abs/1507.05717)\n- [Text recognition (optical character recognition) with deep learning methods, ICCV 2019 ](https://github.com/clovaai/deep-text-recognition-benchmark)\n\n## 🛡️ License \u003ca name=\"license\"\u003e\u003c/a\u003e\nProject is distributed under [MIT License](https://github.com/Saeed-Biabani/Scene-Text-Recognition/blob/main/LICENSE)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FSaeed-Biabani%2FScene-Text-Recognition","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FSaeed-Biabani%2FScene-Text-Recognition","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FSaeed-Biabani%2FScene-Text-Recognition/lists"}