{"id":13936255,"url":"https://github.com/aryaminus/saram","last_synced_at":"2025-07-06T10:33:29.681Z","repository":{"id":29581145,"uuid":"120932555","full_name":"aryaminus/saram","owner":"aryaminus","description":"Get OCR in txt form from an image or pdf extension supporting multiple files from directory using pytesseract with auto rotation for wrong orientation. PYPI:","archived":false,"fork":false,"pushed_at":"2022-12-27T15:33:41.000Z","size":35,"stargazers_count":51,"open_issues_count":5,"forks_count":18,"subscribers_count":8,"default_branch":"master","last_synced_at":"2024-04-27T23:33:05.820Z","etag":null,"topics":["character-recognition","chmod","image","ocr","orientation-detection","pdf","pillow","pyocr","pytesseract","python","tesseract","wand"],"latest_commit_sha":null,"homepage":"https://pypi.python.org/pypi/saram","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/aryaminus.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-02-09T17:01:54.000Z","updated_at":"2024-03-07T20:47:51.000Z","dependencies_parsed_at":"2023-01-14T15:14:19.727Z","dependency_job_id":null,"html_url":"https://github.com/aryaminus/saram","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aryaminus%2Fsaram","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aryaminus%2Fsaram/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aryaminus%2Fsaram/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aryaminus%2Fsaram/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/aryaminus","download_url":"https://codeload.github.com/aryaminus/saram/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250379807,"owners_count":21420841,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["character-recognition","chmod","image","ocr","orientation-detection","pdf","pillow","pyocr","pytesseract","python","tesseract","wand"],"created_at":"2024-08-07T23:02:31.396Z","updated_at":"2025-04-23T06:11:24.960Z","avatar_url":"https://github.com/aryaminus.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# Saram - Image/PDF OCR detection system\nGet OCR in txt form from an image or pdf extension supporting multiple files from directory using `pytesseract` with support for rotation in case of wrong orientation along.\n\n**Currently in beta state**\n\nFollow: \u003ca href=\"https://youtu.be/YF6Tf7qOXU4\" target=\"_blank\"\u003eDemo run\u003c/a\u003e\n\n[![Saram features](https://i.imgur.com/M9dAwPq.gif)](https://youtu.be/YF6Tf7qOXU4)\n\n**Note:**\nMake sure you have a OCR tool like `tesseract` and certain data value for comparing OCR, eg `tesseract-data-eng` along with `Pillow` and `Wand` for image conversion and loading which will be fetched during pip install.\n\n**For using in python**:\nRefer to the \u003ca href=\"https://github.com/aryaminus/saram/tree/py-module\" target=\"_blank\"\u003epy-module\u003c/a\u003e branch\n\n## Installation\n\nInstall using PIP:\n```\n$ pip install saram\n$ saram \u003cdirname\u003e\n```\n***else***\n\nClone the source locally:\n```\n$ git clone https://github.com/aryaminus/saram\n$ cd saram\n$ git checkout py-module\n$ python main.py \u003cdirname\u003e\n```\n\n## Todo\n- [x] Add support for PDF by PDF -\u003e Image -\u003e Txt with converted image deletion after processing\n- [x] Double check for orientation in case of image and PDF\n- [x] Make a PIP package\n- [ ] Add NLP to process the most repeated frequent characters to filer content\n- [ ] Add Cloud Vision support for effective character recognization\n- [ ] Suppot for GUI using tkinter\n\n## Reference\n1. \u003ca href=\"https://github.com/lucab85/PDFtoTXT\" target=\"_blank\"\u003epdf-to-txt\u003c/a\u003e\n2. \u003ca href=\"https://github.com/prabhakar267/ocr-convert-image-to-text\" target=\"_blank\"\u003eocr-convert-image-to-text\u003c/a\u003e\n3. \u003ca href=\"https://pastebin.com/QFMpp28T\" target=\"_blank\"\u003efix-image-rotation\u003c/a\u003e\n4. \u003ca href=\"https://python-packaging.readthedocs.io/en/latest/minimal.html\" target=\"_blank\"\u003epython-packaging \u003c/a\u003e\n\n\n-----------------------------------------------------------------------------------------------------------\n\n## Contributing\n\n1. Fork it (\u003chttps://github.com/aryaminus/saram/fork\u003e)\n2. Create your feature branch (`git checkout -b feature/fooBar`)\n3. Commit your changes (`git commit -am 'Add some fooBar'`)\n4. Push to the branch (`git push origin feature/fooBar`)\n5. Create a new Pull Request\n\n**Enjoy!**\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faryaminus%2Fsaram","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faryaminus%2Fsaram","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faryaminus%2Fsaram/lists"}