{"id":30420615,"url":"https://github.com/macabeus/pyslibtesseract","last_synced_at":"2025-09-08T07:41:58.619Z","repository":{"id":62583541,"uuid":"48309079","full_name":"macabeus/pyslibtesseract","owner":"macabeus","description":"✏️  Integration of Tesseract for Python using a shared library","archived":false,"fork":false,"pushed_at":"2016-03-25T17:40:49.000Z","size":34,"stargazers_count":12,"open_issues_count":1,"forks_count":2,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-08-18T12:54:49.445Z","etag":null,"topics":["hocr","ocr","tesseract"],"latest_commit_sha":null,"homepage":"https://pypi.org/project/pyslibtesseract/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/macabeus.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-12-20T05:54:06.000Z","updated_at":"2024-11-19T13:17:32.000Z","dependencies_parsed_at":"2022-11-03T21:48:49.090Z","dependency_job_id":null,"html_url":"https://github.com/macabeus/pyslibtesseract","commit_stats":null,"previous_names":["brunomacabeusbr/pyslibtesseract"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/macabeus/pyslibtesseract","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/macabeus%2Fpyslibtesseract","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/macabeus%2Fpyslibtesseract/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/macabeus%2Fpyslibtesseract/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/macabeus%2Fpyslibtesseract/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/macabeus","download_url":"https://codeload.github.com/macabeus/pyslibtesseract/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/macabeus%2Fpyslibtesseract/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":274152759,"owners_count":25231293,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-08T02:00:09.813Z","response_time":121,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["hocr","ocr","tesseract"],"created_at":"2025-08-22T08:19:39.521Z","updated_at":"2025-09-08T07:41:58.601Z","avatar_url":"https://github.com/macabeus.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# pyslibtesseract\nIntegration of Tesseract for Python using a shared library\n\n## To install\n\nFrom PyPI\n\n    sudo pip3 install pyslibtesseract\n    \nFrom Github\n\n    sudo apt-get install libtesseract-dev\n    sudo apt-get install libleptonica-dev\n    git clone https://github.com/brunomacabeusbr/pyslibtesseract.git\n    cd pyslibtesseract\n    cd src/cppcode/ \u0026\u0026 cmake . \u0026\u0026 make \u0026\u0026 cd ../.. \u0026\u0026 sudo python3 setup.py install\n\n## To use\n### Start\n\nYou must create a object of TesseractConfig:\n\n    config_single_char = TesseractConfig(psm=PageSegMode.PSM_SINGLE_CHAR)\n    config_line = TesseractConfig(psm=PageSegMode.PSM_SINGLE_LINE)\n    config_line_portuguese_brazilian = TesseractConfig(psm=PageSegMode.PSM_SINGLE_LINE, lang='pt-br')\n\nPossible PSM (page segmentation mode) are:\n\n    PSM_OSD_ONLY\n    PSM_AUTO_OSD\n    PSM_AUTO_ONLY\n    PSM_AUTO\n    PSM_SINGLE_COLUMN\n    PSM_SINGLE_BLOCK_VERT_TEX\n    PSM_SINGLE_BLOCK\n    PSM_SINGLE_LINE\n    PSM_SINGLE_WORD\n    PSM_CIRCLE_WORD\n    PSM_SINGLE_CHAR\n    PSM_SPARSE_TEXT\n    PSM_SPARSE_TEXT_OSD\n    PSM_COUNT\n\nYou can set \u003ca href=\"http://www.sk-spell.sk.cx/tesseract-ocr-parameters-in-302-version\"\u003evariables of Tesseract\u003c/a\u003e:\n\n    config_single_char.add_variable('tessedit_char_whitelist', 'QWERTYUIOPASDFGHJKLZXCVBNM')\n\n### Read\nThe first parameter is always the configuration and the second parameter is always the image path\n\nRead a pharese\n\n\u003cimg src=\"http://i.imgur.com/BqO7Cqh.png\"\u003e\n\n    \u003e\u003e\u003e LibTesseract.simple_read(config_line, 'phrase.png')\n    the book is on the table\n\nRead a pharese and say confidence in each sentence\n\n\u003cimg src=\"http://i.imgur.com/PInL9bB.png\"\u003e\n\n    \u003e\u003e\u003e LibTesseract.read_and_get_confidence_word(config_line, 'phrase.png')\n    [('he', 82.19984436035156), ('is', 84.98550415039062), ('readlnq', 75.25213623046875), ('the', 74.60755157470703), ('book', 85.8053207397461)]\n\nRead a char, say confidence and other possible characters\n\n\u003cimg src=\"http://i.imgur.com/J26XnmD.png\"\u003e\n\n    \u003e\u003e\u003e LibTesseract.read_and_get_confidence_char(config_single_char, 'char.png')\n    [('E', 58.27500915527344), ('Y', 56.93630599975586), ('F', 56.4453125), ('T', 51.12168884277344), ('Q', 47.19916534423828), ('W', 46.1181640625), ('V', 45.31656265258789), ('G', 43.49636459350586)]\n\n### hOCR\nIf you want a return with \u003ca href=\"https://en.wikipedia.org/wiki/HOCR\"\u003ehOCR\u003c/a\u003e format, you need a create config with `hocr=True`\n\n    \u003e\u003e\u003e config_line_with_hocr = TesseractConfig(psm=PageSegMode.PSM_SINGLE_LINE, hocr=True)\n    \nor edit a alredy exist config\n\n    \u003e\u003e\u003e config_line.hocr = True\n    \nThen, use a method `simple_read`\n\n    \u003e\u003e\u003e LibTesseract.simple_read(config_line_with_hocr, 'phrase.png')\n      \u003cdiv class='ocr_page' id='page_1' title='image \"\"; bbox 0 0 319 33; ppageno 0'\u003e\n       \u003cdiv class='ocr_carea' id='block_1_1' title=\"bbox 0 0 319 33\"\u003e\n        \u003cp class='ocr_par' dir='ltr' id='par_1_1' title=\"bbox 10 13 276 25\"\u003e\n         \u003cspan class='ocr_line' id='line_1_1' title=\"bbox 10 13 276 25; baseline 0 0\"\u003e\u003cspan class='ocrx_word' id='word_1_1'     title='bbox 10 14 41 25; x_wconf 75' lang='eng' dir='ltr'\u003e\u003cstrong\u003ethe\u003c/strong\u003e\u003c/span\u003e \u003cspan class='ocrx_word' id='word_1_2' title='bbox 53 13 97 25; x_wconf 84' lang='eng' dir='ltr'\u003e\u003cstrong\u003ebook\u003c/strong\u003e\u003c/span\u003e \u003cspan class='ocrx_word' id='word_1_3' title='bbox 111 13 129 25; x_wconf 79' lang='eng' dir='ltr'\u003e\u003cstrong\u003eis\u003c/strong\u003e\u003c/span\u003e \u003cspan class='ocrx_word' id='word_1_4' title='bbox 143 17 164 25; x_wconf 83' lang='eng' dir='ltr'\u003eon\u003c/span\u003e \u003cspan class='ocrx_word' id='word_1_5' title='bbox 178 14 209 25; x_wconf 75' lang='eng' dir='ltr'\u003e\u003cstrong\u003ethe\u003c/strong\u003e\u003c/span\u003e \u003cspan class='ocrx_word' id='word_1_6' title='bbox 223 14 276 25; x_wconf 76' lang='eng' dir='ltr'\u003e\u003cstrong\u003etable\u003c/strong\u003e\u003c/span\u003e \n         \u003c/span\u003e\n        \u003c/p\u003e\n       \u003c/div\u003e\n      \u003c/div\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmacabeus%2Fpyslibtesseract","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmacabeus%2Fpyslibtesseract","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmacabeus%2Fpyslibtesseract/lists"}