{"id":13545587,"url":"https://github.com/thiagoalessio/tesseract-ocr-for-php","last_synced_at":"2025-05-14T22:03:44.619Z","repository":{"id":2948836,"uuid":"3962354","full_name":"thiagoalessio/tesseract-ocr-for-php","owner":"thiagoalessio","description":"A wrapper to work with Tesseract OCR inside PHP.","archived":false,"fork":false,"pushed_at":"2025-03-25T08:54:04.000Z","size":1139,"stargazers_count":2961,"open_issues_count":5,"forks_count":552,"subscribers_count":117,"default_branch":"main","last_synced_at":"2025-05-07T21:13:14.947Z","etag":null,"topics":["image-to-text","ocr","php","tesseract","text-recognition"],"latest_commit_sha":null,"homepage":"https://packagist.org/packages/thiagoalessio/tesseract_ocr","language":"PHP","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/thiagoalessio.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":".github/CONTRIBUTING.md","funding":null,"license":"MIT-LICENSE","code_of_conduct":".github/CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2012-04-08T04:20:07.000Z","updated_at":"2025-05-07T01:34:06.000Z","dependencies_parsed_at":"2024-01-07T21:14:09.472Z","dependency_job_id":"e0127f06-c24f-480f-91bb-9680df9ccd13","html_url":"https://github.com/thiagoalessio/tesseract-ocr-for-php","commit_stats":{"total_commits":345,"total_committers":26,"mean_commits":13.26923076923077,"dds":0.2115942028985507,"last_synced_commit":"2de7ef79000f527ef633c1368c1dedbb12cd9d56"},"previous_names":[],"tags_count":49,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thiagoalessio%2Ftesseract-ocr-for-php","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thiagoalessio%2Ftesseract-ocr-for-php/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thiagoalessio%2Ftesseract-ocr-for-php/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thiagoalessio%2Ftesseract-ocr-for-php/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/thiagoalessio","download_url":"https://codeload.github.com/thiagoalessio/tesseract-ocr-for-php/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254235686,"owners_count":22036962,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["image-to-text","ocr","php","tesseract","text-recognition"],"created_at":"2024-08-01T11:01:06.228Z","updated_at":"2025-05-14T22:03:44.571Z","avatar_url":"https://github.com/thiagoalessio.png","language":"PHP","funding_links":[],"categories":["PHP","光学字符识别OCR","Software"],"sub_categories":["资源传输下载","OCR libraries by programming language"],"readme":"# Tesseract OCR for PHP\n\nA wrapper to work with Tesseract OCR inside PHP.\n\n[![CI][ci_badge]][ci]\n[![AppVeyor][appveyor_badge]][appveyor]\n[![Codacy][codacy_badge]][codacy]\n[![Test Coverage][test_coverage_badge]][test_coverage]\n\u003cbr/\u003e\n[![Latest Stable Version][stable_version_badge]][packagist]\n[![Total Downloads][total_downloads_badge]][packagist]\n[![Monthly Downloads][monthly_downloads_badge]][packagist]\n\n## Installation\n\nVia [Composer][]:\n\n    $ composer require thiagoalessio/tesseract_ocr\n\n:bangbang: **This library depends on [Tesseract OCR][], version _3.02_ or later.**\n\n\u003cbr/\u003e\n\n### ![][windows_icon] Note for Windows users\n\nThere are [many ways][tesseract_installation_on_windows] to install\n[Tesseract OCR][] on your system, but if you just want something quick to\nget up and running, I recommend installing the [Capture2Text][] package with\n[Chocolatey][].\n\n    choco install capture2text --version 3.9\n\n:warning: Recent versions of [Capture2Text][] stopped shipping the `tesseract` binary.\n\n\u003cbr/\u003e\n\n### ![][macos_icon] Note for macOS users\n\nWith [MacPorts][] you can install support for individual languages, like so:\n\n    $ sudo port install tesseract-\u003clangcode\u003e\n\nBut that is not possible with [Homebrew][]. It comes only with **English** support\nby default, so if you intend to use it for other language, the quickest solution\nis to install them all:\n\n    $ brew install tesseract tesseract-lang\n\n\u003cbr/\u003e\n\n## Usage\n\n### Basic usage\n\n\u003cimg align=\"right\" width=\"50%\" title=\"The quick brown fox jumps over the lazy dog.\" src=\"./tests/EndToEnd/images/text.png\"/\u003e\n\n```php\nuse thiagoalessio\\TesseractOCR\\TesseractOCR;\necho (new TesseractOCR('text.png'))\n    -\u003erun();\n```\n\n```\nThe quick brown fox\njumps over\nthe lazy dog.\n```\n\n\u003cbr/\u003e\n\n### Other languages\n\n\u003cimg align=\"right\" width=\"50%\" title=\"Bülowstraße\" src=\"./tests/EndToEnd/images/german.png\"/\u003e\n\n```php\nuse thiagoalessio\\TesseractOCR\\TesseractOCR;\necho (new TesseractOCR('german.png'))\n    -\u003elang('deu')\n    -\u003erun();\n```\n\n```\nBülowstraße\n```\n\n\u003cbr/\u003e\n\n### Multiple languages\n\n\u003cimg align=\"right\" width=\"50%\" title=\"I eat すし y Pollo\" src=\"./tests/EndToEnd/images/mixed-languages.png\"/\u003e\n\n```php\nuse thiagoalessio\\TesseractOCR\\TesseractOCR;\necho (new TesseractOCR('mixed-languages.png'))\n    -\u003elang('eng', 'jpn', 'spa')\n    -\u003erun();\n```\n\n```\nI eat すし y Pollo\n```\n\n\u003cbr/\u003e\n\n### Inducing recognition\n\n\u003cimg align=\"right\" width=\"50%\" title=\"8055\" src=\"./tests/EndToEnd/images/8055.png\"/\u003e\n\n```php\nuse thiagoalessio\\TesseractOCR\\TesseractOCR;\necho (new TesseractOCR('8055.png'))\n    -\u003eallowlist(range('A', 'Z'))\n    -\u003erun();\n```\n\n```\nBOSS\n```\n\n\u003cbr/\u003e\n\n### Breaking CAPTCHAs\n\nYes, I know some of you might want to use this library for the *noble* purpose\nof breaking CAPTCHAs, so please take a look at this comment:\n\n\u003chttps://github.com/thiagoalessio/tesseract-ocr-for-php/issues/91#issuecomment-342290510\u003e\n\n## API\n\n### run\n\nExecutes a `tesseract` command, optionally receiving an integer as `timeout`,\nin case you experience stalled tesseract processes.\n\n```php\n$ocr = new TesseractOCR();\n$ocr-\u003erun();\n```\n```php\n$ocr = new TesseractOCR();\n$timeout = 500;\n$ocr-\u003erun($timeout);\n```\n\n### image\n\nDefine the path of an image to be recognized by `tesseract`.\n\n```php\n$ocr = new TesseractOCR();\n$ocr-\u003eimage('/path/to/image.png');\n$ocr-\u003erun();\n```\n\n### imageData\n\nSet the image to be recognized by `tesseract` from a string, with its size.\nThis can be useful when dealing with files that are already loaded in memory.\nYou can easily retrieve the image data and size of an image object :\n```php\n//Using Imagick\n$data = $img-\u003egetImageBlob();\n$size = $img-\u003egetImageLength();\n//Using GD\nob_start();\n// Note that you can use any format supported by tesseract\nimagepng($img, null, 0);\n$size = ob_get_length();\n$data = ob_get_clean();\n\n$ocr = new TesseractOCR();\n$ocr-\u003eimageData($data, $size);\n$ocr-\u003erun();\n```\n\n### executable\n\nDefine a custom location of the `tesseract` executable,\nif by any reason it is not present in the `$PATH`.\n\n```php\necho (new TesseractOCR('img.png'))\n    -\u003eexecutable('/path/to/tesseract')\n    -\u003erun();\n```\n\n### version\n\nReturns the current version of `tesseract`.\n\n```php\necho (new TesseractOCR())-\u003eversion();\n```\n\n### availableLanguages\n\nReturns a list of available languages/scripts.\n\n```php\nforeach((new TesseractOCR())-\u003eavailableLanguages() as $lang) echo $lang;\n```\n\n__More info:__ \u003chttps://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc#languages-and-scripts\u003e\n\n### tessdataDir\n\nSpecify a custom location for the tessdata directory.\n\n```php\necho (new TesseractOCR('img.png'))\n    -\u003etessdataDir('/path')\n    -\u003erun();\n```\n\n### userWords\n\nSpecify the location of user words file.\n\nThis is a plain text file containing a list of words that you want to be\nconsidered as a normal dictionary words by `tesseract`.\n\nUseful when dealing with contents that contain technical terminology, jargon,\netc.\n\n```\n$ cat /path/to/user-words.txt\nfoo\nbar\n```\n\n```php\necho (new TesseractOCR('img.png'))\n    -\u003euserWords('/path/to/user-words.txt')\n    -\u003erun();\n```\n\n### userPatterns\n\nSpecify the location of user patterns file.\n\nIf the contents you are dealing with have known patterns, this option can help\na lot tesseract's recognition accuracy.\n\n```\n$ cat /path/to/user-patterns.txt'\n1-\\d\\d\\d-GOOG-441\nwww.\\n\\\\\\*.com\n```\n\n```php\necho (new TesseractOCR('img.png'))\n    -\u003euserPatterns('/path/to/user-patterns.txt')\n    -\u003erun();\n```\n\n### lang\n\nDefine one or more languages to be used during the recognition.\nA complete list of available languages can be found at:\n\u003chttps://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc#languages\u003e\n\n__Tip from [@daijiale][]:__ Use the combination `-\u003elang('chi_sim', 'chi_tra')`\nfor proper recognition of Chinese.\n\n```php\n echo (new TesseractOCR('img.png'))\n     -\u003elang('lang1', 'lang2', 'lang3')\n     -\u003erun();\n```\n\n### psm\n\nSpecify the Page Segmentation Method, which instructs `tesseract` how to\ninterpret the given image.\n\n__More info:__ \u003chttps://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality#page-segmentation-method\u003e\n\n```php\necho (new TesseractOCR('img.png'))\n    -\u003epsm(6)\n    -\u003erun();\n```\n\n### oem\n\nSpecify the OCR Engine Mode. (see `tesseract --help-oem`)\n\n```php\necho (new TesseractOCR('img.png'))\n    -\u003eoem(2)\n    -\u003erun();\n```\n\n### dpi\n\nSpecify the image DPI. It is useful if your image does not contain this information in its metadata.\n\n```php\necho (new TesseractOCR('img.png'))\n    -\u003edpi(300)\n    -\u003erun();\n```\n\n### allowlist\n\nThis is a shortcut for `-\u003econfig('tessedit_char_whitelist', 'abcdef....')`.\n\n```php\necho (new TesseractOCR('img.png'))\n    -\u003eallowlist(range('a', 'z'), range(0, 9), '-_@')\n    -\u003erun();\n```\n\n### configFile\n\nSpecify a config file to be used. It can either be the path to your own\nconfig file or the name of one of the predefined config files:\n\u003chttps://github.com/tesseract-ocr/tesseract/tree/master/tessdata/configs\u003e\n\n```php\necho (new TesseractOCR('img.png'))\n    -\u003econfigFile('hocr')\n    -\u003erun();\n```\n\n### setOutputFile\n\nSpecify an Outputfile to be used. Be aware: If you set an outputfile then\nthe option `withoutTempFiles` is ignored.\nTempfiles are written (and deleted) even if `withoutTempFiles = true`.\n\nIn combination with `configFile` you are able to get the `hocr`, `tsv` or\n`pdf` files.\n\n```php\necho (new TesseractOCR('img.png'))\n    -\u003econfigFile('pdf')\n    -\u003esetOutputFile('/PATH_TO_MY_OUTPUTFILE/searchable.pdf')\n    -\u003erun();\n```\n\n### digits\n\nShortcut for `-\u003econfigFile('digits')`.\n\n```php\necho (new TesseractOCR('img.png'))\n    -\u003edigits()\n    -\u003erun();\n```\n\n### hocr\n\nShortcut for `-\u003econfigFile('hocr')`.\n\n```php\necho (new TesseractOCR('img.png'))\n    -\u003ehocr()\n    -\u003erun();\n```\n\n### pdf\n\nShortcut for `-\u003econfigFile('pdf')`.\n\n```php\necho (new TesseractOCR('img.png'))\n    -\u003epdf()\n    -\u003erun();\n```\n\n### quiet\n\nShortcut for `-\u003econfigFile('quiet')`.\n\n```php\necho (new TesseractOCR('img.png'))\n    -\u003equiet()\n    -\u003erun();\n```\n\n### tsv\n\nShortcut for `-\u003econfigFile('tsv')`.\n\n```php\necho (new TesseractOCR('img.png'))\n    -\u003etsv()\n    -\u003erun();\n```\n\n### txt\n\nShortcut for `-\u003econfigFile('txt')`.\n\n```php\necho (new TesseractOCR('img.png'))\n    -\u003etxt()\n    -\u003erun();\n```\n\n### tempDir\n\nDefine a custom directory to store temporary files generated by tesseract.\nMake sure the directory actually exists and the user running `php` is allowed\nto write in there.\n\n```php\necho (new TesseractOCR('img.png'))\n    -\u003etempDir('./my/custom/temp/dir')\n    -\u003erun();\n```\n\n### withoutTempFiles\n\nSpecify that `tesseract` should output the recognized text without writing to temporary files.\nThe data is gathered from the standard output of `tesseract` instead.\n\n```php\necho (new TesseractOCR('img.png'))\n    -\u003ewithoutTempFiles()\n    -\u003erun();\n```\n\n### Other options\n\nAny configuration option offered by Tesseract can be used like that:\n\n```php\necho (new TesseractOCR('img.png'))\n    -\u003econfig('config_var', 'value')\n    -\u003econfig('other_config_var', 'other value')\n    -\u003erun();\n```\n\nOr like that:\n\n```php\necho (new TesseractOCR('img.png'))\n    -\u003econfigVar('value')\n    -\u003eotherConfigVar('other value')\n    -\u003erun();\n```\n\n__More info:__ \u003chttps://github.com/tesseract-ocr/tesseract/wiki/ControlParams\u003e\n\n### Thread-limit\n\nSometimes, it may be useful to limit the number of threads that tesseract is\nallowed to use (e.g. in [this case](https://github.com/tesseract-ocr/tesseract/issues/898)).\nSet the maxmium number of threads as param for the `run` function:\n\n```php\necho (new TesseractOCR('img.png'))\n    -\u003ethreadLimit(1)\n    -\u003erun();\n```\n\n## How to contribute\n\nYou can contribute to this project by:\n\n* Opening an [Issue][] if you found a bug or wish to propose a new feature;\n* Placing a [Pull Request][] with code that fix a bug, missing/wrong documentation\n  or implement a new feature;\n\nJust make sure you take a look at our [Code of Conduct][] and [Contributing][]\ninstructions.\n\n## License\n\ntesseract-ocr-for-php is released under the [MIT License][].\n\n\n\u003ch2\u003e\u003c/h2\u003e\u003cp align=\"center\"\u003e\u003csub\u003eMade with \u003csub\u003e\u003ca href=\"#\"\u003e\u003cimg src=\"https://thiagoalessio.github.io/tesseract-ocr-for-php/images/heart.svg\" alt=\"love\" width=\"14px\"/\u003e\u003c/a\u003e\u003c/sub\u003e in Berlin\u003c/sub\u003e\u003c/p\u003e\n\n[ci_badge]: https://github.com/thiagoalessio/tesseract-ocr-for-php/workflows/CI/badge.svg?event=push\u0026branch=main\n[ci]: https://github.com/thiagoalessio/tesseract-ocr-for-php/actions?query=workflow%3ACI\n[appveyor_badge]: https://ci.appveyor.com/api/projects/status/xwy5ls0798iwcim3/branch/main?svg=true\n[appveyor]: https://ci.appveyor.com/project/thiagoalessio/tesseract-ocr-for-php/branch/main\n[codacy_badge]: https://app.codacy.com/project/badge/Grade/a81aa10012874f23a57df5b492d835f2\n[codacy]: https://app.codacy.com/gh/thiagoalessio/tesseract-ocr-for-php/dashboard\n[test_coverage_badge]: https://codecov.io/gh/thiagoalessio/tesseract-ocr-for-php/branch/main/graph/badge.svg?token=Y0VnrqiSIf\n[test_coverage]: https://codecov.io/gh/thiagoalessio/tesseract-ocr-for-php\n[stable_version_badge]: https://img.shields.io/packagist/v/thiagoalessio/tesseract_ocr.svg\n[packagist]: https://packagist.org/packages/thiagoalessio/tesseract_ocr\n[total_downloads_badge]: https://img.shields.io/packagist/dt/thiagoalessio/tesseract_ocr.svg\n[monthly_downloads_badge]: https://img.shields.io/packagist/dm/thiagoalessio/tesseract_ocr.svg\n[Tesseract OCR]: https://github.com/tesseract-ocr/tesseract\n[Composer]: http://getcomposer.org/\n[windows_icon]: https://thiagoalessio.github.io/tesseract-ocr-for-php/images/windows-18.svg\n[macos_icon]: https://thiagoalessio.github.io/tesseract-ocr-for-php/images/apple-18.svg\n[tesseract_installation_on_windows]: https://github.com/tesseract-ocr/tesseract/wiki#windows\n[Capture2Text]: https://chocolatey.org/packages/capture2text\n[Chocolatey]: https://chocolatey.org\n[MacPorts]: https://www.macports.org\n[Homebrew]: https://brew.sh\n[@daijiale]: https://github.com/daijiale\n[HOCR]: https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage#hocr-output\n[TSV]: https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage#tsv-output-currently-available-in-305-dev-in-master-branch-on-github\n[Issue]: https://github.com/thiagoalessio/tesseract-ocr-for-php/issues\n[Pull Request]: https://github.com/thiagoalessio/tesseract-ocr-for-php/pulls\n[Code of Conduct]: https://github.com/thiagoalessio/tesseract-ocr-for-php/blob/main/.github/CODE_OF_CONDUCT.md\n[Contributing]: https://github.com/thiagoalessio/tesseract-ocr-for-php/blob/main/.github/CONTRIBUTING.md\n[MIT License]: https://github.com/thiagoalessio/tesseract-ocr-for-php/blob/main/MIT-LICENSE\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthiagoalessio%2Ftesseract-ocr-for-php","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fthiagoalessio%2Ftesseract-ocr-for-php","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthiagoalessio%2Ftesseract-ocr-for-php/lists"}