{"id":13599817,"url":"https://github.com/robertknight/ocrs","last_synced_at":"2025-05-13T22:00:18.672Z","repository":{"id":215085639,"uuid":"738044381","full_name":"robertknight/ocrs","owner":"robertknight","description":"Rust library and CLI tool for OCR (extracting text from images)","archived":false,"fork":false,"pushed_at":"2025-05-08T06:51:22.000Z","size":987,"stargazers_count":1500,"open_issues_count":18,"forks_count":63,"subscribers_count":13,"default_branch":"main","last_synced_at":"2025-05-08T07:06:28.743Z","etag":null,"topics":["computer-vision","machine-learning","ocr"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/robertknight.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE-APACHE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-01-02T09:32:23.000Z","updated_at":"2025-05-08T06:51:25.000Z","dependencies_parsed_at":"2024-04-28T10:22:48.543Z","dependency_job_id":"7666e831-91c7-442d-a47a-2005b6ccecd4","html_url":"https://github.com/robertknight/ocrs","commit_stats":{"total_commits":266,"total_committers":8,"mean_commits":33.25,"dds":0.09774436090225569,"last_synced_commit":"1613ffe7e5bdc7dba10591f863aec8fc62382c43"},"previous_names":["robertknight/ocrs"],"tags_count":19,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/robertknight%2Focrs","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/robertknight%2Focrs/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/robertknight%2Focrs/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/robertknight%2Focrs/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/robertknight","download_url":"https://codeload.github.com/robertknight/ocrs/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254036806,"owners_count":22003651,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computer-vision","machine-learning","ocr"],"created_at":"2024-08-01T17:01:12.438Z","updated_at":"2025-05-13T22:00:18.582Z","avatar_url":"https://github.com/robertknight.png","language":"Rust","readme":"# Ocrs\n\n**ocrs** is a Rust library and CLI tool for extracting text from images, also known as OCR (Optical Character Recognition).\n\nThe goal is to create a modern OCR engine that:\n\n - Works well on a wide variety of images (scanned documents, photos containing\n   text, screenshots etc.) with zero or much less preprocessing effort compared\n   to earlier engines like [Tesseract][tesseract]. This is achieved by using\n   machine learning more extensively in the pipeline.\n - Is easy to compile and run across a variety of platforms, including\n   WebAssembly\n - Is trained on open and liberally licensed datasets\n - Has a codebase that is easy to understand and modify\n\nUnder the hood, the library uses neural network models trained in\n[PyTorch][pytorch], which are then exported to [ONNX][onnx] and executed using\nthe [RTen][rten] engine. See the [models](#models-and-datasets) section for\nmore details.\n\n[onnx]: https://onnx.ai\n[pytorch]: https://pytorch.org\n[rten]: https://github.com/robertknight/rten\n[tesseract]: https://github.com/tesseract-ocr/tesseract\n\n## Status\n\nocrs is currently in an early preview. Expect more errors than commercial OCR\nengines.\n\n## Language Support\n\nocrs currently recognizes the Latin alphabet only (eg. English). Support for\nmore languages is [planned](https://github.com/robertknight/ocrs/issues/8). \n\n## CLI installation\n\nTo install the CLI tool, you will first need Rust and Cargo installed. Then\nrun:\n\n```sh\n$ cargo install ocrs-cli --locked\n```\n\n## CLI usage\n\nTo extract text from an image, run:\n\n```sh\n$ ocrs image.png\n```\n\nWhen the tool is run for the first time, it will download the required models\nautomatically and store them in `~/.cache/ocrs`.\n\n### Additional examples\n\nExtract text from an image and write to `content.txt`:\n\n```sh\n$ ocrs image.png -o content.txt\n```\n\nExtract text and layout information from the image in JSON format:\n\n```sh\n$ ocrs image.png --json -o content.json\n```\n\nAnnotate an image to show the location of detected words and lines:\n\n```sh\n$ ocrs image.png --png -o annotated.png\n````\n\n## Library usage\n\nSee the [ocrs crate README](ocrs/) for details on how to use ocrs as a Rust\nlibrary.\n\n## Models and datasets\n\nocrs uses neural network models written in PyTorch. See the\n[ocrs-models](https://github.com/robertknight/ocrs-models) repository for more\ndetails about the models and datasets, as well as tools for training custom\nmodels. These models are also available in ONNX format for use with other\nmachine learning runtimes.\n\n## Development\n\nTo build and run the ocrs library and CLI tool locally you will need a recent\nstable Rust version installed. Then run:\n\n```sh\ngit clone https://github.com/robertknight/ocrs.git\ncd ocrs\ncargo run -p ocrs-cli -r -- image.png\n```\n\n### Testing\n\nOcrs has unit tests for the code that runs before and after ML model processing,\nplus E2E tests which exercise the whole pipeline, including models.\n\nAfter making changes to the code, run unit tests and lint checks with:\n\n```sh\nmake check\n```\n\nYou can also run standard commands like `cargo test` directly.\n\nRun the E2E tests with:\n\n```sh\nmake test-e2e\n```\n\nFor details of how the ML models are evaluated, see the\n[ocrs-models](https://github.com/robertknight/ocrs-models) repository.\n","funding_links":[],"categories":["Rust"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frobertknight%2Focrs","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frobertknight%2Focrs","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frobertknight%2Focrs/lists"}