{"id":20288216,"url":"https://github.com/tomodachi94/hydrus-ocr","last_synced_at":"2026-05-29T01:32:20.468Z","repository":{"id":220612791,"uuid":"752107106","full_name":"tomodachi94/hydrus-ocr","owner":"tomodachi94","description":"[Maintenance mode] Retrieve files from Hydrus Network and run them through OCR.","archived":false,"fork":false,"pushed_at":"2026-01-02T08:28:27.000Z","size":46,"stargazers_count":2,"open_issues_count":4,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-01-08T06:26:31.997Z","etag":null,"topics":["hydrus","hydrus-network","hydrusnetwork","ocr","optical-character-recognition"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tomodachi94.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-02-03T03:37:08.000Z","updated_at":"2026-01-02T08:28:30.000Z","dependencies_parsed_at":"2024-10-23T02:06:59.219Z","dependency_job_id":null,"html_url":"https://github.com/tomodachi94/hydrus-ocr","commit_stats":null,"previous_names":["tomodachi94/hydrus-ocr"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/tomodachi94/hydrus-ocr","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tomodachi94%2Fhydrus-ocr","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tomodachi94%2Fhydrus-ocr/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tomodachi94%2Fhydrus-ocr/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tomodachi94%2Fhydrus-ocr/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tomodachi94","download_url":"https://codeload.github.com/tomodachi94/hydrus-ocr/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tomodachi94%2Fhydrus-ocr/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33633468,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-28T02:00:06.440Z","response_time":99,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["hydrus","hydrus-network","hydrusnetwork","ocr","optical-character-recognition"],"created_at":"2024-11-14T14:44:55.131Z","updated_at":"2026-05-29T01:32:20.460Z","avatar_url":"https://github.com/tomodachi94.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003e [!WARNING]  \n\u003e I no longer use Hydrus Network. While I'll try to do my best to fix critical bugs, I won't be adding new features.\n\n# hydrus-ocr\n\nThis project runs [OCR](https://en.wikipedia.org/wiki/Optical_character_recognition) on images located in [Hydrus Network](https://hydrusnetwork.github.io/hydrus/) using an external daemon and a third-party library.\n\n\u003e [!CAUTION]\n\u003e I am not liable if this destroys your data. **[Make backups regularly](https://hydrusnetwork.github.io/hydrus/getting_started_installing.html#backing_up)**.\n\n\n## Setup\n\n### In Hydrus\n1. Create a tag service in Hydrus. It can be called whatever you like, but we recommend `ocr` so you remember what it's for. Save the service key for later.\n2. [Enable the client API](https://hydrusnetwork.github.io/hydrus/client_api.html#enabling_the_api).\n3. Create a client API access key (documented above). Give it the `edit file notes`, `edit file tags`, and `search for and fetch files` permissions. Save the service key for later.\n\n### In your server environment\n1. Install `hydrus-ocr` and its Python dependencies:\n    - **Recommended:** `pipx install https://github.com/tomodachi94/hydrus-ocr/releases/download/v0.2.0/hydrus_ocr-0.2.0-py3-none-any.whl` to install using [pipx](https://pipx.pypa.io/latest/) which isolates hydrus-ocr into its own virtualenv.\n    - Alternatively, `pip install https://github.com/tomodachi94/hydrus-ocr/releases/download/v0.2.0/hydrus_ocr-0.2.0-py3-none-any.whl` to install using classic pip.\n2. Install either [`tesseract`/`libtesseract`](https://github.com/tesseract-ocr/tesseract?tab=readme-ov-file#installing-tesseract) or [`cuneiform`](https://launchpad.net/cuneiform-linux) and ensure it is available on your `$PATH`.\n3. Copy `env.example` to `.env` (or to another place where you can set environment variables) and fill in the values.\n4. Run the daemon using `python3 -m hydrus_ocr daemon`. If you want to get fancy, you can configure it to start up automatically with `systemd`, but that is outside of the scope of these docs.\n    * If you only want to run this once (e.g. for running this with `cron`), run `python3 -m hydrus_ocr singular`.\n\n## Usage\n1. Select a file (or a bunch of files!) and right-click them. Select `manage \u003e tags`, select `ocr` (or the name you selected for the tag service), and add the `ocr wanted` tag to the file(s). Apply the changes.\n2. Wait for the daemon to do its job. Depending on the number of files queued, it could take a bit to OCR the files.\n3. Profit. Check the notes for the file; look for a note titled `ocr`.\n\n\n## Configuration\nThis program is configured entirely through environment variables. Here's what they do:\n* `HYDRUS_OCR_ACCESS_KEY`: The access key for the client API. This is a long hexadecimal string.\n* `HYDRUS_OCR_API_URL`: The base URL for the client API. This looks like `http://localhost:45869` by default.\n* `HYDRUS_OCR_TAG_SERVICE_KEY`: The service key for the tag service. This is a long hexadecimal string.\n* `HYDRUS_OCR_LOOP_DELAY`: This controls the frequency at which the program checks for files to OCR. The default value causes a check every 10 seconds; increase or decrease depending on how many requests your Hydrus server can handle at once.\n* `HYDRUS_OCR_LANGUAGE`: The language to OCR the text in (defaults to English). See the [Tesseract documentation](https://tesseract-ocr.github.io/tessdoc/Data-Files) for a full list of languages. Make sure to install the language(s) you want if it isn't available by default. Multiple languages are supported by separating each with a plus (like `eng+deu+jpn`).\n\n## Errors\nThis is a glossary of all possible user-caused errors.\n\n### `MissingToolError`\nThe program couldn't find Tesseract or Cuneiform. See [§ Installation](#installation) for more information.\n\n### `MissingKeyError`\nThe program couldn't find the client API access key and/or the tag service key. See [§ Configuration](#configuration) for more information.\n\n## Changelog\nThe changelog is maintained in [`./CHANGELOG.md`](./CHANGELOG.md).\n\n## FAQ\n### Why should I trust you?\nYou shouldn't. You should read the source code yourself. I've tried to make the code as easy-to-read as possible, with docstrings for all (internal) functions and comments for ambiguous lines of code.\n\n### Why does this exist?\nI used Hydrus to store a large repository of screenshots of chat logs. I wanted to find a way to search their text, and this is the result.\n\n### Why is the quality of the text so bad?\nThis program uses Tesseract to do most of the heavy lifting. Tesseract is notoriously bad at OCRing specific types of images, as well as images of lower quality.\n\n### Why is this separate from Hydrus?\nAside from the fact that this would likely be rejected in a PR, OCR can be a resource-intensive operation, and I didn't want to risk the stability of my Hydrus application.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftomodachi94%2Fhydrus-ocr","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftomodachi94%2Fhydrus-ocr","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftomodachi94%2Fhydrus-ocr/lists"}