{"id":21040985,"url":"https://github.com/chfoo/tppocr","last_synced_at":"2025-08-01T12:09:06.393Z","repository":{"id":145909873,"uuid":"81492656","full_name":"chfoo/tppocr","owner":"chfoo","description":"Tesseract OCR of Pokemon dialog text on streaming video.","archived":false,"fork":false,"pushed_at":"2021-02-09T03:07:48.000Z","size":2104,"stargazers_count":5,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-03T11:43:53.610Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/chfoo.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-02-09T20:38:21.000Z","updated_at":"2023-12-31T01:25:42.000Z","dependencies_parsed_at":null,"dependency_job_id":"5e170e85-1161-4455-86bb-5c6a2cb33715","html_url":"https://github.com/chfoo/tppocr","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chfoo%2Ftppocr","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chfoo%2Ftppocr/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chfoo%2Ftppocr/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chfoo%2Ftppocr/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/chfoo","download_url":"https://codeload.github.com/chfoo/tppocr/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254377502,"owners_count":22061159,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-19T13:49:33.669Z","updated_at":"2025-05-15T16:33:32.520Z","avatar_url":"https://github.com/chfoo.png","language":"Python","readme":"TPPOCR\n======\n\nTesseract OCR of Pokemon dialog text on streaming video.\n\nThis project contains scripts and training data needed for running OCR on live streams such as TwitchPlaysPokemon.\n\nIt has two language data files:\n\n* `pkmngb_en`: English training data for Gameboy Pokemon games such as Red, Blue, Gold, Silver, Crystal.\n* `pkmngba_en`: English training data for Gameboy Advanced / DS Pokemon games such as Ruby, Saphire, Emerald, FireRed, Diamond, Pearl, Platinum.\n\nYou may be interested in [PokeTEXT](https://github.com/rctgamer3/poketext) too.\n\nIf you just want to use the training data, please skip to the bottom of this document.\n\n**Note**: This project was designed for Tesseract version 3. For version 4, please see ~~[tppocr2](https://github.com/chfoo/tppocr2)~~ [tppocr3](https://github.com/chfoo/tppocr3).\n\nInstall\n=======\n\nRequirements:\n\n* [Python](https://www.python.org/downloads/) 3.4+\n* [pip](https://pip.pypa.io/en/stable/installing/) (for installing Python modules)\n* [Pillow](https://pillow.readthedocs.io/en/4.0.x/installation.html) (PIL fork)\n* [redis-py](https://github.com/andymccurdy/redis-py)\n* [Tesseract](https://github.com/tesseract-ocr/tesseract/wiki/Downloads)\n* [tesserocr](https://github.com/sirfz/tesserocr) 3.04 (Python bindings to Teseract)\n* [Redis](https://redis.io/download)\n* [FFmpeg](https://ffmpeg.org/download.html) 2.8+\n* [Livestreamer](http://docs.livestreamer.io/install.html)\n\nTo run TPPOCR, it's recommended to use a Linux OS.\n\nOn Debian/Ubuntu, run the following to install stable versions provided by Debian/Ubuntu:\n\n        apt-get install build-essential tesseract-ocr libtesseract-dev libleptonica-dev cython3 python3 redis-server python3-pip python3-pil\n\nOn Debian/Ubuntu, run the following to install the latest Python library versions on your home directory:\n\n        pip3 install tesserocr redis livestreamer --user\n\nDownload a recent [static build](https://www.johnvansickle.com/ffmpeg/) of FFmpeg:\n\n        wget http://example.com/PUT_URL_HERE_TO/ffmpeg-release-64bit-static.tar.xz\n        tar -xJv ffmpeg-release-64bit-static.tar.xz\n\nTPPOCR will require running Livestreamer and FFmpeg separately. Ensure these files are in `PATH` environment variable:\n\n* `~/.local/bin/livestreamer`\n* `ffprobe`\n* `ffmpeg`\n\nYou can do this by editing your shell profile or by prefixing `PATH=$PATH:~/.local/bin:~/bin/` to commands.\n\nFor Twitch streams, Livestreamer will require an Client-ID or OAUTH token. OAUTH token can be [specified in the config file](http://docs.livestreamer.io/twitch_oauth.html). You can generate one using `livestreamer --twitch-oauth-authenticate`. (Keep your token secret!)\n\nEnsure Redis is not exposed to the internet by checking `/etc/redis/6379.conf`. By default on Debian/Ubuntu, it uses `bind 127.0.0.1`. \n\nFinally, grab TPPOCR from git:\n\n        git clone https://URL_TO_GITHUB_HERE/USERNAME/tppocr\n\nSince TPPOCR is meant to run as a bunch of scripts, it does not currently have an install file.\n\n\nUsage\n=====\n\nThe basic structure of the command to start the OCR process is:\n\n        python3 -m tppocr config.ini\n\nIn addition, the command may need extra environment variables. For example, if tppocr is the current directory:\n\n        PYTHONPATH=./ TESSDATA_PREFIX=./ python3 -m tppocr config.ini\n\n* `PYTHONPATH` is the directory of the tppocr project directory. It should contain the tppocr package directory.\n* `TESSDATA_PREFIX` is directory containing the `tessdata` directory. `tessdata` contains the TPPOCR training data files.\n\nSee the example configuration files for details on setting them.\n\nTo run the web interface, install [Tornado](http://www.tornadoweb.org/en/stable/) and run:\n\n        pip3 install tornado --user\n        python3 -m tppocr.web\n\nAdd `--help` to see available settings. If you want to expose this to the Internet, run it behind a web server with websocket support. Tornado has suggestions [here](http://www.tornadoweb.org/en/stable/guide/running.html). Nginx config to enable websocket is described [here](https://www.nginx.com/blog/websocket-nginx/).\n\nTo save the data, you can use the following:\n\n        python3 -m tppocr.pub.textfile log_dir/\n\n\nStandalone\n----------\n\nIf you simply want to use the training data with Tesseract, copy the traineddata file into the Tesseract data directory.\n\nOr you can run it by specifying the project directory. For example, to read a cropped image of a timestamp:\n\n        tesseract --tessdata-dir ~/Documents/tppocr/tessdata/ -l pkmngba_en screenshot_cropped.jpg stdout /usr/share/tesseract-ocr/tessdata/configs/digits\n\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchfoo%2Ftppocr","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchfoo%2Ftppocr","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchfoo%2Ftppocr/lists"}