{"id":26668830,"url":"https://github.com/inc44/manrad","last_synced_at":"2026-04-30T06:37:47.363Z","repository":{"id":281418203,"uuid":"945219774","full_name":"Inc44/ManRad","owner":"Inc44","description":null,"archived":false,"fork":false,"pushed_at":"2025-03-08T23:35:59.000Z","size":0,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-09T00:23:23.137Z","etag":null,"topics":["gemini","google","manga","ocr","openai","reader","tts","video"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Inc44.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-03-08T23:32:32.000Z","updated_at":"2025-03-08T23:36:53.000Z","dependencies_parsed_at":"2025-03-09T00:23:25.416Z","dependency_job_id":"a0bfc668-3449-4566-9e81-0531bdaf51d3","html_url":"https://github.com/Inc44/ManRad","commit_stats":null,"previous_names":["inc44/manrad"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Inc44%2FManRad","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Inc44%2FManRad/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Inc44%2FManRad/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Inc44%2FManRad/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Inc44","download_url":"https://codeload.github.com/Inc44/ManRad/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245549893,"owners_count":20633865,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["gemini","google","manga","ocr","openai","reader","tts","video"],"created_at":"2025-03-25T21:35:13.078Z","updated_at":"2026-04-30T06:37:42.322Z","avatar_url":"https://github.com/Inc44.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ManRad\n\nManRad is an experimental AI project that attempts to read aloud manga, manhwa, and manhua from images (or, as I call it, \"bullshit spaghetti code that somehow works\"). The goal is to determine whether it is possible to replace voice acting and achieve comparable quality. To accomplish this, it leverages OCR tools such as PaddleOCR and Gemini for text detection, LLMs with vision support from DeepInfra, Gemini, OpenAI, and Mistral for text recognition, and TTS models such as Fish Speech, OpenAI, Kokoro, Hyperbolic, and Lemon, while exploring alternatives like CSM, Edge, Melo, OpenVoice, and Saifs (possibly Oute, Orpheus, and XTTS) to find the highest quality and most cost-effective solution. We are also exploring cloud computing opportunities using Hyperbolic and Vast.ai. This code may eventually be rewritten entirely in Python or C++ (possibly Rust or Go).\n\n## Install\n\nSet your API keys as environment variables:\n```batch\nsetx /M DEEPINFRA_API_KEY \"\"\nsetx /M GEMINI_API_KEY \"\"\nsetx /M LEMON_API_KEY \"\"\nsetx /M MELO_API_KEY \"\"\nsetx /M MISTRAL_API_KEY \"\"\nsetx /M OPENAI_API_KEY \"\"\nsetx /M OPENROUTER_API_KEY \"\"\n```\n\nVerify that your environment variables are set:\n```batch\necho %DEEPINFRA_API_KEY%\necho %GEMINI_API_KEY%\necho %LEMON_API_KEY%\necho %MELO_API_KEY%\necho %MISTRAL_API_KEY%\necho %OPENAI_API_KEY%\necho %OPENROUTER_API_KEY%\n```\n\nCreate and activate a new Conda environment, then install the required packages:\n```bash\nconda create --name ManRad python=3.10 -y\nconda activate ManRad\nconda install paddlepaddle-gpu==3.0.0b1 paddlepaddle-cuda=12.3 -c paddle -c nvidia -y\npip install -r requirements.txt\n```\n\nClone the repository:\n```bash\ngit clone https://github.com/Inc44/ManRad.git\n```\n\n## Usage\n\nNavigate to the project directory and run the scripts:\n```bash\ncd ManRad\npython -OO menu.py ACTION --source PATH --mode save/delete\n```\n\n## Fish Speech\n\n### Install\n\n#### Docker\n\n```\ndocker run -it --name fish-speech --gpus all -p 8080:8080 fishaudio/fish-speech:v1.5.0 zsh\n```\n\n##### Usage\n\n```\npython -m tools.api_server --listen 0.0.0.0:8080 --compile\n```\n\n#### Conda\n```\nconda create -n fish-speech python=3.10 -y\nconda activate fish-speech\ngit clone https://github.com/fishaudio/fish-speech.git\ncd fish-speech\npip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu121\npip install -e .\npip install https://github.com/AnyaCoder/fish-speech/releases/download/v0.1.0/triton_windows-0.1.0-py3-none-any.whl\nhuggingface-cli download fishaudio/fish-speech-1.5 --local-dir checkpoints/fish-speech-1.5\n```\n\n##### Usage\n\n```\nconda activate fish-speech\ncd fish-speech\npython -O tools/api_server.py --listen 0.0.0.0:8080 --compile\n```\n\n## Manga Source\n\n1. Use **Kotatsu** or **HakuNeko** (on Android or PC) to download the full manga.\n2. Navigate to the Android `data` folder or your PC's `Documents` folder.\n3. Locate the `.cbz` or image files.\n4. Copy the directory or files into the project directory.\n\n### Kotatsu\n\n```bash\njava -jar ./kotatsu-dl.jar mangadex.org --format=cbz|zip|dir --dest ManRad\n```\n\n### HakuNeko\n\n- Manga List \u003e Website \u003e ReManga\n- Settings:\n    - Manga Directory: `ManRad`\n    - Chapter File Format: `Comic Book Archive` or `Folder with Images`\n    - De-Scrambling Format: `JPEG`\n    - De-Scrambling Quality: `100`\n\n## Achievements\n\n- Achieved relatively good, inexpensive text recognition using LLaMA 4 Scout on DeepInfra for just $0.35 for 15,000 images, and it is quite fast.\n- Achieved somewhat unstable, local, multilingual audio using Fish Speech 1.5, which takes approximately 5 hours and 30 minutes to generate about 14 hours of audio on an RTX 4060 Ti 16GB, which is a bit too slow.\n- Achieved stable but expensive audio using OpenAI TTS-1, costing about $8 for the same task.\n\n| Name        | Price      | Type       | Eng Only | Voice Clone |\n|-------------|------------|------------|----------|-------------|\n| 11ElevenLabs| $100       | API        | NO       | YES         |\n| MINIMAX     | $18.5      | API        | NO       | YES         |\n| OpenAI      | $4         | API        | NO       | NO          |\n| GPT-4o      | $3         | API        | NO       | NO          |\n| Lemonfox    | $1.5       | API        | YES      | NO          |\n| Melo        | $1.3/Free  | API/Local  | YES      | NO          |\n| Kokoro      | $0.5/Free  | API/Local  | YES      | NO          |\n| Bark        | Free       | Local      | NO       | NO          |\n| CSM         | Free       | Local      | YES      | YES         |\n| E2/F5       | Free       | Local      | YES      | YES         |\n| Edge        | Free*      | API        | NO       | NO          |\n| OpenVoice   | Free       | Local      | YES      | YES         |\n| Parler      | Free       | Local      | YES      | NO          |\n| Saifs       | Free*      | API        | NO       | NO          |\n| Spark       | Free       | Local      | YES      | YES         |\n| XTTS        | Free       | Local      | NO       | YES         |\n\n## Problems\n\n### Bugs\n\n- Image sorting issues need to be resolved, likely using natural sort and image extensions.\n- Scrolling depends on fade.\n\n### TODO/Not Implemented\n\n- Ensure transition duration works for scrolling.\n- Generate silent audio proportional to the estimated duration of missing audio, calculated based on the image text length.\n- Improve performance.\n- Provide an option to return full-page text detection instead of cropped sections (regression).\n- Specify the index of selected entries in lists.\n\n### Failed Features\n\n- Add translation functionality.\n- Automatically detect target height and width using the most common image dimensions.\n- Enable camera movement.\n- Fix zoom bouncing.\n- Implement a neural network classifier for manga pages.\n- Improve height and width detection without loading the entire image.\n\n## License\n\n[![CC BY-NC-SA 4.0][cc-by-nc-sa-shield]][cc-by-nc-sa]\n\nThis work is licensed under a\n[Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License][cc-by-nc-sa].\n\n[![CC BY-NC-SA 4.0][cc-by-nc-sa-image]][cc-by-nc-sa]\n\n[cc-by-nc-sa]: http://creativecommons.org/licenses/by-nc-sa/4.0/\n[cc-by-nc-sa-image]: https://licensebuttons.net/l/by-nc-sa/4.0/88x31.png\n[cc-by-nc-sa-shield]: https://img.shields.io/badge/License-CC%20BY--NC--SA%204.0-lightgrey.svg\n\n## Support\n\n![BuyMeACoffee](https://img.shields.io/badge/Buy%20Me%20a%20Coffee-ffdd00?style=for-the-badge\u0026logo=buy-me-a-coffee\u0026logoColor=black)\n![Ko-Fi](https://img.shields.io/badge/Ko--fi-F16061?style=for-the-badge\u0026logo=ko-fi\u0026logoColor=white)\n![Patreon](https://img.shields.io/badge/Patreon-F96854?style=for-the-badge\u0026logo=patreon\u0026logoColor=white)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finc44%2Fmanrad","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Finc44%2Fmanrad","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finc44%2Fmanrad/lists"}