{"id":13825424,"url":"https://github.com/hay/audio2text","last_synced_at":"2025-04-14T19:53:13.365Z","repository":{"id":152571205,"uuid":"615274383","full_name":"hay/audio2text","owner":"hay","description":"Python command line utility wrappers for Whispercpp and other speech-to-text utilities","archived":false,"fork":false,"pushed_at":"2023-09-21T13:06:10.000Z","size":617,"stargazers_count":11,"open_issues_count":0,"forks_count":4,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-28T08:12:03.365Z","etag":null,"topics":["speech-recognition","speech-to-text","stt","whisper","whisper-cpp"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hay.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-03-17T10:30:25.000Z","updated_at":"2024-11-14T00:14:22.000Z","dependencies_parsed_at":null,"dependency_job_id":"602ab350-2885-4c2d-aa8a-97e32b823b36","html_url":"https://github.com/hay/audio2text","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hay%2Faudio2text","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hay%2Faudio2text/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hay%2Faudio2text/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hay%2Faudio2text/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hay","download_url":"https://codeload.github.com/hay/audio2text/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248951937,"owners_count":21188420,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["speech-recognition","speech-to-text","stt","whisper","whisper-cpp"],"created_at":"2024-08-04T09:01:20.635Z","updated_at":"2025-04-14T19:53:13.335Z","avatar_url":"https://github.com/hay.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# audio2text\n\u003e Python command line utility wrappers for Whispercpp and other speech-to-text utilities\n\n## Introduction\nThis is mainly a set of useful scripts to automate Whispercpp processing, including:\n* Automatic conversion of any video or audio format `ffmpeg` supports to the WAV format Whispercpp needs.\n* Conversion of the transcripts to CSV, SRT, TXT, VTT and WTS (karaoke subtitles).\n\n## Install\n1. You need to have a working executable version of [whisper.cpp](https://github.com/ggerganov/whisper.cpp),\n   you can either place that in the root of this repo as `whispercpp` or give the\n   path using the `-w` flag\n2. Place your models in the `models` folder. By default `audio2text.py` will\n   look for `ggml-large.bin` in the models folder. You can also use the `-m` flag to give a path to the model.\n3. `ffmpeg` is required and should be somewhere in your `$PATH`\n4. You might want to make a virtual environment and then install the `requirements.txt`, e.g.\n\n```bash\npython -m venv .env\nsource .env/bin/activate\npip install -U pip\npip install -r requirements.txt\n./audio2text.py\n```\n\n## Usage\n\n### `audio2text.py`\nTo convert the given `berliner.ogg` file in the test directory to a CSV file (note that you don't need to give an extension)\n```bash\n./audio2text.py -i test/berliner.ogg -o test/berliner -of csv\n```\n\nIf you want multiple output formats you can separate them by comma\n```bash\n./audio2text.py -i test/berliner.ogg -o test/berliner -of srt,txt\n```\n\nWhen giving the argument `all` to the `-of/--output-format` flag all Whisper-supported formats will be written\n```bash\n./audio2text.py -i test/berliner.ogg -o test/berliner -of all\n```\n\nTo prevent duplication of all possible command line options for `whisper.cpp` you can use the `-wa` / `--whisper-args` flag to pass extra command line options to the whisper.cpp executable:\n\n```bash\n./audio2text.py -i test/berliner.ogg -o test/berliner -of csv -wa=\"--threads 8\"\n```\n\nYou can also use the `-u/--url` flag to give an URL to a MP3 file (or any other audio format `ffmpeg` supports). This will be downloaded to the `tmp` directory.\n\n```bash\n./audio2text.py -u https://www.bykr.org/test/berliner.mp3\n```\n\nTo enable logging to a file use the `-lf/--log-file` flag, optionally combined with the `-v/--verbose` flag:\n\n```bash\n./audio2text.py -u https://www.bykr.org/test/berliner.mp3 -v -lf log.txt\n```\n\nTo write all files and the log file to a non-existing directory you can use the `-od/--output-directory` flag:\n\n```bash\n./audio2text.py -u https://www.bykr.org/test/berliner.mp3 -of all -o out/text -v -lf out/log.log -od out\n```\n\n### `srtparse.py`\nConverts SRT files to JSON, CSV and TXT using [dataknead](github.com/hay/dataknead).\n```bash\n./srtparse.py -i test/berliner.srt -o test/berliner.csv\n```\n\n## Troubleshooting\nIf you add the `-v` (verbose) flag `audio2text` will give much more debug information.\n\n## All options\nYou'll get this when doing `audio2text.py -h`\n\n```\nusage: audio2text.py [-h] [-di] [-i INPUT] [-l LANGUAGE] [-lf LOG_FILE]\n                     [-m MODEL_PATH] [-o OUTPUT] [-od OUTPUT_DIRECTORY]\n                     [-of OUTPUT_FORMAT] [-kt] [-u URL] [-v] [-w WHISPER_PATH]\n                     [-wa WHISPER_ARGS]\n\noptions:\n  -h, --help            show this help message and exit\n  -di, --diarize        Diarize audio (only works for natural stereo audio)\n  -i INPUT, --input INPUT\n                        Audio file to transcribe, anything that ffmpeg\n                        supports will work\n  -l LANGUAGE, --language LANGUAGE\n                        Language of audio file, if not given Whisper will try\n                        to autodetect this\n  -lf LOG_FILE, --log-file LOG_FILE\n                        Log messages to a logging file with this path, will\n                        fail if the directory does not exist (use --od to\n                        prevent that)\n  -m MODEL_PATH, --model-path MODEL_PATH\n                        Path to model you want to use for transcribing\n  -o OUTPUT, --output OUTPUT\n                        Path to output file, you don't need to give an\n                        extension\n  -od OUTPUT_DIRECTORY, --output-directory OUTPUT_DIRECTORY\n                        When giving this argument, a directory will be created\n                        before all other commands are run\n  -of OUTPUT_FORMAT, --output-format OUTPUT_FORMAT\n                        Output format, when giving 'all', all formats will be\n                        used\n  -kt, --keep-temp-files\n                        Keep temporary files after transcribing (default is to\n                        remove them)\n  -u URL, --url URL     Give a URL to an audio file to download (e.g. mp3)\n  -v, --verbose         Print debug information\n  -w WHISPER_PATH, --whisper-path WHISPER_PATH\n                        Path to the Whisper executable (defaults to\n                        ./whispercpp)\n  -wa WHISPER_ARGS, --whisper-args WHISPER_ARGS\n                        Give a string of extra parameters to give to the\n                        whisper executable\n ```\n\n## License\nMIT \u0026copy; [Hay Kranen](http://www.haykranen.nl)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhay%2Faudio2text","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhay%2Faudio2text","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhay%2Faudio2text/lists"}