{"id":15318439,"url":"https://github.com/appleboy/go-whisper","last_synced_at":"2026-03-16T17:01:24.841Z","repository":{"id":174145441,"uuid":"651834560","full_name":"appleboy/go-whisper","owner":"appleboy","description":"Speech o Text using docker image with ggerganov/whisper.cpp","archived":false,"fork":false,"pushed_at":"2025-02-24T13:11:25.000Z","size":478,"stargazers_count":75,"open_issues_count":4,"forks_count":8,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-10T09:24:06.689Z","etag":null,"topics":["golang","openai","whisper","whisper-ai","whisper-cpp"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/appleboy.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null},"funding":{"github":null,"patreon":null,"open_collective":null,"ko_fi":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"otechie":null,"lfx_crowdfunding":null,"custom":["https://www.paypal.me/appleboy46"]}},"created_at":"2023-06-10T08:34:03.000Z","updated_at":"2025-04-01T16:12:41.000Z","dependencies_parsed_at":null,"dependency_job_id":"6fcfd56e-b68a-4a5a-b05b-df21b7e5ad9d","html_url":"https://github.com/appleboy/go-whisper","commit_stats":{"total_commits":129,"total_committers":3,"mean_commits":43.0,"dds":0.09302325581395354,"last_synced_commit":"53b9fb101f51bcf1d67ddd5a2a393f6de038606c"},"previous_names":["appleboy/go-whisper"],"tags_count":16,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/appleboy%2Fgo-whisper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/appleboy%2Fgo-whisper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/appleboy%2Fgo-whisper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/appleboy%2Fgo-whisper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/appleboy","download_url":"https://codeload.github.com/appleboy/go-whisper/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248325148,"owners_count":21084877,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["golang","openai","whisper","whisper-ai","whisper-cpp"],"created_at":"2024-10-01T09:00:09.656Z","updated_at":"2026-03-16T17:01:24.752Z","avatar_url":"https://github.com/appleboy.png","language":"Go","funding_links":["https://www.paypal.me/appleboy46"],"categories":[],"sub_categories":[],"readme":"# go-whisper\n\nDocker Image for Speech-to-Text using [ggerganov/whisper.cpp][1].\n\nThis Docker image provides a ready-to-use environment for converting speech to text using the [ggerganov/whisper.cpp][1] library. The whisper.cpp library is an open-source project that enables efficient and accurate speech recognition. By utilizing this Docker image, users can easily set up and run the speech-to-text conversion process without worrying about installing dependencies or configuring the system.\n\nThe Docker image includes all necessary components and dependencies, ensuring a seamless experience for users who want to leverage the power of the whisper.cpp library for their speech recognition needs. Simply pull the Docker image, run the container, and start converting your audio files into text with minimal effort.\n\nIn summary, this Docker image offers a convenient and efficient way to utilize the [ggerganov/whisper.cpp][1] library for speech-to-text conversion, making it an excellent choice for those looking to implement speech recognition in their projects.\n\n[1]:https://github.com/ggerganov/whisper.cpp\n\n## OpenAI's Whisper models converted to ggml format\n\nSee the [Available models][2].\n\n| Model      | Disk    | Mem       | SHA                                          |\n|------------|---------|-----------|----------------------------------------------|\n| tiny       | 75 MB   | ~390 MB   | bd577a113a864445d4c299885e0cb97d4ba92b5f     |\n| tiny.en    | 75 MB   | ~390 MB   | c78c86eb1a8faa21b369bcd33207cc90d64ae9df     |\n| base       | 142 MB  | ~500 MB   | 465707469ff3a37a2b9b8d8f89f2f99de7299dac     |\n| base.en    | 142 MB  | ~500 MB   | 137c40403d78fd54d454da0f9bd998f78703390c     |\n| small      | 466 MB  | ~1.0 GB   | 55356645c2b361a969dfd0ef2c5a50d530afd8d5     |\n| small.en   | 466 MB  | ~1.0 GB   | db8a495a91d927739e50b3fc1cc4c6b8f6c2d022     |\n| medium     | 1.5 GB  | ~2.6 GB   | fd9727b6e1217c2f614f9b698455c4ffd82463b4     |\n| medium.en  | 1.5 GB  | ~2.6 GB   | 8c30f0e44ce9560643ebd10bbe50cd20eafd3723     |\n| large-v1   | 2.9 GB  | ~4.7 GB   | b1caaf735c4cc1429223d5a74f0f4d0b9b59a299     |\n| large      | 2.9 GB  | ~4.7 GB   | 0f4c8e34f21cf1a914c59d8b3ce882345ad349d6     |\n\nFor more information see [ggerganov/whisper.cpp][3].\n\n[2]: https://huggingface.co/ggerganov/whisper.cpp/tree/main\n[3]: https://github.com/ggerganov/whisper.cpp/tree/master/models\n\n## Prepare\n\nDownload the model you want to use and put it in the `models` directory.\n\n```sh\ncurl -LJ https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-small.bin \\\n  --output models/ggml-small.bin\n```\n\n## Usage\n\nPlease follow these simplified instructions to transcribe the audio file using a Docker container:\n\n1. Ensure that you have a `testdata` directory containing the `jfk.wav` file.\n2. Mount both the `models` and `testdata` directories to the Docker container.\n3. Specify the model using the `--model` flag and the audio file path using the `--audio-path` flag.\n4. The transcript result file will be saved in the same directory as the audio file.\n\nTo transcribe the audio file, execute the command provided below.\n\n```sh\ndocker run \\\n  -v $PWD/models:/app/models \\\n  -v $PWD/testdata:/app/testdata \\\n  ghcr.io/appleboy/go-whisper:latest \\\n  --model /app/models/ggml-small.bin \\\n  --audio-path /app/testdata/jfk.wav\n```\n\nSee the following output:\n\n```sh\nwhisper_init_from_file_no_state: loading model from '/app/models/ggml-small.bin'\nwhisper_model_load: loading model\nwhisper_model_load: n_vocab       = 51865\nwhisper_model_load: n_audio_ctx   = 1500\nwhisper_model_load: n_audio_state = 768\nwhisper_model_load: n_audio_head  = 12\nwhisper_model_load: n_audio_layer = 12\nwhisper_model_load: n_text_ctx    = 448\nwhisper_model_load: n_text_state  = 768\nwhisper_model_load: n_text_head   = 12\nwhisper_model_load: n_text_layer  = 12\nwhisper_model_load: n_mels        = 80\nwhisper_model_load: ftype         = 1\nwhisper_model_load: qntvr         = 0\nwhisper_model_load: type          = 3\nwhisper_model_load: mem required  =  743.00 MB (+   16.00 MB per decoder)\nwhisper_model_load: adding 1608 extra tokens\nwhisper_model_load: model ctx     =  464.68 MB\nwhisper_model_load: model size    =  464.44 MB\nwhisper_init_state: kv self size  =   15.75 MB\nwhisper_init_state: kv cross size =   52.73 MB\n1:46AM INF system_info: n_threads = 8 / 8 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | VSX = 0 | COREML = 0 | \n module=transcript\nwhisper_full_with_state: auto-detected language: en (p = 0.967331)\n1:46AM INF [    0s -\u003e    11s] And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country. module=transcript\n```\n\ncommand line arguments:\n| Options               | Description                                                | Default Value     |\n|-----------------------|------------------------------------------------------------|-------------------|\n| --model               | Model is the interface to a whisper model                    | [$PLUGIN_MODEL, $INPUT_MODEL] |\n| --audio-path          | audio path                                                 | [$PLUGIN_AUDIO_PATH, $INPUT_AUDIO_PATH] |\n| --output-folder       | output folder                                              | [$PLUGIN_OUTPUT_FOLDER, $INPUT_OUTPUT_FOLDER] |\n| --output-format       | output format, support txt, srt, csv                        | (default: \"txt\") [$PLUGIN_OUTPUT_FORMAT, $INPUT_OUTPUT_FORMAT] |\n| --output-filename     | output filename                                            | [$PLUGIN_OUTPUT_FILENAME, $INPUT_OUTPUT_FILENAME] |\n| --language            | Set the language to use for speech recognition             | (default: \"auto\") [$PLUGIN_LANGUAGE, $INPUT_LANGUAGE] |\n| --threads             | Set number of threads to use                                | (default: 8) [$PLUGIN_THREADS, $INPUT_THREADS] |\n| --debug               | enable debug mode                                          | (default: false) [$PLUGIN_DEBUG, $INPUT_DEBUG] |\n| --speedup             | speed up audio by x2 (reduced accuracy)                     | (default: false) [$PLUGIN_SPEEDUP, $INPUT_SPEEDUP] |\n| --translate           | translate from source language to english                   | (default: false) [$PLUGIN_TRANSLATE, $INPUT_TRANSLATE] |\n| --print-progress      | print progress                                             | (default: true) [$PLUGIN_PRINT_PROGRESS, $INPUT_PRINT_PROGRESS] |\n| --print-segment       | print segment                                              | (default: false) [$PLUGIN_PRINT_SEGMENT, $INPUT_PRINT_SEGMENT] |\n| --webhook-url         | webhook url                                                | [$PLUGIN_WEBHOOK_URL, $INPUT_WEBHOOK_URL] |\n| --webhook-insecure    | webhook insecure                                           | (default: false) [$PLUGIN_WEBHOOK_INSECURE, $INPUT_WEBHOOK_INSECURE] |\n| --webhook-headers     | webhook headers                                            | [$PLUGIN_WEBHOOK_HEADERS, $INPUT_WEBHOOK_HEADERS] |\n| --youtube-url         | youtube url                                                | [$PLUGIN_YOUTUBE_URL, $INPUT_YOUTUBE_URL] |\n| --youtube-insecure    | youtube insecure                                           | (default: false) [$PLUGIN_YOUTUBE_INSECURE, $INPUT_YOUTUBE_INSECURE] |\n| --youtube-retry-count | youtube retry count                                         | (default: 20) [$PLUGIN_YOUTUBE_RETRY_COUNT, $INPUT_YOUTUBE_RETRY_COUNT] |\n| --prompt              | initial prompt                                             | [$PLUGIN_PROMPT, $INPUT_PROMPT] |\n| --help, -h            | show help                                                  |                   |\n| --version, -v         | print the version                                          |                   |\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fappleboy%2Fgo-whisper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fappleboy%2Fgo-whisper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fappleboy%2Fgo-whisper/lists"}