{"id":27147933,"url":"https://github.com/heimoshuiyu/whisper-fastapi","last_synced_at":"2025-10-06T08:07:31.486Z","repository":{"id":207346601,"uuid":"706177310","full_name":"heimoshuiyu/whisper-fastapi","owner":"heimoshuiyu","description":"A very simple whsper Python FastAPI for OpenAI API, Android voice-typing (konele), Home Assistant (wyoming), and a voice-typing script on Linux and MacOS!","archived":false,"fork":false,"pushed_at":"2025-04-27T03:53:15.000Z","size":135,"stargazers_count":29,"open_issues_count":0,"forks_count":9,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-09-09T11:39:45.387Z","etag":null,"topics":["fastapi","konele","openai","whisper"],"latest_commit_sha":null,"homepage":"https://hub.docker.com/r/heimoshuiyu/whisper-fastapi","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/heimoshuiyu.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-10-17T12:58:50.000Z","updated_at":"2025-09-05T04:48:05.000Z","dependencies_parsed_at":"2023-11-15T10:43:50.516Z","dependency_job_id":"387db806-0ecd-470e-ad0e-07e595ff5335","html_url":"https://github.com/heimoshuiyu/whisper-fastapi","commit_stats":null,"previous_names":["heimoshuiyu/whisper-fastapi"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/heimoshuiyu/whisper-fastapi","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/heimoshuiyu%2Fwhisper-fastapi","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/heimoshuiyu%2Fwhisper-fastapi/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/heimoshuiyu%2Fwhisper-fastapi/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/heimoshuiyu%2Fwhisper-fastapi/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/heimoshuiyu","download_url":"https://codeload.github.com/heimoshuiyu/whisper-fastapi/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/heimoshuiyu%2Fwhisper-fastapi/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278577929,"owners_count":26009703,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-06T02:00:05.630Z","response_time":65,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["fastapi","konele","openai","whisper"],"created_at":"2025-04-08T11:51:37.840Z","updated_at":"2025-10-06T08:07:31.480Z","avatar_url":"https://github.com/heimoshuiyu.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Whisper-FastAPI\n\nWhisper-FastAPI is a very simple Python FastAPI interface for konele and OpenAI services. It is based on the `faster-whisper` project and provides an API for konele-like interface, where translations and transcriptions can be obtained by connecting over websockets or POST requests.\n\n## Features\n\n- **Translation and Transcription**: The application provides an API for konele service, where translations and transcriptions can be obtained by connecting over websockets or POST requests.\n- **Language Support**: If no language is specified, the language will be automatically recognized from the first 30 seconds.\n- **Konele Support**: Konele (or k6nele) is an open-source voice typing application on Android. This project supports a websocket (`/konele/ws`) and a POST method to `/konele/post`.\n- **Home Assistant Support**: By default it listen to `tcp://0.0.0.0:3001` for wyoming protocol.\n- **Audio Transcriptions**: The `/v1/audio/transcriptions` endpoint allows users to upload an audio file and receive transcription in response, with an optional `response_type` parameter. The `response_type` can be 'json', 'text', 'tsv', 'srt', and 'vtt'.\n- **Simplified Chinese**: The traditional Chinese will be automatically convert to simplified Chinese for konele using `opencc` library.\n\n## GPT Refine Result\n\nYou can choose to use the OpenAI GPT model for post-processing transcription results. You can also provide context to GPT to allow it to modify the text based on your context.\n\nSet the environment variables `OPENAI_BASE_URL=https://api.openai.com/v1` and `OPENAI_API_KEY=your-sk` to enable this feature.\n\nWhen the client sends a request with `gpt_refine=True`, this feature will be activated. Specifically:\n\n- For `/v1/audio/transcriptions`, submit using `curl \u003capi_url\u003e -F file=audio.mp4 -F gpt_refine=True`.\n- For `/v1/konele/ws` and `/v1/konele/post`, use the URL format `/v1/konele/ws/gpt_refine`.\n\nThe default model is `gpt-4o-mini` set by environment variable `OPENAI_LLM_MODEL`.\n\nYou can easily edit the code LLM's prompt to better fit your workflow. It's just a few lines of code. Give it a try, it's very simple!\n\n## Usage\n\n### Konele Voice Typing\n\nFor konele voice typing, you can use either the websocket endpoint or the POST method endpoint.\n\n- **Websocket**: Connect to the websocket at `/konele/ws` (or `/v1/konele/ws`) and send audio data. The server will respond with the transcription or translation.\n- **POST Method**: Send a POST request to `/konele/post` (or `/v1/konele/post`) with the audio data in the body. The server will respond with the transcription or translation.\n\nYou can also use the demo I have created to quickly test the effect at \u003chttps://yongyuancv.cn/v1/konele/post\u003e\n\n### Home Assistant Service\n\nBy default it listen to `tcp://0.0.0.0:3001` for wyoming protocol. You can specify `--wyoming-uri tcp://0.0.0.0:3001` to modify it. \n\nBeside the main program `whisper_fastapi.py`, there is another script `wyoming-forward.py` which provides the same Wyoming API, but instead of transcribing audio with a local model, it forwards the audio request to any OpenAI-compatible endpoint. For example:\n\n```bash\npip install wyoming aiohttp  # There are only two dependencies.\nexport OPENAI_API_KEY=your-secret-key\nexport OPENAI_BASE_URL=https://api.openai.com/v1  # this is the default\npython wyoming-forward.py --wyoming-uri tcp://0.0.0.0:3001\n```\n\n### OpenAI Whisper Service\n\nTo use the service that matches the structure of the OpenAI Whisper service, send a POST request to `/v1/audio/transcriptions` with an audio file. The server will respond with the transcription in the format specified by the `response_type` parameter.\n\nYou can also use the demo I have created to quickly test the effect at \u003chttps://yongyuancv.cn/v1/audio/transcriptions\u003e\n\nMy demo is using the large-v2 model on RTX3060.\n\n## Getting Started\n\nTo run the application, you need to have Python installed on your machine. You can then clone the repository and install the required dependencies.\n\n```bash\ngit clone https://github.com/heimoshuiyu/whisper-fastapi.git\ncd whisper-fastapi\npip install -r requirements.txt\n```\n\nYou can then run the application using the following command: (model will be download from huggingface if not exists in cache dir)\n\n```bash\npython whisper_fastapi.py --host 0.0.0.0 --port 5000 --model large-v2\n```\n\nThis will start the application on `http://\u003cyour-ip-address\u003e:5000`.\n\n### Voice Typing Script\n\n\u003e This script currently supports Linux and macOS. If you're familiar with Windows, feel free to contribute via a PR!\n\n\n1. Grant execution permissions to the `voice-typing` script using the command:\n   ```bash\n   chmod +x voice-typing\n   ```\n\n2. Optionally, place the script in a directory included in your system's PATH, such as `/usr/bin/`. This step is not mandatory.\n\n3. Bind a keyboard shortcut to execute this script in your KDE or Gnome settings.\n\nHow it works\n\n- **First Execution**: When you run the script for the first time, it creates a PID file in the `/tmp` directory and starts recording audio from your microphone.\n\n- **Second Execution**: Running the script again will send a termination signal to the recording process using the PID file. This stops the recording and initiates the transcription process via an API. The transcribed text is then saved to your clipboard.\n\nWhisper can improve transcription accuracy based on **context**. For:\n\n- **macOS and Wayland**: The script uses the current clipboard content as context.\n\n- **X11 Environment**: The script uses the text selected by the mouse as context for transcription.\n\n### Deploy with docker\n\n```bash\ndocker run -d \\\n    --tmpfs /tmp \\\n    -v ~/.cache/huggingface:/root/.cache/huggingface \\\n    --gpus all --device nvidia.com/gpu=all --security-opt=label=disable \\\n    -e OPENAI_BASE_URL=https://api.openai.com/v1 -e OPENAI_API_KEY=key -e OPENAI_LLM_MODEL=gpt-4o \\\n    -p 5000:5000 -p 3001:3001 \\\n    docker.io/heimoshuiyu/whisper-fastapi:latest \\\n    --model large-v2\n```\n\nThe `--gpus all` flag indicates that all GPUs are passed to the container. You might want to specify which GPU to use by setting `--gpus 0` or `--gpus 1`. \n\nThe `OPENAI_*` related environment variables are used for the GPT refine feature. If you are not using the GPT refine feature, you can ignore these environment variables.\n\n## Limitation\n\nDefect: Due to the synchronous nature of inference, this API can actually only handle one request at a time.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fheimoshuiyu%2Fwhisper-fastapi","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fheimoshuiyu%2Fwhisper-fastapi","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fheimoshuiyu%2Fwhisper-fastapi/lists"}