{"id":25726917,"url":"https://github.com/hithismani/audio-transcriber","last_synced_at":"2026-05-08T02:05:50.684Z","repository":{"id":279155800,"uuid":"937877828","full_name":"hithismani/audio-transcriber","owner":"hithismani","description":"A Streamlit-powered audio and video transcription tool using OpenAI's Whisper model","archived":false,"fork":false,"pushed_at":"2025-02-24T04:15:44.000Z","size":43,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-02-24T05:23:06.444Z","etag":null,"topics":["audio-transcription","openai","python","streamlit"],"latest_commit_sha":null,"homepage":"https://x.com/megabored/status/1893641574413742102","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hithismani.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-02-24T03:54:39.000Z","updated_at":"2025-02-24T04:15:47.000Z","dependencies_parsed_at":"2025-02-24T05:23:10.539Z","dependency_job_id":"5075dda1-4d87-4c18-9c75-1536743af802","html_url":"https://github.com/hithismani/audio-transcriber","commit_stats":null,"previous_names":["hithismani/audio-transcriber"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hithismani%2Faudio-transcriber","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hithismani%2Faudio-transcriber/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hithismani%2Faudio-transcriber/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hithismani%2Faudio-transcriber/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hithismani","download_url":"https://codeload.github.com/hithismani/audio-transcriber/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240762624,"owners_count":19853522,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["audio-transcription","openai","python","streamlit"],"created_at":"2025-02-25T23:27:53.772Z","updated_at":"2026-05-08T02:05:50.676Z","avatar_url":"https://github.com/hithismani.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Audio/Video Transcription App\n\nThis application uses AI models (OpenAI Whisper and AssemblyAI) to transcribe audio and video files with a simple interface using Streamlit. It also supports speaker diarization via Hugging Face's pyannote.\n\n## Demo\n\nView the demo of the app on [X](https://x.com/megabored/status/1893641574413742102).\n\n## Features\n\n- Supports MP3, MP4, WAV, and M4A file formats\n- Automatically splits large files for processing (up to 50 MB)\n- Provides a simple web interface using Streamlit\n- Multiple transcription options:\n  * Full Transcription\n  * Timestamped Transcription\n  * Optional Transcription with Timestamps and Speaker Identification\n- Handles both audio and video file transcription\n- Robust error handling and logging\n- **New: API Key Management**\n  * Flexible key configuration via .env file or in-app input\n  * Secure, session-based API key handling\n- **New: Cost Estimation**\n  * Real-time estimated transcription cost\n  * Supports OpenAI and AssemblyAI pricing models\n  * Warns about potential high-cost transcriptions\n\n## Prerequisites: Installing FFmpeg\n\nFFmpeg is a critical dependency for this application. Follow the installation instructions for your operating system:\n\n### Windows, macOS, and Linux Installation Instructions\n\n#### Windows\n1. Download FFmpeg from [https://ffmpeg.org/download.html](https://ffmpeg.org/download.html)\n2. Extract the downloaded zip file\n3. Add the `bin` folder to your system PATH\n\n#### macOS (using Homebrew)\n```bash\nbrew install ffmpeg\n```\n\n#### Linux (Ubuntu/Debian)\n```bash\nsudo apt-get update\nsudo apt-get install ffmpeg\n```\n\n## Installation\n\n1. Clone this repository:\n   ```\n   git clone https://github.com/hithismani/audio-transcriber.git\n   cd audio-transcriber\n   ```\n\n2. Create a virtual environment (recommended):\n   ```\n   python -m venv venv\n   source venv/bin/activate  # On Windows, use `venv\\Scripts\\activate`\n   ```\n\n3. Install the required packages:\n   ```\n   pip install -r requirements.txt\n   ```\n\n4. Create a `.env` file in the root directory and add your API keys:\n   ```\n   OPENAI_API_KEY=your_openai_api_key_here\n   ASSEMBLY_API_KEY=your_assemblyai_api_key_here\n   HF_ACCESS_TOKEN=your_huggingface_token_here  # Optional, for speaker identification\n   ```\n\n## API Key Configuration\n\n### Flexible Key Management\n- API keys can be set in two ways:\n  1. Recommended: Add keys to the `.env` file\n  2. In-app: Manually enter keys for the current session\n\n### Supported API Keys\n- OpenAI API Key (required for OpenAI transcription)\n- AssemblyAI API Key (required for AssemblyAI transcription)\n- Hugging Face Access Token (optional, for speaker identification)\n\n## Hugging Face Authentication (Optional Speaker Identification)\n\n### Speaker Identification Setup\n1. Create a Hugging Face account: [https://huggingface.co/](https://huggingface.co/)\n2. Accept user conditions for these models:\n   - [pyannote/speaker-diarization](https://huggingface.co/pyannote/speaker-diarization)\n   - [pyannote/segmentation](https://huggingface.co/pyannote/segmentation)\n3. Create a Hugging Face access token (read role): [https://huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)\n4. Add the token to your `.env` file: `HF_ACCESS_TOKEN=your_huggingface_token`\n\n**Note:** Speaker identification is an experimental feature and is optional. The app will function fully without this token.\n\n## Usage\n\nRun the application:\n```bash\nstreamlit run transcribe.py\n```\n\n### Transcription Options\n- Upload audio or video files (MP3, MP4, WAV, M4A)\n- Choose transcription type:\n  * Full Transcription\n  * Timestamped Transcription\n  * Optional Transcription with Timestamps and Speaker Identification\n- Automatic handling of large files by splitting them into chunks\n- Real-time cost estimation\n- Direct download of transcription results\n\n## Cost Estimation\n\nThe app now provides real-time cost estimation for transcriptions:\n- OpenAI Whisper: $0.006 per minute\n- AssemblyAI: $0.00025 per second\n- Displays estimated cost before transcription\n- Warns about potentially expensive transcriptions\n\n**Notes: Please use this for indication purposes only. If the final amount charged to you is higher, then please make a pull request with relevant edits.** \n\n## Roadmap: Upcoming Features\n\n### ✅ Completed Features\n- [x] Flexible API Key Management\n  * Configure keys via .env or in-app\n  * Secure, session-based key handling\n\n- [x] Cost Estimation\n  * Real-time transcription cost calculation\n  * Supports multiple AI providers\n  * Warns about high-cost transcriptions\n\n- [x] Speaker Identification (Optional)\n  * Uses pyannote.audio for experimental speaker diarization\n  * Identifies distinct speakers in audio\n  * Works best with clear, separated speech\n  * Optional feature that can be enabled/disabled\n  * Provides speaker labels like \"SPEAKER_00\", \"SPEAKER_01\"\n\n- [x] Chunk Serialization\n  * Automatically splits large audio files (\u003e 24 MB)\n  * Preserves audio quality during splitting\n  * Adds short silence between chunks to prevent cut-off words\n  * Supports files up to 50 MB\n  * Handles various audio formats (MP3, WAV, M4A, MP4)\n  * Seamlessly transcribes split chunks\n  * Reconstructs full transcription from individual chunks\n\n### 🚧 Planned Features\n\n1. **Multi-Model Support**\n   - Allow users to choose from multiple AI transcription models\n   - Support for:\n     * OpenAI Whisper\n     * Google Speech-to-Text\n     * Amazon Transcribe\n     * Local open-source models (Whisper, wav2vec, etc.)\n   - Ability to select specific sub-models within each provider\n   - Comparative analysis of transcription accuracy\n\n2. **Advanced Transcription Organization**\n   - Automatic chapter/section detection\n   - Manual chapter creation and editing\n   - Timestamp-based chapter segmentation\n   - Export chapters as separate files or with hierarchical structure\n\n## Privacy Considerations\n\nWhen using this transcription application, be aware of:\n- Audio/video files are sent to OpenAI's or AssemblyAI's servers for transcription\n- Ensure you have necessary rights and permissions for content\n- API data is not used for model training\n- Obtain consent before transcribing recordings of others\n\n## Contributing\n\nContributions are welcome! Please fork the repository and submit a pull request.\n\n## License\n\nMIT License - see the LICENSE file for details.\n\n## Acknowledgments\n\n- OpenAI for the Whisper transcription model\n- AssemblyAI for transcription services\n- Streamlit for the web interface\n- Pyannote for speaker diarization\n- Pydub for audio processing\n- MoviePy for video file handling\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhithismani%2Faudio-transcriber","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhithismani%2Faudio-transcriber","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhithismani%2Faudio-transcriber/lists"}