{"id":13910590,"url":"https://github.com/smaranjitghose/AIAudioTranscriber","last_synced_at":"2025-07-18T09:32:13.446Z","repository":{"id":64850897,"uuid":"578783066","full_name":"smaranjitghose/AIAudioTranscriber","owner":"smaranjitghose","description":"A minimalistic web app to generate transciption for audio built using Python","archived":false,"fork":false,"pushed_at":"2023-03-30T23:43:08.000Z","size":8830,"stargazers_count":29,"open_issues_count":0,"forks_count":12,"subscribers_count":5,"default_branch":"master","last_synced_at":"2024-08-08T00:43:18.090Z","etag":null,"topics":["docker","open-source","openai","python","python3","speech-recognition","speech-to-text","streamlit","streamlit-lottie","streamlit-webapp","whisper"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/smaranjitghose.png","metadata":{"files":{"readme":"README.MD","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null},"funding":{"github":"smaranjitghose","patreon":"smaranjitghose","ko_fi":"smaranjitghose"}},"created_at":"2022-12-15T21:48:52.000Z","updated_at":"2024-05-01T03:15:33.000Z","dependencies_parsed_at":"2023-02-04T06:17:15.328Z","dependency_job_id":null,"html_url":"https://github.com/smaranjitghose/AIAudioTranscriber","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/smaranjitghose%2FAIAudioTranscriber","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/smaranjitghose%2FAIAudioTranscriber/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/smaranjitghose%2FAIAudioTranscriber/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/smaranjitghose%2FAIAudioTranscriber/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/smaranjitghose","download_url":"https://codeload.github.com/smaranjitghose/AIAudioTranscriber/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":226388657,"owners_count":17617313,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["docker","open-source","openai","python","python3","speech-recognition","speech-to-text","streamlit","streamlit-lottie","streamlit-webapp","whisper"],"created_at":"2024-08-07T00:01:36.143Z","updated_at":"2024-11-25T19:31:22.819Z","avatar_url":"https://github.com/smaranjitghose.png","language":"Python","funding_links":["https://github.com/sponsors/smaranjitghose","https://patreon.com/smaranjitghose","https://ko-fi.com/smaranjitghose"],"categories":["Python"],"sub_categories":[],"readme":"# AI Audio Transcriber\n\n\u003cp align = \"center\"\u003e\u003cimg src = \"./assets/doc_assets/transcriber.gif\" height = 300 alt = \"Wallet Icon\"\u003e\u003c/p\u003e\n\nA minimalistic application to generate transcriptions for audio built using Python\n\n## 🚀 Demo\n\n**v.0.0.1**\n\u003cp align = \"center\"\u003e\u003cimg src = \"./assets/doc_assets/demo_snapshot_v1.png\" height = 300 width = 450 alt = \"AITranscriber Snapshot\"\u003e\u003c/p\u003e\n\n**v.0.0.2** (Transcribing a Youtube Video Explaining Whisper)\n\u003cp align = \"center\"\u003e\u003cimg src = \"./assets/doc_assets/demo_snapshot_v2_1.png\" height = 300 width = 450 alt = \"AITranscriber Snapshot v2\"\u003e\u003c/p\u003e\n\n**v.0.0.2** (Transcribing an English Song - Thinkin About It)\n\u003cp align = \"center\"\u003e\u003cimg src = \"./assets/doc_assets/demo_snapshot_v2_2.png\" height = 300 width = 450 alt = \"AITranscriber Snapshot v2\"\u003e\u003c/p\u003e\n\n**v.0.0.3** (Transcribing a [clip](https://www.youtube.com/watch?v=A1UjHboEypk) from [Lex Fridman's podcast](https://lexfridman.com/podcast/))\n\n\u003cp align = \"center\"\u003e\u003cimg src = \"./assets/doc_assets/demo_snapshot_v3.png\" height = 300 width = 450 alt = \"AITranscriber Snapshot v3\"\u003e\u003c/p\u003e\n\n**v.0.0.4** (Transcribing another [clip](https://www.youtube.com/watch?v=zxqjlWNVGNM) from [Lex Fridman's podcast](https://lexfridman.com/podcast/))\n\n\u003cp align = \"center\"\u003e\u003cimg src = \"./assets/doc_assets/demo_snapshot_v4.png\" height = 300 width = 450 alt = \"AITranscriber Snapshot v4\"\u003e\u003c/p\u003e\n\n\n## 📝 Basic Application WorkFlow \n\n```mermaid\nflowchart LR \n    U([Cliemt])\n    \n    I{Choose\\n Input Mode}\n    U -----\u003e I\n    \n    I1[YouTube Video URL] \n    I2[Upload Video File]\n    I3[Upload Audio File]\n    I ---\u003e I1 \u0026 I2 \u0026 I3\n\n    YTC{\"Check if\\n Audio is available?\"}\n    YTA(\"Download video\\n from YouTube\")\n    YTV(\"Download video\\n from YouTube\")\n    \n    I1 ---\u003e YTC\n    YTC --yes---\u003e YTA\n    YTC --no---\u003e YTV\n\n    VTA[\"Convert Video to Audio\"]\n    YTV ---\u003e VTA\n    I2 ---\u003e VTA\n\n    LA[\"Load Audio File\"]\n    YTA \u0026 VTA \u0026 I3---\u003e LA\n    \n    M{\"Choose\\n Model Type\"}\n    U -----\u003e M\n\n    M1[(Ramanujan)]\n    M2[(Bose)]\n    M3[(Raman)]\n    M4[(Kalam)]\n    M ---\u003e M1 \u0026 M2 \u0026 M3 \u0026 M4\n\n    LM[Load Relevant Whisper Model]\n    M1 \u0026 M2 \u0026 M3 \u0026 M4 --\u003e LM\n\n    GT(\"Generate Transcripts\")\n    LA \u0026 LM ---\u003e GT\n\n    O1([\"Detected \\n Language\"])\n    O2([\"Complete \\nSubtitle Text\"])\n    O3([\"Subtitles \\nwith Timestamps\"])\n    GT ---\u003e O1 \u0026 O2 \u0026 O3\n\n    OF([\"Original\\n Audio or Video\"])\n    D{{\"Display to Client\"}}\n    I ---\u003e OF\n    O1 \u0026 O2 \u0026 OF ---\u003e D\n\n    DO{\"Choose\\n Output Option\"}\n    D1[\"SRT\\n File\"]\n    D2[\"VTT\\n File\"]\n    D3[\"Text\\n File\"]\n    DP[\"Process Subtitle Object\"]\n    DN{{\"Download Button\"}}\n\n    O3 ---\u003e DP\n    U ---\u003e DO\n    DO ---\u003e D1 \u0026 D2 \u0026 D3 ---\u003e DP ---\u003e DN\n\n    subgraph Result\n        D\n        DN\n    end\n```\n\n## 🥊CI/CD\n\n(**Preferred Pipeline Using GitHub Actions for Docker Image**)\n\n\u003cp align = \"center\"\u003e\u003cimg src = \"./assets/doc_assets/Docker_CICD.png\" height = 300 width = 450 alt = \"Docker CI/CD\"\u003e\u003c/p\u003e\n\n\n## ⚒️ Set-Up Instructions\n\n\u003cp align = \"center\"\u003e\u003cimg src = \"./assets/doc_assets/setup.gif\" height = 300 alt = \"SetUp Icon\"\u003e\u003c/p\u003e\n\n\n\n- Open your terminal / command prompt. \n\n- Clone the repository \n    ```\n    git clone https://github.com/smaranjitghose/AIAudioTranscriber.git\n    ```\n- Change the directory to the cloned project\n    \n    ```\n    cd AIAudioTranscriber\n    ```\n\n#### **A. Without using Docker**\n\n- Ensure you have any version of [Python](https://www.python.org/downloads/) below 3.10 installed in your system and you have ``virtualenv`` package installed\n\n    ```\n    which python\n    ```\n\n    ```\n    pip install virtualenv\n    ```\n\n- Create a new virtual environment\n    ```\n    python -m venv env\n    ```\n\n- Activate virtual enviroment\n    - On Mac/Linux\n        ```terminal\n        source env/bin/activate\n        ```\n    - On Windows\n        ```terminal\n        env/Scripts/Activate.ps1 \n        ```\n\n- Install ffmpeg in your local syste,\n    - On Windows using [Chocolatey](https://chocolatey.org/)\n        ```terminal\n        choco install ffmpeg\n        ```\n    - On MacOS using [Homebrew](https://brew.sh/)\n        ```terminal\n        brew install ffmpeg \n        ```\n    - On Debian/Ubuntu\n        ```terminal\n        sudo apt update \u0026\u0026 sudo install ffmpeg\n        ```\n    - On Arch Linux\n        ```terminal\n        sudo pacman -S ffmpeg \n        ```\n\n- Install the dependencies\n\n    ```\n    pip install -r requirements.txt\n    ```\n\n- Download the model weights (This will take a few minutes since the total size of models in gigabytes)\n\n    ```\n    python get_model_weights.py\n    ```\n\n- Run the Web application\n    ```\n    streamlit run .\\Home.py\n    ```\n    \u003e **Note**:\n    \u003e - If the app does not load by itself in your default browser, open a browser of your choice and navigate to  `http://localhost:8501`\n    \u003e - To stop the application, press `CTRL + C` in your terminal\n\n#### **B. Using Docker**\n\n- Make sure you have Docker installed on your system. Refer the documentation [here](https://docs.docker.com/desktop/) if you need assistance setting up.\n- Build a docker image\n    ```\n    docker run -t aitranscriber:v0.0.4 .\n    ```\n    \u003e **Note**:\n    \u003e - You may give any name instead of aitranscriber and any tag instead of v0.0.4\n    \u003e - Depending on your system it takes a few minutes to successfully build the image\n- Once complete, check the docker image\n    ```\n    docker images\n    ```\n- Create and run a Docker Container for the image\n    ```\n    docker run -p 8501:8501 aitranscriber:v0.0.4\n    ```\n    \u003e **Note**:\n    \u003e - `docker run -p \u003chostport\u003e:\u003c8501\u003e \u003ccontainer_name\u003e:\u003ctag_name\u003e`\n    \u003e - In the above command, you can play around with which port of your host system you wish to map to the 8501 port of the container \n    \u003e - If you used a different docker image name and/or different tag, make sure to update it in the command\n- Open your preferred Web Browser and navigate to ``http://localhost:8501`` \n    \u003e **Note**:\n    \u003e - If you used a different host port in the above command then navigate to that one, ``http://localhost:\u003chost_port\u003e``\n    \u003e - To stop the container, in the terminal check the containter name: ``docker ps --all``\n    \u003e - Now use container name with the command:  ``docker stop \u003ccontainer_name\u003e``\n\n\n## 🌏Deployment Options \n\n\u003cp align = \"center\"\u003e\u003cimg src = \"./assets/doc_assets/hosting.gif\" height = 300 alt = \"Hosting Icon\"\u003e\u003c/p\u003e\n\n\n- [Streamlit Cloud](https://streamlit.io/cloud)\n- [HuggingFace Spaces](https://huggingface.co/docs/hub/spaces)\n\n- [Fly](https://fly.io/)\n- [Railway](https://railway.app/)\n- [Render](https://render.com/)\n- [Cyclic](https://app.cyclic.sh/#/)\n\n- [Heroku](https://www.heroku.com/)\n- [Digital Ocean](https://www.digitalocean.com/)\n\n- Google Cloud Run\n    - Install Google Cloud CLI\n    - Create an Account on Google Cloud\n    - Create a New Project\n    - Build and Push Docker Image to Google Container Registry\n        ```\n        gcloud builds submit --tag gcr.io/\u003cProjectName\u003e/\u003cAppName\u003e  --project=\u003cProjectName\u003e\n        ```\n    - Deploy the Docker Container\n        ```\n        gcloud run deploy --image gcr.io/\u003cProjectName\u003e/\u003cAppName\u003e --platform managed --project=\u003cProjectName\u003e --allow-unauthenticated\n        ```\n\n- Amazon EC2 Instance\n- Azure App\n\n(**Using Google Colab/Kaggle as temporary MVP server**)\n\n- [pyngrok](https://pyngrok.readthedocs.io/en/latest/index.html)\n    - Step 1: Install pyngrok in Google Colab\n\n        ```\n        ! pip install pyngrok\n        ```\n    \n    - Step 2: Sign-up in [ngrok](https://ngrok.com/) and get Authentication Token\n\n    - Step 3: Authenticate\n        \n        ```python\n           from pyngrok import ngrok\n           ngrok.set_auth_token(\"xxx\")\n        ```\n    - Step 4: Load the Streamlit App at port 8051, create a tunnel for it and reveal the public URL for the tunnel\n\n        ```python\n           !nohup streamlit run app.py --server.port 8051 \u0026\n           url = ngrok.connect(8051).public_url\n           print(url)\n        ```\n    \n    - Step 5: Share URL with client\n     \n\n\n- [localtunnel](https://github.com/localtunnel/localtunnel)\n    - Step 1: Install localtunnel\n\n        ```\n        npm install -g localtunnel\n        ```\n    - Step 2\n\n        ```\n        streamlit run Home.py \u0026 npx localtunnel --port 8501\n        ```\n    \n    - Step 3: Share URL with client\n\n\n(**Using local server as temporary MVP server**)\n\n- NGINX + Cloudfare/ngrok\n\n## 🏗️ Future Work \n\n- [x] Download and use audio from Youtube Video\n- [x] Download and use online audio file\n- [x] Use Session States and Caching for Better UX\n- [x] Display the language detected propely (without using the shortcode)\n- [x] Generate Dedicated SRT,VTT files for transcripts (in addition to txt)\n- [x] Update Model options to honour the name of prominent Indian Scientists\n- [x] Option to limit/increase input model file size\n- [x] Functionality to check the validity URL provided for Youtube Video\n- [x] Add Custom Favicon File\n- [x] Add Scrollable Text Area for Generated Transcripts\n- [x] Containerize the Application with Docker\n- [x] Troubleshoot Docker Container locally\n- [x] Create Basic Workflow on GitHub Actions for Docker Image Build\n- [x] Create Comprehensive Workflow on GitHub Actions for Docker Image Build\n- [ ] Resolve bug: Youtube video with multiple audios should download default audio. \n    - Example: This [clip](https://www.youtube.com/watch?v=93L6gDVRrUY) from Huberman Lab is in English yet the script fetches the spanish audio codec from Youtube  \n\n- [ ] Test Application by spinning up it's Container on Google Cloud Run\n    - [ ] Push to a particular Docker Image Registry\n    - [ ] Set TTL\n    - [ ] Play around with system resources\n    - [ ] Test with custom domain\n- [ ] Add Google Cloud's CI/CD to repo on push/pull requests\n    - [ ] Use cloudbuild.yaml file\n    - [ ] Update build time to 2 hours\n- [ ] Optimize Docker Image Size\n- [ ] Better CI/CD\n- [ ] Kubernetes Upgrade\n- [ ] Better GitHub Actions\n\n**More Features**:\n\n- [ ] Burn transcripts to user-uploaded video\n        ```python\n        import os\n        output_video = \"final.mp4\"\n\n        os.system(f\"ffmpeg -i {input_video} -vf subtitles={subtitle} {output_video}\")\n        ```\n- [ ] Summarize subtitles\n- [ ] Sentiment analysis on video summary\n- [ ] Batch transcript generation + summary + sentiment analysis\n- [ ] Dashboard for video review(s)\n\n\n**Speaker Diarization: Only if Community requires**\n\n- [ ] Incorporate Speaker Diarization for Podcast/Vlog/Conversational Clips\n- [ ] Test it with burning transcripts to user uploaded video\n- [ ] Test it with transcript summarization\n\n**More Aligned Subtitles: Only if Community requires**\n\n- [ ] Word Level Timestamps for transcripts + Generate ASS Transcript File\n- [ ] Test it with burning transcripts to user uploaded video\n- [ ] Test it with previous speaker diarization\n- [ ] Test it with transcript summarization\n\n\n- [ ] Improve UI Natively in Streamlit\n\n**API Development: Only if Community requires**\n\n- [ ] Build API for model inference in FastAPI to handle requests asynchronously (on a different branch perhaps)\n- [ ] Containerize the API with Docker\n- [ ] Troubleshoot Docker Container for API\n- [ ] Host the API on Google/AWS/Linode/Heroku\n- [ ] Perform basic CI/CD for API \n- [ ] Rehost Streamlit Application on a different service (Reduce it to client side for most operations)\n- [ ] Play around with pyScript\n\n**Front End Development: Only if Community requires**\n\n- [ ] Build Basic React Front end\n- [ ] Connect React Front End to FastAPI\n- [ ] Add Loader Animation\n- [ ] Add Animations for model inference times\n- [ ] Handling Errors in Front End/API\n- [ ] Upload File Component\n- [ ] Download Button(s)\n- [ ] Feedback Form\n- [ ] Contact Page\n- [ ] About Page\n- [ ] Home Page\n- [ ] Stripe Integration\n- [ ] Improve Navbar UI\n- [ ] 404 Page\n- [ ] Footer UI\n- [ ] Scrollbar UI\n- [ ] SEO\n\n**CI/CD Pipeline (GitHub Actions)**\n- [ ] SAST (Optional)\n- [ ] Kubernetes Smoke Test (Optional)\n- [ ] Using Super Linter for Linting (Optional)\n- [ ] Unit Tests (Optional)\n- [ ] Integration Test (Optional)\n\n\n## ✏️ Note \n\n\u003cp align = \"center\"\u003e\u003cimg src = \"./assets/doc_assets/notes.gif\" height = 300 alt = \"Note Icon\"\u003e\u003c/p\u003e\n\n- To view the generated transcript file(s) in VS Code IDE install [Subtitles Editor](https://marketplace.visualstudio.com/items?itemName=pepri.subtitles-editor) extension\n- To extensively edit/manipulate the generated transcript file(s) use the open source tool [Subtitle Edit](https://www.nikse.dk/subtitleedit) \n- For Streamlit Sharing, mentioning versions of the modules in requirements throws error at times\n- Large Modelv2 outperforms all other versions of Whisper in terms of performance especially in Multi-lingual Transcription. However, it takes a 10 times more V-RAM than the base model and has longer inference time\n- To quickly record audio from system microphone use [this](https://github.com/smaranjitghose/miscellaneous/blob/master/handypython/data/audio/audio_recording.py) Python Script:\n    - Pre-requisities:\n\n        ```\n        pip install pyaudio wave\n        ```\n- Whisper is unable to read audio file from disk if ``python-ffmpeg`` or ``ffmpeg`` python pacakges are installed. It only works when ``ffmpeg-python`` python package is installed and not the former too\n\n    ```\n    # Remove all ffmpeg related python packages\n    pip uninstall python-ffmpeg ffmpeg ffmpeg-python\n    # Install the appropriate pacakge for ffmpeg\n    pip install ffmpeg-python\n\n    ```\n- [Pixabay](https://pixabay.com/) has a great collection of copyright free, no royalty songs that one can use for testing the application\n- Poor Performance for Kanada or Telegu songs (often language recognition itself fails) for base model. Example: Kantara movie's [Varaha Roopam](https://www.youtube.com/watch?v=gH_RYRwVrVM) Song\n\n\u003cp align = \"center\"\u003e\u003cimg src = \"./assets/doc_assets/demo_snapshot_v2_3.png\" height = 300 width = 450 alt = \"AITranscriber Snapshot v2\"\u003e\u003c/p\u003e\n\n\n#### Docker Container and CI/CD\n\n- Exclude as much irrelevant files as possible with ```.dockerignore``` such as README.MD, LICENSE, snapshots, notebooks, input,output,logs, etc\n\n- Minimize the number of layers (Created by RUN, COPY and ADD)\n- Always combine ``RUN apt-get update`` with ``apt-get install`` in the same RUN statement. Using ``apt-get update`` alone in a RUN statement causes caching issues and subsequent ``apt-get install`` instructions fail.\n\n- Using ``RUN apt-get update \u0026\u0026 apt-get install -y`` ensures your Dockerfile installs the latest package versions with no further coding or manual intervention. This technique is known as “cache busting”. \n\n- In addition, when you clean up the apt cache by removing ``/var/lib/apt/lists`` it reduces the image size, since the apt cache is not stored in a layer. \n\n- Python Docker Image Info:\n    - Images tagged with `stretch`/`buster`/`jessie`/`buster`/`bullseye` are codenames for different [Debian Operating System](https://wiki.debian.org/DebianReleases) Production releases.\n    - `bullseye` being version 11, buster being version 10, and so on. (2022) \n    - `bookworm`, `trixy` and `forky` are work-in-progress releases which may not be stable yet\n    - `-slim` - only installs the minimal packages needed to run the particular tool.\n\n- Base Image with python \u003c= 3.9 raises [issue](https://stackoverflow.com/questions/71712258/error-could-not-build-wheels-for-backports-zoneinfo-which-is-required-to-insta/71735458#71735458) with module `backports.zoneinfo`and pip fails\n\n\n- To build and test multi-architecture docker images locally,\n    - Create a new buildx instance\n        ```\n        docker buildx create --use\n        ```\n    - Build a new docker image for multi-architecture support\n        ```\n         docker buildx build --platform linux/arm64,linux/amd64 -t aitranscriber:multi-architecture -f Dockerfile . \n        ```\n\n- Checking Docker Image Build for multi-architecture is too time consuming for the current application and disabled\n\n\n## 🛡️ License\n\nThis project is licensed under the GNU Affero General Public License v3.0 License - see the [`LICENSE`](LICENSE) file for details.\n\n\n## 🙏 Acknowledgements \n\n\u003cp align = \"center\"\u003e\u003cimg src = \"https://media.giphy.com/media/1gQ6f5kJdBtGPSmdgS/giphy.gif\" height = 300 alt = \"Acknowledgment Icon\"\u003e\u003c/p\u003e\n\n\n- **General Purpose Speech Recognition Model**: [OpenAI Whisper](https://openai.com/blog/whisper/) \n    - [GitHub](https://github.com/openai/whisper)\n    - [Paper:Robust Speech Recognition via Large-Scale Weak Supervision](https://arxiv.org/pdf/2212.04356.pdf)\n- **Animations**: [LottieFiles](https://lottiefiles.com)\n- **Favicon**: [PNG Repo](https://www.pngrepo.com/svg/235200/artificial-intelligence-brain)\n- **Sample Test Clip 1**: [\" Thinkin About It \"](https://pixabay.com/music/soft-house-setze-thinkin-about-you-radio-edit-129282/) by [Niklas Setzkorn](https://pixabay.com/users/setze-32054623/) \u003c/a\u003e from [Pixabay](https://pixabay.com/)\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsmaranjitghose%2FAIAudioTranscriber","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsmaranjitghose%2FAIAudioTranscriber","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsmaranjitghose%2FAIAudioTranscriber/lists"}