Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/labrijisaad/youtube-video-transcriptor
In this notebook, I implemented a script to transcribe YouTube videos (and audio files in general) using Google's speech-to-text API.
https://github.com/labrijisaad/youtube-video-transcriptor
audio-split general-idea-summarization googletranslateapi puthon speech-recognition speech-to-text text-translation transcript youtube youtube-transcripts youtube-video
Last synced: 3 months ago
JSON representation
In this notebook, I implemented a script to transcribe YouTube videos (and audio files in general) using Google's speech-to-text API.
- Host: GitHub
- URL: https://github.com/labrijisaad/youtube-video-transcriptor
- Owner: labrijisaad
- Created: 2022-07-26T15:48:11.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2022-12-19T09:15:12.000Z (about 2 years ago)
- Last Synced: 2024-10-10T00:43:43.055Z (3 months ago)
- Topics: audio-split, general-idea-summarization, googletranslateapi, puthon, speech-recognition, speech-to-text, text-translation, transcript, youtube, youtube-transcripts, youtube-video
- Language: Jupyter Notebook
- Homepage:
- Size: 58.6 KB
- Stars: 10
- Watchers: 1
- Forks: 2
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
## 🎥 `Youtube-video-transcriptor in Python` 🐍
In this project, I developed a script in Python that uses Google's speech-to-text technology to transcribe audio from YouTube videos.
- ⚠️ Please note the following before using the script:
> - 1️⃣ **`The script is intended to be run on Google Colaboratory!`**
> - 2️⃣ The script may not always accurately transcribe text due to noise or the way the speaker talks in the video (e.g. speaking too fast or too slow).
> - 3️⃣ **`The summary model`** used in the script is a community model available on [Huggingface](https://huggingface.co/) that **`only supports English text`**. It may not always accurately capture the general idea of the transcription, especially if there is a lack of data.
❓❓❓ HOW TO USE ❓❓❓
```
>>> 1️⃣ Run the notebook in Colab (make sure you are logged into Colab with your Google account).
>>> 2️⃣ Paste the URL of the youtube video you want to transcribe into the `url` variable.
>>> 3️⃣ Replace the `lang` variable with the language spoken in the video (all instructions are provided in the notebook).
>>> 4️⃣ Run all cells (shortcut: `CTRL + F9`)
>>> 5️⃣ Download the generated TXT files (there will be two in total: one for the transcription and one for the translated transcription).
```⚠️⚠️⚠️ UPDATE ⚠️⚠️⚠️
```
>>> To optimize transcription time, I have updated the script to use `Python threads`, which helps to fully utilize the CPU resources provided by Colab.
>>> As a result, the performance has significantly improved - a 30-minute video can now be transcribed in approximately 35 seconds, compared to the previous time of 2 minutes and 30 seconds.
>>> You can find the updated script with threads in the accompanying notebook. 😁
```> - 🙌 Notebook made by [@labriji_saad](https://github.com/labrijisaad)
> - 🔗 Linledin [@labriji_saad](https://www.linkedin.com/in/labrijisaad/)
- 📫 Feel free to contact me if anything is wrong or if anything needs to be changed 😎! **[email protected]**