https://github.com/vdutts7/youtube-scraper
Scripts for automating YouTube video transcripts, timecodes, summaries, and tags.
https://github.com/vdutts7/youtube-scraper
crawlers gpt scraping webscraping youtube
Last synced: 6 months ago
JSON representation
Scripts for automating YouTube video transcripts, timecodes, summaries, and tags.
- Host: GitHub
- URL: https://github.com/vdutts7/youtube-scraper
- Owner: vdutts7
- Created: 2024-09-14T05:59:28.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-11-03T01:00:11.000Z (11 months ago)
- Last Synced: 2025-03-24T07:22:25.139Z (7 months ago)
- Topics: crawlers, gpt, scraping, webscraping, youtube
- Language: Python
- Homepage:
- Size: 2.12 MB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
![]()
![]()
YouTube Scraper
Scripts for automating YouTube video transcripts, timecodes, summaries, and tags.
[![Github][github]][github-url]
## Table of Contents
## 📝 About
This project provides a set of Python scripts for automating YouTube video analysis. It includes functionality for extracting transcripts, generating summaries, creating tags, and identifying key topics with timecodes. The scripts utilize the YouTube Transcript API and OpenAI's GPT models to process and analyze video content.
## 💻 How to build
_Note: macOS version, adjust accordingly for Windows / Linux_### Initial setup
1. Clone the repo.
2. Copy `.env.example` and rename to `.env`:```
OPENAI_API_KEY=your_openai_api_key_here
YOUTUBE_URL=https://www.youtube.com/watch?v=video_id_here
```3. Install the required dependencies:
```
pip install -r requirements.txt
```### Usage
1. Set the `YOUTUBE_URL` in your `.env` file to the desired YouTube video.
2. Run the scripts:
- For transcript extraction:
```
python transcript.py
```- For summary and tags:
```
python gpt.py
```- For topic timecodes:
```
python timecode.py
```3. Check the `output` folder for the generated files.
### Examples
Using this video from Fireship: https://www.youtube.com/watch?v=6xlPJiNpCVw

For transcript extraction:
For summary and tags:

For topic timecodes:

## 🚀 Next Steps
- Implement error handling and input validation
- Add support for batch processing multiple videos
- Create a user-friendly command-line interface
- Integrate with a web framework for a graphical user interface
- Implement caching to reduce API calls and improve performance## 🔧 Tools Used
[![Python][python]][python-url]
[![OpenAI][openai]][openai-url]
[![YouTube Transcript API][youtube-transcript-api]][youtube-transcript-api-url]## 👤 Contact
[![Email][email]][email-url]
[![Twitter][twitter]][twitter-url][Python]: https://img.shields.io/badge/python-3670A0?style=for-the-badge&logo=python&logoColor=ffdd54
[Python-url]: https://www.python.org/[OpenAI]: https://img.shields.io/badge/OpenAI_GPT--4-0058A0?style=for-the-badge&logo=openai&logoColor=white&color=4aa481
[OpenAI-url]: https://openai.com/[youtube-transcript-api]: https://img.shields.io/badge/YouTube_Transcript_API-FF0000?style=for-the-badge&logo=youtube&logoColor=white
[youtube-transcript-api-url]: https://github.com/jdepoix/youtube-transcript-api[email]: https://img.shields.io/badge/Email-FFCA28?style=for-the-badge&logo=Gmail&logoColor=00bbff&color=black
[email-url]: mailto:me@vd7.io[twitter]: https://img.shields.io/badge/Twitter-FFCA28?style=for-the-badge&logo=Twitter&logoColor=00bbff&color=black
[twitter-url]: https://twitter.com/vdutts7[github]: https://img.shields.io/badge/Github-2496ED?style=for-the-badge&logo=github&logoColor=white&color=black
[github-url]: https://github.com/vdutts7/youtube-scraper