Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/dimzachar/timecodes_issues_mlops
https://github.com/dimzachar/timecodes_issues_mlops
Last synced: about 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/dimzachar/timecodes_issues_mlops
- Owner: dimzachar
- Created: 2023-05-08T08:20:47.000Z (over 1 year ago)
- Default Branch: master
- Last Pushed: 2023-05-08T11:53:26.000Z (over 1 year ago)
- Last Synced: 2024-04-16T02:07:47.331Z (9 months ago)
- Language: Python
- Size: 4.88 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# YouTube Timecode Generation Script
This Python script was created to automate the creation of timestamps of [MLOps Zoomcamp playlist](
https://www.youtube.com/playlist?list=PL3MmuxUbc_hIUISrluw_A7wDSmfOhErJK) that are missing and given that a [Github issue](https://github.com/DataTalksClub/mlops-zoomcamp/issues?q=is%3Aopen+is%3Aissue) is open.It generates timecodes for YouTube videos in a playlist and adds them as comments to corresponding open GitHub issues. The script utilizes the YouTube API, Google API Client Library, OpenAI API, and PyGitHub to interact with YouTube and GitHub, BeautifulSoup for web scraping, and the youtube_transcript_api for handling video transcripts.
You could potentially use it for other playlists.
Please be aware that using OpenAI's API key will incur costs. Using OpenAI's GPT-3.5-turbo model seems cheaper than text-davinci-003. You can easily change it to OpenAI's GPT-3.5-turbo model.
## Dependencies
- youtube_transcript_api
- google-api-python-client
- BeautifulSoup4
- openai
- python-dotenv
- PyGithub## Usage
1. Set up API keys and access tokens for OpenAI, YouTube, and GitHub in your `.env` file.
2. Set the `REPO_NAME` and `playlist_url` variables to the GitHub repository and YouTube playlist URL you want to work with.
3. Run the script: `python main.py`## Functions
- `get_video_urls_and_titles(api_key, playlist_id)`: Retrieves video URLs and titles from a YouTube playlist.
- `get_video_titles_from_issues(issues_url)`: Scrapes video titles from open GitHub issues.
- `match_titles_and_urls(issues_titles, video_info)`: Matches video titles from GitHub issues with their URLs.
- `clean_text(text)`: Cleans up the text by removing timestamps.
- `process_chunk(current_text_chunk, chunk_start_time)`: Generates a timestamp description using OpenAI API.
- `process_transcript(whole_transcript)`: Processes a video transcript by splitting it into chunks and generating timestamp descriptions.
- `add_issue_comment_with_confirmation(issue_titles, comment_body)`: Adds a comment with generated timecodes to the corresponding GitHub issue after user confirmation.## Workflow
1. Load environment variables and authenticate with APIs.
2. Get video URLs and titles from the specified YouTube playlist.
3. Scrape video titles from open GitHub issues.
4. Match video titles from GitHub issues with their YouTube URLs.
5. Loop through matched_videos items and generate timestamp descriptions for video transcripts.
6. Add the generated timecodes as a comment to the corresponding GitHub issue after user confirmation.