{"id":24793930,"url":"https://github.com/mlsakiit/forgetube","last_synced_at":"2025-04-10T20:13:11.252Z","repository":{"id":274088533,"uuid":"912110560","full_name":"MLSAKIIT/ForgeTube","owner":"MLSAKIIT","description":"MLSA Project Wing 2025 - Machine Learning","archived":false,"fork":false,"pushed_at":"2025-03-10T12:47:43.000Z","size":137654,"stargazers_count":13,"open_issues_count":1,"forks_count":12,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-24T17:53:40.113Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MLSAKIIT.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-01-04T16:25:48.000Z","updated_at":"2025-03-05T12:46:36.000Z","dependencies_parsed_at":"2025-03-10T13:35:04.783Z","dependency_job_id":"4985a49b-4715-4b71-a684-0c3979c3245b","html_url":"https://github.com/MLSAKIIT/ForgeTube","commit_stats":null,"previous_names":["mlsakiit/forgetube"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MLSAKIIT%2FForgeTube","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MLSAKIIT%2FForgeTube/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MLSAKIIT%2FForgeTube/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MLSAKIIT%2FForgeTube/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MLSAKIIT","download_url":"https://codeload.github.com/MLSAKIIT/ForgeTube/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248288361,"owners_count":21078903,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-01-29T22:30:02.224Z","updated_at":"2025-04-10T20:13:11.229Z","avatar_url":"https://github.com/MLSAKIIT.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ca\u003e\n  \u003ch1 align=\"center\"\u003e MLSA Project Wing: ML \u003c/h1\u003e\n\u003c/a\u003e\n\u003cp align=\"center\"\u003e \u003cimg src=\"https://avatars.githubusercontent.com/u/79008924?s=280\u0026v=4\"\u003e\n\u003c/p\u003e\n\n\u003ca\u003e\n  \u003ch1 align=\"center\"\u003e ForgeTube \u003c/h1\u003e\n\u003c/a\u003e\n\n[![GitHub](https://img.shields.io/badge/GitHub-MLSAKIIT-181717?style=for-the-badge\u0026logo=github)](https://github.com/MLSAKIIT)\n[![ForgeTube](https://img.shields.io/badge/ForgeTube-Repository-181717?style=for-the-badge\u0026logo=github)](https://github.com/MLSAKIIT/ForgeTube)\n[![YouTube](https://img.shields.io/badge/YouTube-ForgeTube-FF0000?style=for-the-badge\u0026logo=youtube\u0026logoColor=white)](https://www.youtube.com/channel/UCVgzYqxxY6wCIto-Nzx68Uw)\n[![X](https://img.shields.io/badge/X-mlsakiit-1DA1F2?style=for-the-badge\u0026logo=X\u0026logoColor=white)](https://x.com/mlsakiit)\n[![Instagram](https://img.shields.io/badge/Instagram-mlsakiit-E4405F?style=for-the-badge\u0026logo=instagram)](https://www.instagram.com/mlsakiit/)\n[![Discord](https://img.shields.io/badge/Discord-Join%20Us-5865F2?style=for-the-badge\u0026logo=discord)](https://discord.com/invite/P6VCP2Ry3q)\n\n\n## 🚧Our Project:\nOur project focuses on creating an automated video generation system using AI. It transforms text prompts into fully narrated videos by leveraging **large language models** for script generation, **diffusion models** for image creation, and **text to speech systems** for narration. The system processes inputs through multiple stages, from script generation to final video assembly, creating cohesive, engaging content automatically.\n\nThe video generator, designed for sequential content creation, dynamically adapts to different styles and tones while maintaining consistency across visual and audio elements. It also has the ability to add **subtiles** either embedded or through the use of an **srt** file.\n\nThis project demonstrates the potential of combining multiple AI technologies to create an end-to-end content generation pipeline.\n\n## 🖥️Project Stack:\n   `Python 3.11`: Core programming language for the project.\n\n- **Content Generation:**\n   \n   `Gemini API`: To generate the script using `Gemini 2.0 Flash Thinking` model and store it in a `JSON` format with proper audio and visual prompts and respective parameters.\n   \n   `Stable Diffusion XL Base 1.0`: For image generation using diffusion models to run either `locally` or hosted on `Modal`.\n\n   `Kokoro`: An open weight tts model to convert audio prompts into audio.\n\n- **Video Processing**\n    `MoviePy` : For adding text, intro, outro, transition effects, subtitles, audio processing, video processing and Final_Assembly by using  `FFmpeg` under the hood.\n\n- **ML Frameworks:**\n    \n    `PyTorch`: Deep learning framework for model inferencing.\n\n    `Diffusers with SDXL Base 1.0` : Utilize Hugging Face's Diffusers to generate stunning images using the SDXL Base 1.0 model. Enhance your creative projects with state-of-the-art diffusion techniques.\n\n- **Development Tools:**\n    \n    `Jupyter Notebooks`: For development and testing.\n\n    `Google Colab` : For free cloud GPU infrastructure for development and Testing.\n    \n    `Git`: For version control\n\n    `Modal` : For low cost high performance cloud GPU infrastructure.\n\n- **Package Management:**\n\n    `UV`: For fast and efficient dependency management and project setup\n\n## Features\n\n- **Multi-Modal Content Generation**: Seamlessly combines text, image, and audio generation\n- **Style Customization**: Supports different content styles and tones\n- **Modular Architecture**: Each component can be tested and improved independently\n- **Content Segmentation**: Automatically breaks down content into manageable segments\n- **Custom Voice Options**: Multiple TTS voices and emotional tones\n- **Format Flexibility**: Supports different video durations and formats (.mp4 and .mkv)\n- **Performance Metrics**: Tracks generation quality and consistency\n- **Error Handling**: Robust error management across the pipeline\n- **Resource Optimization**: Efficient resource usage during generation\n\n\n## Steps for deployment :\nClone the repo on your system, using : `git clone https://github.com/MLSAKIIT/ForgeTube.git`\n### 1. Using UV for Python Package Management\n\nFor more information, visit the [UV Documentation](https://docs.astral.sh/uv/).\n\nUV is a modern, high-performance Python package and project manager designed to streamline the development process. \n\nHere’s how you can use UV in this project:\n\n1. Install `uv`.\n\n```bash\npip install uv\n```\n2. Download `Python 3.11`\n```bash\nuv python install 3.11\n```\n3. Create a virtual environment \n```bash\nuv venv .venv\n```\n4. Activate your virtual environment\n```bash\n.venv\\scripts\\activate.ps1\n```\n5. Install all dependencies \n```bash\nuv sync\n```\n### 2. Setting up Modal\nFor more information visit the [Modal documentation](https://modal.com/docs/guide).\n\nModal is a cloud function platform that lets you Attach high performance GPUs with a single line of code.\n\nThe nicest thing about all of this is that you don’t have to set up any infrastructure. Just:\n\n1. Create an account at [modal.com](modal.com)\n2. Run `pip install modal` to install the modal Python package\n3. Run `modal setup` to authenticate (if this doesn’t work, try `python -m modal setup`)\n\n### 3. Get your Gemini-API Key :\nTo obtain a Gemini API key from Google AI Studio, follow these detailed steps:\n\n**Step 1: Sign In to Google AI Studio**\n\nNavigate to [Google AI Studio](https://aistudio.google.com/). Once\n signed in, locate and click on the \"Gemini API\" tab. This can typically be found in the main navigation menu or directly on the dashboard. On the Gemini API page, look for a button labeled \"Get API key in Google AI Studio\" and click on it.\n\n**Step 2: Review and Accept Terms of Service**\n\n1. **Review Terms**: A dialog box will appear presenting the Google APIs Terms of Service and the Gemini API Additional Terms of Service. It's essential to read and understand these terms before proceeding.\n2. **Provide Consent**: Check the box indicating your agreement to the terms. Optionally, you can also opt-in to receive updates and participate in research studies related to Google AI.\n3. **Proceed**: Click the \"Continue\" button to move forward.\n\n**Step 3: Create and Secure Your API Key**\n\n1. **Generate API Key**: Click on the \"Create API key\" button. You'll be prompted to choose between creating a new project or selecting an existing one. Make your selection accordingly.\n2. **Retrieve the Key**: Once generated, your unique API key will be displayed. Ensure you copy and store it in a secure location.\n\n**Step 4: Add your Key in `main.py` or `local_main.py`**\n```python\n# 1. Generate the Script\ngem_api = \"Enter your Gemini API Key here\"\nserp_api = \"Enter your Serp API key here\"\n```\n\n\u003e [!IMPORTANT]  \n\u003e Always keep your API key confidential. Avoid sharing it publicly or embedding it directly into client-side code to prevent unauthorized access.\n\n### 4. Setting up Serp-Api\nSerp is used for web scraping google search results on the video topic and gathering additional context to implement Retrieval Augmented Generation (RAG)\n1. Visit [serpapi.com/](https://serpapi.com/) and create an account.\n2. Go to the [dashboard](https://serpapi.com/dashboard), on the top left select Api key.\n3. Copy the API Key and add your Key in `main.py` or `local_main.py`\n```py\n# 1. Generate the Script\ngem_api = \"Enter your Gemini API Key here\"\nserp_api = \"Enter your Serp API key here\"\n```\n### 5. `Kokoro` \nRun the following commands :\n```bash\npython -m pip install spacy # If not insatlled for some reason\npython -m spacy download en_core_web_sm\n```\n### 6. Download and setup FFmpeg\n1. Visit : https://github.com/BtbN/FFmpeg-Builds/releases\n2. Download the setup file for your OS.\n3. On windows download the win64 version, and extract the files.\n4. Make a directory at `C:\\Program Files\\FFmpeg`.\n5. Copy all the files in the directory.\n6. Add `C:\\Program Files\\FFmpeg\\bin` to your `PATH` environment variable.\n7. \n### 7. Start Generating :\nUse `main.py` for running the image generation on Modal or use `main_local.py` to run Stable diffusion XL Locally.\n\n## Troubleshooting\n\u003e [!IMPORTANT]\n\u003e 1. Make sure all the following folders are updated properly :\n```py\nscript_path = \"resources/scripts/\"\nscript_path += \"script.json\" # Name of the script file\nimages_path = \"resources/images/\"\naudio_path = \"resources/audio/\"\nfont_path = \"resources/font/font.ttf\" # Not recommended to change\n```\n\u003e[!IMPORTANT]\n\u003e 2. Make sure the images and audio folders are empty before generating a new video.\n\n3. Name of video file is automatically grabbed from video topic in script. However you may change the following variables to have custom names, if files names are very long then video file wont be generated, so do manually change it in such cases.\n\n```py\nsub_output_file = \"name of the subtitle file.srt\"\nvideo_file = \"name of the video.mp4 or .mkv\"\n```\n\n\n4. **`no module named pip found`** \nTry running the following :\n```bash\npython -m pip install spacy pydub kokoro soundfile torch\npython -m spacy download en_core_web_sm\n```\n\n5. **Serp API not returning any search results :** This is a known issue and is being investigated.\n\n\n\u003e [!IMPORTANT]  \n\u003e Ensure you have sufficient GPU resources for image generation and proper model weights downloaded. It is recommend to use an **NVDIA** GPU with at least **24 GB or more of VRAM** for locally running the image generation and a high single core performance CPU for video assembly.\n\n\u003e [!NOTE]\n\u003e Video generation times may vary based on content length , complexity and hardware used.\n\n## Contributors\n\n| CONTRIBUTORS | MENTORS | CONTENT WRITER |\n| :------:| :-----:| :-----: |\n| Kartikeya Trivedi | Soham Roy | [Name] |\n| Naman Singh | Yash Kumar Gupta | |\n| Soham Mukherjee |  | |\n| Sumedha Gunturi |  | |\n| Souryabrata Goswami|  | |\n| Harshit Agarwal |  | |\n| Rahul Sutradhar |  | |\n| Ayush Mohanty |  | |\n| Shopno Banerjee |  | |\n| Shubham Gupta |  | |\n| Sarthak Singh |  | |\n| Nancy |  | |\n\n\n\n\n## Version\n| Version | Date | Comments |\n| ------- | ---- | -------- |\n| 1.0     | 23/02/2025 | Initial release |\n\n## Future Roadmap\n\n### Part 1: Baseline\n- [x] Pipeline foundations\n- [x] LLM Agent Handing\n- [x] Diffusion Agent Handing\n- [x] TTS Handing\n- [x] Video Assembly Engine\n- [x] Initial Deployment\n\n### Part 2: Advanced\n- [ ] Advanced style transfer capabilities\n- [ ] In-Context Generation for Diffusion Model\n- [ ] Real time generation monitoring\n- [x] Enhanced video transitions\n- [ ] Better quality metrics\n- [ ] Multi language support\n- [ ] Custom character consistency\n- [ ] Animation effects\n\n## Acknowledgements\n- Hugging Face Transformers - https://huggingface.co/transformers\n- Hugging Face Diffusers - https://huggingface.co/diffusers\n- FFmpeg - https://ffmpeg.org/\n- UV - https://docs.astral.sh/uv/\n- MoviePy - https://zulko.github.io/moviepy/getting_started/index.html\n## Project References\n### 1. Large Language Models (LLMs) \u0026 Transformers\n\n* [The Illustrated Transformer](https://jalammar.github.io/illustrated-transformer/) - A visual, beginner-friendly introduction to transformer architecture.\n* [Attention Is All You Need](https://arxiv.org/abs/1706.03762) - The seminal paper on transformer architecture.\n* [Gemini 2.0 Flash Thinking](https://ai.google.dev/gemini-api/docs/thinking)\n---\n### 2. Multi-Agent Systems  \n  * [Introduction to Multi-Agent Systems](https://www.geeksforgeeks.org/what-is-a-multi-agent-system-in-ai/) - Fundamental concepts and principles.\n  * [ A Comprehensive Guide to Understanding LangChain Agents and Tools](https://medium.com/@piyushkashyap045/a-comprehensive-guide-to-understanding-langchain-agents-and-tools-43a187414f4c) - Practical implementation guide.\n* [kokoro](https://github.com/hexgrad/kokoro?tab=readme-ov-file#kokoro)\n  \n### 2. Image Generation \u0026 Processing\n* [Stable Diffusion XL Turbo 1.0 Base](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0)\n* [Stable Diffusion: A Comprehensive End-to-End Guide with Examples](https://medium.com/@jagadeesan.ganesh/stable-diffusion-a-comprehensive-end-to-end-guide-with-examples-47b2c17f15cf)\n* [Stable Diffusion Explained](https://medium.com/@onkarmishra/stable-diffusion-explained-1f101284484d)\n* [Stable Diffusion Explained Step-by-Step with Visualization](https://medium.com/polo-club-of-data-science/stable-diffusion-explained-for-everyone-77b53f4f1c4)\n* [Understanding Stable Diffusion: The Magic Behind AI Image Generation](https://medium.com/@amanatulla1606/understanding-stable-diffusion-the-magic-behind-ai-image-generation-e834e8d92326)\n* [Stable Diffusion Paper](https://arxiv.org/pdf/2403.03206)\n\n---\n### 3. RAG\n* [Retrieval Augmented Generation](https://aiplanet.com/learn/llm-bootcamp/module-13/2380/retrieval-augmented-generation)\n\n---\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmlsakiit%2Fforgetube","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmlsakiit%2Fforgetube","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmlsakiit%2Fforgetube/lists"}