{"id":20127249,"url":"https://github.com/code2k13/emoji_vid_gen","last_synced_at":"2026-02-22T16:35:34.023Z","repository":{"id":225445899,"uuid":"766005456","full_name":"code2k13/emoji_vid_gen","owner":"code2k13","description":"A GenAI-powered script-to-video converter. Creates beautiful videos from text files. Automatically generates narration, images and audio effects.  Can run locally with or without GPUs.  This project is experimental in nature, crafted primarily for educational purposes","archived":false,"fork":false,"pushed_at":"2024-05-11T06:38:54.000Z","size":3830,"stargazers_count":6,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-05-06T18:46:26.815Z","etag":null,"topics":["content-generation","genai","text-to-speech","text-to-video","video-generation"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/code2k13.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-03-02T04:40:37.000Z","updated_at":"2024-12-15T21:41:13.000Z","dependencies_parsed_at":"2024-05-11T07:33:20.866Z","dependency_job_id":null,"html_url":"https://github.com/code2k13/emoji_vid_gen","commit_stats":null,"previous_names":["code2k13/emoji_vid_gen"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/code2k13/emoji_vid_gen","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/code2k13%2Femoji_vid_gen","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/code2k13%2Femoji_vid_gen/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/code2k13%2Femoji_vid_gen/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/code2k13%2Femoji_vid_gen/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/code2k13","download_url":"https://codeload.github.com/code2k13/emoji_vid_gen/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/code2k13%2Femoji_vid_gen/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29718459,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-22T15:10:41.462Z","status":"ssl_error","status_checked_at":"2026-02-22T15:10:04.636Z","response_time":110,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["content-generation","genai","text-to-speech","text-to-video","video-generation"],"created_at":"2024-11-13T20:19:57.674Z","updated_at":"2026-02-22T16:35:34.002Z","avatar_url":"https://github.com/code2k13.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Emoji Video Generator\n\n![sample script converted to video](docs/emoji_vid_generator.gif)\n\nEmojiVidGen is a fun tool that creates videos from text files. It takes input in the form of plain text files containing a script (similar to a story or dialogue). It then transforms this script into a stunning video. EmojiVidGen is based on a plugin system, which allows for experimenting with different models and languages. All you need is some imagination and typing skills !\n\nKey Features\n- Converts text files into visually appealing videos\n- Automatically generates narration, images, and audio effects\n- Designed to run smoothly on computers with 8 GB of memory, offering reasonable processing speeds even without GPUs\n- Utilizes various Generative AI models for its tasks\n- Built on a powerful plugin system, allowing for easy extensibility\n- Switch between different models and spoken languages.\n\n \nWhile initially intended for entertainment with GenAI, `EmojiVidGen` holds significant potential for producing engaging and cool content, especially in capable hands. This project is experimental and primarily crafted for educational purposes, exploring the possibilities of AI-powered video creation.\n\n\u003e This software is intended solely for educational purposes. It is used at your own discretion and risk. Please be aware that the AI models utilized in this code may have restrictions against commercial usage.\n\n## Installation\n\n\n```bash\nsudo apt update\nsudo apt install espeak ffmpeg\n\n```\n\n```bash\ngit clone https://github.com/code2k13/emoji_vid_gen\ncd emoji_vid_gen\nwget https://github.com/googlefonts/noto-emoji/raw/main/fonts/NotoColorEmoji.ttf\n```\n\n\n```bash\npip install -r requirements.txt\n```\n\n## Sample script\n\n\u003e Note: A script should always start with a `Image:` directive\n\n```bash\nImage: Cartoon illustration showing a beautiful landscape with mountains and a road.\nAudio: Tranquil calm music occasional chirping of birds.\nTitle: EmojiVidGen\n🐼: Emoji vid gen is a tool to create videos from text files using AI.\n```\n\n\n## How to run\n\n```bash\npython generate_video.py stories/hello.txt hello.mp4\n```\n\n## A full featured example\n\n```bash\nImage:  A single trophy kept on table. comic book style.\nAudio: Upbeat introduction music for cartoon show.\nTitle: Emoji Quiz Showdown\n🎤: \"Welcome to the Emoji Quiz Showdown! Are you ready to test your knowledge?\"\n🐱: \"Meow! I'm ready!\"\n🐶: \"Woof! Let's do this!\"\nImage: Cartoon illustration of the Eiffel Tower.\n🎤: \"First question What is the capital of France?\"\nAudio: suspenseful music playing.\n🐱: \"Paris!\"\nAudio: people applauding sound\nImage: Cartoon illustration of Mount Everest.\n🎤: \"Correct! One point for the cat! Next question  What is the tallest mountain in the world?\"\nAudio: suspenseful music playing.\n🐶: \"Mount Everest!\"\nAudio: people applauding sound\nImage: Cartoon illustration of a water molecule.\n🎤: \"Right again! One point for the dog! Next question  What is the chemical symbol for water?\"\nAudio: suspenseful music playing.\n🐱: \"H2O!\"\nAudio: people applauding sound\nImage: Cartoon illustration of a globe with seven continents.\n🎤: \"Correct! Another point for the cat! Last question How many continents are there on Earth?\"\nAudio: suspenseful music playing.\n🐶: \"Seven!\"\nAudio: people applauding sound\n🎤: \"Correct! It's a tie! You both did great! Thanks for playing the Emoji Quiz Showdown!\"\n```\n\n## The Narrator\nThe emoji `🎙️` is reserved as narrator. Using it at start of line will cause the system to only generated sound and not output any image on background.\n\n## Using presets\n\nIf you've followed the earlier instructions for video generation, you might have noticed that the default setup uses `espeak` as the text-to-speech engine, resulting in a robotic-sounding output. EmojiVidGen is built with an internal structure comprising of plugins, each capable of modifying how a task is executed or which model is used.\n\nFor instance, you can designate a specific plugin for each type of generation task—be it text-to-image, text-to-audio, or text-to-speech. Because each plugin operates with its unique model and method, configuring these settings individually can be overwhelming. To simplify this process, I've introduced the concept of presets. You can apply a preset by supplying the `--preset` option to the `generate_video.py` file.\n\nFor example the below preset uses a preset called `local_medium`.\n```bash\npython generate_video.py stories/hello.txt hello.mp4 --preset local_medium\n```\n\nAll presets are stored in `./presets folder`. To create a new preset (say `custom_preset`), just create a new `custom_preset.yaml` file in `./presets' folder and start using it like this\n\n```bash\npython generate_video.py stories/hello.txt hello.mp4 --preset custom_preset\n```\n\nNote that the `voice`s used in `characters` section should be supported by the selected `text_to_speech` provider. Images should ideally be PNG files with square aspect ration and transparent background.\n\n## Available Presets\n\n| Preset Name | Description |\n|-----------------|-----------------|\n| openai_basic   | Uses OpenAI for text to speech (standard) and image generation (DALL-E 2 @ 512x512). Needs `OPENAI_API_KEY` environment variable to be populated  |\n| openai_medium   | Similar to openai_basic but uses (DALL-E 3 @ 1024x1024). Needs `OPENAI_API_KEY` environment variable to be populated  |\n| local_basic   | Uses Huggingface's Stable Diffusion pipeline with `stabilityai/sd-turbo` model for text to image. Uses `espeak` for text to speech and Huggingface's AudioLDM pipeline for text to audio.   |\n| local_basic_gpu    | Same as local_basic, but with cuda support enabled.   |\n| local_medium    | Similar to local_basic but uses `brave` as text to speech engine and `stabilityai/sdxl-turbo` model for text to image   |\n| local_medium    | Same as local_medium, but with cuda support is enabled.   |\n| eleven_medium    | Same as local_medium, but uses `ElevenLabs` text to speech API support is enabled. Needs internet and `ELEVEN_API_KEY` variable to be defined in `.env` file. Needs internet and ElevenLabs account.   |\n| parler_medium    | Same as local_medium, but uses `parler` text to speech API support is enabled.|\n\n## Configuring characters \nSometimes you may not want to use emojis as characters in your video or use a different voice for each character. This can now be achieved using the `characters` section in preset yaml files. Given below is an example of how such a section might look like: \n\n```yaml\nglobal:\n  width: 512\n  height: 512 \n  use_cuda: \"false\"\n  characters:\n    - name: \"🎤\"\n      voice: \"fable\"\n\n    - name: \"🐱\"\n      image: \"/workspace/emoji_vid_gen/cat.png\"\n      voice: \"alloy\"\n\n    - name: \"🐶\"\n      image: \"/workspace/emoji_vid_gen/dog.png\"\n      voice: \"echo\"\n\ntext_to_speech:\n  provider: openai\n  voice: Nova\n```\n\n## Creating custom presets\n\nWIP\n\n\n## About Cache\n\nEmojiVidGen utilizes a cache mechanism to retain assets produced during video creation, each associated with the specific 'prompt' used. This feature proves highly beneficial, especially when iteratively refining videos, eliminating the need to regenerate assets repetitively. However, please be aware that the `.cache` directory is not automatically cleared. It's advisable to clear it upon completing a video project and beginning another.\n\n\u003e Tip: To force re-creation of cached assets make minorinor alterations to the 'prompt' such as adding a space or punctuation\n\n## Using pre-created assets\n\nEnsure that asset files are present in `.cache` folder. Create the script in this manner\n\n```bash\nImage: .cache/existing_background_hd.png\nAudio: Funny opening music jingle.\nTitle: EmojiVidGen\n🐼: .cache/existing_speech.wav\n```\n\n## Change default width and height of image\n\nCopy a suitable preset file and modify following lines:\n\n```yaml\nglobal:\n  width: 1152\n  height: 896\n```\n\nNote: This setting does affect the output of stable diffusion. Not all resolutions work that well. For  more information checkout this\n https://replicate.com/guides/stable-diffusion/how-to-use/ . Stable Diffusion seems to work well with square aspect ratios.\n\n\n## Known issues\n\nYou will see this error message when using `espeak` text to speech provider. \n\n```bash\nTraceback (most recent call last):\n  File \"/usr/local/lib/python3.10/dist-packages/pyttsx3/drivers/espeak.py\", line 171, in _onSynth\n    self._proxy.notify('finished-utterance', completed=True)\nReferenceError: weakly-referenced object no longer exists\n```\n\nIgnore this error for now as it does not affect the output.\n\n\nIf you receive the below error, delete the `.cache` directory\n\n```bash\n  File \"plyvel/_plyvel.pyx\", line 247, in plyvel._plyvel.DB.__init__\n  File \"plyvel/_plyvel.pyx\", line 88, in plyvel._plyvel.raise_for_status\nplyvel._plyvel.IOError: b'IO error: lock .cache/asset/LOCK: Resource temporarily unavailable'\n```\n\n## Citation\n\n```\n@misc{lacombe-etal-2024-parler-tts,\n  author = {Yoach Lacombe and Vaibhav Srivastav and Sanchit Gandhi},\n  title = {Parler-TTS},\n  year = {2024},\n  publisher = {GitHub},\n  journal = {GitHub repository},\n  howpublished = {\\url{https://github.com/huggingface/parler-tts}}\n}\n```\n\n```\n@misc{lyth2024natural,\n      title={Natural language guidance of high-fidelity text-to-speech with synthetic annotations},\n      author={Dan Lyth and Simon King},\n      year={2024},\n      eprint={2402.01912},\n      archivePrefix={arXiv},\n      primaryClass={cs.SD}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcode2k13%2Femoji_vid_gen","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcode2k13%2Femoji_vid_gen","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcode2k13%2Femoji_vid_gen/lists"}