{"id":31977504,"url":"https://github.com/mett29/paper2video","last_synced_at":"2026-04-10T10:11:46.370Z","repository":{"id":254755836,"uuid":"846990462","full_name":"mett29/paper2video","owner":"mett29","description":"Tool designed to transform research papers from arXiv into engaging presentations and videos, ready for upload to YouTube.","archived":false,"fork":false,"pushed_at":"2024-08-26T20:53:36.000Z","size":218,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2024-08-26T23:55:25.545Z","etag":null,"topics":["gemini-api","lipsync","llm","video-generation"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mett29.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-08-24T14:34:40.000Z","updated_at":"2024-08-26T20:58:21.000Z","dependencies_parsed_at":"2024-08-28T01:30:27.513Z","dependency_job_id":null,"html_url":"https://github.com/mett29/paper2video","commit_stats":null,"previous_names":["mett29/paper2video"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/mett29/paper2video","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mett29%2Fpaper2video","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mett29%2Fpaper2video/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mett29%2Fpaper2video/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mett29%2Fpaper2video/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mett29","download_url":"https://codeload.github.com/mett29/paper2video/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mett29%2Fpaper2video/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279021374,"owners_count":26087023,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-14T02:00:06.444Z","response_time":60,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["gemini-api","lipsync","llm","video-generation"],"created_at":"2025-10-14T21:48:31.877Z","updated_at":"2025-10-14T21:48:36.228Z","avatar_url":"https://github.com/mett29.png","language":"Python","funding_links":[],"categories":["📄 Paper→Poster / Slides / Graphical Abstract"],"sub_categories":["Video \u0026 Media Generation"],"readme":"# paper2video\n\nTool designed to transform research papers from arXiv into engaging presentations and videos, ready for upload to YouTube.\n\nThis repository automates the process of explaining complex papers, creating visually appealing presentations, and generating video content from those presentations.\n\nAlmost everything is AI generated:\n- the content of each slide, both title and main text (via Gemini API)\n- the narration for each slide, which is then converted to audio (via Gemini API)\n- the audio (via Text-to-Speech Google API)\n- the animated character lip synched (via Wav2Vec2 using the [PyToon](https://github.com/lukerbs/pytoon) library)\n\n\u003cp align=\"center\"\u003e\n    \u003cimg src=\"/img/example_frame.png\" style=\"width: 80%;\" alt=\"Sample frame of the final video\"\u003e\n\u003c/p\u003e\n\nSome notes:\n- I used Gemini Flash 1.5 to reduce costs (it's also a bit faster).\n- I used [PyLaTeX](https://jeltef.github.io/PyLaTeX/current/) to generate the presentation. At first I was trying to use python-pptx but\nI was having issues with text formatting and the conversion to PDF is a nightmare.\n\nPrerequisites\n-------------\n\n```bash\nsudo apt-get install ffmpeg\nsudo apt-get install latexmk\nsudo apt-get install texlive-latex-extra\n```\n\nPoetry setup\n-------------\n```bash\npoetry install\n```\nIf you do not use poetry, you can manually install the dependencies contained in the pyproject.toml inside your virtual environment.\n\nHow to run\n-------------\nCreate a `.env` file in the project root directory and add the environment variables specified in the `.env.example` file.\n```bash\npython generate.py --arxiv_paper \u003carxiv_paper_id\u003e\n```\n\nAdd an animated speaking character\n-------------\nGoing a step further we can add an animated speaking character to the video.\nI took inspiration from this YT video: https://www.youtube.com/watch?v=ItVnKlwyWwA, and searching online I found out this repository: https://github.com/lukerbs/pytoon.\n\nPyToon uses character images created by https://github.com/carykh/lazykh, the author of the mentioned video.\n\nPyToon uses `ForceAlign` (https://github.com/lukerbs/forcealign), a library for forced alignment of English text to English Audio.\nForceAlign uses Pytorch's `WAV2VEC2` pretrained model for acoustic feature extraction.\n\nMy setup does not have enough RAM to run the above model (I'm on WSL2 on an old PC), hence I decided to split the code and run this part on Colab (the free version, I used a T4 GPU).\nYou can do the same or you can modify the `convert_to_video()` function in order to directly generate the final result.\n\nYou can find the notebook in `animate.ipynb`. It contains all the instructions to make it work.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmett29%2Fpaper2video","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmett29%2Fpaper2video","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmett29%2Fpaper2video/lists"}