{"id":25891808,"url":"https://github.com/lamm-mit/pdf2audio","last_synced_at":"2025-05-16T05:05:49.959Z","repository":{"id":274810213,"uuid":"861339384","full_name":"lamm-mit/PDF2Audio","owner":"lamm-mit","description":null,"archived":false,"fork":false,"pushed_at":"2025-04-18T11:10:42.000Z","size":15870,"stargazers_count":1218,"open_issues_count":15,"forks_count":156,"subscribers_count":20,"default_branch":"main","last_synced_at":"2025-04-19T00:34:33.391Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lamm-mit.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-09-22T16:25:39.000Z","updated_at":"2025-04-18T11:42:29.000Z","dependencies_parsed_at":null,"dependency_job_id":"5faf4275-e28a-4551-9378-68a0f9fe2802","html_url":"https://github.com/lamm-mit/PDF2Audio","commit_stats":null,"previous_names":["lamm-mit/pdf2audio"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lamm-mit%2FPDF2Audio","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lamm-mit%2FPDF2Audio/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lamm-mit%2FPDF2Audio/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lamm-mit%2FPDF2Audio/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lamm-mit","download_url":"https://codeload.github.com/lamm-mit/PDF2Audio/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254471061,"owners_count":22076585,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-03-02T20:31:22.328Z","updated_at":"2025-05-16T05:05:44.947Z","avatar_url":"https://github.com/lamm-mit.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# PDF to Audio Converter\n\nThis code can be used to convert PDFs into audio podcasts, lectures, summaries, and more. It uses OpenAI's GPT models for text generation and text-to-speech conversion. You can also edit a draft transcript (multiple times) and provide specific comments, or overall directives on how it could be adapted or improved. \n\n![image](https://github.com/user-attachments/assets/ef8a5e84-d532-4e0e-b08b-fb7be2f98469)\n\n## Features\n\n- Upload multiple PDF files\n- Choose from different instruction templates (podcast, lecture, summary, etc.)\n- Customize text generation and audio models\n- Select different voices for speakers\n- Iterate on the draft via specific or general commments, and/or edits to the transcript and specific feedback to the model for improvements\n\n## Use in Colab\n\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lamm-mit/PDF2Audio/blob/main/PDF2Audio.ipynb)\n\n## Local Installation\n\nFollow these steps to set up PDF2Audio on your local machine using Conda:\n\n1. Clone the repository:\n   ```\n   git clone https://github.com/lamm-mit/PDF2Audio.git\n   cd PDF2Audio\n   ```\n\n2. Install Miniconda (if you haven't already):\n   - Download the installer from [Miniconda website](https://docs.conda.io/en/latest/miniconda.html)\n   - Follow the installation instructions for your operating system\n   - Verify the installation:\n   ```\n   conda --version\n   ```\n   \n3. Create a new Conda environment:\n   ```\n   conda create -n pdf2audio python=3.9\n   ```\n\n4. Activate the Conda environment:\n   ```\n   conda activate pdf2audio\n   ```\n\n5. Install the required dependencies:\n   ```\n   pip install -r requirements.txt\n   ```\n\n6. Set up your OpenAI API key:\n   Create a `.env` file in the project root directory and add your OpenAI API key:\n   ```\n   OPENAI_API_KEY=your_api_key_here\n   ```\n\n## Running the App\n\nTo run the PDF2Audio app:\n\n1. Ensure you're in the project directory and your Conda environment is activated:\n   ```\n   conda activate pdf2audio\n   ```\n\n2. Run the Python script that launches the Gradio interface:\n   ```\n   python app.py\n   ```\n\n3. Open your web browser and go to the URL provided in the terminal (typically `http://127.0.0.1:7860`).\n\n4. Use the Gradio interface to upload a PDF file and convert it to audio.\n\n## How to Use\n\n1. Upload one or more PDF files\n2. Select the desired instruction template\n3. Customize the instructions if needed\n4. Click \"Generate Audio\" to create your audio content\n\n## Access via 🤗 Hugging Face Spaces\n\n[lamm-mit/PDF2Audio](https://huggingface.co/spaces/lamm-mit/PDF2Audio)\n\n\n## Example result\n\n\u003caudio controls\u003e\n  \u003csource src=\"[https://user-images.githubusercontent.com/your-username/your-repo/path-to-audio-file.mp3](https://raw.githubusercontent.com/lamm-mit/PDF2Audio/main/SciAgents%20discovery%20summary%20-%20example.mp3)\" type=\"audio/mpeg\"\u003e\n  Your browser does not support the audio element.\n\u003c/audio\u003e\n\n## Note\n\nThis app requires an OpenAI API key to function. \n\n## Credits\n\nThis project was inspired by and based on the code available at [https://github.com/knowsuchagency/pdf-to-podcast](https://github.com/knowsuchagency/pdf-to-podcast) and [https://github.com/knowsuchagency/promptic](https://github.com/knowsuchagency/promptic). \n\n```bibtex\n@article{ghafarollahi2024sciagentsautomatingscientificdiscovery,\n    title={SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning}, \n    author={Alireza Ghafarollahi and Markus J. Buehler},\n    year={2024},\n    eprint={2409.05556},\n    archivePrefix={arXiv},\n    primaryClass={cs.AI},\n    url={https://arxiv.org/abs/2409.05556}, \n}\n@article{buehler2024graphreasoning,\n    title={Accelerating Scientific Discovery with Generative Knowledge Extraction, Graph-Based Representation, and Multimodal Intelligent Graph Reasoning},\n    author={Markus J. Buehler},\n    journal={Machine Learning: Science and Technology},\n    year={2024},\n    url={http://iopscience.iop.org/article/10.1088/2632-2153/ad7228},\n}\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flamm-mit%2Fpdf2audio","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flamm-mit%2Fpdf2audio","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flamm-mit%2Fpdf2audio/lists"}