Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/meiniki/paper2go

Paper2Go converts written documents to speech.
https://github.com/meiniki/paper2go

ai llm ollama self-hosted tts

Last synced: about 16 hours ago
JSON representation

Paper2Go converts written documents to speech.

Host: GitHub
URL: https://github.com/meiniki/paper2go
Owner: meiniKi
License: mit
Created: 2025-01-10T21:17:14.000Z (5 days ago)
Default Branch: main
Last Pushed: 2025-01-12T23:27:33.000Z (3 days ago)
Last Synced: 2025-01-13T00:24:48.405Z (3 days ago)
Topics: ai, llm, ollama, self-hosted, tts
Language: Python
Homepage:
Size: 47.9 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: readme.md
- License: LICENSE

Awesome Lists containing this project

README

# Paper2Go

![Paper2Go UI](doc/ui.png "Paper2Go")

Paper2Go converts written documents into speech. It primarily aims to convert research papers in PDF format to a summary that can be listened to on the go. All steps can be performed with self-hosted software.

Through the Web App, the document can be uploaded. Paper2Go uses [Docling](https://github.com/DS4SD/docling) to extract the text and store it in Markdown format. Further, each section is summarized and re-formulated by an LLM (local [Ollama](https://github.com/ollama/ollama)) model to be understandable without visually seeing the document (e.g., explaining formulas and tables). Once done, the sections are individually converted to speech using [Fish-Speech](https://github.com/fishaudio/fish-speech) or [XTTSv2](https://huggingface.co/coqui/XTTS-v2) and named by the enumerated section titles. The audio files are combined in a zip file and can be downloaded via the Web App.

> [!IMPORTANT]
> :triangular_ruler: Please note that Paper2Go is work in progress.

## 🎨 Key Features

🔊 Convert PDF documents to an AI-summarized audiobook

💿 Download the audio files separated by section names as ZIP archive

⚙️ Adjust model parameters directly in the Web app

🎤 Upload voices or record your own voice directly in Web app

💾 Download & restore configurations

## 🚀 Quickstart

You can run the application in a Docker container using the provided `docker-compose.yml` file or run it natively. Either way, you need to clone the following repos first and agree to the model licenses.

Clone the repo.

```shell
https://github.com/meiniKi/Paper2Go.git
```

Clone the models into `models`.

```shell
# install Git Large File Storage, see: https://docs.github.com/en/repositories/working-with-files/managing-large-files/installing-git-large-file-storage
git lfs install
cd models
https://github.com/fishaudio/fish-speech.git
git clone https://huggingface.co/fishaudio/fish-speech-1.5
git clone https://huggingface.co/coqui/XTTS-v2
```

Download the XTTS-v2 model. Run `xtts.py` to check your installation and download the model. You may have to access the license agreement while running the script.

```shell
cd models
python xtts.py
```

### Docker

Make sure all necessary drivers and NVIDIA/CUDA tools are installed on your system. You may want to take a look at [this](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#installing-the-nvidia-container-toolkit) and [this](https://developer.nvidia.com/cuda-downloads) pages. Then, restart the docker.

```bash
sudo apt install -y nvidia-cuda-toolkit
sudo apt install -y nvidia-open
sudo systemctl restart docker
```

For Redis you need to apply the following modification.

```
'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sudo sysctl vm.overcommit_memory=1' for this to take effect.
```

You can build the containers and start them.

```bash
docker compose up --build -d
```

To rebuild the contains, use:
```bash
docker compose up --build --force-recreate -d
```

You can now access the Web UI via `0.0.0.0:8501`, i.e., the `:8501` if you are accessing remotely.

### Native Installation

Create a virtual environment, activate it, and install the requirements.

```shell
cd
python3.10 -m venv .venv
source .venv/bin/activate
./install_dep.sh
```

Run redis, e.g., as a docker container.
```shell
cd redis
docker compose up -d
cd ..
```

Start the worker.
```shell
celery -A tasks worker --loglevel=warning
```

Start the application in another terminal.
```shell
streamlit run app.py
```

You should now be able to access the Web UI via IP and port.

## 🔧 Troubleshooting

### The voice record feature does not record the microphone.
Take a look at "treat insecure origin as secure" in Chrome when you access the Web app without an HTTPS connection.