https://github.com/katagaki/lingus
PDF and Markdown conversion using Docling and LibreOffice
https://github.com/katagaki/lingus
docling libreoffice python
Last synced: 8 months ago
JSON representation
PDF and Markdown conversion using Docling and LibreOffice
- Host: GitHub
- URL: https://github.com/katagaki/lingus
- Owner: katagaki
- License: unlicense
- Created: 2025-04-19T01:12:04.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-04-19T04:33:36.000Z (about 1 year ago)
- Last Synced: 2025-06-02T01:04:02.087Z (about 1 year ago)
- Topics: docling, libreoffice, python
- Language: Python
- Homepage:
- Size: 81.1 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Lingus
Converts Microsoft Office files into Markdown.
Uses LibreOffice for conversion to PDF,
then uses Docling for conversion to Markdown.
## Running
1. Build the Docker image.
```zsh
docker build -t lingus .
```
2. Drop your files into the `docs` folder.
3. Create and run a new Docker container.
```zsh
docker run -v $(pwd)/docs:/app/docs -v $(pwd)/outputs:/app/outputs lingus
```
## Notes
- Docling is configured to only use the CPU.
To use the GPU, install the appropriate variant of PyTorch (good luck),
then attach your GPU to the Docker container.
- No file type checks are currently being performed,
which may result in non-Microsoft Office files and
hidden system files being picked up.