Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/dveselov/docsbox
Self-hosted document converting service with HTTP API
https://github.com/dveselov/docsbox
document-converting libreoffice microservice self-hosted
Last synced: 28 days ago
JSON representation
Self-hosted document converting service with HTTP API
- Host: GitHub
- URL: https://github.com/dveselov/docsbox
- Owner: dveselov
- License: mit
- Archived: true
- Created: 2016-07-08T02:30:32.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2018-11-08T16:35:28.000Z (about 6 years ago)
- Last Synced: 2024-08-08T23:24:31.129Z (5 months ago)
- Topics: document-converting, libreoffice, microservice, self-hosted
- Language: Python
- Homepage:
- Size: 229 KB
- Stars: 250
- Watchers: 15
- Forks: 39
- Open Issues: 7
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# docsbox [![Build Status](https://travis-ci.org/dveselov/docsbox.svg?branch=master)](https://travis-ci.org/dveselov/docsbox)
`docsbox` is a standalone service that allows you convert office documents, like .docx and .pptx, into more useful filetypes like PDF, for viewing it in browser with PDF.js, or HTML for organizing full-text search of document content.
`docsbox` uses **LibreOffice** (via **LibreOfficeKit**) for document converting.```bash
$ curl -F "[email protected]" http://localhost/api/v1/{
"id": "9b643d78-d0c8-4552-a0c5-111a89896176",
"status": "queued"
}$ curl http://localhost/api/v1/9b643d78-d0c8-4552-a0c5-111a89896176
{
"id": "9b643d78-d0c8-4552-a0c5-111a89896176",
"result_url": "/media/9b643d78-d0c8-4552-a0c5-111a89896176.zip",
"status": "finished"
}$ curl -O http://localhost/media/9b643d78-d0c8-4552-a0c5-111a89896176.zip
$ unzip -l 9b643d78-d0c8-4552-a0c5-111a89896176.zip
Archive: 9b643d78-d0c8-4552-a0c5-111a89896176.zip
Length Date Time Name
--------- ---------- ----- ----
11135 2016-07-08 05:31 txt
373984 2016-07-08 05:31 pdf
147050 2016-07-08 05:31 html
--------- -------
532169 3 files
``````bash
$ cat options.json
{
"formats": ["pdf"],
"thumbnails": {
"size": "640x480",
}
}$ curl -i -F "[email protected]" -F "options=80/tcp docsbox_nginx_1
f6b55773c71d docsbox_rqworker "rq worker -c docsbox" 15 minutes ago Up 8 minutes docsbox_rqworker_1
662b08daefea docsbox_rqscheduler "rqscheduler -H redis" 15 minutes ago Up 8 minutes docsbox_rqscheduler_1
0364df126b36 docsbox_web "gunicorn -b :8000 do" 15 minutes ago Up 8 minutes 8000/tcp docsbox_web_1
5e8c8481e288 redis:latest "docker-entrypoint.sh" 9 hours ago Up 8 minutes 0.0.0.0:6379->6379/tcp docsbox_redis_1
```# Settings (env)
```
REDIS_URL - redis-server url (default: redis://redis:6379/0)
REDIS_JOB_TIMEOUT - job timeout (default: 10 minutes)
ORIGINAL_FILE_TTL - TTL for uploaded file in seconds (default: 10 minutes)
RESULT_FILE_TTL - TTL for result file in seconds (default: 24 hours)
THUMBNAILS_DPI - thumbnails dpi, for bigger thumbnails choice bigger values (default: 90)
LIBREOFFICE_PATH - path to libreoffice (default: /usr/lib/libreoffice/program/)
```# Scaling
Within a single physical server, docsbox can be scaled by docker-compose:
```bash
$ docker-compose scale web=4 rqworker=8
```
For multi-host deployment you'll need to create global syncronized volume (e.g. with flocker), global redis-server and mount it at `docker-compose.yml` file.# Supported filetypes
| Input | Output | Thumbnails |
| ---------------------------------- | ------------------- | ---------- |
| Document `doc` `docx` `odt` `rtf` | `pdf` `txt` `html` | `yes` |
| Presentation `ppt` `pptx` `odp` | `pdf` `html` | `yes` |
| Spreadsheet `xls` `xlsx` `ods` | `pdf` `csv` `html` | `yes` |