Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/derfenix/webarchive
Own webarchive service
https://github.com/derfenix/webarchive
Last synced: about 2 months ago
JSON representation
Own webarchive service
- Host: GitHub
- URL: https://github.com/derfenix/webarchive
- Owner: derfenix
- License: bsd-3-clause
- Created: 2023-03-26T13:12:07.000Z (over 1 year ago)
- Default Branch: master
- Last Pushed: 2023-12-19T04:44:10.000Z (9 months ago)
- Last Synced: 2024-01-25T07:11:29.558Z (8 months ago)
- Language: Go
- Homepage:
- Size: 248 KB
- Stars: 71
- Watchers: 2
- Forks: 2
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# Own Webarchive
Aimed to be a simple, fast and easy-to-use webarchive for personal or home-net usage.
## Supported store formats
* **headers** — save all headers from response
* **pdf** — save page in pdf
* **single_file** — save html and all its resources (css,js,images) into one html file## Requirements
* Golang 1.19 or higher
* wkhtmltopdf binary in $PATH (to save pages in pdf)## Configuration
The service can be configured via environment variables. There is a list of available
variables:* **DB**
* **DB_PATH** — path for the database files (default `./db`)
* **LOGGING**
* **LOGGING_DEBUG** — enable debug logs (default `false`)
* **API**
* **API_ADDRESS** — address the API server will listen (default `0.0.0.0:5001`)
* **UI**
* **UI_ENABLED** — Enable builtin web UI (default `true`)
* **UI_PREFIX** — Prefix for the web UI (default `/`)
* **UI_THEME** — UI theme name (default `basic`). No other values available yet
* **PDF**
* **PDF_LANDSCAPE** — use landscape page orientation instead of portrait (default `false`)
* **PDF_GRAYSCALE** — use grayscale filter for the output pdf (default `false`)
* **PDF_MEDIA_PRINT** — use media type `print` for the request (default `true`)
* **PDF_ZOOM** — zoom page (default `1.0` i.e. no actual zoom)
* **PDF_VIEWPORT** — use specified viewport value (default `1280x720`)
* **PDF_DPI** — use specified DPI value for the output pdf (default `150`)
* **PDF_FILENAME** — use specified name for output pdf file (default `page.pdf`)*Note*: Prefix **WEBARCHIVE_** can be used with the environment variable names
in case of any conflicts.## Usage
### 1. Start the server
#### Start without docker
```shell
go run ./cmd/server/main.go
```#### Change API address
```shell
API_ADDRESS=127.0.0.1:3001 go run ./cmd/server/main.go
```#### Start in docker
```shell
docker compose up -d webarchive
```### 2. Add a page
```shell
curl -X POST --location "http://localhost:5001/api/v1/pages" \
-H "Content-Type: application/json" \
-d "{
\"url\": \"https://github.com/wkhtmltopdf/wkhtmltopdf/issues/1937\",
\"formats\": [
\"pdf\",
\"headers\"
]
}" | jq .
```or
```shell
curl -X POST --location \
"http://localhost:5001/api/v1/pages?url=https%3A%2F%2Fgithub.com%2Fwkhtmltopdf%2Fwkhtmltopdf%2Fissues%2F1937&formats=pdf%2Cheaders&description=Foo+Bar"
```### 3. Get the page's info
```shell
curl -X GET --location "http://localhost:5001/api/v1/pages/$page_id" | jq .
```
where `$page_id` — value of the `id` field from previous command response.
If `status` field in response is `success` (or `with_errors`) - the `results` field
will contain all processed formats with ids of the stored files.### 4. Open file in browser
```shell
xdg-open "http://localhost:5001/api/v1/pages/$page_id/file/$file_id"
```
Where `$page_id` — value of the `id` field from previous command response, and
`$file_id` — the id of interesting file.### 5. List all stored pages
```shell
curl -X GET --location "http://localhost:5001/api/v1/pages" | jq .
```## Roadmap
- [x] Save page to pdf
- [x] Save URL headers
- [x] Save page to the single-page html
- [ ] Save page to html with separate resource files (?)
- [ ] Basic web UI
- [ ] Optional authentication
- [ ] Multi-user access
- [ ] Support SQL database with or without separate files storage
- [ ] Tags/Categories
- [ ] Save page to markdown