https://github.com/openbookpublishers/archive_urls_pdf
Archive the URLs reported in a PDF file to Wayback Machine
https://github.com/openbookpublishers/archive_urls_pdf
Last synced: 5 months ago
JSON representation
Archive the URLs reported in a PDF file to Wayback Machine
- Host: GitHub
- URL: https://github.com/openbookpublishers/archive_urls_pdf
- Owner: OpenBookPublishers
- License: gpl-3.0
- Created: 2020-02-25T13:07:31.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2021-04-07T07:47:18.000Z (about 5 years ago)
- Last Synced: 2025-06-12T22:35:46.995Z (about 1 year ago)
- Language: Dockerfile
- Size: 14.6 KB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# archive_urls_pdf
This piece of software extracts URLs from PDF files and pass them on to [archiveurl](https://github.com/OpenBookPublishers/archiveurl) to archive them on Wayback Machine.
# Run
Store the PDF file as `file.pdf` in the project folder and then run:
$ `docker build . -t openbookpublishers/archive_urls_pdf`
$ `docker run --rm \
-v /path/to/local/file.pdf:/archive_urls_pdf/file.pdf \
openbookpublishers/archive_urls_pdf`