https://github.com/tonygrif/pdf-finder
A Python program to locate links to PDFs found within a webpage from the command line
https://github.com/tonygrif/pdf-finder
docker pdf-files python web-scraping
Last synced: 7 months ago
JSON representation
A Python program to locate links to PDFs found within a webpage from the command line
- Host: GitHub
- URL: https://github.com/tonygrif/pdf-finder
- Owner: TonyGrif
- License: mit
- Created: 2023-01-26T08:42:57.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-10-07T03:53:20.000Z (about 1 year ago)
- Last Synced: 2025-01-10T20:53:09.953Z (9 months ago)
- Topics: docker, pdf-files, python, web-scraping
- Language: Python
- Homepage: https://hub.docker.com/r/tonygrif/pdf-finder
- Size: 93.8 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# PDF-Finder
A Python program to locate links to PDFs found within a webpage from the command line.## Running with Docker
First, pull the image using `docker pull tonygrif/pdf-finder`.
Upon completion, the application is ready to run via `docker run tonygrif/pdf-finder [URI]`.## Running with Poetry
Ensure required packages are available first through poetry using `poetry install`.
The program can then be run following the syntax: `poetry run python main.py [URI]`.## Sample Execution
When run with `./main.py https://www.cs.odu.edu/~mweigle/courses/cs532/pdfs.html` the first
two PDFs will output:
```
URI: http://www.cs.odu.edu/~mln/pubs/ipres-2018/ipres-2018-jones-archiveit.pdf
Final URI: https://www.cs.odu.edu/~mln/pubs/ipres-2018/ipres-2018-jones-archiveit.pdf
Content Length: 2639215 BytesURI: http://www.cs.odu.edu/~mln/pubs/ipres-2018/ipres-2018-jones-off-topic.pdf
Final URI: https://www.cs.odu.edu/~mln/pubs/ipres-2018/ipres-2018-jones-off-topic.pdf
Content Length: 3119205 Bytes
```