Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/scottgriv/python-pdf_web_scraper
Scrape a web page for pdf files and download them all locally.
https://github.com/scottgriv/python-pdf_web_scraper
pdf pdf-download pdf-downloader pdf-scraper pdf-scraping python utility utility-app utility-application utility-script web-scraper web-scraping
Last synced: 22 days ago
JSON representation
Scrape a web page for pdf files and download them all locally.
- Host: GitHub
- URL: https://github.com/scottgriv/python-pdf_web_scraper
- Owner: scottgriv
- License: unlicense
- Created: 2023-01-02T23:08:21.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2024-02-16T17:58:03.000Z (11 months ago)
- Last Synced: 2024-12-08T16:46:50.925Z (28 days ago)
- Topics: pdf, pdf-download, pdf-downloader, pdf-scraper, pdf-scraping, python, utility, utility-app, utility-application, utility-script, web-scraper, web-scraping
- Language: Python
- Homepage:
- Size: 365 KB
- Stars: 2
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
---------------
Python PDF Web Scraper
A simple Python script that scrapes web pages for PDF files and downloads them to a local directory.
---------------
## Table of Contents
- [Getting Started](#getting-started)
- [Resources](#resources)
- [License](#license)
- [Credits](#credits)## Getting Started
1. Clone this repository.
2. Install [Python](https://www.python.org/downloads/).
3. Install [Pip](https://pip.pypa.io/en/stable/installing/).
4. Install `pip installl beautifulsoup4` and `pip install urllib3` in your terminal.
5. Place the web page URL and output file location in the `main.py` file here:
```python
# Define your URL
url = "https://yourWebsiteURL"#If there is no such folder, the script will create one automatically
folder_location = r'/YOUR/OUTPUT/FILE/PATH'
```
6. Run the script: `python main.py`
7. PDF files will be downloaded to your local directory.## Resources
- [Python](https://www.python.org)
- [Pip](https://pip.pypa.io/en/stable/installing/)
- [Beautiful Soup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)
- [Urllib3](https://urllib3.readthedocs.io/en/latest/)## License
This project is released under the terms of **The Unlicense**, which allows you to use, modify, and distribute the code as you see fit.
- [The Unlicense](https://choosealicense.com/licenses/unlicense/) removes traditional copyright restrictions, giving you the freedom to use the code in any way you choose.
- For more details, see the [LICENSE](LICENSE) file in this repository.## Credits
**Author:** [Scott Grivner](https://github.com/scottgriv)
**Email:** [[email protected]](mailto:[email protected])
**Website:** [scottgrivner.dev](https://www.scottgrivner.dev)
**Reference:** [Main Branch](https://github.com/scottgriv/python-pdf_to_audio)---------------