https://github.com/scottgriv/python-pdf_web_scraper
Scrape a web page for pdf files and download them all locally.
https://github.com/scottgriv/python-pdf_web_scraper
pdf pdf-download pdf-downloader pdf-scraper pdf-scraping python utility utility-app utility-application utility-script web-scraper web-scraping
Last synced: about 1 month ago
JSON representation
Scrape a web page for pdf files and download them all locally.
- Host: GitHub
- URL: https://github.com/scottgriv/python-pdf_web_scraper
- Owner: scottgriv
- License: unlicense
- Created: 2023-01-02T23:08:21.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2025-01-09T04:25:18.000Z (4 months ago)
- Last Synced: 2025-02-06T07:41:23.094Z (3 months ago)
- Topics: pdf, pdf-download, pdf-downloader, pdf-scraper, pdf-scraping, python, utility, utility-app, utility-application, utility-script, web-scraper, web-scraping
- Language: Python
- Homepage:
- Size: 366 KB
- Stars: 3
- Watchers: 1
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
---------------
Python PDF Web Scraper
A simple Python script that scrapes web pages for PDF files and downloads them to a local directory.
---------------
## Table of Contents
- [Getting Started](#getting-started)
- [Disclaimer](#disclaimer)
- [Resources](#resources)
- [License](#license)
- [Credits](#credits)## Getting Started
1. Clone this repository.
2. Install [Python](https://www.python.org/downloads/).
3. Install [Pip](https://pip.pypa.io/en/stable/installing/).
4. Install the required packages using `pip install -r requirements.txt` in your terminal.
5. Place the web page URL and output file location in the `main.py` file here:
```python
# Define your URL
url = "https://yourWebsiteURL"# By default, the script will download PDF files to the downloads folder.
# You can change the folder location by updating the folder_location variable.
# Example: folder_location = r'/Users/yourname/Documents'folder_location = r'./downloads'
```
6. Run the script: `python main.py`
7. PDF files will be downloaded to your local directory.## Disclaimer
> [!IMPORTANT]
> This tool is not intended to break copyright laws and is for personal use only. It merely automates the retrieval of publicly available data using standard web scraping techniques.
> The copyright of the data retrieved belongs to its respective owners, and I am not responsible for any illegal redistribution or misuse of data obtained using this tool.> [!CAUTION]
> Use of this tool is at your own risk. By using this tool, you agree that you are solely responsible for any legal issues that may arise from your use of this tool.## Resources
- [Python](https://www.python.org)
- [Pip](https://pip.pypa.io/en/stable/installing/)
- [Beautiful Soup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)
- [Urllib3](https://urllib3.readthedocs.io/en/latest/)## License
This project is released under the terms of **The Unlicense**, which allows you to use, modify, and distribute the code as you see fit.
- [The Unlicense](https://choosealicense.com/licenses/unlicense/) removes traditional copyright restrictions, giving you the freedom to use the code in any way you choose.
- For more details, see the [LICENSE](LICENSE) file in this repository.## Credits
**Author:** [Scott Grivner](https://github.com/scottgriv)
**Email:** [[email protected]](mailto:[email protected])
**Website:** [scottgrivner.dev](https://www.scottgrivner.dev)
**Reference:** [Main Branch](https://github.com/scottgriv/python-pdf_to_audio)---------------