An open API service indexing awesome lists of open source software.

https://github.com/scottgriv/python-pdf_web_scraper

Scrape a web page for pdf files and download them all locally.
https://github.com/scottgriv/python-pdf_web_scraper

pdf pdf-download pdf-downloader pdf-scraper pdf-scraping python utility utility-app utility-application utility-script web-scraper web-scraping

Last synced: about 1 month ago
JSON representation

Scrape a web page for pdf files and download them all locally.

Awesome Lists containing this project

README

        









Python Badge


GitHub Badge
Email Badge
BuyMeACoffee Badge


Bronze

---------------

Python PDF Web Scraper

A simple Python script that scrapes web pages for PDF files and downloads them to a local directory.

---------------

## Table of Contents

- [Getting Started](#getting-started)
- [Disclaimer](#disclaimer)
- [Resources](#resources)
- [License](#license)
- [Credits](#credits)

## Getting Started

1. Clone this repository.
2. Install [Python](https://www.python.org/downloads/).
3. Install [Pip](https://pip.pypa.io/en/stable/installing/).
4. Install the required packages using `pip install -r requirements.txt` in your terminal.
5. Place the web page URL and output file location in the `main.py` file here:
```python
# Define your URL
url = "https://yourWebsiteURL"

# By default, the script will download PDF files to the downloads folder.
# You can change the folder location by updating the folder_location variable.
# Example: folder_location = r'/Users/yourname/Documents'

folder_location = r'./downloads'
```
6. Run the script: `python main.py`
7. PDF files will be downloaded to your local directory.

## Disclaimer

> [!IMPORTANT]
> This tool is not intended to break copyright laws and is for personal use only. It merely automates the retrieval of publicly available data using standard web scraping techniques.
> The copyright of the data retrieved belongs to its respective owners, and I am not responsible for any illegal redistribution or misuse of data obtained using this tool.

> [!CAUTION]
> Use of this tool is at your own risk. By using this tool, you agree that you are solely responsible for any legal issues that may arise from your use of this tool.

## Resources

- [Python](https://www.python.org)
- [Pip](https://pip.pypa.io/en/stable/installing/)
- [Beautiful Soup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)
- [Urllib3](https://urllib3.readthedocs.io/en/latest/)

## License

This project is released under the terms of **The Unlicense**, which allows you to use, modify, and distribute the code as you see fit.
- [The Unlicense](https://choosealicense.com/licenses/unlicense/) removes traditional copyright restrictions, giving you the freedom to use the code in any way you choose.
- For more details, see the [LICENSE](LICENSE) file in this repository.

## Credits

**Author:** [Scott Grivner](https://github.com/scottgriv)

**Email:** [[email protected]](mailto:[email protected])

**Website:** [scottgrivner.dev](https://www.scottgrivner.dev)

**Reference:** [Main Branch](https://github.com/scottgriv/python-pdf_to_audio)

---------------