https://github.com/scottgriv/python-pdf_web_scraper

Scrape a web page for pdf files and download them all locally.
https://github.com/scottgriv/python-pdf_web_scraper

pdf pdf-download pdf-downloader pdf-scraper pdf-scraping python utility utility-app utility-application utility-script web-scraper web-scraping

Last synced: about 1 year ago
JSON representation

Scrape a web page for pdf files and download them all locally.

Host: GitHub
URL: https://github.com/scottgriv/python-pdf_web_scraper
Owner: scottgriv
License: unlicense
Created: 2023-01-02T23:08:21.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2025-03-22T14:44:18.000Z (over 1 year ago)
Last Synced: 2025-06-26T21:44:30.512Z (about 1 year ago)
Topics: pdf, pdf-download, pdf-downloader, pdf-scraper, pdf-scraping, python, utility, utility-app, utility-application, utility-script, web-scraper, web-scraping
Language: Python
Homepage:
Size: 375 KB
Stars: 12
Watchers: 1
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          



    

        

    








    

    


    

    

    

    


    



---------------

Python PDF Web Scraper


A simple Python script that scrapes web pages for PDF files and downloads them to a local directory.

---------------

## Table of Contents

- [Getting Started](#getting-started)

- [Disclaimer](#disclaimer)

- [Resources](#resources)

- [License](#license)

- [Credits](#credits)

## Getting Started

1. Clone this repository.

2. Install [Python](https://www.python.org/downloads/).

3. Install [Pip](https://pip.pypa.io/en/stable/installing/).

4. Install the required packages using `pip install -r requirements.txt` in your terminal.

5. Place the web page URL and output file location in the `main.py` file here:

```python

# Define your URL

url = "https://yourWebsiteURL"

# By default, the script will download PDF files to the downloads folder.

# You can change the folder location by updating the folder_location variable.

# Example: folder_location = r'/Users/yourname/Documents'

folder_location = r'./downloads'

```

6. Run the script: `python main.py`

7. PDF files will be downloaded to your local directory.

## Disclaimer

> [!IMPORTANT]

> This tool is not intended to break copyright laws and is for personal use only. It merely automates the retrieval of publicly available data using standard web scraping techniques.

> The copyright of the data retrieved belongs to its respective owners, and I am not responsible for any illegal redistribution or misuse of data obtained using this tool.

> [!CAUTION]

> Use of this tool is at your own risk. By using this tool, you agree that you are solely responsible for any legal issues that may arise from your use of this tool.

## Resources

- [Python](https://www.python.org)

- [Pip](https://pip.pypa.io/en/stable/installing/)

- [Beautiful Soup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)

- [Urllib3](https://urllib3.readthedocs.io/en/latest/)

## License

This project is released under the terms of **The Unlicense**, which allows you to use, modify, and distribute the code as you see fit. 

- [The Unlicense](https://choosealicense.com/licenses/unlicense/) removes traditional copyright restrictions, giving you the freedom to use the code in any way you choose.

- For more details, see the [LICENSE](LICENSE) file in this repository.

## Credits

**Author:** [Scott Grivner](https://github.com/scottgriv) 


**Email:** [scott.grivner@gmail.com](mailto:scott.grivner@gmail.com) 


**Website:** [scottgrivner.dev](https://www.scottgrivner.dev) 


**Reference:** [Main Branch](https://github.com/scottgriv/python-pdf_to_audio) 


---------------

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/scottgriv/python-pdf_web_scraper

Awesome Lists containing this project

README

Python PDF Web Scraper