Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/scottgriv/python-pdf_web_scraper

Scrape a web page for pdf files and download them all locally.
https://github.com/scottgriv/python-pdf_web_scraper

pdf pdf-download pdf-downloader pdf-scraper pdf-scraping python utility utility-app utility-application utility-script web-scraper web-scraping

Last synced: 22 days ago
JSON representation

Scrape a web page for pdf files and download them all locally.

Awesome Lists containing this project

README

        









Python Badge


GitHub Badge
Email Badge
BuyMeACoffee Badge


Bronze

---------------

Python PDF Web Scraper

A simple Python script that scrapes web pages for PDF files and downloads them to a local directory.

---------------

## Table of Contents

- [Getting Started](#getting-started)
- [Resources](#resources)
- [License](#license)
- [Credits](#credits)

## Getting Started

1. Clone this repository.
2. Install [Python](https://www.python.org/downloads/).
3. Install [Pip](https://pip.pypa.io/en/stable/installing/).
4. Install `pip installl beautifulsoup4` and `pip install urllib3` in your terminal.
5. Place the web page URL and output file location in the `main.py` file here:
```python
# Define your URL
url = "https://yourWebsiteURL"

#If there is no such folder, the script will create one automatically
folder_location = r'/YOUR/OUTPUT/FILE/PATH'
```
6. Run the script: `python main.py`
7. PDF files will be downloaded to your local directory.

## Resources

- [Python](https://www.python.org)
- [Pip](https://pip.pypa.io/en/stable/installing/)
- [Beautiful Soup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)
- [Urllib3](https://urllib3.readthedocs.io/en/latest/)

## License

This project is released under the terms of **The Unlicense**, which allows you to use, modify, and distribute the code as you see fit.
- [The Unlicense](https://choosealicense.com/licenses/unlicense/) removes traditional copyright restrictions, giving you the freedom to use the code in any way you choose.
- For more details, see the [LICENSE](LICENSE) file in this repository.

## Credits

**Author:** [Scott Grivner](https://github.com/scottgriv)

**Email:** [[email protected]](mailto:[email protected])

**Website:** [scottgrivner.dev](https://www.scottgrivner.dev)

**Reference:** [Main Branch](https://github.com/scottgriv/python-pdf_to_audio)

---------------