{"id":22940828,"url":"https://github.com/scottgriv/python-pdf_web_scraper","last_synced_at":"2025-06-28T15:36:57.166Z","repository":{"id":163780235,"uuid":"584559082","full_name":"scottgriv/python-pdf_web_scraper","owner":"scottgriv","description":"Scrape a web page for pdf files and download them all locally.","archived":false,"fork":false,"pushed_at":"2025-03-22T14:44:18.000Z","size":384,"stargazers_count":12,"open_issues_count":0,"forks_count":2,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-26T21:44:30.512Z","etag":null,"topics":["pdf","pdf-download","pdf-downloader","pdf-scraper","pdf-scraping","python","utility","utility-app","utility-application","utility-script","web-scraper","web-scraping"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"unlicense","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/scottgriv.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-01-02T23:08:21.000Z","updated_at":"2025-06-16T04:13:40.000Z","dependencies_parsed_at":"2023-12-01T23:30:11.063Z","dependency_job_id":"ca97de33-fe5f-4442-b42b-722c8c6fe3ad","html_url":"https://github.com/scottgriv/python-pdf_web_scraper","commit_stats":null,"previous_names":["scottgriv/python-pdf_web_scraper"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/scottgriv/python-pdf_web_scraper","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scottgriv%2Fpython-pdf_web_scraper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scottgriv%2Fpython-pdf_web_scraper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scottgriv%2Fpython-pdf_web_scraper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scottgriv%2Fpython-pdf_web_scraper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/scottgriv","download_url":"https://codeload.github.com/scottgriv/python-pdf_web_scraper/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scottgriv%2Fpython-pdf_web_scraper/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262454649,"owners_count":23313890,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["pdf","pdf-download","pdf-downloader","pdf-scraper","pdf-scraping","python","utility","utility-app","utility-application","utility-script","web-scraper","web-scraping"],"created_at":"2024-12-14T13:32:33.527Z","updated_at":"2025-06-28T15:36:57.121Z","avatar_url":"https://github.com/scottgriv.png","language":"Python","readme":"\u003c!-- Begin README --\u003e\n\n\u003cdiv align=\"center\"\u003e\n    \u003ca href=\"https://github.com/scottgriv/python-pdf_to_audio\" target=\"_blank\"\u003e\n        \u003cimg src=\"./docs/images/icon.png\" width=\"125\" height=\"125\"/\u003e\n    \u003c/a\u003e\n\u003c/div\u003e\n\u003cbr\u003e\n\u003cp align=\"center\"\u003e\n    \u003ca href=\"https://www.python.org/\"\u003e\u003cimg src=\"https://img.shields.io/badge/Python-3.10.13-3776AB?style=for-the-badge\u0026logo=python\" alt=\"Python Badge\" /\u003e\u003c/a\u003e\n    \u003cbr\u003e\n    \u003ca href=\"https://github.com/scottgriv\"\u003e\u003cimg src=\"https://img.shields.io/badge/github-follow_me-181717?style=for-the-badge\u0026logo=github\u0026color=181717\" alt=\"GitHub Badge\" /\u003e\u003c/a\u003e\n    \u003ca href=\"mailto:scott.grivner@gmail.com\"\u003e\u003cimg src=\"https://img.shields.io/badge/gmail-contact_me-EA4335?style=for-the-badge\u0026logo=gmail\" alt=\"Email Badge\" /\u003e\u003c/a\u003e\n    \u003ca href=\"https://www.buymeacoffee.com/scottgriv\"\u003e\u003cimg src=\"https://img.shields.io/badge/buy_me_a_coffee-support_me-FFDD00?style=for-the-badge\u0026logo=buymeacoffee\u0026color=FFDD00\" alt=\"BuyMeACoffee Badge\" /\u003e\u003c/a\u003e\n    \u003cbr\u003e\n    \u003ca href=\"https://prgportfolio.com\" target=\"_blank\"\u003e\u003cimg src=\"https://img.shields.io/badge/PRG-Bronze Project-CD7F32?style=for-the-badge\u0026logo=data:image/svg%2bxml;base64,PD94bWwgdmVyc2lvbj0iMS4wIiBzdGFuZGFsb25lPSJubyI/Pgo8IURPQ1RZUEUgc3ZnIFBVQkxJQyAiLS8vVzNDLy9EVEQgU1ZHIDIwMDEwOTA0Ly9FTiIKICJodHRwOi8vd3d3LnczLm9yZy9UUi8yMDAxL1JFQy1TVkctMjAwMTA5MDQvRFREL3N2ZzEwLmR0ZCI+CjxzdmcgdmVyc2lvbj0iMS4wIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciCiB3aWR0aD0iMjYuMDAwMDAwcHQiIGhlaWdodD0iMzQuMDAwMDAwcHQiIHZpZXdCb3g9IjAgMCAyNi4wMDAwMDAgMzQuMDAwMDAwIgogcHJlc2VydmVBc3BlY3RSYXRpbz0ieE1pZFlNaWQgbWVldCI+Cgo8ZyB0cmFuc2Zvcm09InRyYW5zbGF0ZSgwLjAwMDAwMCwzNC4wMDAwMDApIHNjYWxlKDAuMTAwMDAwLC0wLjEwMDAwMCkiCmZpbGw9IiNDRDdGMzIiIHN0cm9rZT0ibm9uZSI+CjxwYXRoIGQ9Ik0xMiAzMjggYy04IC04IC0xMiAtNTEgLTEyIC0xMzUgMCAtMTA5IDIgLTEyNSAxOSAtMTQwIDQyIC0zOCA0OAotNDIgNTkgLTMxIDcgNyAxNyA2IDMxIC0xIDEzIC03IDIxIC04IDIxIC0yIDAgNiAyOCAxMSA2MyAxMyBsNjIgMyAwIDE1MCAwCjE1MCAtMTE1IDMgYy04MSAyIC0xMTkgLTEgLTEyOCAtMTB6IG0xMDIgLTc0IGMtNiAtMzMgLTUgLTM2IDE3IC0zMiAxOCAyIDIzCjggMjEgMjUgLTMgMjQgMTUgNDAgMzAgMjUgMTQgLTE0IC0xNyAtNTkgLTQ4IC02NiAtMjAgLTUgLTIzIC0xMSAtMTggLTMyIDYKLTIxIDMgLTI1IC0xMSAtMjIgLTE2IDIgLTE4IDEzIC0xOCA2NiAxIDc3IDAgNzIgMTggNzIgMTMgMCAxNSAtNyA5IC0zNnoKbTExNiAtMTY5IGMwIC0yMyAtMyAtMjUgLTQ5IC0yNSAtNDAgMCAtNTAgMyAtNTQgMjAgLTMgMTQgLTE0IDIwIC0zMiAyMCAtMTgKMCAtMjkgLTYgLTMyIC0yMCAtNyAtMjUgLTIzIC0yNiAtMjMgLTIgMCAyOSA4IDMyIDEwMiAzMiA4NyAwIDg4IDAgODggLTI1eiIvPgo8L2c+Cjwvc3ZnPgo=\" alt=\"Bronze\" /\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n---------------\n\n\u003ch1 align=\"center\"\u003ePython PDF Web Scraper\u003c/h1\u003e\n\nA simple Python script that scrapes web pages for PDF files and downloads them to a local directory.\n\n---------------\n\n## Table of Contents\n\n- [Getting Started](#getting-started)\n- [Disclaimer](#disclaimer)\n- [Resources](#resources)\n- [License](#license)\n- [Credits](#credits)\n\n## Getting Started\n\n1. Clone this repository.\n2. Install [Python](https://www.python.org/downloads/).\n3. Install [Pip](https://pip.pypa.io/en/stable/installing/).\n4. Install the required packages using `pip install -r requirements.txt` in your terminal.\n5. Place the web page URL and output file location in the `main.py` file here:\n```python\n# Define your URL\nurl = \"https://yourWebsiteURL\"\n\n# By default, the script will download PDF files to the downloads folder.\n# You can change the folder location by updating the folder_location variable.\n# Example: folder_location = r'/Users/yourname/Documents'\n\nfolder_location = r'./downloads'\n```\n6. Run the script: `python main.py`\n7. PDF files will be downloaded to your local directory.\n\n## Disclaimer\n\n\u003e [!IMPORTANT]\n\u003e This tool is not intended to break copyright laws and is for personal use only. It merely automates the retrieval of publicly available data using standard web scraping techniques.\n\u003e The copyright of the data retrieved belongs to its respective owners, and I am not responsible for any illegal redistribution or misuse of data obtained using this tool.\n\n\u003e [!CAUTION]\n\u003e Use of this tool is at your own risk. By using this tool, you agree that you are solely responsible for any legal issues that may arise from your use of this tool.\n\n## Resources\n\n- [Python](https://www.python.org)\n- [Pip](https://pip.pypa.io/en/stable/installing/)\n- [Beautiful Soup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)\n- [Urllib3](https://urllib3.readthedocs.io/en/latest/)\n\n## License\n\nThis project is released under the terms of **The Unlicense**, which allows you to use, modify, and distribute the code as you see fit. \n- [The Unlicense](https://choosealicense.com/licenses/unlicense/) removes traditional copyright restrictions, giving you the freedom to use the code in any way you choose.\n- For more details, see the [LICENSE](LICENSE) file in this repository.\n\n## Credits\n\n**Author:** [Scott Grivner](https://github.com/scottgriv) \u003cbr\u003e\n**Email:** [scott.grivner@gmail.com](mailto:scott.grivner@gmail.com) \u003cbr\u003e\n**Website:** [scottgrivner.dev](https://www.scottgrivner.dev) \u003cbr\u003e\n**Reference:** [Main Branch](https://github.com/scottgriv/python-pdf_to_audio) \u003cbr\u003e\n\n---------------\n\n\u003cdiv align=\"center\"\u003e\n    \u003ca href=\"https://scottgrivner.dev\" target=\"_blank\"\u003e\n        \u003cimg src=\"./docs/images/footer.png\" width=\"100\" height=\"100\"/\u003e\n    \u003c/a\u003e\n\u003c/div\u003e\n\n\u003c!-- End README --\u003e\n","funding_links":["https://www.buymeacoffee.com/scottgriv"],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fscottgriv%2Fpython-pdf_web_scraper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fscottgriv%2Fpython-pdf_web_scraper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fscottgriv%2Fpython-pdf_web_scraper/lists"}