https://github.com/romangw/img-scrape
Code for an image-pulling bot for Python.
https://github.com/romangw/img-scrape
data-scraping image-scraper projects python web-scraping
Last synced: 10 months ago
JSON representation
Code for an image-pulling bot for Python.
- Host: GitHub
- URL: https://github.com/romangw/img-scrape
- Owner: RomanGW
- Created: 2024-10-15T19:02:54.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-02-26T00:56:51.000Z (over 1 year ago)
- Last Synced: 2025-02-26T01:33:45.228Z (over 1 year ago)
- Topics: data-scraping, image-scraper, projects, python, web-scraping
- Language: Python
- Homepage:
- Size: 153 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# img-scrape
Code for an image-pulling bot using Python.
Requirements:
- datetime
- bs4
- PIL
- os
- urllib.parse
- os
- requests
- Python
A little project used to put something in my GitHub account. Meant to show off some of my Computer Science oriented skills with Python programming, as well as my Data oriented skills with data scraping and organization. img-scrape pulls all images from a website entered by the user by gathering "img-src"s using the find_all() function in bs4 (BeautifulSoup). It uses requests to access data from the website entered. Besides bs4 and requests, it also uses Image from PIL, urlparse from urllib.parse, datetime, and os for the purposes of saving Image data, parsing urls, getting the name for a save folder, and creating that save folder in /dump, respectively.
# How to use:
Please download the entire directory as images downloaded while using img-scrape.py will be downloaded to a generated folder within the /dump folder in the directory.
Upon running img-scrape.py, you will be prompted to enter a full url link. This means including scheme ("https://"), subdomain, domin and domain extension (e.g.: "www.google.com", "www.en.wikipedia.org"), and net location or path. The images found on the page entered will be added to a generated folder based on the time the program is ran within the /dump folder in the directory. In the case an error occurs, an error message will be displayed and an error log will be generated in the folder previously mentioned within the /dump/ folder. Please send this to me!