https://github.com/codeofrahul/image-scraper-using-selenium-and-python
This project is a web image scraper that uses Selenium and Python to automate image extraction from Google Images. It scrolls through the page, collects high-quality images, and downloads them efficiently. The script is designed with error handling and retry mechanisms to ensure robustness.
https://github.com/codeofrahul/image-scraper-using-selenium-and-python
Last synced: 10 months ago
JSON representation
This project is a web image scraper that uses Selenium and Python to automate image extraction from Google Images. It scrolls through the page, collects high-quality images, and downloads them efficiently. The script is designed with error handling and retry mechanisms to ensure robustness.
- Host: GitHub
- URL: https://github.com/codeofrahul/image-scraper-using-selenium-and-python
- Owner: CodeofRahul
- License: mit
- Created: 2025-03-20T08:57:28.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2025-03-20T09:31:08.000Z (11 months ago)
- Last Synced: 2025-03-20T10:26:48.997Z (11 months ago)
- Language: Python
- Size: 2.09 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
[](https://opensource.org/licenses/MIT)
[](https://www.selenium.dev/)
[](https://www.python.org/)
[](https://en.wikipedia.org/wiki/Automation)
# Image-Scraper-using-Selenium-and-Python
This project is a web image scraper that uses Selenium and Python to automate image extraction from Google Images. It scrolls through the page, collects high-quality images, and downloads them efficiently. The script is designed with error handling and retry mechanisms to ensure robustness.
✨ **Features**
* **Automated Web Scraping:** Uses Selenium to interact with the browser and fetch images dynamically.
* **Smart Scrolling:** Automatically scrolls to load more images.
* **Image Filtering:** Filters out low-resolution and placeholder images.
* **Base64 & Direct Image Support:** Handles both direct image URLs and base64-encoded images.
* **Retry Mechanism:** Retries failed downloads to improve success rate.
* **Customizable Settings:** Modify search queries, scroll limits, and number of images easily.
📂 **Project Structure**
🛠️ **Setup & Installation**
Follow these steps to get started:
1️⃣ **Install Dependencies**
Make sure you have Python 3.x installed, then run:
```bash
pip install selenium requests webdriver-manager
```
2️⃣ **Run the Script**
Modify the `search_term` in `Image_Scrape.py` to your desired query and execute:
```bash
Image_Scrape.py
```
3️⃣ **View Downloaded Images**
The images will be saved in the `scraped_images` folder inside the specified directory.
🏗️ **How It Works**
- **Initialize WebDriver:** Sets up ChromeDriver with necessary configurations.
- **Navigate to Google Images:** Searches for the specified term.
- **Scroll Down:** Loads more images dynamically.
- **Scrape Image URLs:** Extracts valid image links.
- **Download Images:** Saves images in a local folder.
- **Exit Browser:** Closes WebDriver after execution.
🔍 **Code Breakdown**
- `create_driver()`: Sets up a headless Chrome browser for scraping.
- `scroll_down()`: Scrolls the page to load more images.
- `scrape_all_images()`: Extracts image URLs while filtering out placeholders.
- `save_image()`: Downloads images and saves them efficiently.
- `scrap_and_save_images()`: Combines all functions to scrape and store images.