Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mohdsaqibhbi/easy_images
Download hundreds of images from Google. Do image post processing later.
https://github.com/mohdsaqibhbi/easy_images
beautifulsoup4 image-duplicate-detection opencv python python-magic requests selenium tabulate tqdm
Last synced: 3 months ago
JSON representation
Download hundreds of images from Google. Do image post processing later.
- Host: GitHub
- URL: https://github.com/mohdsaqibhbi/easy_images
- Owner: mohdsaqibhbi
- License: mit
- Created: 2022-03-13T06:29:11.000Z (over 2 years ago)
- Default Branch: master
- Last Pushed: 2022-03-15T14:33:54.000Z (over 2 years ago)
- Last Synced: 2024-06-27T11:30:52.791Z (5 months ago)
- Topics: beautifulsoup4, image-duplicate-detection, opencv, python, python-magic, requests, selenium, tabulate, tqdm
- Language: Python
- Homepage: https://pypi.org/project/easy-images-downloader/
- Size: 104 KB
- Stars: 11
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Easy Images
This repo contains the Python script that can let you download the images from Google for the given keyword. Also, there are some additional functionalities added that can help in post-image processing.
Preparing the image dataset which is not publicly available, is still a challenging task. Machine Learning Engineers need image data when building something of a Computer Vision. But due to the non-availability of the data, they are left with nothing but 2 choices - either to drop the idea or postpone it until some data is available. And manually downloading the images from Google could take forever.
With this Python script, you can easily download hundreds of images from Google within a couple of minutes and try out your Computer Vision idea. You can also remove duplicate images while downloading or later.
## Features
- Download hundreds of images within couple of minutes with one go.
- Remove duplicate images while downloading.
- Provide the summary of the download.
- Remove duplicate images (later) irrespective of the image size or resolution.
- Resize all the images in a directory.
- Convert all the images in a directory, into grayscale.
- Calculate average image size of all the images in a directory.
- Run above 3 post processing operations just in one go.## Getting Started
### Prerequisites
Require Python >= 3.8### Installation
#### Using Github repo
1. Clone the [repo](https://github.com/mohdsaqibhbi/easy_images) using `git clone https://github.com/mohdsaqibhbi/easy_images.git`.
2. Install the dependencies by running `pip3 install -r requirements.txt`.#### Using pip
`pip3 install easy-images-downloader`### Usage
- To download images from Google.```
from easy_images.easy_images import EasyImageskeywords = "dogs, cats, horse"
easy_response = EasyImages()
easy_response.download(keywords=keywords, max_limit=100)
```- Post processing on all the images in a directory, e.g removing duplicates images.
```
from easy_images.easy_images import EasyImagesimage_dir = "easy_images/dogs"
easy_response = EasyImages()
easy_response.post_processing(image_dir=image_dir, remove_duplicates=True)
```### Parameters
- **Class initialization**
```easy_response = EasyImages(browser_name="chrome", headless=True, loading_timeout=2)```
- ***browser_name*** : *(str), {"chrome", "brave"}, default="chrome"*
The browser to use.
- ***headless*** : *(boolean), default=True*While downloading, whether to open browser or not. Set headless=False to open browser.
- ***loading_timeout*** : *(float), default=2*Page loading timeout. Less for fast and more for slow internet.
- **Download images**
```easy_response.download(keywords, output_dir="easy_images_dir", max_limit=10, image_formats={".jpg", ".jpeg", ".png"}, remove_duplicates=False)```
- ***keywords*** : *(str / dict), e.g. "dogs, cats" or {"dogs": 100, "cats": 200}, default=Required*
Keywords for which images will be downloaded.
- ***output_dir*** : *(str), default="easy_images_dir"*Output directory where images will be downloaded for each keyword.
- ***max_limit*** : *(int), default=10*Maximum number of images to download.
- ***image_formats*** : *(set), default={".jpg", ".jpeg", ".png"}*Supported image formats.
- ***remove_duplicates*** : *(boolean), default=False*Whether to remove duplicate images or not while downloading. Set remove_duplicates=True to remove duplicates.
- **Post processing on images**
```easy_response.post_processing(image_dir, remove_duplicates=False, resize=None, grayscale=False, avg_image_size=False)```
- ***image_dir*** : *(str), e.g. "easy_images/dogs", default=Required*
Directory name from where duplicate images need to be removed.
- ***remove_duplicates*** : *(boolean), default=False*Whether to remove duplicate images from a directory. Set remove_duplicates=True to remove.
- ***resize*** : *(tuple), e.g (200 x 200), default=None*Image size to resize. If resize is equal to tuple of int, resize the images.
- ***grayscale*** : *(boolean), default=False*Whether to convert images in a directory, into grayscale. Set grayscale=True to convert.
- ***avg_image_size*** : *(boolean), default=False*Whether to calculate average image size of all the images in a directory. Set avg_image_size=True to calculate.
## Limitations
**Note: This script/package Will not work in Colab.**
This scripts download the images with size approximately 200 x 200. This is because Google allows to download the images with rendered size only. Only few images can be downloaded with original image size. The original urls of the image are encrypted and with the encryption, image size is changed to a particular size which is lesser than the original image size.
Please share your ideas to overcome these limitations. Let's together build a beautiful python script that can help lots of people.
## Next Steps
Following the next steps to improve the script:
- Find a method to download the images with original size.
- Build the script without selenium for fast downloading. Selenium is a bit slower.
- Add image similarity factor so that more relevant images can be downloaded.
- Optimize the overall script with additional functionalities for faster downloading of images.
- Add some more generic OpenCV functionalities. Please share you ideas if you got some.**Everyone is welcome to contribute to this script. If you want to contribute please write me on [Linkedin](https://www.linkedin.com/in/mohdsaqibhbi) or [Email]([email protected]) me.**
## Disclaimer
This Python script allows you to download hundreds of Google images. Please do not download or use any image whose copyright has been violated. Google indexes images and makes them searchable. It does not create its own images, and as a result, none of them are protected by copyright. The original creator of the image owns the copyrights.## LICENSE
This project is licensed under the terms of the [MIT license](LICENSE).## Follow me
- Follow me on Linkedin: [mohdsaqibhbi](https://www.linkedin.com/in/mohdsaqibhbi)
- Subscribe my Youtube channel: [StarrAI](https://www.youtube.com/channel/UCooZBjTCrnM3LH1nIqAmDQA)