Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/thaoshibe/crawl-original-google-images
python scripts for crawling original image from Google Images
https://github.com/thaoshibe/crawl-original-google-images
chrome-extension crawler crawling crawling-python google google-images pafy scraper youtube youtube-dl youtube-search
Last synced: 3 months ago
JSON representation
python scripts for crawling original image from Google Images
- Host: GitHub
- URL: https://github.com/thaoshibe/crawl-original-google-images
- Owner: thaoshibe
- Created: 2020-06-07T05:50:58.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2022-05-05T05:09:18.000Z (over 2 years ago)
- Last Synced: 2024-09-29T03:22:40.430Z (4 months ago)
- Topics: chrome-extension, crawler, crawling, crawling-python, google, google-images, pafy, scraper, youtube, youtube-dl, youtube-search
- Language: Python
- Homepage:
- Size: 15.6 KB
- Stars: 21
- Watchers: 3
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: readme.md
Awesome Lists containing this project
README
# Crawl Original Google Images & Youtube Videos
---
This repo contains code to crawl images and videos:
- ORIGINAL images from Google Search
- ORIGINAL videos from Youtube### Requirements
1. **ChromeDriver**
- [Check your current Google Chrome Version](https://www.businessinsider.com/what-version-of-google-chrome-do-i-have)
- Download ChromeDriver corresponding to your Chrome Version at [ChromeDriver](https://chromedriver.chromium.org/downloads), unzip it.For example, I'm using Chrome Version `95.0.4638.69`, Linux, so I downloaded [`chromedriver_linux64.zip`](https://chromedriver.storage.googleapis.com/index.html?path=95.0.4638.69/)
1. **Enviroments**
`conda env create -f environment.yml`### Crawl Images from Google Image Search
Download original (not thumbnails) from Google Images Search with **multi-threading** :D
1. Get URLs by keywords
```
python crawl_url.py
```
1. Download imgs from URLs
```
python crawl_data.py
```### Crawl Videos from Youtube
1. Get URLs by keywords
```
python crawl_youtube_link.py
```
1. Download videos from URLs
```
python crawl_videos.py
python crawl_videos.py --metadata --thumbnail # thumbnail and metadata only
```##### To-do
- [x] Init
- [x] Multithreading
- [x] Requiremets
- [x] Write Guideline
- [ ] Add parser to save_dirs, chromedriver, etc.