Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/onyazuka/cool-images-scraper
php images scraper
https://github.com/onyazuka/cool-images-scraper
image images php scraper web
Last synced: 10 days ago
JSON representation
php images scraper
- Host: GitHub
- URL: https://github.com/onyazuka/cool-images-scraper
- Owner: onyazuka
- License: mit
- Created: 2019-05-17T14:37:15.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2022-06-21T17:50:19.000Z (over 2 years ago)
- Last Synced: 2023-03-21T10:58:49.493Z (almost 2 years ago)
- Topics: image, images, php, scraper, web
- Language: PHP
- Size: 75.2 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Cool-Images-Scraper
php images scraper## Description
Php script for scraping web images.## Usage
The script accepts one argument(in argv) - options file(.json):main.php options.json
## Options
- urls [array], MANDATORY - list of urls to scrap from;
- outputDir [string or array], MANDATORY - directory or directories list, in which downloaded images will be stored;
- recursive [boolean] - if set, walks by all available urls (in a href=...), if not, grabs images only from this page;
- depth [integer] - maximum level of recursion;
- whiteList [array of regexp] - patterns of allowed urls;
- blackList [array of regexp] - patterns of disallowed urls;
- path [array of regex] - pattern of scraper's work path. If set, the scraper works in recursive mode, with depth = count($options['path']).
So, on the first level of recursion, path[0] is used, on the second - path[1]...;
- imageNamePatters [array of regex] - array of patterns of images' names, that should be saved;
- maxImageSize [numeric] - if an image file don't satisfies this condition, it will not be saved;
- minImageSize [numeric] - if an image file don't satisfies this condition, it will not be saved;
- fileRewrite [bool] - if set and true, rewrites file in 'outputDir' on conflict, else skips new file, preserving old;
- createDirIfNotExists [string] - ONE OF 'simple' or 'recursive', creates outputDir if it not exists;
- initCookies[array or string] - either key-value array, or string in format "remixflash=32.0.0; remixscreen_depth=24;";
- cookieJar [string] - file name in which session cookies will be stored;
- additionalHeaders [array] - must be in format array('Content-type: text/plain', 'Content-length: 100').## Examples
You can view the examples of configuration files at examples/options%.json.## Todo
Multithread, JS support(if this can be done).