{"id":26694434,"url":"https://github.com/arutselvan/imgscrapy","last_synced_at":"2025-04-13T00:35:21.969Z","repository":{"id":45439237,"uuid":"81747442","full_name":"Arutselvan/ImgScrapy","owner":"Arutselvan","description":"A simple and fast CLI for multithreaded image scraping with support for headless scraping of dynamic websites.","archived":false,"fork":false,"pushed_at":"2023-05-22T23:27:22.000Z","size":25,"stargazers_count":3,"open_issues_count":2,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-04-13T00:34:48.854Z","etag":null,"topics":["cli","downloader","image-downloader","image-downloader-python","image-scraper","python","scrapper"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Arutselvan.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-02-12T18:32:33.000Z","updated_at":"2022-10-16T17:36:02.000Z","dependencies_parsed_at":"2022-08-31T08:52:52.630Z","dependency_job_id":null,"html_url":"https://github.com/Arutselvan/ImgScrapy","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Arutselvan%2FImgScrapy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Arutselvan%2FImgScrapy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Arutselvan%2FImgScrapy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Arutselvan%2FImgScrapy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Arutselvan","download_url":"https://codeload.github.com/Arutselvan/ImgScrapy/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248650456,"owners_count":21139670,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cli","downloader","image-downloader","image-downloader-python","image-scraper","python","scrapper"],"created_at":"2025-03-26T18:29:31.455Z","updated_at":"2025-04-13T00:35:21.949Z","avatar_url":"https://github.com/Arutselvan.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# imgscrapy\n![Version](https://img.shields.io/pypi/v/imgscrapy)\n![Downloads](https://img.shields.io/pypi/dm/imgscrapy?color=green)\n![License](https://img.shields.io/pypi/l/imgscrapy)\n\nA simple CLI image scraper written in python with support for headless scraping of dynamic websites.\n\n#### Installation\n##### Build from source\n+ `git clone https://github.com/arutselvan/ImgScrapy`\n+ `cd ImgScrapy`\n+ `python setup.py install`\n\n##### As a Python package\n```\npip install --user imgscrapy\n```\n\n#### Requirements\npython\u003e=3.6\n\n#### Usage\n```\nusage: imgscrapy [-h] [-d DIRECTORY] [-i] [-n NFIRST] [-t NTHREADS] [-hd] [-to TIMEOUT] target_url\n\nDownloads images from the given URL\n\npositional arguments:\n  target_url            URL to scrape images from\noptional arguments:\n  -h, --help            show this help message and exit\n  -d DIRECTORY, --directory DIRECTORY\n                        Directory in which images should be downloaded\n  -i, --injected        Scrape images from a dynamic website and JS injected images\n  -n NFIRST, --nfirst NFIRST\n                        Scrape the first n images\n  -t NTHREADS, --nthreads NTHREADS\n                        Maximum number of threads to use\n  -hd, --head           Open chromium for scraping JS injected source/images\n  -to TIMEOUT, --timeout TIMEOUT\n                        Timeout value for obtaining page source\n```\n#### Examples\n\n+ Download all images from a static website \n```\nimgscrapy \u003cTarget URL\u003e\n```\n+ Download the first 5 images from a dynamic website\n```\nimgscrapy \u003cTarget URL\u003e -i --nfirst 5\n```\n\n##### Note\nImgScrapy uses [pyppeteer\n](https://github.com/miyakogi/pyppeteer) which uses Chromium for headless scraping. When scraping a dynamic website for the first time, Chromium will be downloaded automatically which might take some time.\n\n#### To Do\n+ Write tests\n+ Add support for Base64 images\n+ Add support for embedded/inline svg files\n+ Fix issues with headless browsing of dynamic sites with modal/popup\n+ Fix issue with missing trailing slash in URL resolution\n+ Add option to dump URL of downloaded/failed images\n\nLicense\n----\n\nMIT\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Farutselvan%2Fimgscrapy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Farutselvan%2Fimgscrapy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Farutselvan%2Fimgscrapy/lists"}