{"id":21660303,"url":"https://github.com/slub/entityfactspicturesharvester","last_synced_at":"2026-05-18T17:44:30.387Z","repository":{"id":150196096,"uuid":"202367902","full_name":"slub/entityfactspicturesharvester","owner":"slub","description":"a commandline command (Python3 program) that reads depiction information (images URLs) from given EntityFacts sheets (as line-delimited JSON records) and retrieves and stores the pictures and thumbnails contained in this information","archived":false,"fork":false,"pushed_at":"2019-08-15T12:58:03.000Z","size":18,"stargazers_count":2,"open_issues_count":0,"forks_count":2,"subscribers_count":8,"default_branch":"master","last_synced_at":"2025-01-25T07:09:04.820Z","etag":null,"topics":["command-line-tool","dnb","entityfacts","entityfacts-sheets","gnd","json","line-delimited-json","pictures","python","thumbnails","wikimedia-commons"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/slub.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-08-14T14:32:44.000Z","updated_at":"2023-12-16T01:41:45.000Z","dependencies_parsed_at":"2023-04-20T02:46:39.123Z","dependency_job_id":null,"html_url":"https://github.com/slub/entityfactspicturesharvester","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/slub%2Fentityfactspicturesharvester","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/slub%2Fentityfactspicturesharvester/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/slub%2Fentityfactspicturesharvester/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/slub%2Fentityfactspicturesharvester/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/slub","download_url":"https://codeload.github.com/slub/entityfactspicturesharvester/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244555979,"owners_count":20471543,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["command-line-tool","dnb","entityfacts","entityfacts-sheets","gnd","json","line-delimited-json","pictures","python","thumbnails","wikimedia-commons"],"created_at":"2024-11-25T09:32:46.015Z","updated_at":"2026-05-18T17:44:25.342Z","avatar_url":"https://github.com/slub.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# entityfactspicturesharvester - EntityFacts pictures harvester\n\nentityfactspicturesharvester is a commandline command (Python3 program) that reads depiction information (images URLs) from given [EntityFacts](https://www.dnb.de/EN/Professionell/Metadatendienste/Datenbezug/Entity-Facts/entity-facts_node.html) sheets* (as line-delimited JSON records) and retrieves and stores the pictures and thumbnails contained in this information\n\n*) EntityFacts are \"fact sheets\" on entities of the Integrated Authority File ([GND](https://www.dnb.de/EN/Professionell/Standardisierung/GND/gnd_node.html)), which is provided by German National Library ([DNB](https://www.dnb.de/EN/Home/home_node.html))\n\n## Usage\n\nIt eats EntityFacts sheets as line-delimited JSON records from *stdin*.\n\nIt retrieves and stores the pictures (/thumbnails) linked in the depiction information of the EntityFacts sheets one by one as file into the give directory.\n\n```\nentityfactspicturesharvester\n\noptional arguments:\n  -h, --help                           show this help message and exit\n```\n\n* example:\n    ```\n    example: entityfactspicturesharvester \u003c [INPUT LINE-DELIMITED JSON FILE WITH ENTITYFACTS SHEETS]\n    ```\n\n### Note\n\nEach (found) picture will be stored with the following pattern: ```image_[GND IDENTIFIER].[ORIGINAL FILE ENDING]```, e.g., ```image_116458461.jpg``` (GND identfier = 116458461; file ending = jpg)\n \nEach (found) thumbnail will be stored with the following pattern: ```thumbnail_[GND IDENTIFIER].[ORIGINAL FILE ENDING]```, e.g., ```thumbnail_172323940.png``` (GND identfier = 172323940; file ending = png)\n\n#### 429 responses\n\nIf you run into '429' responses (\"too many requests\", see, e.g., [HTTP status code 429 at httpstatuses.com](https://httpstatuses.com/429)), then you may try to reduce the number of threads of the thread pool schedulers (line 31 and 32) and/or enable (+ (optionally) setup) the time delays before emitting the picture/thumbnail URLs (line 68 and 146) and/or before doing a request (line 157).\n\n## Run\n\n* clone this git repo or just download the [entityfactspicturesharvester.py](entityfactspicturesharvester/entityfactspicturesharvester.py) file\n* run ./entityfactspicturesharvester.py\n* for a hackish way to use entityfactspicturesharvester system-wide, copy to /usr/local/bin\n\n### Install system-wide via pip\n\n```\nsudo -H pip3 install --upgrade [ABSOLUTE PATH TO YOUR LOCAL GIT REPOSITORY OF ENTITYFACTSPICTURESHARVESTER]\n```\n(which provides you ```entityfactssheetsharvester``` as a system-wide commandline command)\n\n## See Also\n\n* [entityfactssheetsharvester](https://github.com/slub/entityfactssheetsharvester) - a commandline command (Python3 program) that retrieves EntityFacts sheets from a given CSV with GND identifiers and returns them as line-delimited JSON records\n* [entityfactspicturesmetadataharvester](https://github.com/slub/entityfactspicturesmetadataharvester) - a commandline command (Python3 program) that reads depiction information (images URLs) from given EntityFacts sheets (as line-delimited JSON records) and retrieves the (Wikimedia Commons file) metadata of these pictures (as line-delimited JSON records)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fslub%2Fentityfactspicturesharvester","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fslub%2Fentityfactspicturesharvester","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fslub%2Fentityfactspicturesharvester/lists"}