https://github.com/slub/entityfactspicturesharvester
a commandline command (Python3 program) that reads depiction information (images URLs) from given EntityFacts sheets (as line-delimited JSON records) and retrieves and stores the pictures and thumbnails contained in this information
https://github.com/slub/entityfactspicturesharvester
command-line-tool dnb entityfacts entityfacts-sheets gnd json line-delimited-json pictures python thumbnails wikimedia-commons
Last synced: 14 days ago
JSON representation
a commandline command (Python3 program) that reads depiction information (images URLs) from given EntityFacts sheets (as line-delimited JSON records) and retrieves and stores the pictures and thumbnails contained in this information
- Host: GitHub
- URL: https://github.com/slub/entityfactspicturesharvester
- Owner: slub
- License: apache-2.0
- Created: 2019-08-14T14:32:44.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2019-08-15T12:58:03.000Z (almost 7 years ago)
- Last Synced: 2025-01-25T07:09:04.820Z (over 1 year ago)
- Topics: command-line-tool, dnb, entityfacts, entityfacts-sheets, gnd, json, line-delimited-json, pictures, python, thumbnails, wikimedia-commons
- Language: Python
- Homepage:
- Size: 17.6 KB
- Stars: 2
- Watchers: 8
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
# entityfactspicturesharvester - EntityFacts pictures harvester
entityfactspicturesharvester is a commandline command (Python3 program) that reads depiction information (images URLs) from given [EntityFacts](https://www.dnb.de/EN/Professionell/Metadatendienste/Datenbezug/Entity-Facts/entity-facts_node.html) sheets* (as line-delimited JSON records) and retrieves and stores the pictures and thumbnails contained in this information
*) EntityFacts are "fact sheets" on entities of the Integrated Authority File ([GND](https://www.dnb.de/EN/Professionell/Standardisierung/GND/gnd_node.html)), which is provided by German National Library ([DNB](https://www.dnb.de/EN/Home/home_node.html))
## Usage
It eats EntityFacts sheets as line-delimited JSON records from *stdin*.
It retrieves and stores the pictures (/thumbnails) linked in the depiction information of the EntityFacts sheets one by one as file into the give directory.
```
entityfactspicturesharvester
optional arguments:
-h, --help show this help message and exit
```
* example:
```
example: entityfactspicturesharvester < [INPUT LINE-DELIMITED JSON FILE WITH ENTITYFACTS SHEETS]
```
### Note
Each (found) picture will be stored with the following pattern: ```image_[GND IDENTIFIER].[ORIGINAL FILE ENDING]```, e.g., ```image_116458461.jpg``` (GND identfier = 116458461; file ending = jpg)
Each (found) thumbnail will be stored with the following pattern: ```thumbnail_[GND IDENTIFIER].[ORIGINAL FILE ENDING]```, e.g., ```thumbnail_172323940.png``` (GND identfier = 172323940; file ending = png)
#### 429 responses
If you run into '429' responses ("too many requests", see, e.g., [HTTP status code 429 at httpstatuses.com](https://httpstatuses.com/429)), then you may try to reduce the number of threads of the thread pool schedulers (line 31 and 32) and/or enable (+ (optionally) setup) the time delays before emitting the picture/thumbnail URLs (line 68 and 146) and/or before doing a request (line 157).
## Run
* clone this git repo or just download the [entityfactspicturesharvester.py](entityfactspicturesharvester/entityfactspicturesharvester.py) file
* run ./entityfactspicturesharvester.py
* for a hackish way to use entityfactspicturesharvester system-wide, copy to /usr/local/bin
### Install system-wide via pip
```
sudo -H pip3 install --upgrade [ABSOLUTE PATH TO YOUR LOCAL GIT REPOSITORY OF ENTITYFACTSPICTURESHARVESTER]
```
(which provides you ```entityfactssheetsharvester``` as a system-wide commandline command)
## See Also
* [entityfactssheetsharvester](https://github.com/slub/entityfactssheetsharvester) - a commandline command (Python3 program) that retrieves EntityFacts sheets from a given CSV with GND identifiers and returns them as line-delimited JSON records
* [entityfactspicturesmetadataharvester](https://github.com/slub/entityfactspicturesmetadataharvester) - a commandline command (Python3 program) that reads depiction information (images URLs) from given EntityFacts sheets (as line-delimited JSON records) and retrieves the (Wikimedia Commons file) metadata of these pictures (as line-delimited JSON records)