Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/wragge/trove_newspaper_images
https://github.com/wragge/trove_newspaper_images
Last synced: 3 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/wragge/trove_newspaper_images
- Owner: wragge
- License: mit
- Created: 2021-10-03T06:03:58.000Z (over 3 years ago)
- Default Branch: master
- Last Pushed: 2024-04-16T05:40:39.000Z (9 months ago)
- Last Synced: 2024-09-28T00:01:06.662Z (3 months ago)
- Language: Jupyter Notebook
- Size: 8.72 MB
- Stars: 3
- Watchers: 3
- Forks: 0
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# trove-newspaper-images
## Background and alternatives
There’s no reliable way of downloading an image of a Trove newspaper
article from the web interface. The image download option produces an
HTML page with embedded images, and the article is often sliced into
pieces to fit the page.This package includes tools to download articles as complete JPEG
images. If an article is printed across multiple newspaper pages,
multiple images will be downloaded – one for each page. It’s intended
for integration into other tools and processing workflows, or for people
who like working on the command line.If you just want to quickly download an article as an image without
installing anything, you can [use this web
app](https://glam-workbench.net/trove-newspapers/#save-a-trove-newspaper-article-as-an-image)
in the GLAM Workbench. To download images of all articles returned by a
search in Trove, you can also use the [Trove Newspaper and Gazette
Harvester](https://glam-workbench.net/trove-harvester/).See the
[documentation](https://wragge.github.io/trove_newspaper_images/) for
more information.## Install
`pip install trove-newspaper-images`
## Download articles as images
### Use as a library
``` python
from trove_newspaper_images.articles import download_imagesimages = download_images('107024751')
images
```['nla.news-article107024751-11565831.jpg']
### Use from the command line
Just call `trove_newspaper_images.download` from the command line with
an article identifier. You can use the `--output_dir` parameter to
specify a directory for the downloaded images. For example:``` shell
trove_newspaper_images.download 107024751 --output_dir images
```Add the `--masked` parameter to try and remove content from neighbouring
articles.``` shell
trove_newspaper_images.download 107024751 --masked
```------------------------------------------------------------------------
Created by [Tim Sherratt](https://timsherratt.org)
([@wragge](https://twitter.com/wragge)) for the [GLAM
Workbench](https://glam-workbench.net/).