Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/living-with-machines/hmd_newspaper_dl
Bulk download British Library Heritage Made Digital Newspapers 📰
https://github.com/living-with-machines/hmd_newspaper_dl
dataset glam nbdev newspapers
Last synced: 26 days ago
JSON representation
Bulk download British Library Heritage Made Digital Newspapers 📰
- Host: GitHub
- URL: https://github.com/living-with-machines/hmd_newspaper_dl
- Owner: Living-with-machines
- License: mit
- Created: 2021-02-23T09:30:49.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2022-06-16T14:54:12.000Z (over 2 years ago)
- Last Synced: 2024-07-30T18:48:41.139Z (6 months ago)
- Topics: dataset, glam, nbdev, newspapers
- Language: Jupyter Notebook
- Homepage:
- Size: 220 KB
- Stars: 3
- Watchers: 3
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
# hmd_download
> Bulk download Heritage Made Digital digitised newspapers from the British Library Research RepositoryThis [command line tool](https://en.wikipedia.org/wiki/Command-line_interface) is intended to make it easy to bulk download [Heritage Made Digital Newspapers](https://bl.iro.bl.uk/collections/353c908d-b495-4413-b047-87236d2573e3?utf8=%E2%9C%93&sort=score+desc%2C+system_create_dtsi+desc&per_page=100&locale=en) from the [British Library](https://www.bl.uk/) [Research Repository](https://bl.iro.bl.uk/).
The tool has been used by Living with Machines but may be of use to other people. Since the tool is intended to download the collection in 'bulk' it is likely to be useful if you either want:
- all HMD newspapers
- a random sample i.e. 10 newspaperThis tool was developed for internal use so it might not be suitable for your needs. If you have problems or suggestions with the tool please [open an issue](https://github.com/Living-with-machines/hmd_newspaper_dl/issues/new/choose).
## Install
The tool was developed using `nbdev` so although all of the code for this tool lives inside a single Jupyter notebook you can still install it as a Python package. At the moment this is done via GitHub:
```bash
python -m pip install git+https://github.com/Living-with-machines/hmd_newspaper_dl
```It is recommened to install the package insdide a virtual environment. Since this is a command line tool one simple option for installing is [pipx](https://pypa.github.io/pipx/) which will install the tool inside a new virtual environment for you:
```bash
pipx install git+https://github.com/Living-with-machines/hmd_newspaper_dl
```## How to use
Once you have installed the packaghe you will also have made available a console script `hmd_download`:
usage: hmd_download [-h] [--n_threads N_THREADS] [--subset SUBSET] save_dir
Download HMD newspaper from iro to `save_dir` using `n_threads`
positional arguments:
save_dir Output Directory
optional arguments:
-h, --help show this help message and exit
--n_threads N_THREADS Number threads to use (default: 8)
--subset SUBSET Download subset of HMDThis will by default download all available newspaper titles. If you just want a subset you can pass in a subset parameter to specify how many titles you want. At the moment this is just a random selection.
## Feedback
This tool was put together for internal Living with Machines but is shared in case it is helpful for other people. If you have feedback, problems or want to suggest changes please open a [new issue](https://github.com/Living-with-machines/hmd_newspaper_dl/issues/new/choose).