Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/freedomofpress/gotham-grabber
https://github.com/freedomofpress/gotham-grabber
Last synced: 7 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/freedomofpress/gotham-grabber
- Owner: freedomofpress
- License: mit
- Created: 2017-11-08T01:15:01.000Z (about 7 years ago)
- Default Branch: main
- Last Pushed: 2024-02-24T20:29:30.000Z (9 months ago)
- Last Synced: 2024-02-24T21:34:34.129Z (9 months ago)
- Language: Python
- Homepage:
- Size: 59.6 KB
- Stars: 39
- Watchers: 7
- Forks: 7
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# gotham grabber
`gotham-grabber` is a set of scripts originally written to take the URL of a writer page on a site in the Gothamist/DNAinfo network and produce a collection of attractive PDFs of each article. It was created after the sites were abruptly shut down on Thursday, November 2, 2017. The former editor-in-chief of LAist, one of the sites in the Gothamist network, has [written about the significance of that shutdown](https://www.citylab.com/life/2017/11/gothamist-dnainfo-joe-ricketts-shutdown/545069/).
Since the project's inception, the scripts have been expanded to support author pages from the following news sites:
- Gothamist (and other sites in the -ist network)
- DNAinfo
- LA Weekly
- Newsweek
- KinjaAn outer Python script, `gothamgrabber.py`, takes an author page URL as an argument with the flag `--url`, creates a directory in the `out` subfolder where it runs, and saves a list of article URLs. (If that list of URLs already exists, `gotham-grabber.py` can take it as input, using the `-t` or `--textfile` option.) It then invokes `grabber.js`, a node script that drives a headless Chrome instance to capture and format articles as PDFs.
`grabber.js` can be invoked independently. It requires an argument with the flag `--url` and accepts an argument with the flag `--outdir`.
Each script requires installation. To install, clone this repo and run:
```bash
npm install
pip install -r requirements.txt
```The scripts should then be ready to run.