https://github.com/milesmcc/app-scraper

Scraping tool for UCSB's American Presidency Project
https://github.com/milesmcc/app-scraper

Last synced: 3 months ago
JSON representation

Scraping tool for UCSB's American Presidency Project

Host: GitHub
URL: https://github.com/milesmcc/app-scraper
Owner: milesmcc
License: gpl-3.0
Created: 2019-03-28T02:26:21.000Z (over 6 years ago)
Default Branch: master
Last Pushed: 2019-04-01T14:37:51.000Z (about 6 years ago)
Last Synced: 2025-04-09T22:55:20.687Z (3 months ago)
Language: Python
Size: 27.3 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# American Presidency Project Scraper
Unfortunately, no suitable tool exists for exporting search results from UCSB's database of presidential documents for aggregate use. This repository provides a collection of Python 3 scripts that, together, constitute a scraping tool for UCSB's [American Presidency Project](https://www.presidency.ucsb.edu) (APP).

Before using these scripts, note the following:

> * Scraping can put significant load on APP's servers. Choose a reasonable query delay.
> * Scraping is not a future-proof way of gathering data. These tools may break at any time.

### How To Use

Use `gather_documents.py` to scrape documents from a text file containing their URLs (one per line) located at `./documents.txt`. Use `gather_search_results.py` to scrape the search results page.

See each script's internal documentation for more information.

---

This tool is licensed under the General Public License v3. Created by Miles McCain for research with Sarah Kreps.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/milesmcc/app-scraper

Awesome Lists containing this project

README