https://github.com/milesmcc/app-scraper
Scraping tool for UCSB's American Presidency Project
https://github.com/milesmcc/app-scraper
Last synced: 3 months ago
JSON representation
Scraping tool for UCSB's American Presidency Project
- Host: GitHub
- URL: https://github.com/milesmcc/app-scraper
- Owner: milesmcc
- License: gpl-3.0
- Created: 2019-03-28T02:26:21.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2019-04-01T14:37:51.000Z (about 6 years ago)
- Last Synced: 2025-04-09T22:55:20.687Z (3 months ago)
- Language: Python
- Size: 27.3 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# American Presidency Project Scraper
Unfortunately, no suitable tool exists for exporting search results from UCSB's database of presidential documents for aggregate use. This repository provides a collection of Python 3 scripts that, together, constitute a scraping tool for UCSB's [American Presidency Project](https://www.presidency.ucsb.edu) (APP).Before using these scripts, note the following:
> * Scraping can put significant load on APP's servers. Choose a reasonable query delay.
> * Scraping is not a future-proof way of gathering data. These tools may break at any time.### How To Use
Use `gather_documents.py` to scrape documents from a text file containing their URLs (one per line) located at `./documents.txt`. Use `gather_search_results.py` to scrape the search results page.
See each script's internal documentation for more information.
---
This tool is licensed under the General Public License v3. Created by Miles McCain for research with Sarah Kreps.