https://github.com/nathan-lindstedt/coal_study

ProQuest Index Scraping w/ Selenium and Beautiful Soup
https://github.com/nathan-lindstedt/coal_study

beatifulsoup climate-change coal selenium-python sql

Last synced: about 1 month ago
JSON representation

ProQuest Index Scraping w/ Selenium and Beautiful Soup

Host: GitHub
URL: https://github.com/nathan-lindstedt/coal_study
Owner: nathan-lindstedt
License: mit
Created: 2019-12-17T04:33:41.000Z (over 6 years ago)
Default Branch: master
Last Pushed: 2020-06-22T01:09:21.000Z (about 6 years ago)
Last Synced: 2025-01-02T14:12:26.590Z (over 1 year ago)
Topics: beatifulsoup, climate-change, coal, selenium-python, sql
Language: Python
Homepage:
Size: 7.75 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE.md

Awesome Lists containing this project

README

# coal_study

This project involves parsing and scraping text data on climate change and coal energy from ProQuest Congressional and ProQuest Newspaper indexes utilizing the Selenium and Beautiful Soup packages for Python.

Included in this project are four main component files: **coal_query.sql**, **congress_parser.py**, **congress_scraper.py**, and **news_parser.py**

The script **coal_query.sql** contains the PostgreSQL queries used to create the tables and views needed for storage and analysis of text data.

The script **congress_parser.py** parses text data from ProQuest Congressional and inserts the transcript date of publication and url of Congressional hearings into a PostgreSQL local host.

The script **congress_scraper.py** uses the table created by congress_parser.py to retrieve the url; scrapes the associated webpage text data from ProQuest Congressional; and inserts the document title, committee, text, and url into a PostgreSQL local host. (Note: To be able to run, this script requires ChromeDriver to be installed in its local directory. For more information, see: https://chromedriver.chromium.org/).

The script **news_parser.py** parses text data from ProQuest Newspaper and inserts the newspaper article ID and text into a PostgreSQL local host.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/nathan-lindstedt/coal_study

Awesome Lists containing this project

README