https://github.com/codeforboston/cornerwise-scrapers
https://github.com/codeforboston/cornerwise-scrapers
Last synced: 3 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/codeforboston/cornerwise-scrapers
- Owner: codeforboston
- Created: 2017-09-27T02:39:09.000Z (over 8 years ago)
- Default Branch: main
- Last Pushed: 2022-12-08T11:48:32.000Z (over 3 years ago)
- Last Synced: 2025-10-08T18:46:32.359Z (8 months ago)
- Language: Python
- Size: 1.94 MB
- Stars: 0
- Watchers: 4
- Forks: 0
- Open Issues: 6
-
Metadata Files:
- Readme: README.org
Awesome Lists containing this project
README
* Cornerwise Scrapers
** Introduction
Implementation of the "standard" [[https://github.com/codeforboston/cornerwise][Cornerwise]] scrapers using [[https://aws.amazon.com/lambda/][AWS Lambda]]
infrastructure.
** Setup
- Install ~Node.js~, preferably using [[https://github.com/creationix/nvm#installation][NVM]].
- ~npm install serverless~
Installs [[https://serverless.com][Serverless]], a command line utility that simplifies the deployment
of services that run on AWS Lambda, Azure Functions, or others.
- ~npm install --save serverless-python-requirements~
Installs a Serverless plugin that will download the PIP requirements
specified in ~requirements.txt~ before deploying to AWS. (Note: you need
Docker installed for this to run correctly.)
** Deploying
- To deploy to AWS, you'll need to set up an AWS account, if you haven't
already. You should also configure a ~cornerwise~ profile in your AWS
credentials. See [[https://serverless.com/framework/docs/providers/aws/guide/credentials/][here]] for details about setting up a profile and the
privileges the AWS user requires
- Copy ~credentials.example.json~ to ~credentials.json~ and modify the
variables to use your Socrata credentials.
- If everything is correctly configured, you should be able to ~cd~ to this
directory and type ~serverless deploy -v~ to fully deploy the lambda
function and corresponding API Gateway interface to AWS.
* Scrapers
** Somerville, MA Reports and Decisions
- URL :: https://scraper.cornerwise.org/somervillema
- Types :: Cases
- Source :: [[./somervillema.py][somervillema.py]]
- Description :: Scrapes the OSPCD's [[https://www.somervillema.gov/departments/ospcd/planning-and-zoning/reports-and-decisions][Reports and Decisions]] page.
** Somerville, MA PB/ZBA Event Scraper
- URL :: https://scraper.cornerwise.org/somervillema_events
- Types :: Events
- Source :: [[file:somervillema_events.py][somervillema_events.py]]
- Description :: Scrapes the city's [[https://www.somervillema.gov/event-documents][events]] page, finds events for the
Planning Board and Zoning Board of Appeals, and scrapes the
attached Agenda for related case numbers.
** Cambridge, MA
- URL :: https://scraper.cornerwise.org/cambridgema
- Types :: Cases
- Source :: [[./cambridgema.py][cambridgema.py]]
** Somerville, MA Capital Projects
- URL :: https://scraper.cornerwise.org/somerville_projects
- Source :: [[./somervillema_projects.py][somervillema_projects.py]]
- Types :: Projects
- Description :: Published annually by the Somerville Capital Projects
Committee, the dataset [[https://data.somervillema.gov/Finance/Capital-Investment-Plan-Projects-FY16-26/wz6k-gm5k][includes]] "infrastructure projects,
building improvements, park redesigns, and equipment
purchases."
** Green Line Extension
- URL :: https://scraper.cornerwise.org/greenline
- Types :: Events
- Description :: Scrapes the "Upcoming Meetings" section of the
[[http://greenlineextension.org/][Green Line Extension]] home page.
* Using the Scrapers
The interface to the scrapers is intentionally simple. Place a GET request to
the scraper's URL. You may optionally supply a ~since~ query parameter
formatted as ~yyyymmdd~. The scraper will respond with JSON conforming to the
[[http://lbovet.github.io/docson/index.html#https://raw.githubusercontent.com/codeforboston/cornerwise/master/docs/scraper-schema.json][Cornerwise scraper schema]].