Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/xmunoz/docfest_scraper
Scrape all of the docfest events and output scraped event data as json
https://github.com/xmunoz/docfest_scraper
Last synced: about 1 month ago
JSON representation
Scrape all of the docfest events and output scraped event data as json
- Host: GitHub
- URL: https://github.com/xmunoz/docfest_scraper
- Owner: xmunoz
- Created: 2013-05-24T00:46:01.000Z (over 11 years ago)
- Default Branch: master
- Last Pushed: 2013-05-24T06:48:49.000Z (over 11 years ago)
- Last Synced: 2024-04-16T01:33:55.083Z (8 months ago)
- Language: Python
- Size: 129 KB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
SF Docfest scraper
===============[SF IndieFest](http://sfindie.com/connect/) holds an annual documentary festival called [Docfest](http://sfindie.com/festivals/sf-docfest/). Unfortunately, the schedule on their website isn't available in any kind of easily exportable format. I wrote this scraper to scrape the film screening schedule and store the data as json.
The data can then be manipulated and imported into a variety of calendar programs, but I structured it specifically for use with the [Google Calendar API](https://gist.github.com/mcmguaba/5640569).
Ultimately this was a quick and dirty project. There is literally no error handling (except for scrapy's default exception handling), and quite a few hacks. The result is a public calendar that can be viewed by adding this to your google calendar:
[email protected]Want to run the scraper for yourself?
1. Generate a valid epguid param in your browser and paste it in [here](https://github.com/mcmguaba/docfest_scraper/blob/master/docfest/spiders/doc_screenings.py#L31).
2. In that same browser window, inspect your cookie and put that value in [here](https://github.com/mcmguaba/docfest_scraper/blob/master/docfest/spiders/doc_screenings.py#L48).
3. Run the scraper from the command line anywhere inside the scrapy project directory:
root@artemis:~/docfest# scrapy crawl dfsHappy scraping!