Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/xmunoz/docfest_scraper

Scrape all of the docfest events and output scraped event data as json
https://github.com/xmunoz/docfest_scraper

Last synced: about 1 month ago
JSON representation

Scrape all of the docfest events and output scraped event data as json

Awesome Lists containing this project

README

        

SF Docfest scraper
===============

[SF IndieFest](http://sfindie.com/connect/) holds an annual documentary festival called [Docfest](http://sfindie.com/festivals/sf-docfest/). Unfortunately, the schedule on their website isn't available in any kind of easily exportable format. I wrote this scraper to scrape the film screening schedule and store the data as json.

The data can then be manipulated and imported into a variety of calendar programs, but I structured it specifically for use with the [Google Calendar API](https://gist.github.com/mcmguaba/5640569).

Ultimately this was a quick and dirty project. There is literally no error handling (except for scrapy's default exception handling), and quite a few hacks. The result is a public calendar that can be viewed by adding this to your google calendar:
[email protected]

Want to run the scraper for yourself?

1. Generate a valid epguid param in your browser and paste it in [here](https://github.com/mcmguaba/docfest_scraper/blob/master/docfest/spiders/doc_screenings.py#L31).
2. In that same browser window, inspect your cookie and put that value in [here](https://github.com/mcmguaba/docfest_scraper/blob/master/docfest/spiders/doc_screenings.py#L48).
3. Run the scraper from the command line anywhere inside the scrapy project directory:

root@artemis:~/docfest# scrapy crawl dfs

Happy scraping!