Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/scrapinghub/pycon-speakers
Speakers Spider (PyCon 2014 sprint)
https://github.com/scrapinghub/pycon-speakers
Last synced: about 2 months ago
JSON representation
Speakers Spider (PyCon 2014 sprint)
- Host: GitHub
- URL: https://github.com/scrapinghub/pycon-speakers
- Owner: scrapinghub
- Created: 2014-04-12T16:09:30.000Z (over 10 years ago)
- Default Branch: master
- Last Pushed: 2015-01-30T13:37:56.000Z (over 9 years ago)
- Last Synced: 2024-05-03T06:31:25.559Z (5 months ago)
- Language: Python
- Homepage:
- Size: 381 KB
- Stars: 30
- Watchers: 41
- Forks: 21
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Speakers Spider (PyCon 2014 sprint)
===================================This project will:
1. scrape speakers' names from archived conference websites,
2. use [SexMachine](https://pypi.python.org/pypi/SexMachine/) to infer gender, and
3. plot gender ratios for different conferences over time.The Scrapy team have built a spider that scrapes information about speakers at Python conferences since 2011;
please see the [Scrapy installation guide](http://doc.scrapy.org/en/latest/intro/install.html) for installation instructions.To get started with the sprint:
1. Pick a currently-active conference that hasn't yet been scraped and write a Scrapy Spider for that conference. You can see conferences that have been scraped already by typing `scrapy list`.
2. Create a Scrapy Spider for the conference you wish to scrape, in the pycon_speakers/spiders/ directory. It should crawl as many years of the conference as possible and extract Speaker items.
3. Test your spider
4. Submit a pull request
Other tasks:
1. Improve the gender identification in pycon_speakers/pipelines.py
2. Review crawled data and fix spiders when the data is incorrect
3. Chart resultsHere what we have for now:
![Gender Bar Chart](/gender_plot.png)Running the Scrapy Code
-----------------------List available spiders:
scrapy list
Run a spider:
scrapy crawl us.pycon.org
Run all spiders and generate a data.csv file:
run.sh
Scrapy Cloud Test Project
-------------------------See https://dash.scrapinghub.com/p/2878/
username: pycon2014
password: pycon2014