Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/kingkool68/scraping-sochi-2014-athlete-profiles
https://github.com/kingkool68/scraping-sochi-2014-athlete-profiles
Last synced: about 6 hours ago
JSON representation
- Host: GitHub
- URL: https://github.com/kingkool68/scraping-sochi-2014-athlete-profiles
- Owner: kingkool68
- Created: 2014-02-14T23:33:56.000Z (over 10 years ago)
- Default Branch: master
- Last Pushed: 2014-02-24T18:15:15.000Z (over 10 years ago)
- Last Synced: 2024-04-13T11:56:18.398Z (7 months ago)
- Language: PHP
- Size: 1.69 MB
- Stars: 1
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
There is lots of data about the 2014 Winter Olympic athletes available at http://www.sochi2014.com/en/athletes-search but it's not in a standardized format that is easy to work with. This scraper parses the athlete bio pages and stores the data in a simple database for better analysis.
This was used for generating the findings in this post http://www.pewresearch.org/fact-tank/2014/02/19/how-many-sochi-athletes-are-competing-for-a-country-that-is-not-their-birth-nation/
## I Just Want The Data ##
Check the `final data` folder for a `csv`, `json`, and a `sql dump` of the data. Or view the public [Google Doc version](https://docs.google.com/spreadsheet/ccc?key=0AqH3Ey7_dlREdEs4dk5OWHpxTnBMcDF5NkxvX1RKNnc&usp=sharing)## How To Run the Scraper ##
0. Clone this repo
0. Set-up a MySQL database
0. Edit `/config/db-config.php` with your database credentials
0. Run the `create-tables.sql` file to create the table structure for storing the data in the database
0. Run the `index.php` file by visiting it in your browser## Other Files ##
### athlete-bio-links.txt ###
This is a list of URLs to all of the athlete bios from Sochi's search engine.I got this list by running the following JavaScript in Chrome's Dev Tools console. When it's done it copies a list of URLs separated by commas to your clipboard.
```javascript
jQuery(document).ready(function($){
var iterations = 0;
var output = '';
function pushTheMoreButton() {
if( iterations < 191 ) {
$('#show-more-button a').trigger('click');
iterations++;
} else {
window.clearInterval(intervalID);
alert( 'all done!' );
$('.athletes .athlete a').each( function() {
output+= this.href + ', ';
});
copy(output);
}
}
intervalID = window.setInterval(pushTheMoreButton, 500);
});
```### athletes-search-full.html ###
This is the full web page with all of the athlete links visible. Use this if you don't want to wait for the script above to click the "More" button 191 times.Uses http://adodb.sourceforge.net/ and http://simplehtmldom.sourceforge.net/
Enjoy!