https://github.com/miroli/popscraper

Scrape some pop
https://github.com/miroli/popscraper

Last synced: 10 days ago
JSON representation

Scrape some pop

Host: GitHub
URL: https://github.com/miroli/popscraper
Owner: miroli
License: mit
Created: 2014-10-19T13:20:52.000Z (over 10 years ago)
Default Branch: master
Last Pushed: 2014-10-19T18:20:13.000Z (over 10 years ago)
Last Synced: 2025-04-06T18:27:58.664Z (3 months ago)
Language: Python
Size: 172 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

PopScraper
==========

A Python scraper for www.popfakta.se, an ASP.NET-powered webpage. On initialization, the PopScraper object starts a session which persists cookies in subsequent requests. After each `200 OK` response, the scraper parses out the `__VIEWSTATE`, `__VIEWSTATEGENERATOR` and `__EVENTVALIDATION` parameters and passes them along in the next request.

###Installation
Download the zip file for this repository, or run

git clone [email protected]:vienno/PopScraper.git
at the command line. Then install the required packages, using pip.

pip install -r requirements.txt

###Usage
PopScraper supports search by artist, start year and end year. The results are saved to `results.csv` per default, but this can be configured with the `filename` parameter on initialization. One "page" is equivalent to 50 records.

```python
import PopScraper

# Get all records from Håkan Hellström
scraper = PopScraper(artist='Håkan Hellström', filename='hakan.csv')
scraper.fetch_all()

# Get all records from 2006 and later
scraper = PopScraper(year_start='2006', filename='2006.csv')
scraper.fetch_all()

# Use fetch instead of fetch_all to grab just the first page
scraper.fetch()
```

###Environment
PopScraper is tested with Python 2.7 on OS X Mavericks.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/miroli/popscraper

Awesome Lists containing this project

README