https://github.com/jplusplus/vantetider-scraper
A scraper of statistical data from Vantetider.se built on top of Statscraper.
https://github.com/jplusplus/vantetider-scraper
Last synced: about 1 year ago
JSON representation
A scraper of statistical data from Vantetider.se built on top of Statscraper.
- Host: GitHub
- URL: https://github.com/jplusplus/vantetider-scraper
- Owner: jplusplus
- License: mit
- Created: 2017-11-06T07:04:15.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2021-02-08T08:21:46.000Z (over 5 years ago)
- Last Synced: 2025-03-15T14:03:13.569Z (over 1 year ago)
- Language: Python
- Size: 20.5 KB
- Stars: 1
- Watchers: 8
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.rst
- License: LICENSE
Awesome Lists containing this project
README
This is a scraper for statistical data from http://www.vantetider.se built on top of the `Statscraper package `.
Install
-------
pip install -r requirements.txt
The scraper has to do a lot of requests and uses `requests-cache ` to store queries.
Example usage
-------------
.. code:: python
from vantetider import VantetiderScraper
scraper = VantetiderScraper()
scraper.items # List _implemeted_ datasets
# [, , , , , ]
dataset = scraper.get("Overbelaggning") # Get a specific dataset
# List all available dimensions
print dataset.dimensions
print datatset.regions # List available region
print datatset.years # List available years
# Make a query, you have to explicitly define all dimension values you want
# to query. By default the scraper will fetch default values.
res = dataset.fetch({
"region": "Blekinge",
"year": "2016",
"period": "Februari",
# Currenty we can only query by id of dimension value
"type_of_overbelaggning": ["0", "1"], # "Somatik" and "Psykiatri"
})
# Do something with the result
df = res.pandas
Practical application, using dataset.py for storege.
.. code:: python
from vantetider import VantetiderScraper
from vantetider.allowed_values import TYPE_OF_OVERBELAGGNING, PERIODS
import dataset
db = dataset.connect('sqlite:///vantetider.db')
TOPIC = "Overbelaggning"
# Set up local db
table = db.create_table(TOPIC)
scraper = VantetiderScraper()
dataset = scraper.get(TOPIC)
# Get all available regions and years for query
years = [x.value for x in dataset.years]
regions = [x.value for x in dataset.regions]
# Query in chunks to be able to store to database on the run
for region in regions:
for year in years:
res = dataset.fetch({
"year": year,
"type_of_overbelaggning": [x[0] for x in TYPE_OF_OVERBELAGGNING],
"period": PERIODS,
"region": region,
})
df = res.pandas
data = res.list_of_dicts
table.insert_many(data)
TODO
----
- Implement scraping of "Aterbesok", "Undersokningar", "BUPdetalj", "BUP".
- Enable querying on label names on all dimensions
- Add more allowed values to `vantetider/allowed_values.py`
- Make requests-cache optional.
Devlop
------
Run tests:
make tests