https://github.com/jplusplus/siris_scraper
A scraper of statistical data from the Siris database of Skolverket, built on top of Statscraper.
https://github.com/jplusplus/siris_scraper
Last synced: about 1 year ago
JSON representation
A scraper of statistical data from the Siris database of Skolverket, built on top of Statscraper.
- Host: GitHub
- URL: https://github.com/jplusplus/siris_scraper
- Owner: jplusplus
- License: mit
- Created: 2018-02-16T14:58:37.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2021-01-19T22:22:51.000Z (over 5 years ago)
- Last Synced: 2025-03-14T21:04:01.851Z (over 1 year ago)
- Language: Python
- Size: 39.1 KB
- Stars: 1
- Watchers: 9
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.rst
- License: LICENSE
Awesome Lists containing this project
README
This is a scraper for statistical data from the Skolverket's (http://siris.skolverket.se/siris)[SIRIS database] built on top of the `Statscraper package `.
The scraper is limited to the data avialble through http://siris.skolverket.se/siris/ris.export_stat.form
Install
-------
pip install siris_scraper
Example usage
-------------
.. code:: python
from siris.scraper import SirisScraper
# Init scraper
scraper = SirisScraper()
# List all schooltypes
verksamhetsformer = scraper.items
# [, , ,... ]
# Select a schooltype
verksamhetsform = verksamhetsformer.get_by_label(u"Öppen förskola")
# List all available datasets
datasets = verksamhetsform.items
# [...]
# Select a dataset
dataset = datasets.get_by_label("Kostnader per kommun")
# Make a query
res = dataset.fetch() # Get latest available data
#res = dataset.fetch({"period": "2015"}) # Get data for a given period
#res = dataset.fetch({"period": "*"}) # Get data all periods
# List all avilable periods
print(dataset.periods)
# Use the result
# ...in Python Pandas for example
dataframe = res.pandas
Develop
-------
Set up:
pip install -r requirements.txt
Run tests:
make tests
Deploy
------
To pypi:
python3 deploy_to_pypi.py
Todo
----
- The scraper does not handle "uttag" at the moment. Fetches latest by default.