https://github.com/jplusplus/vantetider-scraper

A scraper of statistical data from Vantetider.se built on top of Statscraper.
https://github.com/jplusplus/vantetider-scraper

Last synced: about 1 year ago
JSON representation

A scraper of statistical data from Vantetider.se built on top of Statscraper.

Host: GitHub
URL: https://github.com/jplusplus/vantetider-scraper
Owner: jplusplus
License: mit
Created: 2017-11-06T07:04:15.000Z (over 8 years ago)
Default Branch: master
Last Pushed: 2021-02-08T08:21:46.000Z (over 5 years ago)
Last Synced: 2025-03-15T14:03:13.569Z (over 1 year ago)
Language: Python
Size: 20.5 KB
Stars: 1
Watchers: 8
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.rst
- License: LICENSE

Awesome Lists containing this project

README

          
This is a scraper for statistical data from http://www.vantetider.se built on top of the `Statscraper package `.

Install

-------

  pip install -r requirements.txt

The scraper has to do a lot of requests and uses `requests-cache ` to store queries.

Example usage

-------------

.. code:: python

  from vantetider import VantetiderScraper

  scraper = VantetiderScraper()

  scraper.items  # List _implemeted_ datasets

  # [, , , , , ]

  dataset = scraper.get("Overbelaggning")  # Get a specific dataset

  # List all available dimensions

  print dataset.dimensions

  print datatset.regions  # List available region

  print datatset.years  # List available years

  # Make a query, you have to explicitly define all dimension values you want

  # to query. By default the scraper will fetch default values.

  res = dataset.fetch({

    "region": "Blekinge",

    "year": "2016",

    "period": "Februari",

    # Currenty we can only query by id of dimension value

    "type_of_overbelaggning": ["0", "1"], # "Somatik" and "Psykiatri"

    })

  # Do something with the result

  df = res.pandas

Practical application, using dataset.py for storege.

.. code:: python

  from vantetider import VantetiderScraper

  from vantetider.allowed_values import TYPE_OF_OVERBELAGGNING, PERIODS

  import dataset

  db = dataset.connect('sqlite:///vantetider.db')

  TOPIC = "Overbelaggning"

  # Set up local db

  table = db.create_table(TOPIC)

  scraper = VantetiderScraper()

  dataset = scraper.get(TOPIC)

  # Get all available regions and years for query

  years = [x.value for x in dataset.years]

  regions = [x.value for x in dataset.regions]

  # Query in chunks to be able to store to database on the run

  for region in regions:

      for year in years:

          res = dataset.fetch({

              "year": year,

              "type_of_overbelaggning": [x[0] for x in TYPE_OF_OVERBELAGGNING],

              "period": PERIODS,

              "region": region,

              })

          df = res.pandas

          data = res.list_of_dicts

          table.insert_many(data)

TODO

----

- Implement scraping of "Aterbesok", "Undersokningar", "BUPdetalj", "BUP".

- Enable querying on label names on all dimensions

- Add more allowed values to `vantetider/allowed_values.py`

- Make requests-cache optional.

Devlop

------

Run tests:

  make tests

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/jplusplus/vantetider-scraper

Awesome Lists containing this project

README