{"id":30949244,"url":"https://github.com/petrbouchal/czso","last_synced_at":"2025-09-11T03:05:26.559Z","repository":{"id":46864967,"uuid":"235595701","full_name":"petrbouchal/czso","owner":"petrbouchal","description":"Use Open Data from the Czech Statistical Office in R","archived":false,"fork":false,"pushed_at":"2025-09-09T21:37:00.000Z","size":2810,"stargazers_count":13,"open_issues_count":20,"forks_count":1,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-09-10T01:02:42.735Z","etag":null,"topics":["czech-republic","czech-statistical-office","czso","dataset","open-data","r","rstats","rstats-package","statistics"],"latest_commit_sha":null,"homepage":"https://petrbouchal.xyz/czso","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/petrbouchal.png","metadata":{"files":{"readme":"README.Rmd","changelog":"NEWS.md","contributing":".github/CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2020-01-22T14:53:07.000Z","updated_at":"2025-09-09T21:34:29.000Z","dependencies_parsed_at":"2025-06-21T10:28:51.523Z","dependency_job_id":"c1eee5cc-6ae2-4b67-bf25-acf9f12ba2b1","html_url":"https://github.com/petrbouchal/czso","commit_stats":{"total_commits":251,"total_committers":2,"mean_commits":125.5,"dds":0.01195219123505975,"last_synced_commit":"e9a9cb4a60f8e760e050dd722d87d255feafb122"},"previous_names":[],"tags_count":22,"template":false,"template_full_name":null,"purl":"pkg:github/petrbouchal/czso","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/petrbouchal%2Fczso","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/petrbouchal%2Fczso/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/petrbouchal%2Fczso/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/petrbouchal%2Fczso/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/petrbouchal","download_url":"https://codeload.github.com/petrbouchal/czso/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/petrbouchal%2Fczso/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":274572698,"owners_count":25310060,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-11T02:00:13.660Z","response_time":74,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["czech-republic","czech-statistical-office","czso","dataset","open-data","r","rstats","rstats-package","statistics"],"created_at":"2025-09-11T03:04:57.287Z","updated_at":"2025-09-11T03:05:26.547Z","avatar_url":"https://github.com/petrbouchal.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\noutput: github_document\n---\n\n\u003c!-- README.md is generated from README.Rmd. Please edit that file --\u003e\n\n```{r, include = FALSE}\nknitr::opts_chunk$set(\n  collapse = TRUE,\n  comment = \"#\u003e\",\n  fig.path = \"man/figures/README-\",\n  out.width = \"100%\"\n)\n```\n\n# czso \u003cimg src='man/figures/logo.png' align=\"right\" height=\"138\" /\u003e\n\n\u003c!-- badges: start --\u003e\n[![CRAN status](https://www.r-pkg.org/badges/version/czso)](https://CRAN.R-project.org/package=czso)\n[![CRAN downloads](https://cranlogs.r-pkg.org/badges/grand-total/czso)](https://CRAN.R-project.org/package=czso)\n[![CRAN monthly downloads](https://cranlogs.r-pkg.org/badges/last-month/czso)](https://CRAN.R-project.org/package=czso)\n[![Lifecycle: maturing](https://img.shields.io/badge/lifecycle-maturing-blue.svg)](https://lifecycle.r-lib.org/articles/stages.html)\n[![Mentioned in Awesome Official Statistics ](https://awesome.re/mentioned-badge.svg)](https://github.com/SNStatComp/awesome-official-statistics-software)\n[![R-CMD-check](https://github.com/petrbouchal/czso/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/petrbouchal/czso/actions/workflows/R-CMD-check.yaml)\n\u003c!-- badges: end --\u003e\n\nThe goal of czso is to provide direct, programmatic, hassle-free access from R to open data provided by the Czech Statistical Office (CZSO).\n\nThis is done by\n\n1. **providing direct access from R to the catalogue of open CZSO datasets**, eliminating the hassle from data discovery. Normally this is done done through [the CZSO's product catalogue](https://www.czso.cz/csu/czso/otevrena-data-v-katalogu-produktu-csu) which is unfortunately a bit clunky, or [data.gov.cz](https://data.gov.cz), which is not a natural starting point for many. \n\n2. **providing a function to load a specific dataset to R** directly from the CZSO's datastore, eliminating the friction of copying a URL, downloading, unzipping etc.\n\nAdditionally, the package provides access to metadata on datasets and to codelists (číselníky) as a special case of datasets listed in the catalogue.\n\n## Installation\n\nYou can install the package from CRAN:\n\n``` r\ninstall.packages(\"czso\")\n```\n\n\nYou can install the latest in-development release from [github](https://github.com/petrbouchal/czso) with:\n\n``` r\nremotes::install_github(\"petrbouchal/czso\", ref = github_release())\n```\n\nor the latest version with:\n\n``` r\nremotes::install_github(\"petrbouchal/czso\")\n```\n\nI also keep binaries in a `drat` repo, which you can access by\n\n```\ninstall.packages(\"czso\", repos = \"https://petrbouchal.xyz/drat\")\n```\n\n\n## Example\n\nSay you are looking for a dataset whose title refers to wages (mzda/mzdy):\n\nFirst, retrieve the list of available CZSO datasets:\n\n```{r example-catalogue}\nlibrary(czso)\nsuppressPackageStartupMessages(library(dplyr))\nsuppressPackageStartupMessages(library(stringr))\n\ncatalogue \u003c- czso_get_catalogue()\n```\n\nNow search for your terms of interest in the dataset titles:\n\n```{r example-filter}\ncatalogue %\u003e% \n  filter(str_detect(title, \"[Mm]zd[ay]\")) %\u003e% \n  select(dataset_id, title, description)\n```\n\nYou could also search in descriptions or keywords which are also retrieved into the catalogue.\n\nWe can see the `dataset_id` for the required dataset - now use it to get the dataset:\n\n```{r example-cont}\nczso_get_table(\"110080\")\n```\n\nYou can retrieve the schema for the dataset:\n\n```{r example-schema}\nczso_get_table_schema(\"110080\")\n```\n\nand download the documentation in PDF:\n\n```{r example-doc}\nczso_get_dataset_doc(\"110080\", action = \"download\", format = \"pdf\")\n```\n\nIf you are interested in linking this data to different data, you might need the NUTS codes for regions. Seeing that the lines with regional breakdown list `uzemi_cis` as `\"100\"`, you can get that codelist (číselník):\n\n```{r example-codelist}\nczso_get_codelist(100)\n```\n\nYou would then need to do a bit of manual work to join this codelist onto the data.\n\n\n### A note about \"tables\" and \"datasets\"\n\nIn the parlance of the official open data catalogue, a `dataset` can have multiple distributions (typically multiple formats of the same data). These are called resources in the internals, and manifest as tables in this package. Some metainformation is the property of a dataset (the documentation), while other - the schema - is the property of a table. Hence the function names in this package. This is to keep things organised even if the CZSO almost always provides only one table per dataset and appends new data to it over time.\n\n## Data sources\n\nThe catalogue is drawn from https://data.gov.cz through the [SPARQL endpoint](https://data.gov.cz/sparql).\n\nThe data and specific metadata is then accessed via the `package_show` endpoint of the CZSO API at (example) https://vdb.czso.cz/pll/eweb/package_show?id=290038r19.\n\n## Credit and notes\n\n- not created or endorsed by the Czech Statistical Office, though they, as well as [the open data team at the Ministry of Interior](https://data.gov.cz/) deserve credit for getting the data out there.\n- the package relies on the data.gov.cz catalogue of open data and on the CZSO's local catalogue\n- NB: The robots.txt at the domain hosting the CZSO's catalogue prohibits robots from accessing it; while this may be an inappropriate/erroneous setting for what is in essence a data API, this package tries to honor the spirit of that setting by only accessing the API once per `czso_get_table()` call, relying on a different system for `czso_get_catalogue()`. Hence, *do not use this package for harvesting large numbers of datasets from the CZSO.*\n\n### Acknowledgments\n\nThanks to @jakubklimek and @martinnecasky for [helping me figure out](https://github.com/datagov-cz/nkod/issues/19) the [SPARQL endpoint](https://data.gov.cz/sparql) on the Czech National Open Data Catalogue.\n\n### The logo\n\nAn homage to the CZSO's work in releasing its data in an open format, something that is not necessarily in its DNA.\n\nIt alludes to the shades of the country reflected in the tabular data provided, By interspersing the comma symbol into the name of the package, it refers to both integration between statistics and open data and the slight disruption that the world of statistics undergoes when that integration happens.\n\n## See also\n\nThis package takes inspiration from the packages\n\n- [eurostat](https://github.com/rOpenGov/eurostat/)\n- [OECD](https://github.com/expersso/OECD)\n\nwhich are very useful in their own right - much recommended.\n\nFor Czech geospatial data, see [CzechData](https://github.com/JanCaha/CzechData/) by [JanCaha](https://github.com/JanCaha/).\n\nFor Czech fiscal data, see [statnipokladna](https://github.com/petrbouchal/statnipokladna).\n\nFor various transparency disclosures, see [Hlídač státu](https://www.hlidacstatu.cz/) and the [{hlidacr}](https://cran.r-project.org/package=hlidacr) package.\n\nFor access to some of Prague's open geospatial data in R, see [pragr](https://github.com/petrbouchal/pragr).\n\n## Contributing / code of conduct\n\nPlease note that the 'czso' project is released with a\n  [Contributor Code of Conduct](https://petrbouchal.xyz/czso/CODE_OF_CONDUCT.html).\n  By contributing to this project, you agree to abide by its terms.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpetrbouchal%2Fczso","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpetrbouchal%2Fczso","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpetrbouchal%2Fczso/lists"}