https://github.com/pacificcommunity/dotstat-compare-tables
When we perform big data updates, we need a way to report the changes at indicator levels, comparing before and after. Prototype code trying to solve this problem.
https://github.com/pacificcommunity/dotstat-compare-tables
Last synced: about 2 months ago
JSON representation
When we perform big data updates, we need a way to report the changes at indicator levels, comparing before and after. Prototype code trying to solve this problem.
- Host: GitHub
- URL: https://github.com/pacificcommunity/dotstat-compare-tables
- Owner: PacificCommunity
- Created: 2025-02-02T22:26:16.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2025-02-19T05:04:33.000Z (4 months ago)
- Last Synced: 2025-02-19T06:20:31.626Z (4 months ago)
- Language: R
- Size: 48.8 KB
- Stars: 0
- Watchers: 5
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# PDH SDD GitHub's template
A general template for SDD/PDH projects, incorporating some good practices for github based development.
## Usage
To use this template create a new repository using this repository as a template. See in the top right corner of this page the green button "Use this template". Click on it and follow the instructions. This will create a new repository with the same structure as this one. Then clone the new repository to your local machine and start working on your project.
## Current status
The code is functional. In `src/script.R` you can find a usage example, where we compare the staging and production version for a variety of tables.
### known limitations
- [ ] The current version depends on comparing table across different instances of .Stat
(e.g., base and new data version can be reached through different .stat urls)
rather than different spaces (i.e., validate and disseminate).
This is possible to achieve by changing the agency field.
- [ ] The current version performs `{|dataflows| *} |indicators| * |geographies|` calls,
which is a lot if you are trying to compare many big, dense, dataflows.
It can be improved by reducing the nummber
performing the groupings at a second stage
(eventually, it can be brought down to `{|dataflows|}`API calls).
- [ ] Changes in DSD schema are not handled.
And I suspect they won't be handled that nicely if the dimensions between base and new data updates are different.
- [ ] it might be nice to offer the possibility of generating directly the `.pdf`or `.md` versions of the diff tables.
This _should_ be possible thanks to `{kblExtra}` but Windows is not playing nicely.## Folder structure
There are four main folders in this repository:
- `docs`: Contains the documentation of the project.
- `src`: Contains the source code of the project.
- `raw_data`: Contains temporary local copies of the raw data used in the project. This folder won't be uploaded to the repository.
- `output`: Contains the temporary output files generated by the project (png, pdfs, small data units). This folder won't be uploaded to the repository.## gitignore
The `.gitignore` file is configured to ignore the most common development temporary files for Python, R, and Stata. It also ignore most file formats in the `/temp/` subdirectories.