https://github.com/uva-bi-sdad/auditor
A repository created to help double check and edit them before pushing them downstream
https://github.com/uva-bi-sdad/auditor
Last synced: about 2 months ago
JSON representation
A repository created to help double check and edit them before pushing them downstream
- Host: GitHub
- URL: https://github.com/uva-bi-sdad/auditor
- Owner: uva-bi-sdad
- Created: 2022-08-29T13:10:46.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-02-11T02:17:58.000Z (over 2 years ago)
- Last Synced: 2025-02-07T10:29:52.588Z (4 months ago)
- Language: Jupyter Notebook
- Homepage:
- Size: 1.71 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- Audit: audit.py
Awesome Lists containing this project
README
# Auditor
This repo is canonically different than the testing that happends downstream because it is more intrusive and contains code that can edit/ clean up repositories before they get pushed to the public site
1. Loops through all folders that match ```**/distribution/``` and creates a ```manifest.json``` at the root directory with the: hash, file size (bytes), and file path of each file.
2. For each file added, search if there is a ```measures_info.json``` in the same directory, and check for a string match of the measure and the file name. If there is a match, append the data match from the ```measures_info.json``` on to the element in ```manifest.json```.Thought process:
---
| Design | Pros | Cons | Mitigation
| --- | --- | --- | ---|
| One public branch, default to main, with a development branch with data | Users can pull the data directly | Developers need to switch a branch before developing | Preventing a push to main would signal them to switch branches |
|One public branch, default to development, with a distribution branch | Developers can develop on the main branch directly ||
- Users need to switch to the distribution branch to pull the repo without the extra data
- Developers can accidentally push to the main branch
|
- Can probably include a link in the README to the main branch
- Can set a restriction on who pushes to the distribution branch to prevent accidental pushing and needing to merge backwards
| Two repos both public | Developers can develop directly, and users can pull directly from the repo ||
- Users can see both branches and accidentally pull the one with data
- Two branches are created for each repository
|
- Can add in the README link to the data-less repo, or have a website that just points to the repos separately, or the name of the repo can include the type of repo it is
|Two repos, one public one private | Users can pull the data directly, and developers can develop on the main branch directly | Loses transparency | The public repo can have a development branch that is one-to-one with the main branch of the private repo |Timeline:
---
- **2022-09-14**: Updating to track md5 on all files, but not check for measures unless it matches suffixes in the settings
- **2022-09-07**: Auditor bug-fixes; split on ':' inside measure_info, and split on '.' for filename
- **2022-08-29**: Add auditor and test one repo versus two repo models
- **2022-08-23**: In the generation of the manifest, add in measure info data
- **2022-08-04**: Check the data repository is created in the right format and create a manifest.json fileNotes:
---
- Overleaf notes: https://www.overleaf.com/project/6306378e38071b727de6293eAction references:
---
- Dataverse-uploader action: https://github.com/IQSS/dataverse-uploader
- Copy files to other repos action: https://github.com/derberg/copy-files-to-other-repositories