https://github.com/emilyriederer/convo
R package based on "Column Names as Contracts" blog post (https://emilyriederer.netlify.app/post/column-name-contracts/)
https://github.com/emilyriederer/convo
controlled-vocabulary data-quality data-validation r-package schema-design variable-names variable-naming
Last synced: 4 months ago
JSON representation
R package based on "Column Names as Contracts" blog post (https://emilyriederer.netlify.app/post/column-name-contracts/)
- Host: GitHub
- URL: https://github.com/emilyriederer/convo
- Owner: emilyriederer
- License: other
- Created: 2020-11-18T03:51:41.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2021-10-02T17:57:34.000Z (over 3 years ago)
- Last Synced: 2024-12-01T00:33:48.351Z (4 months ago)
- Topics: controlled-vocabulary, data-quality, data-validation, r-package, schema-design, variable-names, variable-naming
- Language: R
- Homepage: https://emilyriederer.github.io/convo/
- Size: 1.2 MB
- Stars: 31
- Watchers: 4
- Forks: 1
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- jimsghstars - emilyriederer/convo - R package based on "Column Names as Contracts" blog post (https://emilyriederer.netlify.app/post/column-name-contracts/) (R)
README
# convo
[](https://www.tidyverse.org/lifecycle/#experimental)
[](https://github.com/emilyriederer/convo/actions)
[](https://codecov.io/gh/emilyriederer/convo)The goal of `convo` is to enable the creation of a a controlled vocabularly for naming columns in a relational dataset as described in my blog post [Column Names as Contracts](https://emilyriederer.netlify.app/post/column-name-contracts/). This controlled vocabularly can then be used to check a set of names for adherence, to automate documentation, and to generate data checks via the `pointblank` package.
## Installation
You can install the development version of convo from GitHub with:
``` r
devtools::install_github("emilyriederer/convo")
```## Features
### Available
- Define controlled vocabularly (a `convo`) in R or YAML including valid name stubs at different levels of the ontology and optional descriptions or validation checks
- Parse stub lists (candidate `convo`s) from a set of variables
- Evaluate if a set of names adheres to a `convo` and identify violations
- Compare `convo` objects and/or stub lists with set-like operations (union, intersect, setdiff) to identify new candidates for inclusion
- Generate a `pointblank` validation agent or YAML file from a `convo` object for data validation
- Document a dataset with network diagrams or a table### Current Limitations / Future Enhancements
- Define overall metadata for controlled vocabularly metadata such as:
+ overall descriptor string
+ human-readable names describing each level
- Richer control over levels. Currently can only evaluate starting from the front, but in the future could:
+ allow some levels to be optional
+ work from both front and the back
- Current levels are independent of one another
+ could allow for truly hierarchical ontologies where allowed level 2 stubs vary by level 1 stub used
- Current assumption is that realizations of a controlled vocabularly are all delimited by the same separator
+ to work better with filepaths, might potentially want to enable multiple types of delimeters
- Current regex support slightly unreliable. Need to better document and expand
- More aesthetic documentation (`describe_*()` functions)
- Better set operations for combining instead of overwriting full `convo` specifications (not just stub lists)
- Explore integration with `dm` package to validate names across a schema## Example
Main pieces of functionality are illustrated in the [Quick Start Guide](https://emilyriederer.github.io/convo/articles/quickstart-guide.html) on the package website.