Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/hauselin/docdata
R package to generate dataset documentation semi-automatically https://hauselin.github.io/docdata/
https://github.com/hauselin/docdata
data-docs data-management data-sharing documentation documentation-tool open-science
Last synced: 29 days ago
JSON representation
R package to generate dataset documentation semi-automatically https://hauselin.github.io/docdata/
- Host: GitHub
- URL: https://github.com/hauselin/docdata
- Owner: hauselin
- License: other
- Created: 2019-12-07T23:23:02.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2019-12-24T18:48:43.000Z (about 5 years ago)
- Last Synced: 2024-11-05T08:42:33.510Z (3 months ago)
- Topics: data-docs, data-management, data-sharing, documentation, documentation-tool, open-science
- Language: R
- Homepage: https://hauselin.github.io/docdata/
- Size: 271 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.Rmd
- License: LICENSE
Awesome Lists containing this project
README
---
output: github_document
---```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures",
out.width = "100%"
)
```# docdata
docdata is an R package that **generates documentation for datasets semi-automatically**. It streamlines the process of documenting when/where/who etc. a dataset is from. It also **standardizes documentation**.
Ideally, every dataset (e.g., csv/txt file) with tabular data should have a corresponding documentation file that describes the rows and columns of that dataset and other information about the dataset. `docdata` helps you accomplish all that.
`docdata` aims to make data docmentation and sharing easier. It helps you avoid being **that** person who shares data that no one else can use because nothing was documented.
[![Travis build status](https://travis-ci.org/hauselin/docdata.svg?branch=master)](https://travis-ci.org/hauselin/docdata)
[![AppVeyor build status](https://ci.appveyor.com/api/projects/status/github/hauselin/docdata?branch=master&svg=true)](https://ci.appveyor.com/project/hauselin/docdata)## Examples
Below are examples of documentation generated by `docdata`:
* [Data from experimental research](https://github.com/hauselin/depletion_bayes/tree/master/Data)
* Cognitive task data in [GitHub repository](https://github.com/hauselin/depletion_bayes/blob/master/Data/stroop_single_trial.md) and as a [raw markdown file](https://raw.githubusercontent.com/hauselin/depletion_bayes/master/Data/stroop_single_trial.md)## Installation
To install the package, type the following commands into the R console:
``` r
# install.packages("devtools")
devtools::install_github("hauselin/docdata") # you might have to install devtools first (see above)
```## How to use docdata?
**Step 1: use `doc_data()` to generate a documentation (markdown file)**
* Example: `doc_data("mtcars.csv")` (assuming `mtcars.csv` is a dataset in your working directory.)
**Step 2: use `disp_doc()` to print the doc in your console**
* Example: `disp_doc("mtcars.csv")` or `disp_doc("mtcars.md")`
**Step 3: use `doc_open()` to open the doc to edit it**
* Example: `doc_open("mtcars.csv")` or `doc_open("mtcars.md")`
**Step 4: use `doc_refresh()` to refresh/update your documentation**
* Example: `doc_refresh(mtcars.csv)` or `doc_refresh(mtcars.md)`
**Step 5: share your dataset and documentation file with others or your future self(!)**
### Step 1: `doc_data()`
`doc_data()` generates a markdown file that looks like the one shown below. If you dataset is `mtcars.csv`, the markdown file will be named `mtcars.md` and will be located in the same directory as `mtcars.csv`.
Example usage: `doc_data("mtcars.csv")` (assuming `mtcars.csv` is a dataset in your working directory.)
```
A GitHub flavored Markdown textfile documenting a dataset.Generated using [docdata package](https://hauselin.github.io/docdata/) on 2019-12-08 18:16:46.
To cite this package, type citations("docdata") in console.## Data source
mtcars.csv
## About this file
* What (is the data):
* Who (generated this documentation):
* Who (collected the data):
* When (was the data collected):
* Where (was the data collected):
* How (was the data collected):
* Why (was the data collected):## Additional information
* Contact: [email protected]
* Registration: https://osf.io## Columns
* Rows: 32
* Columns: 4| Column | Type | Description |
| ------- | -------- | ----------- |
| mpg | numeric | |
| cyl | numeric | |
| disp | numeric | |
| hp | numeric | |End of documentation.
```
### Step 2: `disp_doc()`
`disp_doc()` prints the documentation in your console. An example (truncated) output is shown below.
Example usage: `disp_doc("mtcars.csv")` or `disp_doc("mtcars.md")`
```
--- DOCUMENTATION BEGIN ---
1 A GitHub flavored Markdown textfile documenting a dataset.
2
3 Generated using docdata package on 2019-12-08 12:50:50.
4 To cite this package, type citations("docdata") in console.
5
6 ## Data source
7
8 mtcars.csv
9
10 ## About this file
...
--- DOCUMENTATION END ---
```### Step 3: `doc_open()`
`doc_open()` opens the documentation in R or RStudio so you can edit it and fill in the details.
Example usage: `doc_open("mtcars.csv")` or `doc_open("mtcars.md")`
### Step 4: `doc_refresh()`
If your documentation looks messy after you've edited it (especially if the description column isn't aligned), run `doc_refresh()` to clean it up. Or if the columns/rows of your dataset have changed since the last time the documentation was generated, run this function again to update your documentation, which merges your previous documentation with a refreshed/updated one.
Example usage: `doc_refresh("mtcars.csv")` or `doc_refresh("mtcars.md")`
* Before (messy)
```
| Column | Type | Description |
| ------- | -------- | --------------------- |
| mpg | numeric | miles per gallon |
| cyl | numeric | number of cylinders |
| disp | numeric | displacement (cu.in.) |
| fakecolumn | numeric | non-existent column |
```* After running `doc_refresh()`: spacing are cleaned and new columns are deleted/added
```
| Column | Type | Description |
| ------- | -------- | ---------------------- |
| mpg | numeric | miles per gallon |
| cyl | numeric | number of cylinders |
| disp | numeric | displacement (cu.in.) |
| hp | numeric | |
| drat | numeric | |
```### Step 5: Share your dataset + documentation