{"id":16282958,"url":"https://github.com/hauselin/docdata","last_synced_at":"2025-07-26T08:38:03.183Z","repository":{"id":92236466,"uuid":"226588816","full_name":"hauselin/docdata","owner":"hauselin","description":"R package to generate dataset documentation semi-automatically https://hauselin.github.io/docdata/","archived":false,"fork":false,"pushed_at":"2019-12-24T18:48:43.000Z","size":278,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-04-08T18:24:48.496Z","etag":null,"topics":["data-docs","data-management","data-sharing","documentation","documentation-tool","open-science"],"latest_commit_sha":null,"homepage":"https://hauselin.github.io/docdata/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hauselin.png","metadata":{"files":{"readme":"README.Rmd","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-12-07T23:23:02.000Z","updated_at":"2019-12-24T18:48:46.000Z","dependencies_parsed_at":"2023-06-08T02:30:19.123Z","dependency_job_id":null,"html_url":"https://github.com/hauselin/docdata","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/hauselin/docdata","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hauselin%2Fdocdata","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hauselin%2Fdocdata/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hauselin%2Fdocdata/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hauselin%2Fdocdata/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hauselin","download_url":"https://codeload.github.com/hauselin/docdata/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hauselin%2Fdocdata/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":267141109,"owners_count":24041980,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-26T02:00:08.937Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-docs","data-management","data-sharing","documentation","documentation-tool","open-science"],"created_at":"2024-10-10T19:12:08.888Z","updated_at":"2025-07-26T08:38:03.158Z","avatar_url":"https://github.com/hauselin.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\noutput: github_document\n---\n\n\u003c!-- README.md is generated from README.Rmd. Please edit that file --\u003e\n\n```{r setup, include = FALSE}\nknitr::opts_chunk$set(\n  collapse = TRUE,\n  comment = \"#\u003e\",\n  fig.path = \"man/figures\",\n  out.width = \"100%\"\n)\n```\n\n# docdata\n\ndocdata is an R package that **generates documentation for datasets semi-automatically**. It streamlines the process of documenting when/where/who etc. a dataset is from. It also **standardizes documentation**. \n\nIdeally, every dataset (e.g., csv/txt file) with tabular data should have a corresponding documentation file that describes the rows and columns of that dataset and other information about the dataset. `docdata` helps you accomplish all that.\n\n`docdata` aims to make data docmentation and sharing easier. It helps you avoid being **that** person who shares data that no one else can use because nothing was documented.\n\n\u003c!-- badges: start --\u003e\n[![Travis build status](https://travis-ci.org/hauselin/docdata.svg?branch=master)](https://travis-ci.org/hauselin/docdata)\n[![AppVeyor build status](https://ci.appveyor.com/api/projects/status/github/hauselin/docdata?branch=master\u0026svg=true)](https://ci.appveyor.com/project/hauselin/docdata)\n\u003c!-- badges: end --\u003e\n\n## Examples\n\nBelow are examples of documentation generated by `docdata`:\n\n* [Data from experimental research](https://github.com/hauselin/depletion_bayes/tree/master/Data)\n* Cognitive task data in [GitHub repository](https://github.com/hauselin/depletion_bayes/blob/master/Data/stroop_single_trial.md) and as a [raw markdown file](https://raw.githubusercontent.com/hauselin/depletion_bayes/master/Data/stroop_single_trial.md)\n\n## Installation\n\nTo install the package, type the following commands into the R console:\n\n``` r\n# install.packages(\"devtools\")\ndevtools::install_github(\"hauselin/docdata\") # you might have to install devtools first (see above)\n```\n\n## How to use docdata?\n\n**Step 1: use `doc_data()` to generate a documentation (markdown file)**\n\n* Example: `doc_data(\"mtcars.csv\")` (assuming `mtcars.csv` is a dataset in your working directory.)\n\n**Step 2: use `disp_doc()` to print the doc in your console**\n\n* Example: `disp_doc(\"mtcars.csv\")` or `disp_doc(\"mtcars.md\")`\n\n**Step 3: use `doc_open()` to open the doc to edit it**\n\n* Example: `doc_open(\"mtcars.csv\")` or `doc_open(\"mtcars.md\")`\n\n**Step 4: use `doc_refresh()` to refresh/update your documentation**\n\n* Example: `doc_refresh(mtcars.csv)` or `doc_refresh(mtcars.md)`\n\n**Step 5: share your dataset and documentation file with others or your future self(!)**\n\n### Step 1: `doc_data()`\n\n`doc_data()` generates a markdown file that looks like the one shown below. If you dataset is `mtcars.csv`, the markdown  file will be named `mtcars.md` and will be located in the same directory as `mtcars.csv`. \n\nExample usage: `doc_data(\"mtcars.csv\")` (assuming `mtcars.csv` is a dataset in your working directory.)\n\n```\nA GitHub flavored Markdown textfile documenting a dataset.\n\nGenerated using [docdata package](https://hauselin.github.io/docdata/) on 2019-12-08 18:16:46.\nTo cite this package, type citations(\"docdata\") in console.\n\n## Data source\n\nmtcars.csv\n\n## About this file\n\n* What (is the data): \n* Who (generated this documentation): \n* Who (collected the data):\n* When (was the data collected): \n* Where (was the data collected):\n* How (was the data collected):\n* Why (was the data collected): \n\n## Additional information\n\n* Contact: XXX@XXX.com\n* Registration: https://osf.io\n\n## Columns\n\n* Rows: 32\n* Columns: 4\n\n| Column  | Type     | Description |\n| ------- | -------- | ----------- |\n| mpg     | numeric  |             |\n| cyl     | numeric  |             |\n| disp    | numeric  |             |\n| hp      | numeric  |             |\n\nEnd of documentation.\n\n```\n\n### Step 2: `disp_doc()`\n\n`disp_doc()` prints the documentation in your console. An example (truncated) output is shown below.\n\nExample usage: `disp_doc(\"mtcars.csv\")` or `disp_doc(\"mtcars.md\")`\n\n```\n--- DOCUMENTATION BEGIN ---\n    1     A GitHub flavored Markdown textfile documenting a dataset.\n    2     \n    3     Generated using docdata package on 2019-12-08 12:50:50.\n    4     To cite this package, type citations(\"docdata\") in console.\n    5     \n    6     ## Data source\n    7     \n    8     mtcars.csv\n    9     \n   10     ## About this file\n   ...\n--- DOCUMENTATION END ---\n```\n\n### Step 3: `doc_open()`\n\n`doc_open()` opens the documentation in R or RStudio so you can edit it and fill in the details.\n\nExample usage: `doc_open(\"mtcars.csv\")` or `doc_open(\"mtcars.md\")`\n\n### Step 4: `doc_refresh()`\n\nIf your documentation looks messy after you've edited it (especially if the description column isn't aligned), run `doc_refresh()` to clean it up. Or if the columns/rows of your dataset have changed since the last time the documentation was generated, run this function again to update your documentation, which merges your previous documentation with a refreshed/updated one.\n\nExample usage: `doc_refresh(\"mtcars.csv\")` or `doc_refresh(\"mtcars.md\")`\n\n* Before (messy)\n\n```\n| Column  | Type     | Description           |\n| ------- | -------- | --------------------- |\n| mpg     | numeric  | miles per gallon           |\n| cyl     | numeric  | number of cylinders            |\n| disp    | numeric  |       displacement (cu.in.)           |\n| fakecolumn     | numeric  | non-existent column            |\n```\n\n* After running `doc_refresh()`: spacing are cleaned and new columns are deleted/added\n\n```\n| Column  | Type     | Description            |\n| ------- | -------- | ---------------------- |\n| mpg     | numeric  | miles per gallon       |\n| cyl     | numeric  | number of cylinders    |\n| disp    | numeric  | displacement (cu.in.)  |\n| hp      | numeric  |                        |\n| drat    | numeric  |                        |\n```\n\n### Step 5: Share your dataset + documentation\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhauselin%2Fdocdata","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhauselin%2Fdocdata","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhauselin%2Fdocdata/lists"}