{"id":28281185,"url":"https://github.com/louis-heraut/dataverseur","last_synced_at":"2025-10-24T21:04:22.704Z","repository":{"id":274657056,"uuid":"923621933","full_name":"louis-heraut/dataverseuR","owner":"louis-heraut","description":"🫖 A dataverse API R wrapper to enhance the deposit procedure using only R variable declarations","archived":false,"fork":false,"pushed_at":"2025-06-06T15:55:06.000Z","size":6311,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-06T16:45:35.926Z","etag":null,"topics":["data","data-repository","data-science","datascience","dataset","dataverse","dataverse-api","json","metadata","metadata-management","metadata-parser","r"],"latest_commit_sha":null,"homepage":"","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/louis-heraut.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":"codemeta.json","zenodo":null}},"created_at":"2025-01-28T15:23:11.000Z","updated_at":"2025-06-06T15:55:07.000Z","dependencies_parsed_at":null,"dependency_job_id":"f5818531-9793-4a63-93eb-60a71325cb0f","html_url":"https://github.com/louis-heraut/dataverseuR","commit_stats":null,"previous_names":["super-lou/dataverseur","louis-heraut/dataverseur"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/louis-heraut/dataverseuR","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/louis-heraut%2FdataverseuR","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/louis-heraut%2FdataverseuR/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/louis-heraut%2FdataverseuR/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/louis-heraut%2FdataverseuR/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/louis-heraut","download_url":"https://codeload.github.com/louis-heraut/dataverseuR/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/louis-heraut%2FdataverseuR/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":260251150,"owners_count":22980977,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data","data-repository","data-science","datascience","dataset","dataverse","dataverse-api","json","metadata","metadata-management","metadata-parser","r"],"created_at":"2025-05-21T11:15:50.545Z","updated_at":"2025-10-24T21:04:22.698Z","avatar_url":"https://github.com/louis-heraut.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"# dataverseuR \u003cimg src=\"https://github.com/louis-heraut/dataverseuR/blob/e50a7ce8e819978059d891105334d19b61d813d4/man/figures/logo_dataverseuR_small.png\" align=\"right\" width=160 height=160 alt=\"\"/\u003e\n\n\u003c!-- badges: start --\u003e\n[![R-CMD-check](https://github.com/louis-heraut/dataverseuR/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/louis-heraut/dataverseuR/actions/workflows/R-CMD-check.yaml)\n[![Lifecycle: stable](https://img.shields.io/badge/lifecycle-stable-green)](https://lifecycle.r-lib.org/articles/stages.html)\n![](https://img.shields.io/github/last-commit/louis-heraut/dataverseuR)\n[![Contributor Covenant](https://img.shields.io/badge/Contributor%20Covenant-2.1-4baaaa.svg)](code_of_conduct.md) \n\u003c!-- badges: end --\u003e\n\n**dataverseuR** is a Dataverse API R wrapper to enhance the deposit procedure using simplier YAML metadata file.\n\nThis project was carried out for the National Research Institute for Agriculture, Food and the Environment (Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement, [INRAE](https://agriculture.gouv.fr/inrae-linstitut-national-de-recherche-pour-lagriculture-lalimentation-et-lenvironnement)).\n\n\n## Installation\nFor the latest development version:\n``` r\nremotes::install_github(\"louis-heraut/dataverseuR\")\n```\n\n\n## Documentation\ndataverseuR has two separate components: one simplifies Dataverse API actions using simple R functions with `dplyr::tibble` formatting, and the other simplifies metadata generation, which can be complex with JSON files.\n\n\n### Authentication\nThe first step is to authenticate with the Dataverse instance. The easiest way is to use a `.env` file in your working directory.\n\n\u003e ⚠️ Warning: NEVER SHARE YOUR CREDENTIALS (for example, through a Git repository).\n\ndataverseuR has a built-in function for this step. Simply run:\n``` R\ncreate_dotenv()\n```\nA `dist.env` file will be created in your working directory. The next step is to fill in your credentials.  \nTo do this, go to your Dataverse instance and create a token. For example, for the demo Recherche Data Gouv instance with `BASE_URL`: [https://demo.recherche.data.gouv.fr](https://demo.recherche.data.gouv.fr), click on your account name, find the [API token tab](https://demo.recherche.data.gouv.fr/dataverseuser.xhtml?selectTab=apiTokenTab), and copy your token to the `API_TOKEN` variable in the `dist.env` file, like this:\n``` bash\n# .env\n\nAPI_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx\nBASE_URL=https://demo.recherche.data.gouv.fr\n\n```\nThen, rename the file to `.env`, and if you are in a Git project, add `.env` to your `.gitignore` file.  \n\nNow you should be able to use the API without issues by running this line in your working directory:\n``` R\ndotenv::load_dot_env()\n```\n\n\n### API Actions\nYou can find the full API documentation [here](https://guides.dataverse.org/en/latest/api/index.html). Not all API actions have been converted to R functions; only a subset has been included to simplify general use of the package. If you need another function, feel free to create it and open a pull request or create an issue.  \n\nBelow is a list of all available API actions.  \n\n#### General API Actions\n- `search_datasets()`: Performs a search on Dataverse, like:\n``` R\n# Find all published datasets that contain the word \"climate\" in their title \ndatasets = search_datasets(query='title:\"climate\"',\n                           publication_status=\"RELEASED\",\n                           type=\"dataset\",\n                           dataverse=\"\",\n                           n_search=1000)\n```\nThis returns:\n``` R\n\u003e datasets\n# A tibble: 73 × 28\n   name             type  url   dataset_DOI description published_at publisher\n   \u003cchr\u003e            \u003cchr\u003e \u003cchr\u003e \u003cchr\u003e       \u003cchr\u003e       \u003cchr\u003e        \u003cchr\u003e    \n 1 Plot Climate In… data… http… doi:10.154… \"Climate d… 2020-11-03T… AMAP ECO…\n 2 European climat… data… http… doi:10.154… \"This data… 2022-09-21T… Landmark…\n 3 Species plots: … data… http… doi:10.154… \"The soil … 2020-11-25T… AMAP ECO…\n 4 Data for “Inter… data… http… doi:10.577… \"“Internat… 2022-10-26T… Experime…\n 5 Tree NSC, RAP a… data… http… doi:10.154… \"Data used… 2022-01-28T… AMAP     \n 6 Soil and crop m… data… http… doi:10.577… \"This work… 2022-07-27T… CLIMASOMA\n 7 Climate effect … data… http… doi:10.577… \"It holds … 2022-12-13T… URGI     \n 8 Atmospheric cli… data… http… doi:10.154… \"The datas… 2021-06-06T… AnaEE-Fr…\n 9 Growth and annu… data… http… doi:10.154… \"The prese… 2021-09-28T… Etude_Pr…\n10 R script to gen… data… http… doi:10.577… \"Ce dossie… 2023-02-13T… Data INR…\n# ℹ 63 more rows\n# ℹ 21 more variables: citationHtml \u003cchr\u003e, identifier_of_dataverse \u003cchr\u003e,\n#   name_of_dataverse \u003cchr\u003e, citation \u003cchr\u003e, storageIdentifier \u003cchr\u003e,\n#   subjects \u003clist\u003e, fileCount \u003cint\u003e, versionId \u003cint\u003e, versionState \u003cchr\u003e,\n#   majorVersion \u003cint\u003e, minorVersion \u003cint\u003e, createdAt \u003cchr\u003e, updatedAt \u003cchr\u003e,\n#   contacts \u003clist\u003e, authors \u003clist\u003e, keywords \u003clist\u003e, publications \u003clist\u003e,\n#   producers \u003clist\u003e, geographicCoverage \u003clist\u003e, dataSources \u003clist\u003e, …\n# ℹ Use `print(n = ...)` to see more rows\n```\n\n- `create_datasets()`: Creates datasets.\n``` R\ninitialise_metadata()\nsource(metadata_path)\nres = generate_metadata()\ndataset_DOI = create_datasets(dataverse=\"\",\n                              metadata_path=res$metadata_path)\n```\nFor more information about metadata creation, see the [Metadata Generation](#metadata-generation) section.  \nThe documentation for each function is self-explanatory.  \n\n- `modify_datasets()`: Modifies dataset metadata.\n- `add_datasets_files()`: Adds files to datasets.\n- `delete_datasets_files()`: Deletes files from datasets.\n- `delete_all_datasets_files()`: Deletes all files from datasets.\n- `publish_datasets()`: Publishes datasets.\n- `delete_datasets()`: Deletes datasets.\n\n#### Dataset Information\n- `list_datasets_files()`: Lists files in datasets, like:\n``` R\ndataset_DOI = \"doi:10.57745/LNBEGZ\"\nfiles = list_datasets_files(dataset_DOI)\n```\nThis returns:\n``` R\n\u003e files\n# A tibble: 69 × 24\n   dataset_DOI        label restricted directoryLabel version datasetVersionId\n   \u003cchr\u003e              \u003cchr\u003e \u003clgl\u003e      \u003cchr\u003e            \u003cint\u003e            \u003cint\u003e\n 1 doi:10.57745/LNBE… cent… FALSE      trendEX              1           276347\n 2 doi:10.57745/LNBE… cent… FALSE      dataEX               1           276347\n 3 doi:10.57745/LNBE… data… FALSE      NA                   1           276347\n 4 doi:10.57745/LNBE… dtLF… FALSE      dataEX               1           276347\n 5 doi:10.57745/LNBE… dtLF… FALSE      trendEX              1           276347\n 6 doi:10.57745/LNBE… EGU2… FALSE      NA                   1           276347\n 7 doi:10.57745/LNBE… endL… FALSE      dataEX               1           276347\n 8 doi:10.57745/LNBE… endL… FALSE      trendEX              1           276347\n 9 doi:10.57745/LNBE… ETAL… FALSE      NA                   1           276347\n10 doi:10.57745/LNBE… meta… FALSE      NA                   1           276347\n# ℹ 59 more rows\n# ℹ 18 more variables: categories \u003clist\u003e, id \u003cint\u003e, file_DOI \u003cchr\u003e,\n#   pidURL \u003cchr\u003e, filename \u003cchr\u003e, contentType \u003cchr\u003e, filesize \u003cint\u003e,\n#   storageIdentifier \u003cchr\u003e, originalFileFormat \u003cchr\u003e,\n#   originalFormatLabel \u003cchr\u003e, originalFileSize \u003cint\u003e,\n#   originalFileName \u003cchr\u003e, UNF \u003cchr\u003e, rootDataFileId \u003cint\u003e, md5 \u003cchr\u003e,\n#   checksum \u003cdf[,2]\u003e, creationDate \u003cchr\u003e, description \u003cchr\u003e\n# ℹ Use `print(n = ...)` to see more rows\n```\n\n- `get_datasets_metadata()`: Retrieves the metadata list of datasets.  \nOnce you have the metadata as a `list` of `list`, it can be challenging to modify. To learn how to handle it using R variable formatting, refer to the [following section](#metadata-generation) about metadata generation. \n\n- `download_datasets_files()`: Downloads files from datasets.\n- `get_datasets_size()`: Retrieves the size of datasets.\n- `get_datasets_metrics()`: Retrieves metrics about datasets.\n\n#### Others\n- `convert_DOI_to_URL()`: Converts a DOI to a URL.\n\n\n### Metadata Generation\n#### Metadata Management\nThe idea behind this formalism is to create Dataverse metadata directly from a simpler, human-readable YAML file.\nThe base metadata file in Dataverse is a JSON file, which is represented by a complex nested list structure in R. To simplify this, every value entry in this JSON file (i.e., every metadata field in Dataverse) is converted into a simpler assignment in a YAML text file.\n\n```yml\n# Create metadata for the title of the future dataset in Dataverse\ntitle: Hydrological projections of discharge for the model {MODEL}\n```\n\nEvery piece of metadata must be clearly identified as such. Therefore:\n- The metadata name is precise and non-negotiable (you need to start from an [example](https://github.com/louis-heraut/dataverseuR_toolbox) or download metadata from Dataverse using the function `get_datasets_metadata()` to find a metadata name; see [Metadata Importation](#metadata-importation)).\n- Some metadata can be duplicated, such as author names, so you need to use an indented dash list format (see below).\n\nThis way, you can create a YAML file that gathers all these metadata, like this:\n\n\n``` yml\n# ░█▀▄░█▀▀▄░▀█▀░█▀▀▄░▄░░░▄░█▀▀░█▀▀▄░█▀▀░█▀▀░█░▒█░▒█▀▀▄\n# ░█░█░█▄▄█░░█░░█▄▄█░░█▄█░░█▀▀░█▄▄▀░▀▀▄░█▀▀░█░▒█░▒█▄▄▀\n# ░▀▀░░▀░░▀░░▀░░▀░░▀░░░▀░░░▀▀▀░▀░▀▀░▀▀▀░▀▀▀░░▀▀▀░▒█░▒█ _______________\n# GitHub : https://github.com/louis-heraut/dataverseuR\n# Author : Héraut, Louis\n# Affiliation : INRAE, UR RiverLy, Villeurbanne, France\n# ORCID : 0009-0006-4372-0923\n#\n# This file is a parameterization file used by the dataverseuR R\n# package to generate a metadata JSON file needed by the Dataverse API\n# to create or modify a dataset.\n\ntitle: Plane simulation trajectory for the model {MODEL}\n\nalternativeURL: https://other-datarepository.org\n\ndatasetContact:\n- datasetContactName: Locke, John\n  datasetContactAffiliation: Laboratory, Institut, Island\n  datasetContactEmail: dany.doe@institut.org\n\nauthor:\n- authorName: Locke, John\n  authorAffiliation: Laboratory, Institut, Island\n  authorIdentifierScheme: ORCID\n  authorIdentifier: xxxx-xxxx-xxxx-xxxx\n\ncontributor:\n- contributorType: Data Curator\n  contributorName: Reyes, Hugo\n  contributorAffiliation: Laboratory, Same Institut, Island\n  contributorIdentifierScheme: ORCID\n  contributorIdentifier: 4815-1623-4248-1516\n\nproducer:\n- producerName: Producer\n  producerURL: https://producer.org\n  producerLogoURL: https://producer.org/logo.png\n\ndistributor:\n- distributorName: dataverse instance\n  distributorURL: https://dataverse.org\n  distributorLogoURL: https://dataverse.org/logo.png\n\ndsDescription:\n- dsDescriptionValue: A collection of 815 simulated plane trajectories designed for\n    testing flight behavior under unusual navigational conditions. Includes data on\n    course deviations, atmospheric anomalies, and long-range displacement events.\n  dsDescriptionLanguage: English\n\nlanguage: English\n\nsubject: Earth and Environmental Sciences\n\nkeyword:\n- keywordValue: atmospheric boundary layer\n  keywordTermURL: http://opendata.inrae.fr/thesaurusINRAE/c_823\n  keywordVocabulary: INRAETHES\n  keywordVocabularyURI: http://opendata.inrae.fr/thesaurusINRAE/thesaurusINRAE\n- keywordValue: magnetic characteristic\n  keywordTermURL: http://opendata.inrae.fr/thesaurusINRAE/c_13144\n  keywordVocabulary: INRAETHES\n  keywordVocabularyURI: http://opendata.inrae.fr/thesaurusINRAE/thesaurusINRAE\n- keywordValue: plane\n\nkindOfData: Dataset\nkindOfDataOther: Flying simulation\n\ndataOrigin: simulation data\n\nsoftware:\n- softwareName: '{MODEL}'\n  softwareVersion: x\n\npublication:\n- publicationRelationType: IsSupplementTo\n  publicationCitation: futur publication\n  publicationIDType: doi\n  publicationIDNumber: doi\n  publicationURL: https://doi.org\n\nproject:\n- projectAcronym: Others Project\n  projectTitle: 'Others Project : long title'\n  projectURL: https://project.org\n\ntimePeriodCovered:\n- timePeriodCoveredStart: '2004-09-22'\n  timePeriodCoveredEnd: '2010-05-23'\n\ndepositor: Austen, Kate\n\ncountry: Fiji\n\ndateOfDeposit: '2020-03-19'\n```\n\nThis allows you to add a new author to the author list with:\n``` yml\n- authorName: Shephard, Jack\n  authorAffiliation: Laboratory, An other Institut, Island\n```\n\nThis way, you can also modify metadata containing placeholders like `{MODEL}` by using simple R code to read the YAML file as text:\n``` R\nmetadata = readLines(\"metadata.yml\")\nmetadata = gsub(\"\\\\{MODEL\\\\}\", \"AirDynamics\", metadata)\nwriteLines(metadata, \"metadata.yml\")\n```\n\nFor more complex situations, you can read and edit the YAML file inside a loop:\n``` R\nModels = c(\"AirDynamics\", \"PlaneSimulation\")\nfor (model in Models) {\n    metadata = yaml::read_yaml(\"metadata.yml\")\n    metadata$software[[1]]$softwareName = model\n    yaml::write_yaml(yml_data, paste0(\"metadata_\", model, \".yml\"))\n}\n```\n\n#### Metadata Generation Workflow\nIn order to create a dataset from scratch:\n1. Initialize a YAML metadata template\n```R\ninitialise_metadata(\"path/to/metadata.yml\")\n```\n2. Modify the file as seen above\n3. Generate the JSON file\n```R\nmetadata_json_path = generate_metadata_json(\"path/to/metadata.yml\")\n```\n\nYou can now import this metadata JSON file to a Dataverse instance using the `create_datasets()` function mentioned earlier in the [General API Actions](#general-api-actions) section.\n\n#### Metadata Importation\nOtherwise, you can retrieve metadata from an existing dataset on Dataverse using `get_datasets_metadata()`. This imports the JSON equivalent of the metadata. From here, you can convert this JSON formatting to a YAML metadata file using the `convert_metadata_to_yml()` function:\n``` R\ndataset_DOI = \"doi:10.57745/LNBEGZ\"\nmetadata_json_path = \"metadata.json\"\nget_datasets_metadata(dataset_DOI, metadata_json_path)\nmetadata_yml_path = convert_metadata_to_yml(metadata_json_path)\n```\n\n\n### Workflow Examples\nA dedicated repository provides use cases in [dataverseuR_toolbox](https://github.com/louis-heraut/dataverseuR_toolbox).  \nIf you need help creating a personal workflow and cannot find what you need in these examples, [open an issue](https://github.com/louis-heraut/dataverseuR_toolbox/issues).\n\n\n## FAQ\n📬 — **I would like an upgrade / I have a question / Need to reach me**  \nFeel free to [open an issue](https://github.com/louis-heraut/dataverseuR/issues) ! I’m actively maintaining this project, so I’ll do my best to respond quickly.  \nI’m also reachable on my institutional INRAE [email](mailto:louis.heraut@inrae.fr?subject=%5BdataverseuR%5D) for more in-depth discussions.\n\n🛠️ — **I found a bug**  \n- *Good Solution* : Search the existing issue list, and if no one has reported it, create a new issue !  \n- *Better Solution* : Along with the issue submission, provide a minimal reproducible code sample.  \n- *Best Solution* : Fix the issue and submit a pull request. This is the fastest way to get a bug fixed.\n\n🚀 — **Want to contribute ?**  \nIf you don't know where to start, [open an issue](https://github.com/louis-heraut/dataverseuR/issues).\n\nIf you want to try by yourself, why not start by also [opening an issue](https://github.com/louis-heraut/dataverseuR/issues) to let me know you're working on something ? Then:\n\n- Fork this repository  \n- Clone your fork locally and make changes (or even better, create a new branch for your modifications)\n- Push to your fork and verify everything works as expected\n- Open a Pull Request on GitHub and describe what you did and why\n- Wait for review\n- For future development, keep your fork updated using the GitHub “Sync fork” functionality or by pulling changes from the original repo (or even via remote upstream if you're comfortable with Git). Otherwise, feel free to delete your fork to keep things tidy ! \n\nIf we’re connected through work, why not reach out via email to see if we can collaborate more closely on this repo by adding you as a collaborator !\n\n\n## Code of Conduct\nPlease note that this project is released with a [Contributor Code of Conduct](CODE_OF_CONDUCT.md). By participating in this project you agree to abide by its terms.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flouis-heraut%2Fdataverseur","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flouis-heraut%2Fdataverseur","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flouis-heraut%2Fdataverseur/lists"}