{"id":21427006,"url":"https://github.com/j535d165/datahugger","last_synced_at":"2025-07-14T10:31:14.518Z","repository":{"id":66129867,"uuid":"520916910","full_name":"J535D165/datahugger","owner":"J535D165","description":"One downloader for many scientific data and code repositories! DOI :open_hands: Data","archived":false,"fork":false,"pushed_at":"2025-06-23T19:58:30.000Z","size":3979,"stargazers_count":75,"open_issues_count":14,"forks_count":10,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-06-23T20:43:26.158Z","etag":null,"topics":["cli","data","datacite","dataone","dataverse","dryad","figshare","github","mendeley-data","python","rdm","repository","research","research-data-management","science","scientific","scientific-data","utrecht-university","zenodo"],"latest_commit_sha":null,"homepage":"https://J535D165.github.io/datahugger/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/J535D165.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-08-03T14:35:12.000Z","updated_at":"2025-06-17T09:30:36.000Z","dependencies_parsed_at":"2023-09-22T01:10:22.995Z","dependency_job_id":"226be42b-996c-48f6-8457-d1f4e387c847","html_url":"https://github.com/J535D165/datahugger","commit_stats":{"total_commits":145,"total_committers":6,"mean_commits":"24.166666666666668","dds":0.04137931034482756,"last_synced_commit":"338781bf45adb70cd99709e81343d0605e04a26d"},"previous_names":[],"tags_count":30,"template":false,"template_full_name":null,"purl":"pkg:github/J535D165/datahugger","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/J535D165%2Fdatahugger","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/J535D165%2Fdatahugger/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/J535D165%2Fdatahugger/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/J535D165%2Fdatahugger/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/J535D165","download_url":"https://codeload.github.com/J535D165/datahugger/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/J535D165%2Fdatahugger/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265280504,"owners_count":23739850,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cli","data","datacite","dataone","dataverse","dryad","figshare","github","mendeley-data","python","rdm","repository","research","research-data-management","science","scientific","scientific-data","utrecht-university","zenodo"],"created_at":"2024-11-22T21:43:36.188Z","updated_at":"2025-07-14T10:31:14.151Z","avatar_url":"https://github.com/J535D165.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003cimg width=\"360px\" alt=\"Datahugger - Where DOI hugs Data\" src=\"https://github.com/J535D165/datahugger/raw/main/datahugger_logo.svg\"\u003e\n\u003c/p\u003e\n\n# Datahugger - Where DOI :open_hands: Data\n\nDatahugger is a tool to download scientific datasets, software, and code from a large number of repositories based on their DOI [(wiki)](https://en.wikipedia.org/wiki/Digital_object_identifier) or URL. With Datahugger, you can automate the downloading of data and improve the reproducibility of your research. Datahugger provides a straightforward [Python interface](#download-with-python) as well as an intuitive [Command Line Interface](#download-with-command-line) (CLI).\n\n## Supported repositories\n\nDatahugger offers support for more than [\u003c!-- count --\u003e377\u003c!-- count --\u003e generic and specific (scientific) repositories](https://j535d165.github.io/datahugger/repositories) (and more to come!).\n\n[![Datahugger support Zenodo, Dataverse, DataOne, GitHub, FigShare, HuggingFace, Mendeley Data, Dryad, OSF, and many more](https://github.com/J535D165/datahugger/raw/main/docs/images/logos.png)](https://j535d165.github.io/datahugger/repositories)\n\nWe are still expanding Datahugger with support for more repositories. You can\nhelp by [requesting support for a repository](https://github.com/J535D165/datahugger/issues/new/choose) in the issue\ntracker. Pull Requests are very welcome as well.\n\n## Installation\n\n[![PyPI](https://img.shields.io/pypi/v/datahugger)](https://pypi.org/project/datahugger/)\n\nDatahugger requires Python 3.6 or later.\n\n```\npip install datahugger\n```\n\n## Getting started\n\n### Datahugger with Python\n\nLoad a dataset (or any digital asset) from a repository with the\n`datahugger.get()` function. The first argument is the DOI or URL,\nand the second is the folder name to store the dataset (it will be\ncreated if it does not exist).\n\nThe following code loads dataset [10.5061/dryad.mj8m0](https://doi.org/10.5061/dryad.mj8m0) into\nthe folder `data`.\n\n```python\nimport datahugger\n\n# download the dataset to the folder \"data\"\ndatahugger.get(\"10.5061/dryad.mj8m0\", \"data\")\n```\n\nFor an example of how this can integrate with your work, see the\n[example workflow notebook](https://github.com/J535D165/datahugger/blob/main/examples/example_datahugger_in_workflow.ipynb) or\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/J535D165/datahugger/blob/main/examples/example_datahugger_in_workflow.ipynb)\n\n\n### Datahugger with command line\n\nThe command line function `datahugger` provides an easy interface to download data. The first\nargument is the DOI or URL, and the second argument is the name of the folder to store the dataset (will be\ncreated if it does not exist).\n\n```bash\ndatahugger 10.5061/dryad.mj8m0 data\n```\n\n```bash\n% datahugger 10.5061/dryad.mj8m0 data\nCollecting...\nNestTemperatureData.csv            : 100%|████████████████████████████████████████| 607k/607k\nREADME_for_NestTemperatureData.txt : 100%|██████████████████████████████████████| 2.82k/2.82k\nExternalTemps.csv                  : 100%|██████████████████████████████████████| 1.06k/1.06k\nREADME_for_ExternalTemps.txt       : 100%|██████████████████████████████████████| 2.82k/2.82k\nInternalEggTempData.csv            : 100%|██████████████████████████████████████████| 664/664\nREADME_for_InternalEggTempData.txt : 100%|██████████████████████████████████████| 2.82k/2.82k\nSoilSimulation_Output.csv          : 100%|████████████████████████████████████████| 229M/229M\nREADME_for_SoilSimulation_[...].txt: 100%|██████████████████████████████████████| 2.82k/2.82k\nDataset successfully downloaded.\n```\n\n**Tip:** On some systems, you have to quote the DOI or URL. For example: `datahugger \"10.5061/dryad.mj8m0\" data`.\n\n## Tips and tricks\n\n- No need to struggle with DOIs versus DOI URLs. They both work (and more). Example: The values `10.5061/dryad.x3ffbg7m8`, `doi:10.5061/dryad.x3ffbg7m8`, [`https://doi.org/10.5061/dryad.x3ffbg7m8`](https://doi.org/10.5061/dryad.x3ffbg7m8), and [`https://datadryad.org/stash/dataset/doi:10.5061/dryad.x3ffbg7m8`](https://datadryad.org/stash/dataset/doi:10.5061/dryad.x3ffbg7m8) all point to the same dataset.\n- Do not republish the dataset when uploading your data to a scientific data repository. These storage resources can be used better :)\n\n## Contact\n\nPlease feel free to reach out with questions, comments, and suggestions. The\n[issue tracker](/issues) is a good starting point. You can also email me at\n[jonathandebruinos@gmail.com](mailto:jonathandebruinos@gmail.com).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fj535d165%2Fdatahugger","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fj535d165%2Fdatahugger","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fj535d165%2Fdatahugger/lists"}