{"id":13557718,"url":"https://github.com/weecology/retriever","last_synced_at":"2025-10-21T19:50:55.394Z","repository":{"id":945933,"uuid":"1968190","full_name":"weecology/retriever","owner":"weecology","description":"Quickly download, clean up, and install public datasets into a database management system","archived":false,"fork":false,"pushed_at":"2025-03-24T03:31:20.000Z","size":81151,"stargazers_count":312,"open_issues_count":52,"forks_count":139,"subscribers_count":30,"default_branch":"main","last_synced_at":"2025-03-24T04:28:42.050Z","etag":null,"topics":["data","data-retrieval","data-science","dataset","datasets","hacktobefest","python"],"latest_commit_sha":null,"homepage":"http://data-retriever.org","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/weecology.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGES.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"docs/code_of_conduct.rst","threat_model":null,"audit":null,"citation":"CITATION","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2011-06-28T19:01:15.000Z","updated_at":"2025-03-24T03:31:25.000Z","dependencies_parsed_at":"2024-02-07T13:08:26.522Z","dependency_job_id":"825be247-e58e-4bea-a293-9c01881be56f","html_url":"https://github.com/weecology/retriever","commit_stats":{"total_commits":2028,"total_committers":82,"mean_commits":24.73170731707317,"dds":0.6336291913214991,"last_synced_commit":"37982577eca010a03dd5b5e23fe30be8f42da9ed"},"previous_names":[],"tags_count":22,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/weecology%2Fretriever","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/weecology%2Fretriever/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/weecology%2Fretriever/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/weecology%2Fretriever/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/weecology","download_url":"https://codeload.github.com/weecology/retriever/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247002203,"owners_count":20867425,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data","data-retrieval","data-science","dataset","datasets","hacktobefest","python"],"created_at":"2024-08-01T12:04:30.501Z","updated_at":"2025-10-21T19:50:50.352Z","avatar_url":"https://github.com/weecology.png","language":"Python","funding_links":[],"categories":["Python","data-science"],"sub_categories":[],"readme":"![Retriever logo](http://i.imgur.com/se7TtrK.png)\n\n\n[![Python package](https://github.com/weecology/retriever/actions/workflows/python-package.yml/badge.svg)](https://github.com/weecology/retriever/actions/workflows/python-package.yml)\n[![Build Status (windows)](https://ci.appveyor.com/api/projects/status/qetgo4jxa5769qtb/branch/main?svg=true)](https://ci.appveyor.com/project/ethanwhite/retriever/branch/main)\n[![Research software impact](http://depsy.org/api/package/pypi/retriever/badge.svg)](http://depsy.org/package/python/retriever)\n[![codecov.io](https://codecov.io/github/weecology/retriever/coverage.svg?branch=main)](https://codecov.io/github/weecology/retriever?branch=main)\n[![Documentation Status](https://readthedocs.org/projects/retriever/badge/?version=latest)](http://retriever.readthedocs.io/en/latest/?badge=latest)\n[![License](http://img.shields.io/badge/license-MIT-blue.svg)](https://raw.githubusercontent.com/weecology/retriever/main/LICENSE)\n[![Join the chat at https://gitter.im/weecology/retriever](https://badges.gitter.im/weecology/retriever.svg)](https://gitter.im/weecology/retriever?utm_source=badge\u0026utm_medium=badge\u0026utm_campaign=pr-badge\u0026utm_content=badge)\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.1038272.svg)](https://doi.org/10.5281/zenodo.1038272)\n[![JOSS Publication](http://joss.theoj.org/papers/10.21105/joss.00451/status.svg)](https://doi.org/10.21105/joss.00451)\n[![Anaconda-Server Badge](https://anaconda.org/conda-forge/retriever/badges/downloads.svg)](https://anaconda.org/conda-forge/retriever)\n[![Anaconda-Server Badge](https://anaconda.org/conda-forge/retriever/badges/version.svg)](https://anaconda.org/conda-forge/retriever)\n[![Version](https://img.shields.io/pypi/v/retriever.svg)](https://pypi.python.org/pypi/retriever)\n\u003ca href=\"https://numfocus.org/sponsored-projects\"\u003e\n\u003cimg alt=\"NumFOCUS\"\n   src=\"https://i0.wp.com/numfocus.org/wp-content/uploads/2019/06/AffiliatedProject.png\" width=\"100\" height=\"18\"\u003e\n\u003c/a\u003e\n\nFinding data is one thing. Getting it ready for analysis is another. Acquiring,\ncleaning, standardizing and importing publicly available data is time consuming\nbecause many datasets lack machine readable metadata and do not conform to\nestablished data structures and formats. The Data Retriever automates the first\nsteps in the data analysis pipeline by downloading, cleaning, and standardizing\ndatasets, and importing them into relational databases, flat files, or\nprogramming languages. The automation of this process reduces the time for a\nuser to get most large datasets up and running by hours, and in some cases days.\n\n## Installing the Current Release\n\nIf you have Python installed you can install the current release using either `pip`:\n\n```bash\npip install retriever\n```\n\nor `conda` after adding the `conda-forge` channel (`conda config --add channels conda-forge`):\n\n```bash\nconda install retriever\n```\n\nDepending on your system configuration this may require `sudo` for `pip`:\n\n```bash\nsudo pip install retriever\n```\n\nPrecompiled binary installers are also available for Windows, OS X, and\nUbuntu/Debian on\nthe [releases page](https://github.com/weecology/retriever/releases). These do\nnot require a Python installation.\n\n[List of Available Datasets](https://retriever.readthedocs.io/en/latest/datasets_list.html)\n----------------------------\n\nInstalling From Source\n----------------------\n\nTo install the Data Retriever from source, you'll need Python 3.6.8+ with the following packages installed:\n\n* xlrd\n\nThe following packages are optionally needed to interact with associated\ndatabase management systems:\n\n* PyMySQL (for MySQL)\n* sqlite3 (for SQLite)\n* psycopg2-binary (for PostgreSQL), previously psycopg2.\n* pyodbc (for MS Access - this option is only available on Windows)\n* Microsoft Access Driver (ODBC for windows)\n\n### To install from source\n\nEither use `pip` to install directly from GitHub:\n\n```shell\npip install git+https://git@github.com/weecology/retriever.git\n```\n\nor:\n\n1. Clone the repository\n2. From the directory containing setup.py, run the following command: `pip\n   install .`. You may need to include `sudo` at the beginning of the\n   command depending on your system (i.e., `sudo pip install .`).\n\nMore extensive documentation for those that are interested in developing can be found [here](http://retriever.readthedocs.io/en/latest/?badge=latest)\n\nUsing the Command Line\n----------------------\nAfter installing, run `retriever update` to download all of the available dataset scripts.\nTo see the full list of command line options and datasets run `retriever --help`.\nThe output will look like this:\n\n```shell\nusage: retriever [-h] [-v] [-q]\n                 {download,install,defaults,update,new,new_json,edit_json,delete_json,ls,citation,reset,help}\n                 ...\n\npositional arguments:\n  {download,install,defaults,update,new,new_json,edit_json,delete_json,ls,citation,reset,help}\n                        sub-command help\n    download            download raw data files for a dataset\n    install             download and install dataset\n    defaults            displays default options\n    update              download updated versions of scripts\n    new                 create a new sample retriever script\n    new_json            CLI to create retriever datapackage.json script\n    edit_json           CLI to edit retriever datapackage.json script\n    delete_json         CLI to remove retriever datapackage.json script\n    ls                  display a list all available dataset scripts\n    citation            view citation\n    reset               reset retriever: removes configuration settings,\n                        scripts, and cached data\n    help\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -v, --version         show program's version number and exit\n  -q, --quiet           suppress command-line output\n```\n\nTo install datasets, use `retriever install`:\n\n```shell\nusage: retriever install [-h] [--compile] [--debug]\n                         {mysql,postgres,sqlite,msaccess,csv,json,xml} ...\n\npositional arguments:\n  {mysql,postgres,sqlite,msaccess,csv,json,xml}\n                        engine-specific help\n    mysql               MySQL\n    postgres            PostgreSQL\n    sqlite              SQLite\n    msaccess            Microsoft Access\n    csv                 CSV\n    json                JSON\n    xml                 XML\n\noptional arguments:\n  -h, --help            show this help message and exit\n  --compile             force re-compile of script before downloading\n  --debug               run in debug mode\n```\n\n\n### Examples\n\nThese examples are using the [*Iris* flower dataset](https://en.wikipedia.org/wiki/Iris_flower_data_set).\nMore examples can be found in the Data Retriever documentation.\n\nUsing Install\n\n```shell\nretriever install -h   (gives install options)\n```\n\nUsing specific database engine, retriever install {Engine}\n\n```shell\nretriever install mysql -h     (gives install mysql options)\nretriever install mysql --user myuser --password ******** --host localhost --port 8888 --database_name testdbase iris\n```\ninstall data into an sqlite database named iris.db you would use:\n\n```shell\nretriever install sqlite iris -f iris.db\n```\n\nUsing download\n\n```shell\nretriever download -h    (gives you help options)\nretriever download iris\nretriever download iris --path C:\\Users\\Documents\n```\n\nUsing citation\n\n```shell\nretriever citation   (citation of the retriever engine)\nretriever citation iris  (citation for the iris data)\n```\n\nSpatial Dataset Installation\n----------------------------\n\n**Set up Spatial support**\n\nTo set up spatial support for Postgres using Postgis please\nrefer to the [spatial set-up docs](https://retriever.readthedocs.io/en/latest/spatial_dbms.html).\n\n```shell\nretriever install postgres harvard-forest # Vector data\nretriever install postgres bioclim # Raster data\n# Install only the data of USGS elevation in the given extent\nretriever install postgres usgs-elevation -b -94.98704597353938 39.027001800158615 -94.3599408119917 40.69577051867074\n\n```\n\nWebsite\n-------\n\nFor more information see the\n[Data Retriever website](http://www.data-retriever.org/).\n\nAcknowledgments\n---------------\n\nDevelopment of this software was funded by the [Gordon and Betty Moore\nFoundation's Data-Driven Discovery\nInitiative](https://www.moore.org/initiative-strategy-detail?initiativeId=data-driven-discovery) through\n[Grant GBMF4563](http://www.moore.org/grants/list/GBMF4563) to Ethan White and\nthe [National Science Foundation](http://nsf.gov/) as part of a [CAREER award to\nEthan White](http://nsf.gov/awardsearch/showAward.do?AwardNumber=0953694).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fweecology%2Fretriever","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fweecology%2Fretriever","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fweecology%2Fretriever/lists"}