{"id":16472934,"url":"https://github.com/elipousson/getdata","last_synced_at":"2025-09-10T01:36:21.608Z","repository":{"id":44985926,"uuid":"507762575","full_name":"elipousson/getdata","owner":"elipousson","description":"📍🌎 A R package to get location data from a variety of open sources","archived":false,"fork":false,"pushed_at":"2024-10-29T02:00:38.000Z","size":5473,"stargazers_count":12,"open_issues_count":3,"forks_count":2,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-09-04T18:58:04.136Z","etag":null,"topics":["r","r-package","rspatial","rstats"],"latest_commit_sha":null,"homepage":"https://elipousson.github.io/getdata/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/elipousson.png","metadata":{"files":{"readme":"README.Rmd","changelog":"NEWS.md","contributing":".github/CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":"codemeta.json","zenodo":null}},"created_at":"2022-06-27T04:31:56.000Z","updated_at":"2024-10-29T01:58:20.000Z","dependencies_parsed_at":"2023-11-22T18:42:20.634Z","dependency_job_id":"b4583cbe-7a6b-4c3f-9513-d28392f883a9","html_url":"https://github.com/elipousson/getdata","commit_stats":{"total_commits":376,"total_committers":2,"mean_commits":188.0,"dds":"0.0026595744680850686","last_synced_commit":"93b84190e920d91fc916d4616869a7f8f6d7a95c"},"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"purl":"pkg:github/elipousson/getdata","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elipousson%2Fgetdata","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elipousson%2Fgetdata/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elipousson%2Fgetdata/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elipousson%2Fgetdata/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/elipousson","download_url":"https://codeload.github.com/elipousson/getdata/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elipousson%2Fgetdata/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":274396615,"owners_count":25277395,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-09T02:00:10.223Z","response_time":80,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["r","r-package","rspatial","rstats"],"created_at":"2024-10-11T12:19:03.834Z","updated_at":"2025-09-10T01:36:21.582Z","avatar_url":"https://github.com/elipousson.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\noutput: github_document\n---\n\n\u003c!-- README.md is generated from README.Rmd. Please edit that file --\u003e\n\n```{r, include = FALSE}\nknitr::opts_chunk$set(\n  collapse = TRUE,\n  comment = \"#\u003e\",\n  fig.path = \"man/figures/README-\",\n  out.width = \"100%\"\n)\n```\n\n# getdata \u003ca href=\"https://elipousson.github.io/getdata/\"\u003e\u003cimg src=\"man/figures/logo.png\" align=\"right\" height=\"118\" /\u003e\u003c/a\u003e\n\n\u003c!-- badges: start --\u003e\n[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)\n[![Codecov test coverage](https://codecov.io/gh/elipousson/getdata/branch/main/graph/badge.svg)](https://app.codecov.io/gh/elipousson/getdata?branch=main)\n[![Project Status: Active – The project has reached a stable, usable state and is being actively developed.](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\u003c!-- badges: end --\u003e\n\nThe goal of {getdata} is to make the experience of getting location data easier and more consistent across a wide variety of sources. {getdata} started as part of the [{overedge}](https://elipousson.github.io/overedge/) package along with [{maplayer}](https://elipousson.github.io/maplayer/) and [{sfext}](https://elipousson.github.io/sfext/).\n\n{getdata} is designed to work well with location-specific data packages such as [{mapmaryland}](https://elipousson.github.io/mapmaryland/) and [{mapbaltimore}](https://elipousson.github.io/mapbaltimore/) and to support reproducible approaches to map-making and place-based data analysis. Using data access functions from {sfext} and additional API wrapper functions, this package supports data access for sources including:\n\n-   ArcGIS FeatureServer and MapServer layers (using [{esri2sf}](https://github.com/yonghah/esri2sf))\n-   U.S. Census Bureau data (using [{tigris}](https://github.com/walkerke/tigris))\n-   OpenStreetMap (using [{osmdata}](https://docs.ropensci.org/osmdata/))\n-   Socrata Open Data resources (using [{RSocrata](https://github.com/Chicago/RSocrata)})\n-   Google Sheets (using [{googlesheets4}](https://googlesheets4.tidyverse.org/))\n-   Flickr photos (using [{FlickrAPI}](https://koki25ando.github.io/FlickrAPI/))\n-   Static map images from Mapbox (using [{mapboxapi}](https://walker-data.com/mapboxapi/))\n-   Airtable bases (using {httr2} and the [Airtable API](https://airtable.com/api))\n-   Wikipedia articles (using {httr2} and the [Wikipedia Geosearch API](https://www.mediawiki.org/wiki/Extension:GeoData))\n-   Other spatial data sources including Google MyMaps, GitHub gists, and any data source already supported by [sf::read_sf()](https://r-spatial.github.io/sf/reference/st_read.html) (see [sfext::read_sf_ext()](https://elipousson.github.io/sfext/reference/read_sf_ext.html) for more details)\n\nThe advantage of using {getdata} is that it provides a consistent interface for using a location to create a bounding box for spatial filtering. Many functions also support querying spatial data by name or id. Where possible, a spatial filter is used before importing or downloading data to avoid the need to load large data files when you are only need a small area. The package also provides a consistent approach for handling API tokens and keys and for caching data locally (see [set_access_token()](https://elipousson.github.io/getdata/reference/set_access_token.html) or [filenamr::get_data_dir()](https://elipousson.github.io/sfext/reference/get_data_dir.html) for more details).\n\nThe related {sfext} package allows {getdata} to supports the easy conversion of tabular data into spatial data. For example, if the source data has coordinates, you can convert the data into an sf object. If data has an address column, you can geocode the data using the [{tidygeocoder}](https://jessecambon.github.io/tidygeocoder/) package. If the data has a location name column, such as \"neighborhood\", you can join the data to a simple feature object with the related geometry. You also can turn off these options by setting `geometry = FALSE` for most data access functions.\n\nLastly, the [format_data()](https://elipousson.github.io/getdata/reference/format_data.html) and [format_sf_data()](https://elipousson.github.io/getdata/reference/format_sf_data.html) functions provide convenient options for working with the data after it is downloaded. While advanced R users may prefer to create more custom formatting scripts, these functions are designed to support the creation of custom data formatting and access functions such as [format_md_crash_data()](https://elipousson.github.io/mapmaryland/reference/format_md_sf.html) and [get_md_crash_data()](https://elipousson.github.io/mapmaryland/reference/get_md_open_data.html).\n\nFair warning: this package is *not* optimized for speed and I have no plans to submit it to CRAN. This package imports {rlang} for both non-standard evaluation and error handling and relies on {dplyr}, {purrr}, and other tidyverse packages. Suggestions for additional data sources to support, new functions, or improvements to existing functions are welcome.\n\n## Installation\n\nYou can install the development version of getdata like so:\n\n``` r\npak::pkg_install(\"elipousson/getdata\")\n```\n\n## Basic usage\n\n`get_location_data()` is a flexible function for reading and subsetting data. In this example, data is a file path but it can also be a URL, the name of a data set in another package, or a sf object.\n\n```{r}\nlibrary(getdata)\nlibrary(dplyr)\n\n# location is optional\nnc \u003c- get_location_data(data = system.file(\"shape/nc.shp\", package = \"sf\"))\n```\n\nYou can use [get_location()](https://elipousson.github.io/getdata/reference/) to get a specific location from a larger simple feature collection that includes a specific type of locations, such as counties in North Carolina. The most basic approach is filtering by name or id:\n\n```{r get_location}\n# get_location works with a type sf object and name and id values\nlocation \u003c- get_location(type = nc, name = \"Warren\", name_col = \"NAME\")\n```\n\nYou can then access data within or around this specific location. For example, `get_location_data()` can return all counties within a quarter-mile of Warren County.\n\n```{r}\nnearby_counties \u003c- get_location_data(\n  data = nc,\n  location = location,\n  dist = 0.25,\n  unit = \"mi\",\n  crop = FALSE\n)\n\nglimpse(nearby_counties)\n```\n\nThis same approach of using names as an attribute query or locations with buffers as a spatial filter works for most functions in this package. You can access data from OpenStreetMap:\n\n```{r}\ncounty_parks \u003c- get_osm_data(\n  location = nearby_counties[1, ],\n  asp = 1,\n  key = \"leisure\",\n  value = \"park\",\n  geometry = \"polygons\"\n)\n\nglimpse(county_parks)\n```\n\nYou can also access data from any public ArcGIS MapServer or FeatureServer layers:\n\n```{r}\nnps_park_url \u003c- \"https://carto.nationalmap.gov/arcgis/rest/services/govunits/MapServer/29\"\n\nnps_park \u003c- get_esri_data(\n  url = nps_park_url,\n  name = \"Cape Lookout National Seashore\",\n  name_col = \"NAME\",\n  quiet = TRUE\n)\n\nglimpse(nps_park)\n```\n\nIn some cases, an API key may be required for functions to work:\n\n```{r get_open_data, eval = FALSE}\n## Get Q2 2020 vehicle crash data for Cecil County, Maryland\nget_open_data(\n  source_url = \"https://opendata.maryland.gov\",\n  data = \"65du-s3qu\",\n  where = \"(year = '2020') AND (quarter = 'Q2')\",\n  name_col = \"county_desc\",\n  name = \"Cecil\",\n  token = Sys.getenv(\"MARYLAND_OPEN_DATA_API_KEY\")\n)\n```\n\nYou must set or provide an API token or key for `get_open_data()`, `get_airtable_data()`, `get_flickr_photos()` to work. `get_gsheet_data()` will require user authentication (handled automatically by the {googlesheets4} package).\n\n## Helper and utility functions\n\nThe package also includes a handful of helper and wrapper functions designed that can be used for formatting, labelling, and other tasks.\n\nFor example, you can use `fix_epoch_date()` to convert columns with [UNIX time](https://en.wikipedia.org/wiki/Unix_time) numeric values to POSIXct values:\n\n```{r}\nnps_park[[\"loaddate\"]]\n\nnps_park \u003c- fix_epoch_date(nps_park)\n\nnps_park[[\"loaddate\"]]\n```\n\nYou can use `make_variable_dictionary()` to make a custom dictionary:\n\n```{r}\nmake_variable_dictionary(\n  nps_park[, c(10:12)],\n  .labels = c(\n    \"Geographic Names Information System identifier\",\n    \"Park name\",\n    \"Area (sq km)\",\n    \"Geometry\"\n  )\n)\n```\n\nOr you can use `rename_with_xwalk()` to rename columns:\n\n```{r}\nrename_with_xwalk(\n  nps_park[, c(10:12)],\n  xwalk = list(\n    \"gnis\" = \"gnis_id\",\n    \"sq_km\" = \"areasqkm\"\n  )\n)\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Felipousson%2Fgetdata","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Felipousson%2Fgetdata","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Felipousson%2Fgetdata/lists"}