{"id":19227479,"url":"https://github.com/datastreamapp/datastreamr","last_synced_at":"2025-04-21T01:31:36.823Z","repository":{"id":47653475,"uuid":"259741688","full_name":"datastreamapp/datastreamr","owner":"datastreamapp","description":"DataSteam API helper in R","archived":false,"fork":false,"pushed_at":"2025-03-27T15:10:00.000Z","size":165,"stargazers_count":7,"open_issues_count":0,"forks_count":0,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-03-27T16:24:51.656Z","etag":null,"topics":["api-wrapper","r"],"latest_commit_sha":null,"homepage":"","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/datastreamapp.png","metadata":{"files":{"readme":"docs/README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-04-28T20:15:00.000Z","updated_at":"2025-03-27T15:10:04.000Z","dependencies_parsed_at":"2024-06-27T05:25:41.164Z","dependency_job_id":"91ea7ea7-226b-43da-8061-4928ef0ba90e","html_url":"https://github.com/datastreamapp/datastreamr","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datastreamapp%2Fdatastreamr","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datastreamapp%2Fdatastreamr/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datastreamapp%2Fdatastreamr/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datastreamapp%2Fdatastreamr/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/datastreamapp","download_url":"https://codeload.github.com/datastreamapp/datastreamr/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249982512,"owners_count":21355704,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["api-wrapper","r"],"created_at":"2024-11-09T15:23:32.310Z","updated_at":"2025-04-21T01:31:36.817Z","avatar_url":"https://github.com/datastreamapp.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ch1 align=\"center\"\u003e\n  \u003cimg src=\"https://raw.githubusercontent.com/gordonfn/datastreamr/master/docs/images/datastream.svg\" alt=\"DataStream Logo\" width=\"400\"\u003e\n  \u003cbr/\u003e\n  DataStream R Package\n  \u003cbr/\u003e\n  \u003cbr/\u003e\n\u003c/h1\u003e\n\n\nThis tool is useful for those who want to extract large volumes of data from \u003ca href=\"https://datastream.org\"\u003eDataStream\u003c/a\u003e. This R package allows users to call  DataStream's  \u003ca href=\"https://github.com/datastreamapp/api-docs\"\u003ePublic API\u003c/a\u003e using R functions and specific search queries. The package includes several functions which accept a selection of filtering queries and returns a dataframe with the desired data from DataStream. \n\n**You might use this tool, for example, if you:**\n*  Cross-dataset downloads (i.e., to download all available pH data in Ontario on DataStream)\n*  Want to count how many sites in New Brunswick have cesium data on DataStream\n\n##\n  \n\u003ch3 align=\"center\"\u003e\n \u003ca href=\"https://docs.google.com/forms/d/1SjPVeblz2QFaghpiBZPZKOVNKXgw5UMnAtJLJS1tQYI\"\u003eRequest an API Token\u003c/a\u003e\n \u003c/h3\u003e\n\u003cp align=\"center\"\u003e\n\u003ch align=\"center\"\u003e\nTo have full API permissions, users must request an API token which is required to call to the API \n\u003cp align=\"center\"\u003e\n\n\n## Installation\nTo install the most recent version in R:\n\n```R\n# install.packages(\"devtools\")\nremotes::install_github(\"datastreamapp/datastreamr\")\n```\n\n## Attribution/Citation\nThank you ahead of time for using this data responsibly and providing the appropriate citations when necessary when presenting work to external parties. These dataset citations must be accompanied by a link to the DOI (https://doi.org/{value}). The dataset licence, citation, and DOI can be retrieved from the `/Metadata` endpoint.\n\n### Licence representations\nThe API returns the URL for a dataset's licence, these should be mapped to the full licence name with a link to the full licence details.\n- `Attribution Licence`: \n  - EN: Attribution Licence (ODC-By) v1.0\n  - FR: Licence d'attribution (ODC-By) v1.0\n  - URL: https://opendatacommons.org/licenses/by/1-0/\n- `Public Domain Dedication and Licence`: \n  - EN: Public Domain Dedication and Licence (ODC-PDDL) v1.0\n  - FR: Dédicace et licence du domaine public (ODC-PDDL) v1.0\n  - URL: https://opendatacommons.org/licenses/pddl/1-0/\n- `Open Government Licence`:\n  - EN: Open Government Licence (OGL)\n  - FR: Licence du gouvernement ouvert (OGL)\n  - URL: Dataset-dependent, entered by data provider (eg. https://open.canada.ca/en/open-government-licence-canada) \n\n## The Functions \nThe following functions are used to call DataStream's API and pull desired information.  \n\n### setAPIKey():  \n**Description** \n \u003cbr/\u003e\nBy default the environment variable \"DATASTREAM_API_KEY\" is used for setting the API key. Click  \u003ca href=\"https://docs.google.com/forms/d/1SjPVeblz2QFaghpiBZPZKOVNKXgw5UMnAtJLJS1tQYI\"\u003ehere\u003c/a\u003e to request an api token \u003cbr/\u003e\n \u003cbr/\u003e\n  \u003cbr/\u003e\n**Usage**\n```R\nlibrary(datastreamr)\n# To set API Key for the current session, use:\nsetAPIKey('xxxxxxxxxx')\n\n# Preferably, save the API key as an environmental variable\nusethis::edit_r_environ()\n# add DATASTREAM_API_KEY=\"xxxxxxxxxx\" to the file, save, and restart R, then there is no need to include `setAPIKey()` within the script.\n\n# Saving the API key as an environmental variable means it will remain private but be available when needed\n```\n\n### metadata():  \n**Description** \n \u003cbr/\u003e\nPulls only the dataset level metadata information including dataset name, citation, licence, abstract, etc. \n \u003cbr/\u003e\n  \u003cbr/\u003e\n**Usage**\n```R\nmetadata( \n  list(\n    `$select` = NULL,\n    `$filter` = NULL,\n    `$top` = NULL,\n    `$count` = \"false\"\n  )\n)\n```\n\n  \n### locations():  \n**Description** \n \u003cbr/\u003e\nPulls only the location data including Location ID, Location Name, Latitude, and Longitude.\n \u003cbr/\u003e\n  \u003cbr/\u003e\n**Usage**\n```R\nlocations( \n  list(\n    `$select` = NULL,\n    `$filter` = NULL,\n    `$top` = NULL,\n    `$count` = \"false\"\n  )\n)\n```\n  \n### records(): \n**Description** \n \u003cbr/\u003e\nPulls data formatted the same as the downloaded DataStream CSV’s including all columns listed in the DataStream \u003ca href=\"https://github.com/datastreamapp/schema\"\u003eschema\u003c/a\u003e .\n \u003cbr/\u003e\n  \u003cbr/\u003e\n  **Usage**\n* This function will take longer than `observations`, but provides all available columns in one request. \u003cbr/\u003e\n* Use this function if you aim to pull all location and parameter data in one call  \u003cbr/\u003e\n\n```R\nrecords( \n  list(\n    `$select` = NULL,\n    `$filter` = NULL,\n    `$top` = NULL,\n    `$count` = \"false\"\n  )\n)\n```\n  \n### observations():  \n**Description** \n \u003cbr/\u003e\nPulls data in a condensed format that must be joined with other endpoints to create a full dataset with all the DataStream columns. Specifically, location rows are not pulled, instead `LocationId` is pulled for each observation and then can be used in combination with `locations()`. \n \u003cbr/\u003e\n  \u003cbr/\u003e\n  **Usage**\n* This function will be quicker than `records`, but if location specifics are needed, needs to be paired with `locations()` \u003cbr/\u003e\n* Use this function if you are uninterested in specific location coordinates, or in combination with `locations()` when you plan to pull \u003e200,000 of rows of data \u003cbr/\u003e\n\n```R\nobservations( \n  list(\n    `$select` = NULL,\n    `$filter` = NULL,\n    `$top` = NULL,\n    `$count` = \"false\"\n  )\n)\n```\n\n## Function Inputs \nAll of the functions above accept query parameters. The ones supported are:  \n\n##\n- **select:** *A list of allowable columns to return* \u003cbr/\u003e\n\n  - Fields to be selected are entered as a list.\n  - Example: `select=\"DatasetName,Abstract\"`\n  - Default: All columns available.\n \n$`\\color{blue}{\\text{Note}}`$: refer to **Allowed Values** section below for available select fields\n\n##\n - **filter:** *A list of conditions to filter by*  \u003cbr/\u003e\n \n   - Available operators: \n    - eq: Used for exact matches.\n    - ne: Used for not equal to.\n    - gt: Used for greater than.\n    - lt: Used for less than.\n    - ge: Used for greater than or equal to.\n    - le: Used for less than or equal to.\n    - and: Used to combine multiple filters with an “and” condition.\n   - Grouping: `filter=\"CharacteristicName eq 'Dissolved oxygen saturation' and DOI eq '10.25976/n02z-mm23'\"`\n   - Temporal (Dataset creation): `filter=\"CreateTimestamp gt 2020-03-23\"`\n   - Temporal (Data date-range): `filter=\"ActivityStartYear gt '2019'\"`\n   - Spatial: `filter=RegionId eq 'hub.atlantic'`\n        - `RegionId` Values (these values are subject to change):\n        - **DataStream Hubs**: `hub.{atlantic,lakewinnipeg,mackenzie,greatlakes,pacific }`\n        - **Countries**: `admin.2.{ca}`\n        - **Provinces/Territories**: `admin.4.ca.{ab,bc,mb,nb,nl,ns,nt,nu,on,pe,qc,sk,yt}`\n          \n$`\\color{blue}{\\text{Note}}`$: refer to **Allowed Values** section below for available filter fields\n    \n##\n- **top:**  \u003cbr/\u003e\n\n  - Maximum: 10000\n  - Example: `top=10`\n##\n- **count:** *When TRUE, returns number of observations rather than the data itself* \u003cbr/\u003e\n\n  - Return only the count for the request. When the value is large enough it becomes an estimate (~0.0005% accurate)\n  - Example: `count=true`\n  - Default: `false`\n  \n  ### Performance Tips\n    - Use `select` to request only the parameters you need. This will decrease the amount of data needed to process and transfer.\n\n  \n## Allowed Values\nThe allowed `select` and `filter` options for each of the functions are listed **\u003ca href=\"https://github.com/datastreamapp/api-docs?tab=readme-ov-file#endpoints\"\u003eHERE\u003c/a\u003e**. \n\n\n\n$`\\color{green}{\\text{Note:}}`$ When using the `filter` field, a useful resource is the \"allowed values\" tab of our \u003ca href=\"https://datastreamorg.sharepoint.com/:x:/s/Datastream/EaqcNGHom7BFlRi6bRY4VDoBy6ECq6v3bbUyeb0B3S3HGg?e=75aBTl\"\u003e upload template \u003c/a\u003e. This will give you available strings for: \n* `MonitoringLocationType`\n* `ActivityMediaName`\n* `CharacteristicName`\n\n\n## Full examples\n\n## Locations\n\n**Get Locations from a dataset**\n```R\nsetAPIKey('xxxxxxxxxx')\n\nqs \u003c- list(\n    `$select` = \"Id,DOI,Name,Latitude,Longitude\",\n    `$filter` = \"DOI eq '10.25976/xxxx-xx00'\",\n    `$top` = 10000\n  )\nresult = locations(qs)\n```\n\n**Get Locations from multiple datasets**\n```R\nqs \u003c- list(\n    `$select` = \"Id,DOI,Name,Latitude,Longitude\",\n    `$filter` = \"DOI in ('10.25976/xxxx-xx00', '10.25976/xxxx-xx11', '10.25976/xxxx-xx22')\",\n    `$top` = 10000)\nresult = locations(qs)\n```\n\n## Observations\n\n**Get `Temperature` and `pH` observations from multiple datasets**\n\n```R\nqs \u003c- list(\n    `$select` = \"DOI,ActivityType,ActivityMediaName,ActivityStartDate,ActivityStartTime,SampleCollectionEquipmentName,CharacteristicName,MethodSpeciation,ResultSampleFraction,ResultValue,ResultUnit,ResultValueType\",\n    `$filter` = \"DOI in ('10.25976/xxxx-xx00', '10.25976/xxxx-xx11', '10.25976/xxxx-xx22') and CharacteristicName in ('Temperature, water', 'pH')\",\n    `$top` = 10000)\nresult = observations(qs)\n```\n\n\n## Records\n**Get select fields from a dataset**\n\n```R\nqs \u003c- list(\n    `$select` = \"DOI,ActivityType,ActivityMediaName,ActivityStartDate,ActivityStartTime,SampleCollectionEquipmentName,CharacteristicName,MethodSpeciation,ResultSampleFraction,ResultValue,ResultUnit,ResultValueType\",\n    `$filter` = \"DOI eq '10.25976/xxxx-xx00'\",\n    `$top` = 10000)\nresult = records(qs)\n```\n\n## Metadata\n**Get the `DOI`, `Version`, and `DatasetName` for a dataset**\n```R\nqs \u003c- list(\n    `$select` = \"DOI,Version,DatasetName\",\n    `$filter` = \"DOI eq '10.25976/xxxx-xx00'\",\n    `$top` = 10000)\nresult = metadata(qs)\n```\n\n## Get Result Count\n```R\nqs \u003c- list(\n    `$filter` = \"DOI eq '10.25976/xxxx-xx00'\",\n    `$count` = \"true\")\ncount = observations(qs)\n```\n\n\n## Tests\nDockerfile is provided to run the unit tests and the integration tests. To build the docker image for running tests and other debugging purposes you can run: \n```Bash\ndocker build -t datastreamr .\n```\n\nTo run the unit tests:\n```Bash\ndocker run --rm -e DATASTREAM_API_KEY=$(cat api_key.txt) datastreamr R -e \"library(testthat); test_file('tests/testthat/test_unit.R')\"\n```\n\nTo run the integration tests:\n```Bash\ndocker run --rm -e DATASTREAM_API_KEY=$(cat api_key.txt) datastreamr R -e \"library(testthat); test_file('tests/testthat/test_integration.R')\"\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatastreamapp%2Fdatastreamr","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdatastreamapp%2Fdatastreamr","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatastreamapp%2Fdatastreamr/lists"}