Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/pepijn-devries/ECOTOXr

R package that creates a local SQLite build of (and allows querying of) the US EPA ECOTOX database.
https://github.com/pepijn-devries/ECOTOXr

Last synced: about 2 months ago
JSON representation

R package that creates a local SQLite build of (and allows querying of) the US EPA ECOTOX database.

Awesome Lists containing this project

README

        

---
output: github_document
---

```{r, echo = FALSE, message = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-"
)
library(ECOTOXr)
```

# `ECOTOXr`
> Harness information from the [US EPA ECOTOXicology Knowledgebase](https://cfpub.epa.gov/ecotox/)

[![R build status](https://github.com/pepijn-devries/ECOTOXr/workflows/R-CMD-check/badge.svg)](https://github.com/pepijn-devries/ECOTOXr/actions)
[![version](https://www.r-pkg.org/badges/version/ECOTOXr)](https://CRAN.R-project.org/package=ECOTOXr)
![cranlogs](https://cranlogs.r-pkg.org/badges/ECOTOXr)

## Overview

ECOTOXr logo
`ECOTOXr` can be used to explore and analyse data from the [US EPA ECOTOX database](https://cfpub.epa.gov/ecotox/).
More specifically you can:

* Build a local SQLite copy of the [US EPA ECOTOX database](https://cfpub.epa.gov/ecotox/)
* Search and extract data from the local database
* Use experimental features to search the on-line dashboards: [ECOTOX](https://cfpub.epa.gov/ecotox/search.cfm) and
[CompTox](https://comptox.epa.gov/dashboard/batch-search)

## Why use `ECOTOXr`?

The `ECOTOXr` package allows you to search and extract data from the [ECOTOXicological Knowledgebase](https://cfpub.epa.gov/ecotox/)
and import it directly into `R`. This will allow you to formalize and document the search- and extract-procedures in `R` code.
This makes it easier to share and reproduce such procedures and its results. Moreover, you can directly apply any statistical
analysis offered in `R`.

## Installation

> Get CRAN version
```{r eval=FALSE}
install.packages("ECOTOXr")
```

> Get development version from r-universe
```{r eval=FALSE}
install.packages("ECOTOXr", repos = c("https://pepijn-devries.r-universe.dev", "https://cloud.r-project.org"))
```

## Usage

### Preparing the database

Although `ECOTOXr` has experimental features to search the on-line database. The package will
reach its full potential when you build a copy of the database on your local machine.

> Download and build a local copy of the latest ASCII export of the US EPA ECOTOX database

```{r eval=FALSE}
download_ecotox_data()
```

### Searching the local database for species and substances

Obviously, searching the local database is only possible after the download and build is
ready (see previous section).

> Search the local database for tests of water flea Daphnia magna exposed to benzene

```{r eval=FALSE}
search_ecotox(
list(
latin_name = list(terms = "Daphnia magna", method = "exact"),
chemical_name = list(terms = "benzene", method = "exact")
)
)
```

### Three ways of querying the local database

Let's have a look at 3 different approaches for retrieving a specific record from the local database, using the
unique identifier `result_id`. The first option is to use the build in `search_ecotox` function. It uses
simple `R` syntax and allows you to search and collect any field from any table in the database. Furthermore,
all requested output fields are automatically joined to the result without the end-user needing to know anything
about the database structure.

> Using the prefab function `search_ecotox` packaged by `ECOTOXr`

```{r warning = FALSE}
search_ecotox(
list(
result_id = list(terms = "401386", method = "exact")
),
as_data_frame = F
)
```

If you like to use [`dplyr`](https://dplyr.tidyverse.org/) verbs, you are in luck. SQLite database can be approached using
`dplyr` verbs. This approach will only return information from the `results` table. The end-user will have to join other information
(like test species and test substance) manually. This does require knowledge of the database structure.

> Using `dplyr` verbs

```{r warning = FALSE}
con <- dbConnectEcotox()
dplyr::tbl(con, "results") |>
dplyr::filter(result_id == "401386") |>
dplyr::collect()
```

If you prefer working using `SQL` directly, that is fine too. The [`RSQLite`](https://cran.r-project.org/package=RSQLite) package
allows you to get queries using `SQL` statements. The result is identical to that of the previous approach. Here too the end-user
needs knowledge of the database structure in order to join additional data.

> Using `SQL` syntax

```{r warning = FALSE}
dbGetQuery(con, "SELECT * FROM results WHERE result_id='401386'") |>
dplyr::as_tibble()
```

## Disclaimers

It is the end-users own responsibility to check the quality of collected data, using the original referenced source in order to evaluate its
fitness for use, see also: .

Note that the package maintainer is not affiliated with the US EPA, this package is therefore **not** official US EPA software.

## Resources

* De Vries, P. (2024) ECOTOXr: An R package for reproducible and transparent retrieval of data from EPA's ECOTOX database. _Chemosphere_ 364 143078
* [Manual of the CRAN release](https://CRAN.R-project.org/package=ECOTOXr)
* EPA ECOTOX help
* Olker, Jennifer H.; Elonen, Colleen M.; Pilli, Anne; Anderson, Arne; Kinziger, Brian; Erickson, Stephen; Skopinski, Michael;
Pomplun, Anita; LaLone, Carlie A.; Russom, Christine L.; Hoff, Dale. (2022): The ECOTOXicology Knowledgebase: A Curated Database of
Ecologically Relevant Toxicity Tests to Support Environmental Research and Risk Assessment. _Environmental Toxicology and Chemistry_
41(6) 1520-1539