Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/davidcarslaw/openair

Tools for air quality data analysis
https://github.com/davidcarslaw/openair

air-quality air-quality-data meteorology openair

Last synced: about 1 month ago
JSON representation

Tools for air quality data analysis

Awesome Lists containing this project

README

        

---
output: github_document
editor_options:
markdown:
wrap: 72
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
warning = FALSE,
fig.retina = 2,
message = FALSE,
eval = FALSE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```

# openair: open source tools for air quality data analysis

[![R-CMD-check](https://github.com/davidcarslaw/openair/workflows/R-CMD-check/badge.svg)](https://github.com/davidcarslaw/openair/actions)
[![CRAN
status](https://www.r-pkg.org/badges/version/openair)](https://CRAN.R-project.org/package=openair)
![](http://cranlogs.r-pkg.org/badges/grand-total/openair)

**openair** is an R package developed for the purpose of analysing air
quality data --- or more generally atmospheric composition data. The
package is extensively used in academia, the public and private sectors.
The project was initially funded by the UK Natural Environment Research
Council ([NERC](https://www.ukri.org/councils/nerc/)), with additional funds from
Defra.

The most up to date information on `openair` can be found in the package
itself and at the book website
().

## Installation

Installation can be done in the normal way:

```{r eval=FALSE}
install.packages("openair")
```

The development version can be installed from GitHub. Installation of
`openair` from GitHub is easy using the `pak` package. Note,
because `openair` contains C++ code a compiler is also needed. For
Windows - for example,
[Rtools](https://cran.r-project.org/bin/windows/Rtools/) is needed.

``` r
# install.packages("pak")
pak::pak("davidcarslaw/openair")
```

## Description

`openair` has developed over several years to help analyse air quality
data.

This package continues to develop and input from other developers would
be welcome. A summary of some of the features are:

- **Access to data** from several hundred UK air pollution monitoring
sites through the `importAURN` and family functions.
- **Utility functions** such as `timeAverage` and `selectByDate` to
make it easier to manipulate atmospheric composition data.
- Flexible **wind and pollution roses** through `windRose` and
`pollutionRose`.
- Flexible plot conditioning to easily plot data by hour or the day,
day of the week, season etc. through the `openair` `type` option
available in most functions.
- More sophisticated **bivariate polar plots** and conditional
probability functions to help characterise different sources of
pollution. A paper on the latter is available
[here](https://www.sciencedirect.com/science/article/pii/S1364815214001339).
- Access to NOAA Hysplit pre-calculated annual 96-hour back
**trajectories** and many plotting and analysis functions e.g.
trajectory frequencies, Potential Source Contribution Function and
trajectory clustering.
- Many functions for air quality **model evaluation** using the
flexible methods described above e.g. the `type` option to easily
evaluate models by season, hour of the day etc. These include key
model statistics, Taylor Diagram, Conditional Quantile plots.

## Brief examples

### Import data from the UK Automatic Urban and Rural Network

It is easy to import hourly data from 100s of sites and to import
several sites at one time and several years of data.

```{r library, eval=TRUE}
library(openair)
kc1 <- importAURN(site = "kc1", year = 2020)
kc1
```

### Utility functions

Using the `selectByDate` function it is easy to select quite complex
time-based periods. For example, to select weekday (Monday to Friday)
data from June to September for 2012 *and* for the hours 7am to 7pm
inclusive:

```{r selectbydate, eval = TRUE}
sub <- selectByDate(kc1,
day = "weekday",
year = 2020,
month = 6:9,
hour = 7:19
)
sub
```

Similarly it is easy to time-average data in many flexible ways. For
example, 2-week means can be calculated as

```{r timeaverage, eval=TRUE}
sub2 <- timeAverage(kc1, avg.time = "2 week")
```

### The `type` option

One of the key aspects of `openair` is the use of the `type` option,
which is available for almost all `openair` functions. The `type` option
partitions data by different categories of variable. There are many
built-in options that `type` can take based on splitting your data by
different date values. A summary of in-built values of type are:

- "year" splits data by year
- "month" splits variables by month of the year
- "monthyear" splits data by year *and* month
- "season" splits variables by season. Note in this case the user can
also supply a `hemisphere` option that can be either "northern"
(default) or "southern"
- "weekday" splits variables by day of the week
- "weekend" splits variables by Saturday, Sunday, weekday
- "daylight" splits variables by nighttime/daytime. Note the user must
supply a `longitude` and `latitude`
- "dst" splits variables by daylight saving time and non-daylight
saving time (see manual for more details)
- "wd" if wind direction (`wd`) is available `type = "wd"` will split
the data up into 8 sectors: N, NE, E, SE, S, SW, W, NW.
- "seasonyear (or"yearseason") will split the data into year-season
intervals, keeping the months of a season together. For example,
December 2010 is considered as part of winter 2011 (with January and
February 2011). This makes it easier to consider contiguous seasons.
In contrast, `type = "season"` will just split the data into four
seasons regardless of the year.

If a categorical variable is present in a data frame e.g. `site` then
that variables can be used directly e.g. `type = "site"`.

`type` can also be a numeric variable. In this case the numeric variable
is split up into 4 *quantiles* i.e. four partitions containing equal
numbers of points. Note the user can supply the option `n.levels` to
indicate how many quantiles to use.

### Example directional analysis

`openair` can plot basic wind roses very easily provided the variables
`ws` (wind speed) and `wd` (wind direction) are available.

```{r windrose, eval = TRUE, fig.width = 4, fig.cap="A wind rose summarising the wind conditions at a monitoring station.", fig.height = 4.5, out.width = '50%', fig.alt="A polar bar chart showing the proportion of wind coming from 12 compass directions, where we show most wind at the monitoring station arrives from the South West."}
windRose(mydata)
```

However, the real flexibility comes from being able to use the `type`
option.

```{r windrose2, eval = TRUE, fig.width = 10, fig.height = 5, out.width = '100%', fig.cap="Wind roses summarising the wind conditions at a monitoring station per year, demonstrating the `{openair}` type option.", fig.alt="Polar bar charts showing the proportion of wind coming from 12 compass directions. There are 8 charts, each representing a year of data from 1998 to 2005. While there is a small amount of variation, the dominant wind direction for each year is from the south west."}
windRose(mydata,
type = "year",
layout = c(4, 2)
)
```

There are many flavours of bivariate polar plots, as described
[here](https://bookdown.org/david_carslaw/openair/sections/directional-analysis/polar-plots.html) that
are useful for understanding air pollution sources.

```{r polarCPF, eval = TRUE, fig.width = 4.5, fig.height = 4, out.width = '60%', fig.cap="A bivariate polar plot showing the wind conditions which give rise to elevated pollutant concentrations.", fig.alt="A polar heatmap with wind direction on the spoke axes and wind speed on the radial axes showing the probability of sulfur dioxide being higher than the 90th percentile. The chart indicates the highest probabilities occur when the wind is coming from the east and is blowing at between 0 and 12 metres per second."}
polarPlot(mydata,
pollutant = "so2",
statistic = "cpf",
percentile = 90,
cols = "YlGnBu"
)
```