https://github.com/rociojoo/rmovementpaperrep

Navigating through the R packages for movement: Supporting information
https://github.com/rociojoo/rmovementpaperrep
movement-ecology packages r tracking-data
Last synced: over 1 year ago
JSON representation
Navigating through the R packages for movement: Supporting information
Host: GitHub
URL: https://github.com/rociojoo/rmovementpaperrep
Owner: rociojoo
Created: 2018-10-15T01:24:41.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2020-09-02T22:44:04.000Z (almost 6 years ago)
Last Synced: 2025-01-25T19:43:47.939Z (over 1 year ago)
Topics: movement-ecology, packages, r, tracking-data
Homepage:
Size: 13.7 MB
Stars: 3
Watchers: 5
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.Rmd
Awesome Lists containing this project

README

          ---

title: "Navigating through the R packages for movement: Supporting information"

author: "Rocio Joo, Matthew E. Boone, Thomas A. Clay, Samantha C. Patrick, Susana Clusella-Trullas, and Mathieu Basille."

date: "May 16, 2019"

output:

  github_document:

    toc: true

    toc_depth: 3

---

```{r setup, include = FALSE}

knitr::opts_chunk$set(echo = FALSE, warning = FALSE, message = FALSE, 

    fig.path = "figures/")

library("igraph")

library("scales")

library("Matrix")

library("dplyr")

library("cowplot")

library("viridis")

library("reshape")

library("RColorBrewer")

## library("kableExtra")

library("ggrepel")

library("printr")

```

```{r data}

data_dir <- "data/"

data <- read.csv(paste0(data_dir, "survey-responses.csv"), stringsAsFactors = FALSE)

data_all <- data %>% filter(completion == 100)

packages <- read.csv(paste0(data_dir, "pkg-list-survey.csv"), 

    stringsAsFactors = FALSE)

data_question_1 <- data_all[, grep("q1", colnames(data_all))]

colnames(data_question_1) <- t(packages)

# dropping trajr that was added in the end and only got 1

# response

data_question_1 <- data_question_1[, -grep("trajr", colnames(data_question_1))]

packages_new <- packages[-which(packages == "trajr"), ]

Total <- sapply(1:dim(data_question_1)[1], function(x) {

    sum(as.numeric(data_question_1[x, ] != "Never"))

})

discard_rows <- which(Total == 0)

data_all <- data_all[-discard_rows, ]

# Table with information of all packages from the survey

pkg_info <- read.csv(paste0(data_dir, "pkg-info.csv"), stringsAsFactors = FALSE)

```

[![DOI](https://zenodo.org/badge/153035894.svg)](https://zenodo.org/badge/latestdoi/153035894)

## Overview

This repository is a companion to the manuscript "*Navigating through

the R packages for movement: a review for users and developers*", from

Rocio Joo, Matthew E. Boone, Thomas A. Clay, Samantha C. Patrick,

Susana Clusella-Trullas, and Mathieu Basille (pre-print available on

[arXiv.org](https://arxiv.org/abs/1901.05935)). This document is

actually a dynamic R report, for which RMarkdown sources are available

[here](README.Rmd) with full code. The repository also serves to store

data about:

1. Information for [74 R packages](data/pkg-info.csv) related to

   tracking data processing and analysis. Information was collected

   between March and August 2018. **58** of the packages were described in

   the review, and **72** of those packages were the focus of a survey on

   their users about their use, relevance and quality of their

   documentation (see [packages included in the survey](#packages-included-in-the-survey) 

   for more details). Additional details about this data file are available

   [here](data/README_pkg-info.md).

2. [Responses to an anonymous survey](data/survey-responses.csv) about

   the use, relevance and quality of the documentation of 72 packages

   related to movement. The survey was executed in the Fall of 2018.

   Additional details about this data file are available

   [here](data/README_survey-responses.md).

This repository can be cited using its DOI: 10.5281/zenodo.3066226

## A large amount of R packages for movement

The manuscript presents a review of R packages for movement. R is one

of the most used programming softwares to process, visualize and

analyze data from tracking devices. The large amount of existing

packages makes it difficult to keep track of the spectrum of choices,

with an increasing number of available packages every year (this is

**Figure 2** of the manuscript):

```{r ms-fig-2, fig.width = 7, fig.height = 5}

# only keeping columns that we will use

data_pkg <- pkg_info[, c("Package", "Year")]

# Loading the names of packages we include in the review

mov_pac <- read.csv(paste0(data_dir, "pkg-list-paper.csv"), stringsAsFactors = FALSE)

mov_pac[mov_pac == "SGAT/TripEstimation"] <- "SGAT"

mov_pac[mov_pac == "TwGeos/BAStag"] <- "TwGeos"

mov_pac[mov_pac == "ukfsst/kfsst"] <- "ukfsst"

# Subsetting by packages in review

ind.mov.pac <- match(mov_pac$Package, data_pkg$Package)

data_pkg <- data_pkg[ind.mov.pac, ]

# counting packages per year

theTable <- as.data.frame(table(data_pkg$Year, useNA = "no"))

colnames(theTable) <- c("Year", "Total")

# in case there are some years without publication of

# packages

theTable$Year <- as.numeric(levels(theTable$Year))

# filter out 2018 which is not complete

theTable <- theTable %>% filter(Year < 2018)

range_year <- range(theTable$Year)

values_year <- range_year[1]:range_year[2]

missing_years <- setdiff(values_year, theTable$Year)

theTable <- rbind.data.frame(cbind.data.frame(Year = missing_years, 

    Total = rep(0, length(missing_years))), theTable)

theTable$Year <- factor(theTable$Year, levels = sort(theTable$Year))

ggplot(theTable, aes(x = Year, y = Total)) +

    geom_bar(stat = "identity", position = "identity") +

    xlab("Year of publication") + ylab("Number of packages") +

    scale_y_continuous(minor_breaks = seq(0, 12, 1), breaks = seq(0, 

        12, 3)) +

    background_grid(major = "y", minor = "none", colour.major = "grey80", 

        size.major = 0.5)

## ggsave("figures/Fig2.pdf", width = 7, height = 5)

```

Since the packages were reviewed between March and August 2018, this 

last year was incomplete and not included in the graph.

Many packages are actually not connected to each others, showing a very 

fragmented landscape of tracking packages in R. Here we show a network 

representation of the dependency and suggestion between tracking packages 

(this is **Figure 4** of the manuscript). The arrows go towards the package

the others suggest (dashed arrows) or depend on (solid arrows). Bold font

corresponds to active packages.  The size of the circle is proportional to 

the number of packages that suggest or depend on this one.

```{r ms-fig-4, fig.width = 12, fig.height = 12}

# loading the import + suggest information for each package

imports <- read.csv(paste0(data_dir, "pkg-import-suggest.csv"),

    stringsAsFactors = FALSE)

# getting the names of all packages that participate here as

# a dependent or dependency (or suggestion)

packages_dep <- unique(c(imports$Package, imports$Dependency))

imports$Dependency[which(imports$Dependency == "-")] <- NA

# accomodating everything as a matrix of counts

packages_dep <- packages_dep[-which(packages_dep == "-")]  # excluding packages with neither dependencies or suggestions

pack.id <- 1:length(packages_dep)

table.counts <- as.data.frame.matrix(table(imports$Package, imports$Dependency))  # table of counts with rows as list of packages and columns the packages they depend on/suggest

# but we only want to account for tracking packages.

# So in the end, we want a matrix of counts which would be

# square, with rows and columns of tracking packages.

# Problem? Not all tracking packages are counted as

# dependencies or suggestions of other packages, so they are

# not in the columns. We are going to add the missing ones

# first (with zeros) and then remove the columns that should

# not be there

new_col <- setdiff(t(mov_pac), colnames(table.counts))

zero_matrix <- matrix(0, ncol = length(new_col), nrow = dim(table.counts)[1])

row.names(zero_matrix) <- row.names(table.counts)

colnames(zero_matrix) <- new_col

table.counts <- cbind.data.frame(table.counts, zero_matrix)

ind.mov.row <- match(t(mov_pac), row.names(table.counts))

ind.mov.col <- match(t(mov_pac), colnames(table.counts))

table.counts <- table.counts[ind.mov.row, ind.mov.col]

# making sure that the order is right

col.names.df <- colnames(table.counts)

row.names.df <- row.names(table.counts)

table.counts <- table.counts[order(row.names.df, decreasing = FALSE),

    order(col.names.df, decreasing = FALSE)]

# only keeping columns that we will use

data_pkg <- pkg_info[, c("Package", "Active")]

data_pkg$Package[data_pkg$Package == "SGAT/TripEstimation"] <- "SGAT"

data_pkg$Package[data_pkg$Package == "TwGeos/BAStag"] <- "TwGeos"

data_pkg$Package[data_pkg$Package == "ukfsst/kfsst"] <- "ukfsst"

ind.mov.cran <- match(t(mov_pac), data_pkg$Package)

data_pkg <- data_pkg[ind.mov.cran, ]

table.mov <- t(table.counts)  # transposing for plotting

num_sugg <- apply(table.mov, 1, sum)  # total number of dep/sugg

g.matrix <- graph.adjacency(t(as.matrix(table.mov)), weighted = TRUE,

    mode = "directed", diag = FALSE)

g <- simplify(g.matrix)

font_text <- rep(1, nrow(data_pkg))

font_text[data_pkg$Active == "Yes"] <- 2  # bold for active packages

color_back <- "white"  #alpha('snow3',data_pkg$downloads/max(data_pkg$downloads))

# now, we want to make dashed and more transparent arrows for

# suggestion but darker and solid arrows for import

el <- as_edgelist(g)

edges_sugg <- data_frame(suggesting = V(g)[el[, 1]]$name, suggested = V(g)[el[,

    2]]$name, type = "suggestion")

edges_sugg$type <- as.character(edges_sugg$type)

imports <- read.csv(paste0(data_dir, "pkg-import.csv"), stringsAsFactors = FALSE)  # we need an only import file

imports <- imports[imports$Package %in% mov_pac$Package, ]

for (i in 1:nrow(edges_sugg)) {

    ind <- (imports$Package %in% as.character(edges_sugg$suggesting[i])) +

        (imports$Dependency %in% as.character(edges_sugg$suggested[i]))

    if (any(ind == 2)) {

        edges_sugg$type[i] <- "dependency"

    }

}

line_type <- rep(2, length(E(g)))

line_type[edges_sugg$type == "dependency"] <- 1

line_color <- rep(alpha("#ef8a62", 0.5), length(E(g)))

line_color[edges_sugg$type == "dependency"] <- alpha("#ef8a62",

    0.8)

# General options for plotting.

V(g)$label.family <- "Helvetica"

V(g)$label <- V(g)$name

V(g)$degree <- degree(g)

layout1 <- layout.fruchterman.reingold(g)

V(g)$label.color <- "darkblue"

V(g)$label.font <- font_text

V(g)$frame.color <- alpha("black", 0.7)

V(g)$label.cex <- 1.5

V(g)$color <- color_back

V(g)$size <- 40 * num_sugg/sum(num_sugg)

egam <- 3

E(g)$width <- egam * 1.5

E(g)$arrow.size <- egam/3

E(g)$lty <- line_type

E(g)$color <- line_color

# pdf('NetworkImportSuggestTrack.pdf',width = 18,height = 16)

plot(g, layout = layout1, vertex.label.dist = 0.5)

# dev.off()

```

## The survey

Our review aimed at an objective introduction to the packages

organized by the type of processing or analyzing they focused on, and

to provide feedback to developers from a user perspective. For the

second objective, we elaborated a survey for package users regarding:

1. How popular those packages are;

2. How well documented they are;

3. How relevant they are for users.

Those were the three questions that we asked about the packages, plus

one about the level as an R user of the survey participant.  In the

review we showed results regarding package documentation. In the

following, we present the complete results of the survey.

### Packages included in the survey

In theory, any package could be potentially useful for movement

analysis; either a time series package, a spatial analysis one or even

`ggplot2` to make more beautiful graphics! For the review, we

considered only what we referred to as **tracking packages**. Tracking

packages were those created to either analyze tracking data

(i.e. $(x,y,t)$) or to transform data from tagging devices into proper

tracking data. For instance, a package that would use accelerometer,

gyroscope and magnetometer data to reconstruct an animal's trajectory

via path integration, thus transforming those data into an $(x,y,t)$

format, would fit into the definition. But a package analyzing

accelerometry series to detect changes in behavior would not fit.

For this survey, we added packages that, though not tracking packages

*per se*, were created to process or analyze data extracted from

tracking devices in other formats (e.g. `accelerometry` for

accelerometry data, `diveMove` for time-depth recorders or

`pathtrackr` for video tracking data).  Packages from any public

repository (e.g. CRAN, GitHub, R-forge) were included in the

survey. Packages created for eye, computer-mouse or fishing vessel

movement were not considered here. A couple of packages that were

finally discarded from the review because of either being in early

stages of development (`movement`) or because of being archived in

CRAN due to unfixed problems (`sigloc`), were included in the

survey. Two packages, `lsmnsd` and `segclust2d`, were added for an

updated version of the review but did not make it in time for the

survey. The package `trajr` was added to the survey in a late stage,

but because of that, and the fact that it got only one response, we

filtered it out of the analysis.

A total of 72 packages were included in this survey: `acc`,

`accelerometry`, `adehabitatHR`, `adehabitatHS`, `adehabitatLT`,

`amt`, `animalTrack`, `anipaths`, `argosfilter`, `argosTrack`,

`BayesianAnimalTracker`, `BBMM`, `bcpa`, `bsam`, `caribou`, `crawl`,

`ctmcmove`, `ctmm`, `diveMove`, `drtracker`, `EMbC`, `feedR`,

`FLightR`, `GeoLight`, `GGIR`, `hab`, `HMMoce`, `Kftrack`, `m2b`,

`marcher`, `migrateR`, `mkde`, `momentuHMM`, `move`, `moveHMM`,

`movement`, `movementAnalysis`, `moveNT`, `moveVis`, `moveWindSpeed`,

`nparACT`, `pathtrackr`, `pawacc`, `PhysicalActivity`, `probgls`,

`rbl`, `recurse`, `rhr`, `rpostgisLT`, `rsMove`, `SDLfilter`,

`SGAT/TripEstimation`, `sigloc`, `SimilarityMeasures`, `SiMRiv`,

`smam`, `SwimR`, `T-LoCoH`, `telemetr`, `trackdem`, `trackeR`,

`Trackit`, `TrackReconstruction`, `TrajDataMining`, `trajectories`,

`trip`, `TwGeos`/`BAStag`, `TwilightFree`, `Ukfsst`/`kfsst`, `VTrack`

and `wildlifeDI`.

### Participation in the survey 

The survey was designed to be completely anonymous, meaning that we

had no way to know who participated. There was no previous selection

of the participants and no probabilistic sampling was involved. The

survey was advertised by Twitter, mailing lists (r-sig-geo and

r-sig-ecology), individual emails to researchers and the [lab's website](https://mablab.org/post/2018-08-31-r-movement-review/).

The survey got exemption from the Institutional Review Board at University of Florida 

(IRB02 Office, Box 112250, University of Florida, Gainesville, FL 32611-2250).

A total of `r data %>% filter(!is.na(completion)) %>% nrow()` people

participated in the survey, and `r data_all %>% nrow()` answered all

four questions. To answer all questions the participant had to have

tried at least one of the packages. In the following sections, we

analyze only completed surveys.

### Survey representativity

To get a rough idea of how representative the survey was of the

population of the package users, we compared the number of

participants that used each package to the number of monthly downloads

that each package has.

The number of downloads were calculated using the R package

`cran.stats`. It calculates the number of independent downloads by

each package (substracting downloads by dependencies) by day. It only

provides download statistics for packages on CRAN, downloaded using

the RStudio CRAN mirror—total downloads are likely to be an order of

magnitude higher. We computed the average number of downloads per

month, from September 2017 to August 2018; fewer months were

considered for packages that were younger than one year old.

There is no perfect match between the number of users and the number

of downloads per package, but a correlation of 0.85 for the 49

packages on CRAN provides evidence of an overall good representation

of the users of tracking packages in the survey. Moreover, most of the

packages with very few users in the survey regardless of their

relatively high download statistics were accelerometry packages for

human patients, which would be revealing that we did not reach that

subpopulation of users through Twitter and emails.

A log-log plot for both metrics is shown in the figure below.

```{r representativity, fig.width = 14, fig.height = 8}

data_question <- data_all[, grep("q1", colnames(data_all))]

colnames(data_question) <- t(packages)

# I'm dropping trajr that was added in the end and only got 1

# response

data_question <- data_question[, -grep("trajr", colnames(data_question))]

packages_new <- packages[-which(packages == "trajr"), ]

categories <- c("Never", "Rarely", "Sometimes", "Often")

use_counts <- t(sapply(1:ncol(data_question), function(x) {

    data_line <- factor(data_question[, x], levels = categories)

    count_p_use <- as.numeric(table(data_line))

    return(count_p_use)

}))

use_counts <- data.frame(use_counts)

colnames(use_counts) <- categories

rownames(use_counts) <- t(packages_new)

use_counts$Package <- row.names(use_counts)

use_counts <- use_counts %>% mutate(users = Rarely + Sometimes + 

    Often)

funciones <- read.csv(paste0(data_dir, "pkg-info.csv"))

matrix_fun <- left_join(use_counts, funciones)

matrix_fun <- matrix_fun[(!is.na(matrix_fun$monthly.downloads)), 

    ]

ggplot(matrix_fun, aes(x = users, y = monthly.downloads)) +

    geom_point(size = 3) +

    geom_text_repel(label = matrix_fun$Package, segment.size = 0.6, 

        force = 3, segment.alpha = 0.5, hjust = 0, box.padding = 0.5, 

        min.segment.length = 0.1, size = 6, direction = "both") +

    scale_x_continuous(trans = "log10") +

    scale_y_continuous(trans = "log10") +

    xlab("Number of users") + ylab("Monthly downloads")

```

## The questions

### User level

Let's see first the level of use in R of the participants. The options

were:

* Beginner: You only use existing packages and occasionally write some

  lines of code.

* Intermediate: You use existing packages but you also write and

  optimize your own functions.

* Advanced: You commonly use version control or contribute to develop

  packages.

```{r user-experience, fig.width=7, fig.height=5, fig.cap="Level of R use"}

data_question_4 <- data_all[, grep("q4", colnames(data_all))]

categories <- c("Beginner", "Intermediate", "Advanced")

data_question_4 <- factor(unlist(lapply(strsplit(data_question_4,

    "\\:"), "[[", 1)), levels = categories)

use_counts <- as.numeric(table(data_question_4))

prop <- round(as.numeric(prop.table(table(data_question_4))) *

    100, 1)

use_counts <- data.frame(levels = categories, total = use_counts)

use_counts$levels <- factor(as.character(use_counts$levels),

    levels = categories)

ggplot(data = use_counts, aes(x = levels, y = total)) +

    geom_bar(stat = "identity", position = position_dodge()) +

    geom_text(aes(label = total), size = 6, vjust = 1.5, hjust = .5, 

        color = "white") +

    xlab("Level of use") + ylab("Total respondents")

```

Most participants considered themselves in an intermediate level 

(`r prop[2]`%), meaning that they could write functions in R. Some others

were beginners (`r prop[1]`%) and advanced (`r prop[3]`%) R users.

### Package use

The first question about package use was: How often do you use each of

these packages? (Never, Rarely, Sometimes, Often)

The bar graphics below show that most packages were unknown (or at

least had never been used) by the survey participants. The

`adehabitat` packages (HR, LT and HS) were the most used

packages. These packages provide a collection of tools to estimate

home range, handle and analyze trajectories, and analyze habitat

selection, respectively. On the bottom of the graphic, `smam` (for

animal movement models), `PhysicalActivity`, `nparACT`, `GGIR` (these

three for accelerometry data on human patients) and `feedr` (to handle

radio telemetry data) had no users among the participants. For that

reason, those 5 packages will not appear in the analysis of the next

questions.

```{r relevance, fig.width = 16, fig.height = 15, fig.cap = "Package use"}

data_question <- data_all[, grep("q1", colnames(data_all))]

colnames(data_question) <- t(packages)

# I'm dropping trajr that was added in the end and only got 1

# response

data_question <- data_question[, -grep("trajr", colnames(data_question))]

packages_new <- packages[-which(packages == "trajr"), ]

categories <- c("Never", "Rarely", "Sometimes", "Often")

use_counts <- t(sapply(1:ncol(data_question), function(x) {

    data_line <- factor(data_question[, x], levels = categories)

    count_p_use <- as.numeric(table(data_line))

    return(count_p_use)

}))

use_counts <- data.frame(use_counts)

colnames(use_counts) <- categories

rownames(use_counts) <- t(packages_new)

use_counts$package <- row.names(use_counts)

df1 <- melt(use_counts, id.vars = "package", variable_name = "response")

g <- unlist(by(df1, df1$package, function(x) sum(x$value[x$response !=

    "Never"])))

df1$package <- factor(df1$package, levels = names(sort(g, decreasing = FALSE)))

color.pallete <- brewer.pal(5, "YlGnBu")

color.pallete[1] <- "lightgray"

ggplot(data = df1) +

    geom_col(aes(x = package, y = value, fill = response)) +

    coord_flip() +

    scale_fill_manual(values = color.pallete) +

    ylab("Count")

```

If you want to check the numbers for specific packages, the complete

table is below:

```{r relevance-table}

use_counts[, 1:length(categories)]

## kable(use_counts[, 1:length(categories)]) %>%

##   kable_styling(bootstrap_options = c("striped", "hover",

##                                       "condensed", "responsive"))

```

There is actually not much difference in the number of packages used

by the distinct levels of R users as you can see in the boxplots

below:

```{r boxplot-users-packages, fig.width = 7, fig.height = 5, fig.cap = "Packages per user level"}

Total <- Total[-discard_rows]

new_df <- cbind.data.frame(Total, user = data_question_4)

categories <- c("Beginner", "Intermediate", "Advanced")

new_df <- cbind.data.frame(Total, user = data_question_4)

new_df$user <- factor(as.character(new_df$user), levels = categories)

ggplot(new_df, aes(x = user, y = Total)) +

    geom_boxplot() +

    scale_y_continuous(breaks = 0:max(new_df$Total)) +

    xlab("User level") + ylab("Package count")

```

### Package documentation 

Without proper user testing and peer editing, package documentation

can lead to large gaps of understanding and limited usefulness of the

package. If functions and workflows are not expressly defined, a

package's capacity to help users is undermined.

In this survey we asked the participants how helpful was the

documentation provided for each of the packages they stated to

use. Documentation includes what is contained in the manual and help

pages, vignettes, published manuscripts, and other material about the

package provided by the authors. The participants had to answer using

one of the following options:

* Not enough: "It's not enough to let me know how to do what I need"

* Basic: "It's enough to let me get started with simple use of the

  functions but not to go further (e.g. use all arguments in the

  functions, or put extra variables)"

* Good: "I did everything I wanted and needed to do with it"

* Excellent: "I ended up doing even more than what I planned because

  of the excellent information in the documentation"

* Don't remember: "I honestly can't remember…"

```{r documentation, fig.width = 16, fig.height = 15, fig.cap = "Bar plots of absolute frequency of each category of package documentation"}

data_question <- data_all[, grep("q2", colnames(data_all))]

colnames(data_question) <- t(packages)

# I'm dropping trajr that was added in the end and only got 1

# response

data_question <- data_question[, -grep("trajr", colnames(data_question))]

packages_new <- packages[-which(packages == "trajr"), ]

categories <- c("Not enough", "Basic", "Good", "Excellent", "Don't remember")

use_counts <- t(sapply(1:ncol(data_question), function(x) {

    data_line <- factor(data_question[, x], levels = categories)

    count_p_use <- as.numeric(table(data_line, useNA = "no"))

    return(count_p_use)

}))

use_counts <- data.frame(use_counts)

colnames(use_counts) <- categories

rownames(use_counts) <- t(packages_new)

total_package <- rowSums(use_counts)

use_counts$package <- row.names(use_counts)

use_counts <- use_counts[total_package > 0, ]

df1 <- melt(use_counts, id.vars = "package", variable_name = "response")

g <- unlist(by(df1, df1$package, function(x) sum(x$value)))

color.pallete <- brewer.pal(5, "YlGnBu")

color.pallete[1] <- "lightgray"

df1$package <- factor(df1$package, levels = names(sort(g, decreasing = FALSE)))

df1$response <- factor(df1$response, levels = rev(c("Excellent", 

    "Good", "Basic", "Not enough", "Don't remember")))

ggplot(data = df1) +

    geom_col(aes(x = package, y = value, fill = response)) +

    coord_flip() +

    scale_fill_manual(values = color.pallete) +

    background_grid(major = "x", minor = "x", colour.major = "grey80", 

        colour.minor = "grey80", size.major = 0.5) +

    ylab("Count")

```

```{r useage counts}

df2 <- use_counts[, c("Not enough", "Basic", "Good", "Excellent")]

use_per <- df2/apply(df2, 1, sum) * 100

use_per$good_excellent <- signif(apply(use_per[, c("Good", "Excellent")], 

    1, sum), 4)

use_per$counts <- apply(df2[, c("Good", "Excellent")], 1, sum)

```

Remember that participants could only give their opinion on

documentation regarding the packages they had used. Hence, the

packages with many users got many documentation answers. The figure 

above allows for a closer look at the proportion of type of

response for each package.

To identify some packages with remarkably good documentation, let's

first only consider those packages with at least 10 responses on the

quality of documentation (regardless of the "Don't remember"). These

are 26 (you can see the table of responses below). Among them,

`momentuHMM` had more than 50% of the responses 

(`r round(use_per["momentuHMM","Excellent"],2)`%; 

`r use_counts["momentuHMM","Excellent"]`) as "excellent documentation",

meaning that the documentation was so good that thanks to it, more

than half of its users discovered additional features of the package

and were able to do more analyses than what they initially

planned. Moreover, 10 packages had more than 75% of the responses as

either "good" or "excellent": 

`momentuHMM` (`r use_per["momentuHMM","good_excellent"]`%; 

`r use_per["momentuHMM","counts"]`), 

`moveHMM` (`r use_per["moveHMM","good_excellent"]`%; 

`r use_per["moveHMM","counts"]`), 

`adehabitatLT` (`r use_per["adehabitatLT","good_excellent"]`%; 

`r use_per["adehabitatLT","counts"]`), 

`adehabitatHR` (`r use_per["adehabitatHR","good_excellent"]`%; 

`r use_per["adehabitatHR","counts"]`), 

`EMbC` (`r use_per["EMbC","good_excellent"]`%; `r use_per["EMbC","counts"]`),

`wildlifeDI` (`r use_per["wildlifeDI","good_excellent"]`%; 

`r use_per["wildlifeDI","counts"]`), 

`ctmm` (`r use_per["ctmm","good_excellent"]`%; `r use_per["ctmm","counts"]`),

`GeoLight` (`r use_per["GeoLight","good_excellent"]`%; 

`r use_per["GeoLight","counts"]`), 

`move` (`r use_per["move","good_excellent"]`%; `r use_per["move","counts"]`),

`recurse` (`r use_per["recurse","good_excellent"]`%; 

`r use_per["recurse","counts"]`). The two leading packages, `momentuHMM`

and `moveHMM`, focus on the use of Hidden Markov models which allow

identifying different patterns of behavior called states.

One way to visualize the quality of documentation is to relate the

rating to the number of respondents who declared using each package

(this is **Figure 3** of the manuscript). This figure shows the

proportion of good and excellent documentation for packages with at

least 10 respondents; light green corresponds to packages with

standard documentation only, blue is for packages with vignettes, and

purple is for packages that also have peer-reviewed articles

published:

```{r ms-fig-3, fig.width = 14, fig.height = 8}

# question about documentation

data_question <- data_all[, grep("q2", colnames(data_all))]

colnames(data_question) <- t(packages)

# I'm dropping trajr that was added in the end and only got 1

# response

data_question <- data_question[, -grep("trajr", colnames(data_question))]

packages_new <- packages[-which(packages == "trajr"), ]

categories <- c("Not enough", "Basic", "Good", "Excellent", "Don't remember")

use_counts <- t(sapply(1:ncol(data_question), function(x) {

    data_line <- factor(data_question[, x], levels = categories)

    count_p_use <- as.numeric(table(data_line, useNA = "no"))

    return(count_p_use)

}))

use_counts <- data.frame(use_counts)

colnames(use_counts) <- categories

rownames(use_counts) <- t(packages_new)

total_package <- rowSums(use_counts)

use_counts$Package <- row.names(use_counts)

use_counts <- use_counts[total_package > 0, ]

matrix_fun <- left_join(use_counts, funciones)

# not all of the packages from the survey are included in the

# review

packages_paper <- read.csv(paste0(data_dir, "pkg-list-paper.csv"))

matrix_fun <- inner_join(packages_paper, matrix_fun)

# clarifying documentation options

matrix_fun$documentation <- c("Manual")

matrix_fun$documentation[matrix_fun$Vignettes == "Yes" & matrix_fun$Papers == 

    "No"] <- c("Manual+Vignette")

matrix_fun$documentation[matrix_fun$Vignettes == "Yes" & matrix_fun$Papers == 

    "Yes"] <- c("Manual+Vignette+Paper")

matrix_fun$num_answers <- rowSums(matrix_fun[2:5])

matrix_fun$Good_Exc <- (rowSums(matrix_fun[4:5]))/matrix_fun$num_answers * 

    100

# discarding the packages we did not get users from:

matrix_fun_2 <- matrix_fun[matrix_fun$num_answers > 0, ]  # missing: 'feedR' 'smam'            

matrix_fun_3 <- matrix_fun[matrix_fun$num_answers >= 10, ]

ggplot(matrix_fun_3, aes(x = num_answers, y = Good_Exc, col = documentation)) + 

    geom_point(size = 3) +

    geom_text_repel(label = matrix_fun_3$Package, segment.size = 1, 

        force = 1, segment.alpha = 0.4, hjust = 0, box.padding = 0.4, 

        min.segment.length = 0.25, size = 6) + 

    xlab("Number of respondents") + ylab("Good or Excellent Rating") + 

    ## scale_colour_manual(values = color.pallete) +

    scale_color_viridis(discrete = TRUE, end = .75, option = "D", direction = -1)+

    background_grid(major = "xy", colour.major = "grey80") +

    theme(legend.position = "none")

## ggsave("figures/Fig3.pdf", width = 14, height = 8)

```

If you want to check the numbers for specific packages, the complete

table is below:

```{r documentation-table}

use_counts[, 1:length(categories)]

## kable(use_counts[, 1:length(categories)]) %>%

##   kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))

```

### Package Relevance

Participants were asked how relevant was each of the packages they use

for their work. They had to answer using one of the following options:

* Not relevant: "I tried the package but really didn't find it a good

  use for my work"

* Slightly relevant: "It helps in my work, but not for the core of it"

* Important: "It's important for the core of my work, but if it didn't

  exist, there are other packages or solutions to obtain something

  similar"

* Essential: "I wouldn't have done the key part of my work without

  this package"

```{r importance, fig.width = 16, fig.height = 15, fig.cap = "Bar plots of absolute frequency of each category of package relevance"}

data_question <- data_all[, grep("q3", colnames(data_all))]

colnames(data_question) <- t(packages)

# I'm dropping trajr that was added in the end and only got 1

# response

data_question <- data_question[, -grep("trajr", colnames(data_question))]

packages_new <- packages[-which(packages == "trajr"), ]

categories <- c("Not relevant", "Slightly relevant", "Important", 

    "Essential")

use_counts <- t(sapply(1:ncol(data_question), function(x) {

    data_line <- factor(data_question[, x], levels = categories)

    count_p_use <- as.numeric(table(data_line, useNA = "no"))

    return(count_p_use)

}))

use_counts <- data.frame(use_counts)

colnames(use_counts) <- categories

rownames(use_counts) <- t(packages_new)

total_package <- rowSums(use_counts)

use_counts$package <- row.names(use_counts)

use_counts <- use_counts[total_package > 0, ]

df1 <- melt(use_counts, id.vars = "package", variable_name = "response")

g <- unlist(by(df1, df1$package, function(x) sum(x$value)))

color.pallete <- brewer.pal(5, "YlGnBu")

color.pallete[1] <- "lightgray"

df1$package <- factor(df1$package, levels = names(sort(g, decreasing = F)))

df1$response <- factor(df1$response, levels = rev(c("Essential", 

    "Important", "Slightly relevant", "Not relevant")))

ggplot(data = df1) +

    geom_col(aes(x = package, y = value, fill = response)) +

    coord_flip() +

    scale_fill_manual(values = color.pallete) +

    background_grid(major = "x", minor = "x", colour.major = "grey80", 

        colour.minor = "grey80", size.major = 0.5) +

    ylab("Count")

```

```{r importance-counts}

df2 <- use_counts[, c("Not relevant", "Slightly relevant", "Important", 

    "Essential")]

use_per <- df2/apply(df2, 1, sum) * 100

use_per$good_excellent <- signif(apply(use_per[, c("Important", 

    "Essential")], 1, sum), 4)

use_per$counts <- apply(df2[, c("Important", "Essential")], 1, 

    sum)

```

The two barplots show the absolute and relative frequency of the

answers for each package, respectively.  We identified the packages

that were highly relevant for their users, considering only those

packages with at least 10 responses. Among these 33 packages, three

were regarded as either "Important" or "Essential" for more than 75%

of their users: 

`bsam` (`r use_per['bsam','good_excellent']`%; `r use_per['bsam','counts']`), 

`adehabitatHR` (`r use_per['adehabitatHR','good_excellent']`%; 

`r use_per['adehabitatHR','counts']`), and 

`adehabitatLT` (`r use_per['adehabitatLT','good_excellent']`%; 

`r use_per['adehabitatLT','counts']`). `bsam` allows fitting Bayesian

state-space models to animal tracking data.

```{r importance-percentage,fig.width = 16, fig.height = 10, fig.cap = "Bar plots of relative frequency of each category of package relevance (for packages with more than 5 users)"}

package_levels <- row.names(use_per[order(use_per$good_excellent, 

    use_per$Essential, use_per$Important), ])

use_per2 <- use_per

use_per2[is.na(use_per2)] <- 0

use_per2$package <- row.names(use_per2)

use_per2 <- subset(use_per2, counts > 5)

df1 <- melt(subset(use_per2, select = -c(good_excellent, counts)), 

    id.vars = "package", variable_name = "response")

df1$package <- factor(df1$package, levels = package_levels)

df1$response <- factor(df1$response, levels = c("Not relevant", 

    "Slightly relevant", "Important", "Essential"))

color.pallete <- brewer.pal(4, "YlGnBu")

color.pallete[1:2] <- c("lightgray", "darkgray")

# color.pallete[1]<-c('lightgray')

ggplot(data = df1) +

    geom_col(aes(x = package, y = value, fill = response)) +

    coord_flip() +

    scale_fill_manual(values = color.pallete) +

    ylab("Percentage")

```

If you want to check the numbers for specific packages, the complete

table is below:

```{r importance-table}

use_counts[, 1:length(categories)]

## kable(use_counts[, 1:length(categories)]) %>%

##   kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))

```

## Summary 

* Most packages had very few users among the participants. The vast

  landscape of packages could be leading users to: 1) rely on "old"

  and established packages (like the `adehabitat` family) that gather

  most functions for common analyses in movement and 2) search for

  other packages when doing other specific analyses. Moreover, many

  packages contain functions that other packages have implemented too

  (see more details in the review manuscript), so repetition could

  make users spread between packages.

* After the `adehabitat` family of packages, several packages for

  modeling animal movement (`momentuHMM`, `moveHMM`, `crawl` and

  `ctmm`) showed to be very popular, which could be an indicator of an

  increase in research on movement models.

* Few of the packages had remarkably good documentation (>75% of

  "good" or "excellent" documentation), and, on the other end of the

  spectrum, a couple of packages got less than 50% of "good" or

  "excellent" rates.

* Most packages were relevant for the work of their users, which is a

  positive feature!
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/rociojoo/rmovementpaperrep

Awesome Lists containing this project

README