{"id":21727500,"url":"https://github.com/rociojoo/rmovementpaperrep","last_synced_at":"2025-03-20T23:17:40.307Z","repository":{"id":80721476,"uuid":"153035894","full_name":"rociojoo/RmovementPaperRep","owner":"rociojoo","description":"Navigating through the R packages for movement: Supporting information","archived":false,"fork":false,"pushed_at":"2020-09-02T22:44:04.000Z","size":14356,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-01-25T19:43:47.939Z","etag":null,"topics":["movement-ecology","packages","r","tracking-data"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rociojoo.png","metadata":{"files":{"readme":"README.Rmd","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-10-15T01:24:41.000Z","updated_at":"2021-05-12T03:03:30.000Z","dependencies_parsed_at":"2023-06-05T11:45:16.356Z","dependency_job_id":null,"html_url":"https://github.com/rociojoo/RmovementPaperRep","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rociojoo%2FRmovementPaperRep","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rociojoo%2FRmovementPaperRep/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rociojoo%2FRmovementPaperRep/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rociojoo%2FRmovementPaperRep/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rociojoo","download_url":"https://codeload.github.com/rociojoo/RmovementPaperRep/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244706544,"owners_count":20496571,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["movement-ecology","packages","r","tracking-data"],"created_at":"2024-11-26T03:51:07.237Z","updated_at":"2025-03-20T23:17:40.289Z","avatar_url":"https://github.com/rociojoo.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"---\ntitle: \"Navigating through the R packages for movement: Supporting information\"\nauthor: \"Rocio Joo, Matthew E. Boone, Thomas A. Clay, Samantha C. Patrick, Susana Clusella-Trullas, and Mathieu Basille.\"\ndate: \"May 16, 2019\"\noutput:\n  github_document:\n    toc: true\n    toc_depth: 3\n---\n\n\n```{r setup, include = FALSE}\nknitr::opts_chunk$set(echo = FALSE, warning = FALSE, message = FALSE, \n    fig.path = \"figures/\")\nlibrary(\"igraph\")\nlibrary(\"scales\")\nlibrary(\"Matrix\")\nlibrary(\"dplyr\")\nlibrary(\"cowplot\")\nlibrary(\"viridis\")\nlibrary(\"reshape\")\nlibrary(\"RColorBrewer\")\n## library(\"kableExtra\")\nlibrary(\"ggrepel\")\nlibrary(\"printr\")\n\n```\n\n```{r data}\ndata_dir \u003c- \"data/\"\n\ndata \u003c- read.csv(paste0(data_dir, \"survey-responses.csv\"), stringsAsFactors = FALSE)\ndata_all \u003c- data %\u003e% filter(completion == 100)\npackages \u003c- read.csv(paste0(data_dir, \"pkg-list-survey.csv\"), \n    stringsAsFactors = FALSE)\n\ndata_question_1 \u003c- data_all[, grep(\"q1\", colnames(data_all))]\ncolnames(data_question_1) \u003c- t(packages)\n# dropping trajr that was added in the end and only got 1\n# response\ndata_question_1 \u003c- data_question_1[, -grep(\"trajr\", colnames(data_question_1))]\npackages_new \u003c- packages[-which(packages == \"trajr\"), ]\n\nTotal \u003c- sapply(1:dim(data_question_1)[1], function(x) {\n    sum(as.numeric(data_question_1[x, ] != \"Never\"))\n})\ndiscard_rows \u003c- which(Total == 0)\ndata_all \u003c- data_all[-discard_rows, ]\n\n# Table with information of all packages from the survey\npkg_info \u003c- read.csv(paste0(data_dir, \"pkg-info.csv\"), stringsAsFactors = FALSE)\n\n```\n[![DOI](https://zenodo.org/badge/153035894.svg)](https://zenodo.org/badge/latestdoi/153035894)\n\n## Overview\n\nThis repository is a companion to the manuscript \"*Navigating through\nthe R packages for movement: a review for users and developers*\", from\nRocio Joo, Matthew E. Boone, Thomas A. Clay, Samantha C. Patrick,\nSusana Clusella-Trullas, and Mathieu Basille (pre-print available on\n[arXiv.org](https://arxiv.org/abs/1901.05935)). This document is\nactually a dynamic R report, for which RMarkdown sources are available\n[here](README.Rmd) with full code. The repository also serves to store\ndata about:\n\n1. Information for [74 R packages](data/pkg-info.csv) related to\n   tracking data processing and analysis. Information was collected\n   between March and August 2018. **58** of the packages were described in\n   the review, and **72** of those packages were the focus of a survey on\n   their users about their use, relevance and quality of their\n   documentation (see [packages included in the survey](#packages-included-in-the-survey) \n   for more details). Additional details about this data file are available\n   [here](data/README_pkg-info.md).\n2. [Responses to an anonymous survey](data/survey-responses.csv) about\n   the use, relevance and quality of the documentation of 72 packages\n   related to movement. The survey was executed in the Fall of 2018.\n   Additional details about this data file are available\n   [here](data/README_survey-responses.md).\n\nThis repository can be cited using its DOI: 10.5281/zenodo.3066226\n\n## A large amount of R packages for movement\n\nThe manuscript presents a review of R packages for movement. R is one\nof the most used programming softwares to process, visualize and\nanalyze data from tracking devices. The large amount of existing\npackages makes it difficult to keep track of the spectrum of choices,\nwith an increasing number of available packages every year (this is\n**Figure 2** of the manuscript):\n\n```{r ms-fig-2, fig.width = 7, fig.height = 5}\n# only keeping columns that we will use\ndata_pkg \u003c- pkg_info[, c(\"Package\", \"Year\")]\n# Loading the names of packages we include in the review\nmov_pac \u003c- read.csv(paste0(data_dir, \"pkg-list-paper.csv\"), stringsAsFactors = FALSE)\nmov_pac[mov_pac == \"SGAT/TripEstimation\"] \u003c- \"SGAT\"\nmov_pac[mov_pac == \"TwGeos/BAStag\"] \u003c- \"TwGeos\"\nmov_pac[mov_pac == \"ukfsst/kfsst\"] \u003c- \"ukfsst\"\n\n# Subsetting by packages in review\nind.mov.pac \u003c- match(mov_pac$Package, data_pkg$Package)\ndata_pkg \u003c- data_pkg[ind.mov.pac, ]\n\n# counting packages per year\ntheTable \u003c- as.data.frame(table(data_pkg$Year, useNA = \"no\"))\ncolnames(theTable) \u003c- c(\"Year\", \"Total\")\n\n# in case there are some years without publication of\n# packages\ntheTable$Year \u003c- as.numeric(levels(theTable$Year))\n# filter out 2018 which is not complete\ntheTable \u003c- theTable %\u003e% filter(Year \u003c 2018)\nrange_year \u003c- range(theTable$Year)\nvalues_year \u003c- range_year[1]:range_year[2]\nmissing_years \u003c- setdiff(values_year, theTable$Year)\n\ntheTable \u003c- rbind.data.frame(cbind.data.frame(Year = missing_years, \n    Total = rep(0, length(missing_years))), theTable)\ntheTable$Year \u003c- factor(theTable$Year, levels = sort(theTable$Year))\n\nggplot(theTable, aes(x = Year, y = Total)) +\n    geom_bar(stat = \"identity\", position = \"identity\") +\n    xlab(\"Year of publication\") + ylab(\"Number of packages\") +\n    scale_y_continuous(minor_breaks = seq(0, 12, 1), breaks = seq(0, \n        12, 3)) +\n    background_grid(major = \"y\", minor = \"none\", colour.major = \"grey80\", \n        size.major = 0.5)\n## ggsave(\"figures/Fig2.pdf\", width = 7, height = 5)\n\n```\n\nSince the packages were reviewed between March and August 2018, this \nlast year was incomplete and not included in the graph.\n\nMany packages are actually not connected to each others, showing a very \nfragmented landscape of tracking packages in R. Here we show a network \nrepresentation of the dependency and suggestion between tracking packages \n(this is **Figure 4** of the manuscript). The arrows go towards the package\nthe others suggest (dashed arrows) or depend on (solid arrows). Bold font\ncorresponds to active packages.  The size of the circle is proportional to \nthe number of packages that suggest or depend on this one.\n\n```{r ms-fig-4, fig.width = 12, fig.height = 12}\n# loading the import + suggest information for each package\nimports \u003c- read.csv(paste0(data_dir, \"pkg-import-suggest.csv\"),\n    stringsAsFactors = FALSE)\n\n# getting the names of all packages that participate here as\n# a dependent or dependency (or suggestion)\npackages_dep \u003c- unique(c(imports$Package, imports$Dependency))\nimports$Dependency[which(imports$Dependency == \"-\")] \u003c- NA\n\n# accomodating everything as a matrix of counts\npackages_dep \u003c- packages_dep[-which(packages_dep == \"-\")]  # excluding packages with neither dependencies or suggestions\npack.id \u003c- 1:length(packages_dep)\ntable.counts \u003c- as.data.frame.matrix(table(imports$Package, imports$Dependency))  # table of counts with rows as list of packages and columns the packages they depend on/suggest\n# but we only want to account for tracking packages.\n\n# So in the end, we want a matrix of counts which would be\n# square, with rows and columns of tracking packages.\n# Problem? Not all tracking packages are counted as\n# dependencies or suggestions of other packages, so they are\n# not in the columns. We are going to add the missing ones\n# first (with zeros) and then remove the columns that should\n# not be there\n\nnew_col \u003c- setdiff(t(mov_pac), colnames(table.counts))\nzero_matrix \u003c- matrix(0, ncol = length(new_col), nrow = dim(table.counts)[1])\nrow.names(zero_matrix) \u003c- row.names(table.counts)\ncolnames(zero_matrix) \u003c- new_col\ntable.counts \u003c- cbind.data.frame(table.counts, zero_matrix)\n\nind.mov.row \u003c- match(t(mov_pac), row.names(table.counts))\nind.mov.col \u003c- match(t(mov_pac), colnames(table.counts))\n\ntable.counts \u003c- table.counts[ind.mov.row, ind.mov.col]\n# making sure that the order is right\ncol.names.df \u003c- colnames(table.counts)\nrow.names.df \u003c- row.names(table.counts)\ntable.counts \u003c- table.counts[order(row.names.df, decreasing = FALSE),\n    order(col.names.df, decreasing = FALSE)]\n\n# only keeping columns that we will use\ndata_pkg \u003c- pkg_info[, c(\"Package\", \"Active\")]\n\ndata_pkg$Package[data_pkg$Package == \"SGAT/TripEstimation\"] \u003c- \"SGAT\"\ndata_pkg$Package[data_pkg$Package == \"TwGeos/BAStag\"] \u003c- \"TwGeos\"\ndata_pkg$Package[data_pkg$Package == \"ukfsst/kfsst\"] \u003c- \"ukfsst\"\n\nind.mov.cran \u003c- match(t(mov_pac), data_pkg$Package)\ndata_pkg \u003c- data_pkg[ind.mov.cran, ]\n\ntable.mov \u003c- t(table.counts)  # transposing for plotting\n\nnum_sugg \u003c- apply(table.mov, 1, sum)  # total number of dep/sugg\n\ng.matrix \u003c- graph.adjacency(t(as.matrix(table.mov)), weighted = TRUE,\n    mode = \"directed\", diag = FALSE)\ng \u003c- simplify(g.matrix)\n\nfont_text \u003c- rep(1, nrow(data_pkg))\nfont_text[data_pkg$Active == \"Yes\"] \u003c- 2  # bold for active packages\ncolor_back \u003c- \"white\"  #alpha('snow3',data_pkg$downloads/max(data_pkg$downloads))\n\n# now, we want to make dashed and more transparent arrows for\n# suggestion but darker and solid arrows for import\nel \u003c- as_edgelist(g)\nedges_sugg \u003c- data_frame(suggesting = V(g)[el[, 1]]$name, suggested = V(g)[el[,\n    2]]$name, type = \"suggestion\")\nedges_sugg$type \u003c- as.character(edges_sugg$type)\n\nimports \u003c- read.csv(paste0(data_dir, \"pkg-import.csv\"), stringsAsFactors = FALSE)  # we need an only import file\nimports \u003c- imports[imports$Package %in% mov_pac$Package, ]\nfor (i in 1:nrow(edges_sugg)) {\n    ind \u003c- (imports$Package %in% as.character(edges_sugg$suggesting[i])) +\n        (imports$Dependency %in% as.character(edges_sugg$suggested[i]))\n    if (any(ind == 2)) {\n        edges_sugg$type[i] \u003c- \"dependency\"\n    }\n}\n\nline_type \u003c- rep(2, length(E(g)))\nline_type[edges_sugg$type == \"dependency\"] \u003c- 1\nline_color \u003c- rep(alpha(\"#ef8a62\", 0.5), length(E(g)))\nline_color[edges_sugg$type == \"dependency\"] \u003c- alpha(\"#ef8a62\",\n    0.8)\n\n# General options for plotting.\nV(g)$label.family \u003c- \"Helvetica\"\nV(g)$label \u003c- V(g)$name\nV(g)$degree \u003c- degree(g)\nlayout1 \u003c- layout.fruchterman.reingold(g)\nV(g)$label.color \u003c- \"darkblue\"\nV(g)$label.font \u003c- font_text\nV(g)$frame.color \u003c- alpha(\"black\", 0.7)\nV(g)$label.cex \u003c- 1.5\nV(g)$color \u003c- color_back\nV(g)$size \u003c- 40 * num_sugg/sum(num_sugg)\negam \u003c- 3\nE(g)$width \u003c- egam * 1.5\nE(g)$arrow.size \u003c- egam/3\nE(g)$lty \u003c- line_type\nE(g)$color \u003c- line_color\n\n# pdf('NetworkImportSuggestTrack.pdf',width = 18,height = 16)\nplot(g, layout = layout1, vertex.label.dist = 0.5)\n# dev.off()\n\n```\n\n\n## The survey\n\nOur review aimed at an objective introduction to the packages\norganized by the type of processing or analyzing they focused on, and\nto provide feedback to developers from a user perspective. For the\nsecond objective, we elaborated a survey for package users regarding:\n\n1. How popular those packages are;\n2. How well documented they are;\n3. How relevant they are for users.\n\nThose were the three questions that we asked about the packages, plus\none about the level as an R user of the survey participant.  In the\nreview we showed results regarding package documentation. In the\nfollowing, we present the complete results of the survey.\n\n\n### Packages included in the survey\n\nIn theory, any package could be potentially useful for movement\nanalysis; either a time series package, a spatial analysis one or even\n`ggplot2` to make more beautiful graphics! For the review, we\nconsidered only what we referred to as **tracking packages**. Tracking\npackages were those created to either analyze tracking data\n(i.e. $(x,y,t)$) or to transform data from tagging devices into proper\ntracking data. For instance, a package that would use accelerometer,\ngyroscope and magnetometer data to reconstruct an animal's trajectory\nvia path integration, thus transforming those data into an $(x,y,t)$\nformat, would fit into the definition. But a package analyzing\naccelerometry series to detect changes in behavior would not fit.\n\nFor this survey, we added packages that, though not tracking packages\n*per se*, were created to process or analyze data extracted from\ntracking devices in other formats (e.g. `accelerometry` for\naccelerometry data, `diveMove` for time-depth recorders or\n`pathtrackr` for video tracking data).  Packages from any public\nrepository (e.g. CRAN, GitHub, R-forge) were included in the\nsurvey. Packages created for eye, computer-mouse or fishing vessel\nmovement were not considered here. A couple of packages that were\nfinally discarded from the review because of either being in early\nstages of development (`movement`) or because of being archived in\nCRAN due to unfixed problems (`sigloc`), were included in the\nsurvey. Two packages, `lsmnsd` and `segclust2d`, were added for an\nupdated version of the review but did not make it in time for the\nsurvey. The package `trajr` was added to the survey in a late stage,\nbut because of that, and the fact that it got only one response, we\nfiltered it out of the analysis.\n\nA total of 72 packages were included in this survey: `acc`,\n`accelerometry`, `adehabitatHR`, `adehabitatHS`, `adehabitatLT`,\n`amt`, `animalTrack`, `anipaths`, `argosfilter`, `argosTrack`,\n`BayesianAnimalTracker`, `BBMM`, `bcpa`, `bsam`, `caribou`, `crawl`,\n`ctmcmove`, `ctmm`, `diveMove`, `drtracker`, `EMbC`, `feedR`,\n`FLightR`, `GeoLight`, `GGIR`, `hab`, `HMMoce`, `Kftrack`, `m2b`,\n`marcher`, `migrateR`, `mkde`, `momentuHMM`, `move`, `moveHMM`,\n`movement`, `movementAnalysis`, `moveNT`, `moveVis`, `moveWindSpeed`,\n`nparACT`, `pathtrackr`, `pawacc`, `PhysicalActivity`, `probgls`,\n`rbl`, `recurse`, `rhr`, `rpostgisLT`, `rsMove`, `SDLfilter`,\n`SGAT/TripEstimation`, `sigloc`, `SimilarityMeasures`, `SiMRiv`,\n`smam`, `SwimR`, `T-LoCoH`, `telemetr`, `trackdem`, `trackeR`,\n`Trackit`, `TrackReconstruction`, `TrajDataMining`, `trajectories`,\n`trip`, `TwGeos`/`BAStag`, `TwilightFree`, `Ukfsst`/`kfsst`, `VTrack`\nand `wildlifeDI`.\n\n\n### Participation in the survey \n\nThe survey was designed to be completely anonymous, meaning that we\nhad no way to know who participated. There was no previous selection\nof the participants and no probabilistic sampling was involved. The\nsurvey was advertised by Twitter, mailing lists (r-sig-geo and\nr-sig-ecology), individual emails to researchers and the [lab's website](https://mablab.org/post/2018-08-31-r-movement-review/).\n\nThe survey got exemption from the Institutional Review Board at University of Florida \n(IRB02 Office, Box 112250, University of Florida, Gainesville, FL 32611-2250).\n\nA total of `r data %\u003e% filter(!is.na(completion)) %\u003e% nrow()` people\nparticipated in the survey, and `r data_all %\u003e% nrow()` answered all\nfour questions. To answer all questions the participant had to have\ntried at least one of the packages. In the following sections, we\nanalyze only completed surveys.\n\n\n### Survey representativity\n\nTo get a rough idea of how representative the survey was of the\npopulation of the package users, we compared the number of\nparticipants that used each package to the number of monthly downloads\nthat each package has.\n\nThe number of downloads were calculated using the R package\n`cran.stats`. It calculates the number of independent downloads by\neach package (substracting downloads by dependencies) by day. It only\nprovides download statistics for packages on CRAN, downloaded using\nthe RStudio CRAN mirror—total downloads are likely to be an order of\nmagnitude higher. We computed the average number of downloads per\nmonth, from September 2017 to August 2018; fewer months were\nconsidered for packages that were younger than one year old.\n\nThere is no perfect match between the number of users and the number\nof downloads per package, but a correlation of 0.85 for the 49\npackages on CRAN provides evidence of an overall good representation\nof the users of tracking packages in the survey. Moreover, most of the\npackages with very few users in the survey regardless of their\nrelatively high download statistics were accelerometry packages for\nhuman patients, which would be revealing that we did not reach that\nsubpopulation of users through Twitter and emails.\n\nA log-log plot for both metrics is shown in the figure below.\n\n```{r representativity, fig.width = 14, fig.height = 8}\ndata_question \u003c- data_all[, grep(\"q1\", colnames(data_all))]\ncolnames(data_question) \u003c- t(packages)\n# I'm dropping trajr that was added in the end and only got 1\n# response\ndata_question \u003c- data_question[, -grep(\"trajr\", colnames(data_question))]\npackages_new \u003c- packages[-which(packages == \"trajr\"), ]\ncategories \u003c- c(\"Never\", \"Rarely\", \"Sometimes\", \"Often\")\nuse_counts \u003c- t(sapply(1:ncol(data_question), function(x) {\n    data_line \u003c- factor(data_question[, x], levels = categories)\n    count_p_use \u003c- as.numeric(table(data_line))\n    return(count_p_use)\n}))\nuse_counts \u003c- data.frame(use_counts)\ncolnames(use_counts) \u003c- categories\nrownames(use_counts) \u003c- t(packages_new)\nuse_counts$Package \u003c- row.names(use_counts)\nuse_counts \u003c- use_counts %\u003e% mutate(users = Rarely + Sometimes + \n    Often)\n\nfunciones \u003c- read.csv(paste0(data_dir, \"pkg-info.csv\"))\nmatrix_fun \u003c- left_join(use_counts, funciones)\nmatrix_fun \u003c- matrix_fun[(!is.na(matrix_fun$monthly.downloads)), \n    ]\n\nggplot(matrix_fun, aes(x = users, y = monthly.downloads)) +\n    geom_point(size = 3) +\n    geom_text_repel(label = matrix_fun$Package, segment.size = 0.6, \n        force = 3, segment.alpha = 0.5, hjust = 0, box.padding = 0.5, \n        min.segment.length = 0.1, size = 6, direction = \"both\") +\n    scale_x_continuous(trans = \"log10\") +\n    scale_y_continuous(trans = \"log10\") +\n    xlab(\"Number of users\") + ylab(\"Monthly downloads\")\n\n```\n\n\n## The questions\n\n### User level\n\nLet's see first the level of use in R of the participants. The options\nwere:\n\n* Beginner: You only use existing packages and occasionally write some\n  lines of code.\n* Intermediate: You use existing packages but you also write and\n  optimize your own functions.\n* Advanced: You commonly use version control or contribute to develop\n  packages.\n\n```{r user-experience, fig.width=7, fig.height=5, fig.cap=\"Level of R use\"}\ndata_question_4 \u003c- data_all[, grep(\"q4\", colnames(data_all))]\ncategories \u003c- c(\"Beginner\", \"Intermediate\", \"Advanced\")\ndata_question_4 \u003c- factor(unlist(lapply(strsplit(data_question_4,\n    \"\\\\:\"), \"[[\", 1)), levels = categories)\nuse_counts \u003c- as.numeric(table(data_question_4))\nprop \u003c- round(as.numeric(prop.table(table(data_question_4))) *\n    100, 1)\nuse_counts \u003c- data.frame(levels = categories, total = use_counts)\nuse_counts$levels \u003c- factor(as.character(use_counts$levels),\n    levels = categories)\n\nggplot(data = use_counts, aes(x = levels, y = total)) +\n    geom_bar(stat = \"identity\", position = position_dodge()) +\n    geom_text(aes(label = total), size = 6, vjust = 1.5, hjust = .5, \n        color = \"white\") +\n    xlab(\"Level of use\") + ylab(\"Total respondents\")\n\n```\n\nMost participants considered themselves in an intermediate level \n(`r prop[2]`%), meaning that they could write functions in R. Some others\nwere beginners (`r prop[1]`%) and advanced (`r prop[3]`%) R users.\n\n\n### Package use\n\nThe first question about package use was: How often do you use each of\nthese packages? (Never, Rarely, Sometimes, Often)\n\nThe bar graphics below show that most packages were unknown (or at\nleast had never been used) by the survey participants. The\n`adehabitat` packages (HR, LT and HS) were the most used\npackages. These packages provide a collection of tools to estimate\nhome range, handle and analyze trajectories, and analyze habitat\nselection, respectively. On the bottom of the graphic, `smam` (for\nanimal movement models), `PhysicalActivity`, `nparACT`, `GGIR` (these\nthree for accelerometry data on human patients) and `feedr` (to handle\nradio telemetry data) had no users among the participants. For that\nreason, those 5 packages will not appear in the analysis of the next\nquestions.\n\n```{r relevance, fig.width = 16, fig.height = 15, fig.cap = \"Package use\"}\ndata_question \u003c- data_all[, grep(\"q1\", colnames(data_all))]\ncolnames(data_question) \u003c- t(packages)\n# I'm dropping trajr that was added in the end and only got 1\n# response\ndata_question \u003c- data_question[, -grep(\"trajr\", colnames(data_question))]\npackages_new \u003c- packages[-which(packages == \"trajr\"), ]\n\ncategories \u003c- c(\"Never\", \"Rarely\", \"Sometimes\", \"Often\")\nuse_counts \u003c- t(sapply(1:ncol(data_question), function(x) {\n    data_line \u003c- factor(data_question[, x], levels = categories)\n    count_p_use \u003c- as.numeric(table(data_line))\n    return(count_p_use)\n}))\nuse_counts \u003c- data.frame(use_counts)\ncolnames(use_counts) \u003c- categories\nrownames(use_counts) \u003c- t(packages_new)\nuse_counts$package \u003c- row.names(use_counts)\n\ndf1 \u003c- melt(use_counts, id.vars = \"package\", variable_name = \"response\")\ng \u003c- unlist(by(df1, df1$package, function(x) sum(x$value[x$response !=\n    \"Never\"])))\n\ndf1$package \u003c- factor(df1$package, levels = names(sort(g, decreasing = FALSE)))\ncolor.pallete \u003c- brewer.pal(5, \"YlGnBu\")\ncolor.pallete[1] \u003c- \"lightgray\"\n\nggplot(data = df1) +\n    geom_col(aes(x = package, y = value, fill = response)) +\n    coord_flip() +\n    scale_fill_manual(values = color.pallete) +\n    ylab(\"Count\")\n\n```\n\nIf you want to check the numbers for specific packages, the complete\ntable is below:\n\n```{r relevance-table}\nuse_counts[, 1:length(categories)]\n\n## kable(use_counts[, 1:length(categories)]) %\u003e%\n##   kable_styling(bootstrap_options = c(\"striped\", \"hover\",\n##                                       \"condensed\", \"responsive\"))\n\n```\n\nThere is actually not much difference in the number of packages used\nby the distinct levels of R users as you can see in the boxplots\nbelow:\n\n```{r boxplot-users-packages, fig.width = 7, fig.height = 5, fig.cap = \"Packages per user level\"}\nTotal \u003c- Total[-discard_rows]\nnew_df \u003c- cbind.data.frame(Total, user = data_question_4)\ncategories \u003c- c(\"Beginner\", \"Intermediate\", \"Advanced\")\nnew_df \u003c- cbind.data.frame(Total, user = data_question_4)\nnew_df$user \u003c- factor(as.character(new_df$user), levels = categories)\n\nggplot(new_df, aes(x = user, y = Total)) +\n    geom_boxplot() +\n    scale_y_continuous(breaks = 0:max(new_df$Total)) +\n    xlab(\"User level\") + ylab(\"Package count\")\n\n```\n\n### Package documentation \n\nWithout proper user testing and peer editing, package documentation\ncan lead to large gaps of understanding and limited usefulness of the\npackage. If functions and workflows are not expressly defined, a\npackage's capacity to help users is undermined.\n\nIn this survey we asked the participants how helpful was the\ndocumentation provided for each of the packages they stated to\nuse. Documentation includes what is contained in the manual and help\npages, vignettes, published manuscripts, and other material about the\npackage provided by the authors. The participants had to answer using\none of the following options:\n\n* Not enough: \"It's not enough to let me know how to do what I need\"\n* Basic: \"It's enough to let me get started with simple use of the\n  functions but not to go further (e.g. use all arguments in the\n  functions, or put extra variables)\"\n* Good: \"I did everything I wanted and needed to do with it\"\n* Excellent: \"I ended up doing even more than what I planned because\n  of the excellent information in the documentation\"\n* Don't remember: \"I honestly can't remember…\"\n\n```{r documentation, fig.width = 16, fig.height = 15, fig.cap = \"Bar plots of absolute frequency of each category of package documentation\"}\ndata_question \u003c- data_all[, grep(\"q2\", colnames(data_all))]\ncolnames(data_question) \u003c- t(packages)\n# I'm dropping trajr that was added in the end and only got 1\n# response\ndata_question \u003c- data_question[, -grep(\"trajr\", colnames(data_question))]\npackages_new \u003c- packages[-which(packages == \"trajr\"), ]\n\ncategories \u003c- c(\"Not enough\", \"Basic\", \"Good\", \"Excellent\", \"Don't remember\")\nuse_counts \u003c- t(sapply(1:ncol(data_question), function(x) {\n    data_line \u003c- factor(data_question[, x], levels = categories)\n    count_p_use \u003c- as.numeric(table(data_line, useNA = \"no\"))\n    return(count_p_use)\n}))\nuse_counts \u003c- data.frame(use_counts)\ncolnames(use_counts) \u003c- categories\nrownames(use_counts) \u003c- t(packages_new)\ntotal_package \u003c- rowSums(use_counts)\nuse_counts$package \u003c- row.names(use_counts)\nuse_counts \u003c- use_counts[total_package \u003e 0, ]\n\ndf1 \u003c- melt(use_counts, id.vars = \"package\", variable_name = \"response\")\ng \u003c- unlist(by(df1, df1$package, function(x) sum(x$value)))\n\ncolor.pallete \u003c- brewer.pal(5, \"YlGnBu\")\ncolor.pallete[1] \u003c- \"lightgray\"\ndf1$package \u003c- factor(df1$package, levels = names(sort(g, decreasing = FALSE)))\ndf1$response \u003c- factor(df1$response, levels = rev(c(\"Excellent\", \n    \"Good\", \"Basic\", \"Not enough\", \"Don't remember\")))\n\nggplot(data = df1) +\n    geom_col(aes(x = package, y = value, fill = response)) +\n    coord_flip() +\n    scale_fill_manual(values = color.pallete) +\n    background_grid(major = \"x\", minor = \"x\", colour.major = \"grey80\", \n        colour.minor = \"grey80\", size.major = 0.5) +\n    ylab(\"Count\")\n\n```\n\n```{r useage counts}\ndf2 \u003c- use_counts[, c(\"Not enough\", \"Basic\", \"Good\", \"Excellent\")]\nuse_per \u003c- df2/apply(df2, 1, sum) * 100\nuse_per$good_excellent \u003c- signif(apply(use_per[, c(\"Good\", \"Excellent\")], \n    1, sum), 4)\nuse_per$counts \u003c- apply(df2[, c(\"Good\", \"Excellent\")], 1, sum)\n\n```\n\nRemember that participants could only give their opinion on\ndocumentation regarding the packages they had used. Hence, the\npackages with many users got many documentation answers. The figure \nabove allows for a closer look at the proportion of type of\nresponse for each package.\n\nTo identify some packages with remarkably good documentation, let's\nfirst only consider those packages with at least 10 responses on the\nquality of documentation (regardless of the \"Don't remember\"). These\nare 26 (you can see the table of responses below). Among them,\n`momentuHMM` had more than 50% of the responses \n(`r round(use_per[\"momentuHMM\",\"Excellent\"],2)`%; \n`r use_counts[\"momentuHMM\",\"Excellent\"]`) as \"excellent documentation\",\nmeaning that the documentation was so good that thanks to it, more\nthan half of its users discovered additional features of the package\nand were able to do more analyses than what they initially\nplanned. Moreover, 10 packages had more than 75% of the responses as\neither \"good\" or \"excellent\": \n`momentuHMM` (`r use_per[\"momentuHMM\",\"good_excellent\"]`%; \n`r use_per[\"momentuHMM\",\"counts\"]`), \n`moveHMM` (`r use_per[\"moveHMM\",\"good_excellent\"]`%; \n`r use_per[\"moveHMM\",\"counts\"]`), \n`adehabitatLT` (`r use_per[\"adehabitatLT\",\"good_excellent\"]`%; \n`r use_per[\"adehabitatLT\",\"counts\"]`), \n`adehabitatHR` (`r use_per[\"adehabitatHR\",\"good_excellent\"]`%; \n`r use_per[\"adehabitatHR\",\"counts\"]`), \n`EMbC` (`r use_per[\"EMbC\",\"good_excellent\"]`%; `r use_per[\"EMbC\",\"counts\"]`),\n`wildlifeDI` (`r use_per[\"wildlifeDI\",\"good_excellent\"]`%; \n`r use_per[\"wildlifeDI\",\"counts\"]`), \n`ctmm` (`r use_per[\"ctmm\",\"good_excellent\"]`%; `r use_per[\"ctmm\",\"counts\"]`),\n`GeoLight` (`r use_per[\"GeoLight\",\"good_excellent\"]`%; \n`r use_per[\"GeoLight\",\"counts\"]`), \n`move` (`r use_per[\"move\",\"good_excellent\"]`%; `r use_per[\"move\",\"counts\"]`),\n`recurse` (`r use_per[\"recurse\",\"good_excellent\"]`%; \n`r use_per[\"recurse\",\"counts\"]`). The two leading packages, `momentuHMM`\nand `moveHMM`, focus on the use of Hidden Markov models which allow\nidentifying different patterns of behavior called states.\n\nOne way to visualize the quality of documentation is to relate the\nrating to the number of respondents who declared using each package\n(this is **Figure 3** of the manuscript). This figure shows the\nproportion of good and excellent documentation for packages with at\nleast 10 respondents; light green corresponds to packages with\nstandard documentation only, blue is for packages with vignettes, and\npurple is for packages that also have peer-reviewed articles\npublished:\n\n```{r ms-fig-3, fig.width = 14, fig.height = 8}\n# question about documentation\ndata_question \u003c- data_all[, grep(\"q2\", colnames(data_all))]\ncolnames(data_question) \u003c- t(packages)\n# I'm dropping trajr that was added in the end and only got 1\n# response\ndata_question \u003c- data_question[, -grep(\"trajr\", colnames(data_question))]\npackages_new \u003c- packages[-which(packages == \"trajr\"), ]\n\ncategories \u003c- c(\"Not enough\", \"Basic\", \"Good\", \"Excellent\", \"Don't remember\")\nuse_counts \u003c- t(sapply(1:ncol(data_question), function(x) {\n    data_line \u003c- factor(data_question[, x], levels = categories)\n    count_p_use \u003c- as.numeric(table(data_line, useNA = \"no\"))\n    return(count_p_use)\n}))\nuse_counts \u003c- data.frame(use_counts)\ncolnames(use_counts) \u003c- categories\nrownames(use_counts) \u003c- t(packages_new)\ntotal_package \u003c- rowSums(use_counts)\nuse_counts$Package \u003c- row.names(use_counts)\nuse_counts \u003c- use_counts[total_package \u003e 0, ]\n\nmatrix_fun \u003c- left_join(use_counts, funciones)\n# not all of the packages from the survey are included in the\n# review\npackages_paper \u003c- read.csv(paste0(data_dir, \"pkg-list-paper.csv\"))\nmatrix_fun \u003c- inner_join(packages_paper, matrix_fun)\n\n# clarifying documentation options\nmatrix_fun$documentation \u003c- c(\"Manual\")\nmatrix_fun$documentation[matrix_fun$Vignettes == \"Yes\" \u0026 matrix_fun$Papers == \n    \"No\"] \u003c- c(\"Manual+Vignette\")\nmatrix_fun$documentation[matrix_fun$Vignettes == \"Yes\" \u0026 matrix_fun$Papers == \n    \"Yes\"] \u003c- c(\"Manual+Vignette+Paper\")\n\nmatrix_fun$num_answers \u003c- rowSums(matrix_fun[2:5])\nmatrix_fun$Good_Exc \u003c- (rowSums(matrix_fun[4:5]))/matrix_fun$num_answers * \n    100\n\n# discarding the packages we did not get users from:\nmatrix_fun_2 \u003c- matrix_fun[matrix_fun$num_answers \u003e 0, ]  # missing: 'feedR' 'smam'            \nmatrix_fun_3 \u003c- matrix_fun[matrix_fun$num_answers \u003e= 10, ]\n\nggplot(matrix_fun_3, aes(x = num_answers, y = Good_Exc, col = documentation)) + \n    geom_point(size = 3) +\n    geom_text_repel(label = matrix_fun_3$Package, segment.size = 1, \n        force = 1, segment.alpha = 0.4, hjust = 0, box.padding = 0.4, \n        min.segment.length = 0.25, size = 6) + \n    xlab(\"Number of respondents\") + ylab(\"Good or Excellent Rating\") + \n    ## scale_colour_manual(values = color.pallete) +\n    scale_color_viridis(discrete = TRUE, end = .75, option = \"D\", direction = -1)+\n    background_grid(major = \"xy\", colour.major = \"grey80\") +\n    theme(legend.position = \"none\")\n## ggsave(\"figures/Fig3.pdf\", width = 14, height = 8)\n\n```\n\nIf you want to check the numbers for specific packages, the complete\ntable is below:\n\n```{r documentation-table}\nuse_counts[, 1:length(categories)]\n\n## kable(use_counts[, 1:length(categories)]) %\u003e%\n##   kable_styling(bootstrap_options = c(\"striped\", \"hover\", \"condensed\", \"responsive\"))\n```\n\n\n### Package Relevance\n\nParticipants were asked how relevant was each of the packages they use\nfor their work. They had to answer using one of the following options:\n\n* Not relevant: \"I tried the package but really didn't find it a good\n  use for my work\"\n* Slightly relevant: \"It helps in my work, but not for the core of it\"\n* Important: \"It's important for the core of my work, but if it didn't\n  exist, there are other packages or solutions to obtain something\n  similar\"\n* Essential: \"I wouldn't have done the key part of my work without\n  this package\"\n\n```{r importance, fig.width = 16, fig.height = 15, fig.cap = \"Bar plots of absolute frequency of each category of package relevance\"}\ndata_question \u003c- data_all[, grep(\"q3\", colnames(data_all))]\ncolnames(data_question) \u003c- t(packages)\n# I'm dropping trajr that was added in the end and only got 1\n# response\ndata_question \u003c- data_question[, -grep(\"trajr\", colnames(data_question))]\npackages_new \u003c- packages[-which(packages == \"trajr\"), ]\n\ncategories \u003c- c(\"Not relevant\", \"Slightly relevant\", \"Important\", \n    \"Essential\")\nuse_counts \u003c- t(sapply(1:ncol(data_question), function(x) {\n    data_line \u003c- factor(data_question[, x], levels = categories)\n    count_p_use \u003c- as.numeric(table(data_line, useNA = \"no\"))\n    return(count_p_use)\n}))\nuse_counts \u003c- data.frame(use_counts)\ncolnames(use_counts) \u003c- categories\nrownames(use_counts) \u003c- t(packages_new)\ntotal_package \u003c- rowSums(use_counts)\nuse_counts$package \u003c- row.names(use_counts)\nuse_counts \u003c- use_counts[total_package \u003e 0, ]\n\ndf1 \u003c- melt(use_counts, id.vars = \"package\", variable_name = \"response\")\ng \u003c- unlist(by(df1, df1$package, function(x) sum(x$value)))\n\ncolor.pallete \u003c- brewer.pal(5, \"YlGnBu\")\ncolor.pallete[1] \u003c- \"lightgray\"\ndf1$package \u003c- factor(df1$package, levels = names(sort(g, decreasing = F)))\n\ndf1$response \u003c- factor(df1$response, levels = rev(c(\"Essential\", \n    \"Important\", \"Slightly relevant\", \"Not relevant\")))\n\nggplot(data = df1) +\n    geom_col(aes(x = package, y = value, fill = response)) +\n    coord_flip() +\n    scale_fill_manual(values = color.pallete) +\n    background_grid(major = \"x\", minor = \"x\", colour.major = \"grey80\", \n        colour.minor = \"grey80\", size.major = 0.5) +\n    ylab(\"Count\")\n\n```\n\n```{r importance-counts}\ndf2 \u003c- use_counts[, c(\"Not relevant\", \"Slightly relevant\", \"Important\", \n    \"Essential\")]\nuse_per \u003c- df2/apply(df2, 1, sum) * 100\nuse_per$good_excellent \u003c- signif(apply(use_per[, c(\"Important\", \n    \"Essential\")], 1, sum), 4)\nuse_per$counts \u003c- apply(df2[, c(\"Important\", \"Essential\")], 1, \n    sum)\n\n```\n\nThe two barplots show the absolute and relative frequency of the\nanswers for each package, respectively.  We identified the packages\nthat were highly relevant for their users, considering only those\npackages with at least 10 responses. Among these 33 packages, three\nwere regarded as either \"Important\" or \"Essential\" for more than 75%\nof their users: \n`bsam` (`r use_per['bsam','good_excellent']`%; `r use_per['bsam','counts']`), \n`adehabitatHR` (`r use_per['adehabitatHR','good_excellent']`%; \n`r use_per['adehabitatHR','counts']`), and \n`adehabitatLT` (`r use_per['adehabitatLT','good_excellent']`%; \n`r use_per['adehabitatLT','counts']`). `bsam` allows fitting Bayesian\nstate-space models to animal tracking data.\n\n```{r importance-percentage,fig.width = 16, fig.height = 10, fig.cap = \"Bar plots of relative frequency of each category of package relevance (for packages with more than 5 users)\"}\npackage_levels \u003c- row.names(use_per[order(use_per$good_excellent, \n    use_per$Essential, use_per$Important), ])\n\nuse_per2 \u003c- use_per\nuse_per2[is.na(use_per2)] \u003c- 0\nuse_per2$package \u003c- row.names(use_per2)\nuse_per2 \u003c- subset(use_per2, counts \u003e 5)\ndf1 \u003c- melt(subset(use_per2, select = -c(good_excellent, counts)), \n    id.vars = \"package\", variable_name = \"response\")\ndf1$package \u003c- factor(df1$package, levels = package_levels)\ndf1$response \u003c- factor(df1$response, levels = c(\"Not relevant\", \n    \"Slightly relevant\", \"Important\", \"Essential\"))\ncolor.pallete \u003c- brewer.pal(4, \"YlGnBu\")\ncolor.pallete[1:2] \u003c- c(\"lightgray\", \"darkgray\")\n# color.pallete[1]\u003c-c('lightgray')\n\nggplot(data = df1) +\n    geom_col(aes(x = package, y = value, fill = response)) +\n    coord_flip() +\n    scale_fill_manual(values = color.pallete) +\n    ylab(\"Percentage\")\n\n```\n\nIf you want to check the numbers for specific packages, the complete\ntable is below:\n\n```{r importance-table}\nuse_counts[, 1:length(categories)]\n\n## kable(use_counts[, 1:length(categories)]) %\u003e%\n##   kable_styling(bootstrap_options = c(\"striped\", \"hover\", \"condensed\", \"responsive\"))\n```\n\n\n## Summary \n\n* Most packages had very few users among the participants. The vast\n  landscape of packages could be leading users to: 1) rely on \"old\"\n  and established packages (like the `adehabitat` family) that gather\n  most functions for common analyses in movement and 2) search for\n  other packages when doing other specific analyses. Moreover, many\n  packages contain functions that other packages have implemented too\n  (see more details in the review manuscript), so repetition could\n  make users spread between packages.\n* After the `adehabitat` family of packages, several packages for\n  modeling animal movement (`momentuHMM`, `moveHMM`, `crawl` and\n  `ctmm`) showed to be very popular, which could be an indicator of an\n  increase in research on movement models.\n* Few of the packages had remarkably good documentation (\u003e75% of\n  \"good\" or \"excellent\" documentation), and, on the other end of the\n  spectrum, a couple of packages got less than 50% of \"good\" or\n  \"excellent\" rates.\n* Most packages were relevant for the work of their users, which is a\n  positive feature!\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frociojoo%2Frmovementpaperrep","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frociojoo%2Frmovementpaperrep","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frociojoo%2Frmovementpaperrep/lists"}