An open API service indexing awesome lists of open source software.

https://github.com/bmaitner/r_citations


https://github.com/bmaitner/r_citations

Last synced: 4 months ago
JSON representation

Awesome Lists containing this project

README

          

# Overview
This repository contains code for investigating how often manuscripts in Ecology and Evolutionary Biology that cite the R software language make their R code available. The R scripts that this work relies upon are contained in the folder 'R_scripts'. The data generated by this work (which includes both scripted and manual components) are stored in the 'data' folder. The 'figures' folder contains figures produced for a related manuscript (in review). For more information, see the preprint at: https://www.authorea.com/doi/full/10.22541/au.170003886.68548206/v1

# Important Data

## Citation Data
The main data file in this repository is [cite_data.RDS](https://github.com/bmaitner/R_citations/blob/main/data/cite_data.RDS). This is an RDS file containing information on citation counts for R files and associated predictor variables. Many of the variables are returned from the Rscopus package (https://cran.r-project.org/web/packages/rscopus/index.html) using the scopus API (https://dev.elsevier.com/sc_apis.html). Metadata on fields returned by the scopus API is available at https://dev.elsevier.com/sc_apis.html. Below, we provide information on fields which are NOT returned by the scopus API (i.e., data which we collected).

* uid = A unique ID assigned to each record.
* r_scripts_available = A binary variable (yes/no) describing whether any R code was shared as part of the publication.
* r_used = A binary variable (yes/no) describing whether R was used in the publication (as opposed to simply referenced without being used).
* data_available = A binary variable (yes/no) describing whether the full data underlying the publication were included.
* comments = Unstructured comments about the record. This may contain information about why a judgement was made or where code was found.
* code location = Text string describing where the code was located, options include: NA, "SI", "figshare", "website", "appendix", "dryad", "github", "Github", "zenodo", "environmental data initiative", "sciencebase.gov", "mendeley data", "osf", "bitbucket"
* code format = Text string describing the format a code was shared in, options include: NA, "word", "pdf", "R", "typeset text", "rtf", "txt", "rmd"
* code license = Text string describing the license for the shared code, if any. Note that "NA" means that a license was not specified, where NA means we did not check. Options include: NA, "NA", "GPL", "CC0", "CC-BY", "MIT", "Open", "copyright"
* n = A numeric index variable used to stratify randomization.

See https://dev.elsevier.com/sc_apis.html for information on the following fields:
* title
* author
* year
* doi
* journal
* issn
* volume
* pages
* date
* display_date
* citations
* article_type
* open_access

## Impact Factor Data
The other important data file in this repository is [impact_factor.csv](https://github.com/bmaitner/R_citations/blob/main/data/manual_downloads/impact_factors.csv). This is a CSV file containing information on the impact factors of journals used in this work, as recorded on June 16, 2023. This information on impact factor was provided by the R package "scholar" (https://cran.r-project.org/web/packages/scholar/index.html). Below we provide information on the fields included.

* needed_journals = The list of journals submitted to the scholar R package. These were extracted from the "journal" field of the file cite_data.RDS (see above).
* Journal = The journal title matched by scholar.
* Cites = The number of citations of that journal.
* ImpactFactor = The journal's impact factor.
* Eigenfactor = The journal's Eigenfactor.
* dist = The distance between the submitted journal name and the returned journal name, as calcualted by scholar.

# Important Code

There are two important R scripts in this repository: [1_data_collection.R](https://github.com/bmaitner/R_citations/blob/main/R_scripts/1_data_collection.R) and [2_analyses_and_figures.R](https://github.com/bmaitner/R_citations/blob/main/R_scripts/2_analyses_and_figures.R). The former file was used to select publications for the study (along with relevant metadata). The latter file contains code underlying analyses and visualizations.

[![DOI](https://zenodo.org/badge/526320931.svg)](https://zenodo.org/badge/latestdoi/526320931)