https://github.com/bmaitner/r_citations
https://github.com/bmaitner/r_citations
Last synced: 4 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/bmaitner/r_citations
- Owner: bmaitner
- License: cc0-1.0
- Created: 2022-08-18T18:02:22.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2024-08-22T20:37:16.000Z (almost 2 years ago)
- Last Synced: 2025-09-09T17:41:28.050Z (10 months ago)
- Language: R
- Size: 47.6 MB
- Stars: 3
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Overview
This repository contains code for investigating how often manuscripts in Ecology and Evolutionary Biology that cite the R software language make their R code available. The R scripts that this work relies upon are contained in the folder 'R_scripts'. The data generated by this work (which includes both scripted and manual components) are stored in the 'data' folder. The 'figures' folder contains figures produced for a related manuscript (in review). For more information, see the preprint at: https://www.authorea.com/doi/full/10.22541/au.170003886.68548206/v1
# Important Data
## Citation Data
The main data file in this repository is [cite_data.RDS](https://github.com/bmaitner/R_citations/blob/main/data/cite_data.RDS). This is an RDS file containing information on citation counts for R files and associated predictor variables. Many of the variables are returned from the Rscopus package (https://cran.r-project.org/web/packages/rscopus/index.html) using the scopus API (https://dev.elsevier.com/sc_apis.html). Metadata on fields returned by the scopus API is available at https://dev.elsevier.com/sc_apis.html. Below, we provide information on fields which are NOT returned by the scopus API (i.e., data which we collected).
* uid = A unique ID assigned to each record.
* r_scripts_available = A binary variable (yes/no) describing whether any R code was shared as part of the publication.
* r_used = A binary variable (yes/no) describing whether R was used in the publication (as opposed to simply referenced without being used).
* data_available = A binary variable (yes/no) describing whether the full data underlying the publication were included.
* comments = Unstructured comments about the record. This may contain information about why a judgement was made or where code was found.
* code location = Text string describing where the code was located, options include: NA, "SI", "figshare", "website", "appendix", "dryad", "github", "Github", "zenodo", "environmental data initiative", "sciencebase.gov", "mendeley data", "osf", "bitbucket"
* code format = Text string describing the format a code was shared in, options include: NA, "word", "pdf", "R", "typeset text", "rtf", "txt", "rmd"
* code license = Text string describing the license for the shared code, if any. Note that "NA" means that a license was not specified, where NA means we did not check. Options include: NA, "NA", "GPL", "CC0", "CC-BY", "MIT", "Open", "copyright"
* n = A numeric index variable used to stratify randomization.
See https://dev.elsevier.com/sc_apis.html for information on the following fields:
* title
* author
* year
* doi
* journal
* issn
* volume
* pages
* date
* display_date
* citations
* article_type
* open_access
## Impact Factor Data
The other important data file in this repository is [impact_factor.csv](https://github.com/bmaitner/R_citations/blob/main/data/manual_downloads/impact_factors.csv). This is a CSV file containing information on the impact factors of journals used in this work, as recorded on June 16, 2023. This information on impact factor was provided by the R package "scholar" (https://cran.r-project.org/web/packages/scholar/index.html). Below we provide information on the fields included.
* needed_journals = The list of journals submitted to the scholar R package. These were extracted from the "journal" field of the file cite_data.RDS (see above).
* Journal = The journal title matched by scholar.
* Cites = The number of citations of that journal.
* ImpactFactor = The journal's impact factor.
* Eigenfactor = The journal's Eigenfactor.
* dist = The distance between the submitted journal name and the returned journal name, as calcualted by scholar.
# Important Code
There are two important R scripts in this repository: [1_data_collection.R](https://github.com/bmaitner/R_citations/blob/main/R_scripts/1_data_collection.R) and [2_analyses_and_figures.R](https://github.com/bmaitner/R_citations/blob/main/R_scripts/2_analyses_and_figures.R). The former file was used to select publications for the study (along with relevant metadata). The latter file contains code underlying analyses and visualizations.
[](https://zenodo.org/badge/latestdoi/526320931)