Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/KTH-Library/openalex

R package to provide data access to OpenAlex by way of REST API
https://github.com/KTH-Library/openalex

Last synced: 9 days ago
JSON representation

R package to provide data access to OpenAlex by way of REST API

Awesome Lists containing this project

README

        

---
output: github_document
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```

# openalex

[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)
[![R-CMD-check](https://github.com/KTH-Library/openalex/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/KTH-Library/openalex/actions/workflows/R-CMD-check.yaml)

The goal of `openalex` is to provide access to data from [OpenAlex](https://openalex.org) - an open and comprehensive catalog of scholarly papers, authors, institutions and more ... - to R through the [Open Alex REST API](https://docs.openalex.org/api)...

## Installation

You can install the current version of `openalex` from [GitHub](https://github.com/kth-library/openalex) with:

``` r
#install.packages("devtools")
devtools::install_github("kth-library/openalex", dependencies = TRUE)
```

## Example

This is a basic example which shows you how to get information for papers and authors:

```{r example, eval=TRUE}

library(openalex)
library(dplyr)
suppressPackageStartupMessages(library(purrr))
library(knitr)

iid <-
openalex:::openalex_autocomplete(
query = "Royal Institute of Technology",
entity_type = "institution",
format = "table") |>
head(1) |>
pull("id")

data <-
openalex_crawl(entity = "works", verbose = TRUE, fmt = "tables",
query = openalex:::openalex_query(filter =
sprintf("institutions.id:%s,publication_year:2024", iid)))

res <- data |> map(head) # return only first six rows from each table

res
```

## Rate limits and using an API key

By providing an email address you enter the "polite pool" which provides even less of rate limiting for API requests.

You can provide it in `~/.Renviron` by setting `OPENALEX_USERAGENT=http://github.com/hadley/httr (mailto:your_email@your_institution.org)`.

You can also set it just for the session by using a helper fcn `openalex_polite()` to temporarily set or unset the email used in the user agent string when making API requests:

```{r polite}
library(openalex)

# set an email to use for the session

openalex_polite("[email protected]")

# unset, and use default user agent string...

openalex_polite("")

```

A premium subscription API key can be used by setting `OPENALEX_KEY=secret_premium_api_key` in your `.Renviron`, or temporarily in a session using:

```{r premium, eval = FALSE}
library(openalex)

# temporarily use a premium subscription API key
openalex_key("secret_premium_api_key")

# unset to not use the premium subscription API key
openalex_key("")

```

This will make it possible to make API calls that return the latest available records, for example based on recent creation dates or recent last modification timestamps.

```{r updates, eval = TRUE}

# we do not require an API key for the publish date
published_since_ <- openalex_works_published_since(since_days = 7)

# but an API key is needed when using "from_created_date" and "from_updated_date" fields.
created_since_7d <- openalex_works_created_since(since_days = 7)
updated_since_1h <- openalex_works_updated_since(since_minutes = 60)

# first few rows of each of these retrievals
created_since_7d |> _$work_ids |> head() |> knitr::kable()
updated_since_1h |> _$work_ids |> head() |> knitr::kable()

```

## Data source attribution

When data from `openalex` is displayed publicly, this attribution also needs to be displayed:

```{r attribution}
library(openalex)
openalex_attribution()
```