{"id":32204401,"url":"https://github.com/gastonbecerra/ojsr","last_synced_at":"2026-02-21T03:32:34.641Z","repository":{"id":114816504,"uuid":"231683909","full_name":"gastonbecerra/ojsr","owner":"gastonbecerra","description":"R package to crawl and scrape OJS (open journal system)","archived":false,"fork":false,"pushed_at":"2024-11-13T12:00:06.000Z","size":9661,"stargazers_count":3,"open_issues_count":4,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-12-09T08:55:28.738Z","etag":null,"topics":["oai-pmh","ojs","rstats","scraper","web-scraping"],"latest_commit_sha":null,"homepage":"","language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gastonbecerra.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-01-03T23:53:44.000Z","updated_at":"2024-11-13T12:00:09.000Z","dependencies_parsed_at":null,"dependency_job_id":"ef1c47b2-0fe4-4607-8641-63a2b852489a","html_url":"https://github.com/gastonbecerra/ojsr","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/gastonbecerra/ojsr","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gastonbecerra%2Fojsr","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gastonbecerra%2Fojsr/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gastonbecerra%2Fojsr/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gastonbecerra%2Fojsr/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gastonbecerra","download_url":"https://codeload.github.com/gastonbecerra/ojsr/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gastonbecerra%2Fojsr/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29672704,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-21T03:11:15.450Z","status":"ssl_error","status_checked_at":"2026-02-21T03:10:34.920Z","response_time":107,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["oai-pmh","ojs","rstats","scraper","web-scraping"],"created_at":"2025-10-22T04:54:48.508Z","updated_at":"2026-02-21T03:32:34.633Z","avatar_url":"https://github.com/gastonbecerra.png","language":"HTML","funding_links":[],"categories":[],"sub_categories":[],"readme":"# OJS Scraper for R\n\n\u003c!-- badges: start --\u003e\n[![CRAN status](https://www.r-pkg.org/badges/version/ojsr)](https://cran.r-project.org/package=ojsr)\n[![R-CMD-check](https://github.com/gastonbecerra/ojsr/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/gastonbecerra/ojsr/actions/workflows/R-CMD-check.yaml)\n\u003c!-- badges: end --\u003e\n\nThe aim of this package is to aid you in crawling OJS archives, issues, articles, galleys, and search results, and retrieving/scraping metadata from articles. **ojsr functions rely on OJS routing conventions** to compose the URL for different scraping scenarios.\n\n# Installation\n\nFrom CRAN:\n\n```r\ninstall.packages('ojsr') \n```\n\nFrom Github:\n\n```r\ninstall.packages('devtools') \ndevtools::install_github(\"gastonbecerra/ojsr\")\n```\n\n# ojsr functions\n\n- **`get_issues_from_archive()`**: scrapes issues URLs from OJS issues archive\n- **`get_articles_from_issue()`**: scrapes articles URLs from the ToC of OJS issues\n- **`get_articles_from_search()`**: scrapes OJS search results for a given criteria to retrieve articles URLs\n- **`get_galleys_from_article()`**: scrapes galleys URLs from OJS articles\n- **`get_html_meta_from_article()`**: scrapes metadata from OJS articles HTML\n- **`get_oai_meta_from_article()`**: retrieves OAI records for OJS articles\n- **`parse_base_url()`**: parses URLs against OJS routing conventions to retrieve the base URL\n- **`parse_oai_url()`**: parses URLs against OJS routing conventions to retrieve the OAI protocol URL\n\n# Example\n\nLet's say we want to collect metadata from some journals to compare their top keywords. We have the journals' names and URLs, and can use ojsr to scrap their issues, articles and metadata.\n\n```{r}\n\nlibrary(dplyr) \nlibrary(ojsr)\n\njournals \u003c- data.frame ( cbind(\n    name = c( \"Revista Evaluar\", \"PSocial\" ),\n    url = c( \"https://revistas.unc.edu.ar/index.php/revaluar\", \"https://publicaciones.sociales.uba.ar/index.php/psicologiasocial\")\n  ), stringsAsFactors = FALSE )\n\n# we are using the journal URL as input to retrieve the issues\nissues \u003c- ojsr::get_issues_from_archive(input_url = journals$url) \n\n# we are using the issues URL we just scraped as an input to retrieve the articles\narticles \u003c- ojsr::get_articles_from_issue(input_url = issues$output_url)\n\n# we are using the articles URL we just scraped as an input to retrieve the metadata\nmetadata \u003c- ojsr::get_html_meta_from_article(input_url = articles$output_url)\n\n# let's parse the base URLs from journals and metadata, so we can bind by journal\njournals$base_url \u003c- ojsr::parse_base_url(journals$url)\nmetadata$base_url \u003c- ojsr::parse_base_url(metadata$input_url)\n\nmetadata %\u003e% filter(meta_data_name==\"citation_keywords\") %\u003e% # filtering only keywords\n  left_join(journals) %\u003e% # include journal names\n  group_by(base_url, keyword = meta_data_content) %\u003e% tally(sort=TRUE) \n\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgastonbecerra%2Fojsr","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgastonbecerra%2Fojsr","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgastonbecerra%2Fojsr/lists"}