{"id":19756062,"url":"https://github.com/luistar/cs-journals-analysis-toolkit","last_synced_at":"2026-05-09T23:45:11.451Z","repository":{"id":211922300,"uuid":"730192606","full_name":"luistar/cs-journals-analysis-toolkit","owner":"luistar","description":"Scripts to merge data from multiple indices of computer science journals. Currently supports Scopus, Clarivate, CORE, DBLP and more.","archived":false,"fork":false,"pushed_at":"2023-12-11T15:27:09.000Z","size":5043,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-10T22:35:50.967Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/luistar.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2023-12-11T11:58:21.000Z","updated_at":"2023-12-11T14:51:54.000Z","dependencies_parsed_at":"2023-12-11T16:49:25.333Z","dependency_job_id":null,"html_url":"https://github.com/luistar/cs-journals-analysis-toolkit","commit_stats":null,"previous_names":["luistar/cs-journals-analysis-toolkit"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/luistar%2Fcs-journals-analysis-toolkit","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/luistar%2Fcs-journals-analysis-toolkit/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/luistar%2Fcs-journals-analysis-toolkit/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/luistar%2Fcs-journals-analysis-toolkit/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/luistar","download_url":"https://codeload.github.com/luistar/cs-journals-analysis-toolkit/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241086621,"owners_count":19907305,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-12T03:14:37.799Z","updated_at":"2026-05-09T23:45:11.397Z","avatar_url":"https://github.com/luistar.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Computer Science Journal Analysis Toolkit\n\nThis repository contains R scripts and instructions to replicate the analysis of \ncomputer science journals from different sources presented at the GRIN annual general \nmeeting held in Rome on December 12, 2023.\n\nThe accompanying slide deck for this repository is available in PDF format within the \nrepository under the filename [`computer-science-journals-presentation.pdf`](./computer-science-journals-presentation.pdf).\n\nCurrently, **5 sources** are considered in the analysis:\n* [Scopus](https://www.scimagojr.com/journalrank.php)\n* [Clarivate's Web of Science (WoS) - Computer Science Group](https://jcr.clarivate.com/jcr/browse-categories)\n* [GEV01 List for VQR 2015-2019](https://www.anvur.it/attivita/vqr/vqr-2015-2019/gev/area-1-scienze-matematiche-e-informatiche/)\n* [CORE Journal List](http://portal.core.edu.au/jnl-ranks/?search=\u0026by=all\u0026source=CORE2020\u0026sort=atitle\u0026page=1)\n* [DBLP Computer Science Journals](https://dblp.org/db/journals/index.html)\n\n## Replication Instructions\n\n### 0. Prerequisites\nThis toolkit has been developed and tested using R 4.3.0 and the PyCharm IDE. \nRStudio should work out of the box as well.\n\n### 1. Data Preparation\nBefore performing the analysis, it is necessary to download raw data from the considered \nsources. Due to Intellectual Property concerns, raw data from Scopus, WoS, CORE and DBLP \nare not included in this repository and need to be downloaded manually as instructed below.\n\n#### 1.1 Scopus\nVisit the [Scimago Journal Rank website](https://www.scimagojr.com/journalrank.php), select \"Journals\" \nas the entry type from the dropdown menu on top, and click on the \"Download data\" button \nin the upper-right part of the web page.\nA download for a CSV file (e.g.: `scimagojr 2022.csv`) should automatically start.\nDownload the file and place it in the `./raw_data/` directory.\n\n#### 1.2 Web of Science\nVisit the [WoS Journal Citation Reports web page](https://jcr.clarivate.com/jcr/browse-categories),\nclick on the Computer Science group, and then click on \"See All 14\" to access a page listing\nall categories in the Computer Science group (as shown in the screen capture below).\n![wos_categories.png](screenshots/wos_categories.png)\nSubsequently, for each of the 28 categories (each of the 14 categories comes in two editions, \ne.g., SCIE, ESCI), open the category page by clicking on the category name.\n![wos_category_detail.png](screenshots/wos_category_detail.png)\nIn the category detail page, click on the \"Export\" link in the upper-right part of the web\npage to download, and then select the CSV download option, as shown in the screen capture below.\nNote that it is necessary to be logged in with a Clarivate account to download the data.\n![wos_download.png](screenshots/wos_download.png)\nRepeat the above process for each of the 28 categories, and place the downloaded CSV files\nin the `./raw_data/clarivate/` directory.\n\n#### 1.3 CORE\nVisit the [CORE Journal Ranks page](http://portal.core.edu.au/jnl-ranks/?search=\u0026by=all\u0026source=CORE2020\u0026sort=atitle\u0026page=1)\nand click on the \"Export\" button placed in the upper-right corner of the table.\nThis will trigger the download of a `CORE_journals.csv` file.\nPlace the downloaded file in the `./raw_data/` directory.\n\n#### 1.4 GEV01 List of Journals for VQR 2015-2019\nThe GEV01 List of Journals for VQR 2015-2019 is already included in the repository\nin the `./raw_data/GEV/Elenco INF01 - Informatica.xlsx` file, \nthus no additional step is needed. Data can be downloaded from [this link](https://www.anvur.it/attivita/vqr/vqr-2015-2019/gev/area-1-scienze-matematiche-e-informatiche/).\n\n#### 1.5 DBLP\nData from DBLP is scraped from the DBLP website. To automatically scrape DBLP Journal \ndata, run the `./raw_data/scrape_dblp_journals.R` script.\nThe script will generate a `./raw_data/dblp_journals.csv` file containing all journals \nlisted on the DBLP website. Note that the scraping scripts takes approximately 45 minutes to run,\ndue to limitations on the number of connections accepted by the DBLP website.\n\n### 2. Data Pre-processing\nWith all the raw data in place, the next step is data pre-processing.\nTo this end, it is necessary to run the following scripts.\n\n#### 2.1 WoS\nThis step is performed by the `00a_process_clarivate.R` script. \nThis scripts merges the CSV files for each category into a single dataset, \nwhich is saved in the `./data/clarivate.RDS` file.\nBefore running the script, make sure that the `clarivateDataPath` variable in \nLine 8 is properly initialized with the correct path in which you downloaded the CSV\nfiles from WoS.\n\n#### 2.2 Scopus\nThis step is performed by the `00b_process_scimago.R` script. \nThis scripts does some pre-processing and normalization on ISSNs, and saves \nprocessed data in two separate files:\n* `./data/scimago.RDS`, containing the entire Scopus journal list, and\n* `./data/scimago_CS.RDS`, containing the subset of journals that are classified in \nthe Computer Science Area in Scopus.\n\nBefore running the script, make sure that on **Line 4** the correct path to the CSV file\nyou downloaded is provided.\n\n#### 2.3 CORE\nThis step is performed by the `00c_process_core.R` script. \nThis scripts processes raw data from CORE, and saves pre-processed data in \nthe `./data/core.RDS` file.\nBefore running the script, make sure that on **Line 4** the correct path to the CSV file\nyou downloaded is provided.\n\n\n#### 2.4 GEV\nThis step is performed by the `00d_process_gev.R` script. \nThis scripts processes raw data from GEV, \nand saves pre-processed data in the `./data/gev.RDS` file.\nBefore running the script, make sure that on **Line 3** the correct path to the xlsx file\nyou downloaded is provided.\n\n#### 2.5 DBLP\nThis step is performed by the `00e_process_dblp.R` script. \nThis scripts processes raw data from GEV, \nand saves pre-processed data in the `./data/dblp.RDS` file.\nBefore running the script, make sure that on **Line 5** the correct path to the CSV file\nyou downloaded is provided.\n\n### 3. JOINING DATA\nAt this point, the `./data/` directory should contain the following pre-processed files:\n* `./data/clarivate.RDS`;\n* `./data/scimago.RDS` and `./data/scimago_CS.RDS`;\n* `./data/core.RDS`\n* `./data/gev.RDS`\n* `./data/dblp.RDS`\n\nTo join the data, run the following scripts in order:\n\n1. `01a_join_data.R`. This script takes ~1 minute to perform the join, and creates the following files:\n   * `./data_joined/scimago.RDS` and `./data_joined/scimago_all.RDS`\n   * `./data_joined/clarivate.RDS`\n   * `./data_joined/core.RDS`\n   * `./data_joined/gev.RDS`\n   * `./data_joined/dblp.RDS`\n   \n   These files contain the original data from each source, in which the titles of the\n   journals are normalized (i.e.: a journal appearing in multiple source, has the same\n   title in each of them). Moreover, an additional file containing  all joined data is generated\n   * `./data_joined/full_outer_join.RDS`\n\n2. `01b_prepare_joined_dataset.R`. This script reads the `./data_joined/all_outer_join.RDS`\nfile generated by the previous script and creates an XLSX and a JSON file containing the final \ndataset. These files are saved in:\n   * `./data_joined/dataset.xlsx`\n   * `./data_joined/dataset.json`\n\n### 4. Analysis and Visualization\nTo generate the Venn diagrams and the Upset plots representing the intersections between\nthe considered sources, run the `02a_analysis.R` script.\nThe plots will be saved in the `./visualizations` directory.\nTo perform the pairwise analysis between the considered sources and generate \narea-proportional Euler diagrams, run the `02b_euler.R` script.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fluistar%2Fcs-journals-analysis-toolkit","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fluistar%2Fcs-journals-analysis-toolkit","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fluistar%2Fcs-journals-analysis-toolkit/lists"}