{"id":23020440,"url":"https://github.com/fabulani/prisma-automator","last_synced_at":"2025-06-20T22:41:38.249Z","repository":{"id":176419579,"uuid":"433022150","full_name":"Fabulani/prisma-automator","owner":"Fabulani","description":"Automates the initial steps of the PRISMA statement by generating search strings, using them to search Scopus, and then saving screened results to an Excel sheet.","archived":false,"fork":false,"pushed_at":"2024-08-08T06:34:36.000Z","size":88,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-05T13:02:50.554Z","etag":null,"topics":["prisma","python","scopus-api"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Fabulani.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-11-29T11:50:07.000Z","updated_at":"2024-08-08T06:34:40.000Z","dependencies_parsed_at":"2024-08-08T08:19:34.174Z","dependency_job_id":"09ee2d03-7eaa-4503-8569-d0f0f1436a4a","html_url":"https://github.com/Fabulani/prisma-automator","commit_stats":null,"previous_names":["fabulani/prisma-automator"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Fabulani%2Fprisma-automator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Fabulani%2Fprisma-automator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Fabulani%2Fprisma-automator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Fabulani%2Fprisma-automator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Fabulani","download_url":"https://codeload.github.com/Fabulani/prisma-automator/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Fabulani%2Fprisma-automator/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":258228353,"owners_count":22668035,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["prisma","python","scopus-api"],"created_at":"2024-12-15T12:14:04.833Z","updated_at":"2025-06-20T22:41:33.236Z","avatar_url":"https://github.com/Fabulani.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# prisma-automator\nAutomates the initial steps of the PRISMA statement: split-string generation, article search, search result screening, and data saving to an Excel sheet for further analysis.\n\n- [Requirements](#requirements)\n- [Quickstart](#quickstart)\n  - [Environment](#environment)\n  - [Elsevier API Key](#elsevier-api-key)\n  - [Keyword groups](#keyword-groups)\n- [Detailed Explanation](#detailed-explanation)\n  - [Splitter](#splitter)\n  - [Collector](#collector)\n  - [Use case](#use-case)\n  - [Test suite](#test-suite)\n- [Limitations](#limitations)\n\n## Requirements\n- Python 3.x\n\n**Note:** package developed with Python 3.9.1\n\n## Quickstart\n### Environment\n(Optional) Create a virtual environment:\n```sh\npython -m venv venv\n\n# Windows\n./venv/Scripts/Activate\n\n# Linux\n./venv/bin/activate\n```\n\nInstall requirements:\n```sh\npip install -r requirements.txt\n```\n\n### Elsevier API Key\n\nFollow instructions in [Pybliometrics: How to access Scopus](https://pybliometrics.readthedocs.io/en/stable/access.html).\n\nThe first time you run `main.py`, it'll request your API key when it attempts to search Scopus. Follow the instructions in the console.\n\nYou can also change your key by editing `APIKey` in `C:\\Users\\\u003cYOU\u003e\\.pybliometrics\\config.ini` under `[Authentication]`.\n\n### Keyword groups\n\nOpen `main.py`, look for a section named ` \"\"\" Split string generation \"\"\"` and `# Create keyword groups`. Edit/add/delete keyword group variables to your liking.\n\nAfter setting your keyword groups, run\n```sh\npython main.py\n```\n\nFor an example and more details about keyword groups, see [Use case](#use-case).\n\n## Detailed Explanation\n\nThe project contains two main classes: `Splitter` and `Collector`.\n\nThe `Splitter` class is responsible for generating all possible search strings (splits) from the defined keyword groups.\n\nThe `Collector` class is responsible for interacting with the Scopus API and retrieving results, as well as cleaning them up before saving them to a local directory.\n\nBoth classes come with methods to wrap all of their functionality and streamline the process of acquiring search strings and Scopus results. Nonetheless, it's also possible to use the other methods and alter the default functionality to suit any particular needs.\n\n`Splitter` has the `split()` method, and `Collector` has the `run()` method. Start by looking into these if you want to understand how everything works.\n\nThe following class diagram presents an overview of the system:\n\n![prisma_automator class diagram](./doc/cd_PrismaAutomator.png)\n\n### Splitter\nThe `Splitter` class uses a recursive depth-first search to generate all possible keyword combinations. Before that, it's necessary to generate an adjacency graph to represent the tree. The generated combinations are then parsed to generate splits that are searchable in Scopus.\n- `add_kwgroup()` and `add_kwgroups()`: add keyword groups to the Splitter. These keyword groups are then used in generating combinations and splits;\n- `generate_kwgraph()`: generates an adjacency graph from the added keywords for use in depth-first search;\n- `generate_combinations()`: uses recursive depth-first search to generate all possible keyword combinations;\n- `parse_combinations()`: parses keyword combinations into searchable strings;\n- `split()`: streamlines the split generation process by calling all other methods, as well as saving generated data to the local directory.\n\n### Collector\nThe `Collector` class comes with 3 methods: `search()`, `screen()`, and `run()`.\n- `search()`: takes the generated splits as input and searches Scopus. Results are saved in 3 different objects: a Pandas dataframe containing all data from search results (doi, title, etc.), and two lists containing the number of search results and their associated split;\n- `screen()`: takes the generated dataframe as input and screens it for duplicates, unnecessary columns (e.g. funding data), conference reviews, and rows without a doi;\n- `run()`: streamlines the whole process by calling `search()` and `screen()`, as well as saving the generated data to the local directory.\n\n### Use case\nSuppose you'd like to look for articles related to extended reality and its applications in brain-computer interfaces and gaming. You come up with the following keywords:\n- Virtual Reality\n- Augmented Reality\n- Extended Reality\n- Mixed Reality\n- BCI\n- Gaming\n\nThen you wonder if Digital Twins are being used alongside these technologies, but it's an optional keyword. Suppose you come up with a search string that looks something like this:\n\n```log\n(\"Virtual Reality\" OR \"Augmented Reality\" OR (\"Extended Reality\" OR \"Mixed Reality\")) AND (\"BCI\" OR \"Gaming\") AND \"Digital Twin\"\n```\n\nand the same search string, but without the `AND \"Digital Twin\"` part. Your keyword groups (separated by the `AND` operators) are:\n\n```txt\nGroup 1 = \"Virtual Reality\" OR \"Augmented Reality\" OR (\"Extended Reality\" OR \"Mixed Reality\")\nGroup 2 = \"BCI\" OR \"Gaming\"\nGroup 3 = \"Digital Twin\"\n```\n\nAs such, your keyword group variables can be defined as:\n\n```py\ngroup_1 = [\"Virtual Reality\", \"Augmented Reality\", \"Extended Reality || Mixed Reality\"]\ngroup_2 = [\"BCI\", \"Gaming\"]\ngroup_3 = [\"Digital Twin\", \"\"]\n```\n\nNote that `(\"Extended Reality\" OR \"Mixed Reality\")` became `\"Extended Reality || Mixed Reality\"` (drop parenthesis, swap `OR` with `||`), and that `group_3` has an empty string as one of the keywords. The empty string signals an optional keyword group, and will generate search strings that don't contain it (i.e. without `AND \"Digital Twin\"`).\n\nOnce the variables are declared, you can add them to the Splitter in two ways:\n\n```py\n# Create Splitter first!\nsplitter = Splitter()\n\n# Add one group\nsplitter.add_kwgroup(group_1)\n\n# Add multiple groups\nkw_groups = [group_2, group_3]\nsplitter.add_kwgroups(kw_groups)\n```\n\nTo generate splits, call the `split()` method.\n```py\nsplits = splitter.split()\n```\n\nIn the `./out` folder, you'll find a new file called `splits.txt`. This file contains all the generated splits.\n\nTo search Scopus, all you need to do is call the `run()` method in the `Collector` class with `splits` as the parameter:\n```py\ncollector = Collector()\ncollector.run(splits)\n```\n\n**Note:** see [Limitations](#limitations) about subscriber access and the `run()` method.\n\nYou'll find 3 new files in the `./out` folder: \n- `search_results.txt`: contains the splits that had less than 1000 results (configurable through the `threshold` parameter in the `Collector.collect()` method, upto 5000) and of which results were saved, as well as the amount of results found;\n- `excluded_results.txt`: contains the splits that were excluded from the search because they had too many results. Also contains the number of results of each split.\n- `final_results.xlsx`: contains data regarding the collected documents from Scopus, as well as the split used to find it.\n\nOpen up `search_results.txt` and `excluded_results.txt` to analyse the effectiveness of your splits. Open `final_results.xlsx` to continue with the PRISMA statement: analyse which articles aren't relevant to your research, exclude them, and continue!\n\n### Test suite\nTo run the test suite, run\n```sh\npytest\n```\nwhile in the root of the project (where `test_suite.py` is).\n\n## Limitations\n\nThe Scopus API is limited for those without subscriber access. This means that, by default, you won't have access to these critical data:\n- abstract text;\n- author keywords.\n\n`prisma-automator` assumes non-subscriber access by default. If you have an API Key with subscriber access, you'll need to override the parameters of the `Collector` class' `run()` method by setting `collector.run(splits, subscriber=True)`. \n\nFor more on getting subscriber access, see [Pybliometrics: How to access Scopus](https://pybliometrics.readthedocs.io/en/stable/access.html).\n\nTo sum it up, the quickest methods are (assuming your institution has subscriber access):\n- you are in your instition’s network;\n- you use your instition’s VPN.\n\nFor limitations on API Key quotas, see [Pybliometrics: API Key quotas and 429 error](https://pybliometrics.readthedocs.io/en/stable/access.html#api-key-quotas-and-429-error)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffabulani%2Fprisma-automator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffabulani%2Fprisma-automator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffabulani%2Fprisma-automator/lists"}