{"id":13761535,"url":"https://github.com/PLOS/allofplos","last_synced_at":"2025-05-10T12:32:53.662Z","repository":{"id":38043043,"uuid":"104312447","full_name":"PLOS/allofplos","owner":"PLOS","description":"Repository for the allofplos project.","archived":false,"fork":false,"pushed_at":"2024-08-24T10:08:51.000Z","size":4706,"stargazers_count":60,"open_issues_count":35,"forks_count":17,"subscribers_count":14,"default_branch":"master","last_synced_at":"2024-08-24T11:25:57.055Z","etag":null,"topics":["no-code-coverage","openaccess","plos","publishing","science"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/PLOS.png","metadata":{"files":{"readme":"README.md","changelog":"HISTORY.txt","contributing":"contributing.rst","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-09-21T06:54:40.000Z","updated_at":"2024-08-24T11:25:59.573Z","dependencies_parsed_at":"2024-01-15T03:59:21.405Z","dependency_job_id":"10169121-11ca-49d2-97d2-f08f33a6afdd","html_url":"https://github.com/PLOS/allofplos","commit_stats":null,"previous_names":[],"tags_count":9,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PLOS%2Fallofplos","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PLOS%2Fallofplos/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PLOS%2Fallofplos/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PLOS%2Fallofplos/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/PLOS","download_url":"https://codeload.github.com/PLOS/allofplos/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224962054,"owners_count":17399149,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["no-code-coverage","openaccess","plos","publishing","science"],"created_at":"2024-08-03T13:01:59.217Z","updated_at":"2024-11-16T19:30:54.149Z","avatar_url":"https://github.com/PLOS.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"[![Build Status](https://api.travis-ci.org/PLOS/allofplos.svg?branch=master)](https://travis-ci.org/PLOS/allofplos)\n\n# All of Plos (allofplos)\n\nCopyright (c) 2017-2022, Public Library of Science. MIT License, see\nLICENSE.txt for more information.\n\n## Why allofplos?\n\nThis is for downloading/updating/maintaining a repository of all PLOS\nXML article files. This can be used to have a copy of the PLOS text\ncorpus for further analysis. Use this program to download all PLOS XML\narticle files instead of doing web scraping.\n\n## Installation instructions\n\nThis program requires Python 3.8+.\n\nUsing pip:\n\n```\npip install allofplos\n```\n\nThis should install *allofplos* and requirements. At this stage you are\nready to go.\n\nIf you want to manually install from source (for example for development\npurposes), first clone the project repository:\n\n```\ngit clone git@github.com:PLOS/allofplos.git\n```\n\nInstall Python dependencies inside the newly created virtual\nenvironment:\n\n```\npipenv install\n```\n\n## How to run the program\n\nExecute the following command.\n\n```\npython -m allofplos.update\n```\n\nor, if running from source:\n\n```\npipenv run python -m allofplos.update\n```\n\nThe first time it runs it will download a larger than 7 Gb zip file\n(**allofplos.zip**) with all the XML files inside. **Note**: Make sure\nthat you have enough space in your device for the zip file and for its\ncontent before running this command (at least 30Gb). After this file\nis downloaded, it will extract its contents into the allofplos_xml\ndirectory inside your installation of *allofplos*.\n\nIf you want to see the directory on your file system where this is\ninstalled run\n\n```\npython -c \"from allofplos import get_corpus_dir; print(get_corpus_dir())\"\n```\n\nIf you ever downloaded the corpus before, it will make an incremental\nupdate to the existing corpus. The script:\n\n-   checks for and then downloads to a temporary folder individual new\n    articles that have been published\n\n-   of those new articles, checks whether they are corrections (and\n    whether the linked corrected article has been updated)\n\n-   checks whether there are VORs (Versions of Record) for uncorrected\n    proofs in the main articles directory and downloads those\n\n-   checks whether the newly downloaded articles are uncorrected\n    proofs or not after all of these checks, it moves the new articles\n    into the main articles folder.\n\nHere’s what the print statements might look like on a typical run:\n\n```\n147 new articles to download.\n147 new articles downloaded.\n3 amended articles found.\n0 amended articles downloaded with new xml.\nCreating new text list of uncorrected proofs from scratch.\nNo new VOR articles indexed in Solr.\n17 VOR articles directly downloaded.\n17 uncorrected proofs updated to version of record. 44 uncorrected proofs remaining in uncorrected proof list.\n9 uncorrected proofs found. 53 total in list.\nCorpus started with 219792 articles.\nMoving new and updated files...\n164 files moved. Corpus now has 219939 articles.\n```\n\n## How to run the tests\n\nTo run the tests, you will need to install *allofplos* with its testing\ndependencies. These testing dependencies include `pytest`, which we will\nuse to run the tests.\n\n```\npipenv run python -m pytest\n```\n\n## Community guidelines\n\nIf you wish to contribute to this project please open a ticket in the\nGitHub repo at \u003chttps://github.com/PLOS/allofplos/issues\u003e. For support\nrequests write to \u003cmining@plos.org\u003e\n\n## Citing This Library\n\n*allofplos* is published in the proceedings of the SciPy 2018. DOI\n[10.25080/Majora-4af1f417-009](https://doi.org/10.25080/Majora-4af1f417-009)\nrefers to all versions of allofplos.\n\nIf you want to cite allofplos using Bibtex:\n\n    @InProceedings{ elizabeth_seiver-proc-scipy-2018,\n      author    = { Elizabeth Seiver and M Pacer and Sebastian Bassi },\n      title     = { Text and data mining scientific articles with allofplos },\n      booktitle = { Proceedings of the 17th Python in Science Conference },\n      pages     = { 61 - 64 },\n      year      = { 2018 },\n      editor    = { Fatih Akici and David Lippa and Dillon Niederhut and M Pacer },\n      doi       = { 10.25080/Majora-4af1f417-009 }\n    }\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FPLOS%2Fallofplos","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FPLOS%2Fallofplos","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FPLOS%2Fallofplos/lists"}