{"id":16638267,"url":"https://github.com/albertocuadra/doi_scraper","last_synced_at":"2025-06-30T11:03:50.138Z","repository":{"id":164590412,"uuid":"640054736","full_name":"AlbertoCuadra/doi_scraper","owner":"AlbertoCuadra","description":"Digital Object Identifier scraper written in Python","archived":false,"fork":false,"pushed_at":"2025-03-20T10:42:41.000Z","size":35,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-04-12T09:54:30.011Z","etag":null,"topics":["bibtex","crossref","crossref-api","doi","latex","python","research","scraper"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AlbertoCuadra.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-05-12T21:43:27.000Z","updated_at":"2025-03-20T10:42:45.000Z","dependencies_parsed_at":null,"dependency_job_id":"44f3e560-4afb-43e8-bebb-a592f0327702","html_url":"https://github.com/AlbertoCuadra/doi_scraper","commit_stats":null,"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"purl":"pkg:github/AlbertoCuadra/doi_scraper","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlbertoCuadra%2Fdoi_scraper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlbertoCuadra%2Fdoi_scraper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlbertoCuadra%2Fdoi_scraper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlbertoCuadra%2Fdoi_scraper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AlbertoCuadra","download_url":"https://codeload.github.com/AlbertoCuadra/doi_scraper/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlbertoCuadra%2Fdoi_scraper/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262762431,"owners_count":23360326,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bibtex","crossref","crossref-api","doi","latex","python","research","scraper"],"created_at":"2024-10-12T06:44:06.505Z","updated_at":"2025-06-30T11:03:50.127Z","avatar_url":"https://github.com/AlbertoCuadra.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# DOI Scraper\r\n\r\nThe DOI Scraper is a Python script that reads a `.bib` file, searches for entries missing required fields (such as a DOI), retrieves the missing information using the [Crossref API](https://www.crossref.org/documentation/retrieve-metadata/rest-api/), and reformats the file with consistent indentation. The refactored design supports different entry types (e.g., articles, books, inproceedings, tech reports), with each type defining its own required fields.\r\n\r\n## Prerequisites\r\n\r\n- Python 3.x\r\n- `requests` library\r\n- `tqdm` library\r\n\r\n## Installation\r\n\r\n1. Clone the repository or download the `doi_scraper.py` file.\r\n\r\n2. Install the required dependencies by running the following command:\r\n\r\n```shell\r\npip install -r requirements.txt\r\n```\r\n\r\n# Usage\r\n\r\nPlace your input `.bib` file in the same directory as the `doi_scraper.py` script.\r\n\r\nOpen the `doi_scraper.py` file and modify the following variables according to your needs:\r\n\r\n```python\r\ninput_file = 'input.bib'   # Name of the input .bib file\r\noutput_file = 'output.bib' # Name of the output .bib file\r\nINDENT_PRE = 4             # Number of spaces before the field name\r\nINDENT_POST = 16           # Number of spaces after the field name\r\n```\r\n\r\nRun the script using the following command:\r\n\r\n```shell\r\npython doi_scraper.py\r\n```\r\n\r\nThe script will search for articles without a DOI and retrieve the missing DOIs using the Crossref API. It will then update the output .bib file with the retrieved DOIs.\r\n\r\nOnce the script completes, you will find the updated .bib file with the retrieved DOIs in the same directory.\r\n\r\n## Optional Arguments\r\n\r\n* `--format-only`: If you want to reformat the file without performing any Crossref lookups.\r\n\r\n# Example\r\n\r\n## Before\r\n\r\n```bibtex\r\n@article{Cuadra2020,\r\ntitle            = {Effect of equivalence ratio fluctuations on planar detonation discontinuities},\r\nauthor   = {Cuadra, Alberto and Huete, C{\\'e}sar and Vera, Marcos},\r\npages= {A30 1--39}\r\n}\r\n```\r\n\r\n## After\r\n\r\n```bibtex\r\n@article{Cuadra2020,\r\n    title           = {Effect of equivalence ratio fluctuations on planar detonation discontinuities},\r\n    author          = {Cuadra, Alberto and Huete, C{\\'e}sar and Vera, Marcos},\r\n    pages           = {A30 1--39},\r\n    year            = {2020},\r\n    journal         = {Journal of Fluid Mechanics},\r\n    volume          = {903},\r\n    doi             = {10.1017/jfm.2020.651},\r\n}\r\n```\r\n\r\n# License\r\n\r\nThis project is licensed under the [MIT License](LICENSE).","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falbertocuadra%2Fdoi_scraper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falbertocuadra%2Fdoi_scraper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falbertocuadra%2Fdoi_scraper/lists"}