{"id":25347356,"url":"https://github.com/michelecotrufo/pdf2doi","last_synced_at":"2025-05-16T16:05:00.901Z","repository":{"id":37851760,"uuid":"354169776","full_name":"MicheleCotrufo/pdf2doi","owner":"MicheleCotrufo","description":"A  python library/command-line tool to extract the DOI or other identifiers of a scientific paper from a pdf file.","archived":false,"fork":false,"pushed_at":"2025-02-23T18:55:05.000Z","size":83620,"stargazers_count":114,"open_issues_count":8,"forks_count":22,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-05-16T16:04:29.262Z","etag":null,"topics":["arxiv","arxiv-identifiers","bibtex","bibtex-entry","doi","extract","extract-doi","identifiers","metadata","pdf","pdf-text","pypdf2","python"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MicheleCotrufo.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-04-03T01:06:21.000Z","updated_at":"2025-05-13T08:44:23.000Z","dependencies_parsed_at":"2024-06-22T05:03:15.359Z","dependency_job_id":"312b6174-69e1-463d-b941-d9da3174710b","html_url":"https://github.com/MicheleCotrufo/pdf2doi","commit_stats":{"total_commits":252,"total_committers":4,"mean_commits":63.0,"dds":0.09126984126984128,"last_synced_commit":"6662ad7e08b5b05e779b2cf9174da5348c217557"},"previous_names":[],"tags_count":12,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MicheleCotrufo%2Fpdf2doi","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MicheleCotrufo%2Fpdf2doi/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MicheleCotrufo%2Fpdf2doi/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MicheleCotrufo%2Fpdf2doi/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MicheleCotrufo","download_url":"https://codeload.github.com/MicheleCotrufo/pdf2doi/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254564124,"owners_count":22092122,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["arxiv","arxiv-identifiers","bibtex","bibtex-entry","doi","extract","extract-doi","identifiers","metadata","pdf","pdf-text","pypdf2","python"],"created_at":"2025-02-14T14:26:46.439Z","updated_at":"2025-05-16T16:05:00.878Z","avatar_url":"https://github.com/MicheleCotrufo.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# pdf2doi \n\n```pdf2doi``` is a Python library/command-line tool to automatically extract the DOI or other identifiers (e.g. arXiv ID) starting from the .pdf file of a publication \n(or from a folder containing several .pdf files), and to retrieve bibliographic information.\nIt exploits several methods (see below for detailed description) to find a valid identifier of a pdf file, and it validates any result\nvia web queries to public archives (e.g. http://dx.doi.org). \nThe validation process also returns raw bibtex infos, which can be used for further processing, such as generating BibTeX entries ([pdf2bib](https://github.com/MicheleCotrufo/pdf2bib)) or\nautomatically renaming pdf files ([pdf-renamer](https://github.com/MicheleCotrufo/pdf-renamer)).\n\npdf2doi can be used either from [command line](#command-line-usage), or inside your [python script](#usage-inside-a-python-script) or, only for Windows, directly from the [right-click context menu](#installing-the-shortcuts-in-the-right-click-context-menu-of-windows) of a pdf file or a folder.\n\n[![Downloads](https://pepy.tech/badge/pdf2doi)](https://pepy.tech/project/pdf2doi?versions=1.4\u0026versions=1.5.post1\u0026versions=1.6)\n[![Downloads](https://pepy.tech/badge/pdf2doi/month)](https://pepy.tech/project/pdf2doi?versions=1.4\u0026versions=1.5.post1\u0026versions=1.6)\n[![Pip Package](https://img.shields.io/pypi/v/pdf2doi?logo=PyPI)](https://pypi.org/project/pdf2doi)\n\n## Warning\nVersions of ```pdf2doi``` prior to the **1.6** are affected by a very annoying bug. By default, after finding the DOI of a pdf paper, ```pdf2doi``` will store the DOI into the metadata of the pdf file. Due to a bug, the size of the pdf file would double everytime that a metadata was added. This bug has been fixed in all versions \u003e= 1.6. \n\nIf you have pdf files that have been affected by this bug, you can use ```pdf2doi``` to fix it. After updating to a version \u003e 1.6, run ```pdf2doi path/to/folder/containing/pdf/files -id ''```. This will restore the pdf files to their original size.\n\nThanks Ole Steuernagel for pointing out this issue.\n\n## Latest stable version\nThe latest stable version of ```pdf2doi``` is the **1.7**. See [here](https://github.com/MicheleCotrufo/pdf2doi/releases) for the full change log.\n\n### [v1.7] - 2024-11-10\n\n#### Main changes\n- Changed url for dx.doi.org validation (https://github.com/MicheleCotrufo/pdf2doi/issues/35)\n- Added 'r' in front of strings to suppress warnings in recent Python versions (https://github.com/MicheleCotrufo/pdf2doi/pull/36)\n- Changed ```pymupdf``` dependency to ```pymupdf\u003e=1.21.0``` (https://github.com/MicheleCotrufo/pdf2doi/issues/32 https://github.com/MicheleCotrufo/pdf2doi/issues/28 https://github.com/MicheleCotrufo/pdf2doi/issues/37)\n\n## Installation\n\nUse the package manager pip to install pdf2doi.\n\n```bash\npip install pdf2doi==1.7\n```\n\n- Many users have reported (https://github.com/MicheleCotrufo/pdf2doi/issues/32 https://github.com/MicheleCotrufo/pdf2doi/issues/28 https://github.com/MicheleCotrufo/pdf2doi/issues/37) that the installation fails because of some issue related to the installation of the library ```pymupdf```. We are still not sure what the issue is. A possible fix seems to be installing ```pymupdf``` separately (before installing ```pdf2doi```), via ```pip install pymupdf\u003e=1.21.0```.\n\n- The library ```textract``` provides additional ways to analyze pdf files, and it is sometimes more powerful than ```PyPDF2```, but it comes with a large overhead of additional required dependencies, and sometimes it generates version conflicts. \nThe user can decide whether to install it or not. ```pdf2doi``` will only try to use this library if it detects that it is installed.\nTo install it,\n```bash\npip install textract==1.6.4\npip install pdfminer.six==20191110\n```\n\nUnder Windows, after installation of ```pdf2doi``` it is also possible to add [shortcuts to the right-click context menu](#installing-the-shortcuts-in-the-right-click-context-menu-of-windows).\n\n## Used by\n\nHere is a list of applications/repositories that make use of ```pdf2doi```. If you use ```pdf2doi``` in your application and you wish to add it to this list, send me a message.\n\n* [file_organizer](https://github.com/codedai/file_organizer)\n* [pubmex](https://github.com/mmagnus/pubmex)\n* [mendeley-migration](https://github.com/newmanrs/mendeley-migration)\n* [pub2sum](https://github.com/SamuelKnaus/pub2sum)\n* [DataIngest](https://github.com/workfor-webapps/DataIngest)\n* [pdf2bib](https://github.com/MicheleCotrufo/pdf2bib)\n* [pdf-renamer](https://github.com/MicheleCotrufo/pdf-renamer)\n\n\n## Table of Contents\n - [Installation](#installation)\n - [Description](#description)\n - [Usage](#usage)\n    * [Command line usage](#command-line-usage)\n        + [Manually associate the correct identifier to a file from command line](#manually-associate-the-correct-identifier-to-a-file-from-command-line)\n    * [Usage inside a python script](#usage-inside-a-python-script)\n        + [Manually associate the correct identifier to a file](#manually-associate-the-correct-identifier-to-a-file)\n - [Installing the shortcuts in the right-click context menu of Windows](#installing-the-shortcuts-in-the-right-click-context-menu-of-windows)\n  -[Contributing](#contributing)\n - [License](#license)\n - [Acknowledgment](#acknowledgment)\n - [Donating](#donating)\n\n## Description\nAutomatically associating a DOI or other identifiers (e.g. arXiv ID) to a pdf file can be either a very easy or a very difficult\n(sometimes nearly impossible) task, depending on how much care was placed in crafting the file. In the simplest case (which typically works with most recent publications)\nit is enough to look into the file metadata. For older publications, the identifier is often found within the pdf text and it can be\nextracted with the help of regular expressions. In the unluckiest cases, the only method left is to google some details of the publication\n(e.g. the title or parts of the text) and hope that a valid identifier is contained in one of the first results.\n\n```pdf2doi``` applies sequentially all these methods (starting from the simplest ones) until a valid identifier is found and validated.\nSpecifically, for a given .pdf file it will, in order,\n\n1. Look into the metadata of the .pdf file (extracted via the library [PyPDF2](https://github.com/mstamy2/PyPDF2)) and check if any of them contains a string that matches the pattern of \na DOI or an arXiv ID. Priority is given to metadata which contain the word 'doi' in their label.\n\n2. Check if the name of the pdf file contains any sub-string that matches the pattern of \na DOI or an arXiv ID.\n\n3. Scan the text inside the .pdf file, and check for any string that matches the pattern of \na DOI or an arXiv ID. The text is extracted with the libraries [PyPDF2](https://github.com/mstamy2/PyPDF2) and [pdfminer](https://github.com/pdfminer/pdfminer.six). If the library \n[textract](https://github.com/deanmalmgren/textract) is installed, ```pdf2doi``` will try to use that too.\n\n4. Try to find possible titles of the publication. In the current version, possible titles are identified via \nthe libraries [pdftitle](https://github.com/metebalci/pdftitle) and [PyMuPDF](https://github.com/pymupdf/PyMuPDF), and by the file name. For each possible title a google search \nis performed and the plain text of the first results is scanned for valid identifiers.\n\n5. As a last desperate attempt, the first N=1000 characters of the pdf text are used as a query for\na google search. The plain text of the first results is scanned for valid identifiers.\n\nAny time that a potential identifier is found, it is also validated by performing a query to a relevant website (e.g., http://dx.doi.org for DOIs and http://export.arxiv.org for arxiv IDs). \nThis validation process also returns raw [BibTeX](http://www.bibtex.org/) info when the identifier is valid. \n\nWhen a valid identifier is found with any method different than the first one, the identifier is also stored inside the metadata of\nthe pdf file. In this way, future lookups of this same file will be able to extract the identifier with the \nfirst method, speeding up the search (This feature can be disabled by the user, in case edits to the pdf file are not desired).\n\nThe library is far from being perfect. Often, especially for old publications, none of the currently implemented methods will work. Other times the wrong DOI might be extracted: this can happen, for example, \nif the DOI of another paper is present in the pdf text and it appears before the correct DOI. A quick and dirty solution to this problem is to look up the identifier manually and then add it to the metadata\nof the file, with the methods shown [here](#manually-associate-the-correct-identifier-to-a-file) (from python console) or [here](#manually-associate-the-correct-identifier-to-a-file-from-command-line) (from command line). \nIn this way, ```pdf2doi``` will always retrieve the correct DOI when analyzing this same file in the future, which can be useful when ```pdf2doi```  is used  to automatize\n bibliographic procedures for a large number of files (e.g. via [pdf2bib](https://github.com/MicheleCotrufo/pdf2bib) or\n[pdf-renamer](https://github.com/MicheleCotrufo/pdf-renamer)).\n\nCurrently, only the format of arXiv identifiers in use after [1 April 2007](https://arxiv.org/help/arxiv_identifier) is supported.\n\n## Usage\n\npdf2doi can be used either as a [stand-alone application](#command-line-usage) invoked from the command line, or by [importing it in your python project](#usage-inside-a-python-script) or, only for Windows, \ndirectly from the [right-click context menu](#installing-the-shortcuts-in-the-right-click-context-menu-of-windows) of a pdf file or a folder.\n\n### Command line usage\n```pdf2doi``` can be invoked directly from the command line, without having to open a python console.\nThe simplest command-line invokation is\n\n```.\n$ pdf2doi 'path/to/target'\n```\nwhere ```target``` is either a valid pdf file or a directory containing pdf files. Adding the optional command '-v' increases the output verbosity,\ndocumenting all steps.\nFor example, when targeting the folder [examples](/examples) we get the following output\n\n```\n$ pdf2doi \".\\examples\" -v\n[pdf2doi]: Looking for pdf files in the folder ....\n[pdf2doi]: Found 4 pdf files.\n[pdf2doi]: ................\n[pdf2doi]: Trying to retrieve a DOI/identifier for the file: .\\examples\\chaumet_JAP_07.pdf\n[pdf2doi]: Method #1: Looking for a valid identifier in the document infos...\n[pdf2doi]: Could not find a valid identifier in the document info.\n[pdf2doi]: Method #2: Looking for a valid identifier in the file name...\n[pdf2doi]: Could not find a valid identifier in the file name.\n[pdf2doi]: Method #3: Looking for a valid identifier in the document text...\n[pdf2doi]: Extracting text with the library PyPdf...\n[pdf2doi]: Text extracted succesfully. Looking for an identifier in the text...\n[pdf2doi]: Validating the possible DOI 10.1063/1.2409490 via a query to dx.doi.org...\n[pdf2doi]: The DOI 10.1063/1.2409490 is validated by dx.doi.org.\n[pdf2doi]: A valid DOI was found in the document text.\n[pdf2doi]: Trying to add the tag '/pdf2doi_identifier'-\u003e '10.1063/1.2409490' into the metadata of the file '.\\chaumet_JAP_07.pdf'...\n[pdf2doi]: The tag '/pdf2doi_identifier'-\u003e '10.1063/1.2409490' was added succesfully to the metadata of the file '.\\chaumet_JAP_07.pdf'...\n[pdf2doi]: 10.1063/1.2409490\n[pdf2doi]: ................\n[pdf2doi]: Trying to retrieve a DOI/identifier for the file: .\\examples\\paper12.2009_unknown_040916_440842.pdf\n[pdf2doi]: Method #1: Looking for a valid identifier in the document infos...\n[pdf2doi]: Could not find a valid identifier in the document info.\n[pdf2doi]: Method #2: Looking for a valid identifier in the file name...\n[pdf2doi]: Could not find a valid identifier in the file name.\n[pdf2doi]: Method #3: Looking for a valid identifier in the document text...\n[pdf2doi]: Extracting text with the library PyPdf...\n[pdf2doi]: Text extracted succesfully. Looking for an identifier in the text...\n[pdf2doi]: Could not find a valid identifier in the document text extracted by PyPdf.\n[pdf2doi]: Extracting text with the library pdfminer...\n[pdf2doi]: Text extracted succesfully. Looking for an identifier in the text...\n[pdf2doi]: Validating the possible DOI 10.1037/a0015278 via a query to dx.doi.org...\n[pdf2doi]: The DOI 10.1037/a0015278 is validated by dx.doi.org.\n[pdf2doi]: A valid DOI was found in the document text.\n[pdf2doi]: Trying to add the tag '/pdf2doi_identifier'-\u003e '10.1037/a0015278' into the metadata of the file '.\\paper12.2009_unknown_040916_440842.pdf'...\n[pdf2doi]: The tag '/pdf2doi_identifier'-\u003e '10.1037/a0015278' was added succesfully to the metadata of the file '.\\paper12.2009_unknown_040916_440842.pdf'...\n[pdf2doi]: 10.1037/a0015278\n[pdf2doi]: ................\n[pdf2doi]: Trying to retrieve a DOI/identifier for the file: .\\examples\\PhysRevLett.116.061102.pdf\n[pdf2doi]: Method #1: Looking for a valid identifier in the document infos...\n[pdf2doi]: Could not find a valid identifier in the document info.\n[pdf2doi]: Method #2: Looking for a valid identifier in the file name...\n[pdf2doi]: Could not find a valid identifier in the file name.\n[pdf2doi]: Method #3: Looking for a valid identifier in the document text...\n[pdf2doi]: Extracting text with the library PyPdf...\n[pdf2doi]: Text extracted succesfully. Looking for an identifier in the text...\n[pdf2doi]: Standardised DOI: 10.1103/PhysRevLett.116.061102 -\u003e 10.1103/physrevlett.116.061102\n[pdf2doi]: Validating the possible DOI 10.1103/physrevlett.116.061102 via a query to dx.doi.org...\n[pdf2doi]: The DOI 10.1103/physrevlett.116.061102 is validated by dx.doi.org.\n[pdf2doi]: Standardised DOI: 10.1103/PhysRevLett.116.061102 -\u003e 10.1103/physrevlett.116.061102\n[pdf2doi]: A valid DOI was found in the document text.\n[pdf2doi]: Trying to add the tag '/pdf2doi_identifier'-\u003e '10.1103/physrevlett.116.061102' into the metadata of the file '.\\PhysRevLett.116.061102.pdf'...\n[pdf2doi]: The tag '/pdf2doi_identifier'-\u003e '10.1103/physrevlett.116.061102' was added succesfully to the metadata of the file '.\\PhysRevLett.116.061102.pdf'...\n[pdf2doi]: 10.1103/physrevlett.116.061102\n[pdf2doi]: ................\n[pdf2doi]: Trying to retrieve a DOI/identifier for the file: .\\examples\\s41586-019-1666-5.pdf\n[pdf2doi]: Method #1: Looking for a valid identifier in the document infos...\n[pdf2doi]: Could not find a valid identifier in the document info.\n[pdf2doi]: Method #2: Looking for a valid identifier in the file name...\n[pdf2doi]: Could not find a valid identifier in the file name.\n[pdf2doi]: Method #3: Looking for a valid identifier in the document text...\n[pdf2doi]: Extracting text with the library PyPdf...\n[pdf2doi]: Text extracted succesfully. Looking for an identifier in the text...\n[pdf2doi]: Validating the possible DOI 10.1038/s41586-019-1666-5 via a query to dx.doi.org...\n[pdf2doi]: The DOI 10.1038/s41586-019-1666-5 is validated by dx.doi.org.\n[pdf2doi]: A valid DOI was found in the document text.\n[pdf2doi]: Trying to add the tag '/pdf2doi_identifier'-\u003e '10.1038/s41586-019-1666-5' into the metadata of the file '.\\s41586-019-1666-5.pdf'...\n[pdf2doi]: The tag '/pdf2doi_identifier'-\u003e '10.1038/s41586-019-1666-5' was added succesfully to the metadata of the file '.\\s41586-019-1666-5.pdf'...\n[pdf2doi]: 10.1038/s41586-019-1666-5\n[pdf2doi]: ................\nDOI             10.1063/1.2409490                        .\\chaumet_JAP_07.pdf\n\nDOI             10.1037/a0015278                         .\\paper12.2009_unknown_040916_440842.pdf\n\nDOI             10.1103/physrevlett.116.061102           .\\PhysRevLett.116.061102.pdf\n\nDOI             10.1038/s41586-019-1666-5                .\\s41586-019-1666-5.pdf\n```\n\nEvery line which begins with ```[pdf2doi]``` is omitted when the optional command '-v' is absent.\nIn the final output, the first column specifies the kind of identifier (currently either 'DOI' or 'arxiv'), the second column contains the found DOI/identifier, and the third column contains the file path.\n\n\nA list of all optional arguments can be generated by ```pdf2doi --h```\n```\n$ pdf2doi --h\nusage: pdf2doi [-h] [-v] [-nws] [-nwv] [-nostore] [-no_arxiv2doi] [-id IDENTIFIER] [-google GOOGLE_RESULTS] [-s FILENAME_IDENTIFIERS] [-clip] [-install--right--click] [-uninstall--right--click] [path ...]\n\nRetrieves the DOI or other identifiers (e.g. arXiv) from pdf files of a publications.\n\npositional arguments:\n  path                  Relative path of the target pdf file or of the targe folder.\n\noptions:\n  -h, --help            show this help message and exit\n  -v, --verbose         Increase verbosity. By default (i.e. when not using -v), only a table with the found identifiers will be printed as output.\n  -nws, --no_web_search\n                        Disable any method to find identifiers which requires internet searches (e.g. queries to google).\n  -nwv, --no_web_validation\n                        Disable the online validation of identifiers (e.g., via queries to http://dx.doi.org/).\n  -nostore, --no_store_identifier_metadata\n                        By default, anytime an identifier is found it is added to the metadata of the pdf file (if not present yet). By using this additional option, the identifier is not stored in the file\n                        metadata.\n  -no_arxiv2doi         If a valid arXiv ID is found for a given .pdf file, by default pdf2doi will try to also look for a DOI (either because the paper has been published in a journal or because arXiv has\n                        assigned to it a DOI of the form \"10.48550/arXivID\"). By adding this command, the arXiv ID is instead always returned.\n  -id IDENTIFIER        Stores the string IDENTIFIER in the metadata of the target pdf file, with key '/pdf2doi_identifier'. Note: when this argument is passed, all other arguments (except for the path to the\n                        pdf file) are ignored.\n  -google GOOGLE_RESULTS\n                        Set how many results should be considered when doing a google search for the DOI (default=6).\n  -s FILENAME_IDENTIFIERS, --save_identifiers_file FILENAME_IDENTIFIERS\n                        Save all the identifiers found in the target folder in a text file inside the same folder with name specified by FILENAME_IDENTIFIERS. This option is only available when a folder is\n                        targeted.\n  -clip, --save_doi_clipboard\n                        Store all found DOI/identifiers into the clipboard.\n  -install--right--click\n                        Add a shortcut to pdf2doi in the right-click context menu of Windows. You can copy the identifier and/or bibtex entry of a pdf file (or all pdf files in a folder) into the clipboard by\n                        just right clicking on it! NOTE: this feature is only available on Windows.\n  -uninstall--right--click\n                        Uninstall the right-click context menu functionalities. NOTE: this feature is only available on Windows.\n```\n\n#### Manually associate the correct identifier to a file from command line\nSometimes it is not possible to retrieve a DOI/identifier automatically, or maybe the one that is retrieved is not the correct one. In these (hopefully rare) occasions\nit is possible to manually add the correct DOI/identifier to the pdf metadata, by using the ```-id``` argument,\n```\n$ pdf2doi \"path\\to\\pdf\" -id \"doi1234\"\n```\nThis creates a new metadata in the pdf file with label '/pdf2doi_identifier' and containing the string ```doi1234```.  Future lookups of this same file via ```pdf2doi``` (in particular when used by other tools such as [pdf2bib](https://github.com/MicheleCotrufo/pdf2bib) or\n[pdf-renamer](https://github.com/MicheleCotrufo/pdf-renamer)) will then return the correct identifier and BibTeX infos.\n\n### Usage inside a python script\n```pdf2doi``` can also be used as a library within a python script. The function ```pdf2doi.pdf2doi``` is the main point of entry. The function looks for the identifier of a pdf file by applying all the available methods. \nThe first input argument must be a valid path (either absolute or relative) to a pdf file or to a folder containing pdf files. The path can be passed either as a string, or as a Pathlib object \nThe same optional arguments available in the command line operation are now available via the methods ```set``` and ```get``` of the object ```pdf2doi.config```\nFor example, we can scan the folder [examples](/examples) while soppressing output verbosity by, \n\n```python\n\u003e\u003e\u003e import pdf2doi\n\u003e\u003e\u003e pdf2doi.config.set('verbose',False)\n\u003e\u003e\u003e results = pdf2doi.pdf2doi('.\\examples')\n```\nA full list of the library settings can be printed by the method ```pdf2doi.config.print()```\n```python\n\u003e\u003e\u003e import pdf2doi\n\u003e\u003e\u003e pdf2doi.config.print()\nverbose : True (bool)\nseparator : \\ (str)\nmethod_dxdoiorg : application/citeproc+json (str)\nwebvalidation : True (bool)\nwebsearch : True (bool)\nnumb_results_google_search : 6 (int)\nN_characters_in_pdf : 1000 (int)\nsave_identifier_metadata : True (bool)\nreplace_arxivID_by_DOI_when_available : True (bool)\n```\n\nThe output of the function ```pdf2doi``` is a list of dictionaries (or just a single dictionary if a single file was targeted). Each dictionary has the following keys\n\n```python\nresult['identifier'] = DOI or other identifier (or None if nothing is found)\nresult['identifier_type'] = string specifying the type of identifier (e.g. 'doi' or 'arxiv')\nresult['validation_info'] = Additional info on the paper. If config.get('webvalidation') = True, then result['validation_info']\n                            will typically contain raw bibtex data for this paper. Otherwise it will just contain True \nresult['path'] = path of the pdf file\nresult['method'] = method used to find the identifier\n```\nFor example, the DOIs/identifiers of each file can be printed by\n```python\n\u003e\u003e\u003e for result in results:\n\u003e\u003e\u003e     print(result['identifier'])\n10.1016/0021-9991(86)90093-8\n10.1063/1.2409490\n10.1103/PhysRevLett.116.061102\n10.1038/s41586-019-1666-5\n```\n\nBy default, everytime that a valid DOI/identifier is found, it is stored in the metadata of the pdf file. In this way, subsequent lookups of the same folder/file will be much faster.\nThis behaviour can be removed (e.g. if the user does not want or cannot edit the files) by setting save_identifier_metadata to False, via\n```python\n\u003e\u003e\u003e pdf2doi.config.set('save_identifier_metadata',False)\n```\n\n#### Manually associate the correct identifier to a file\nSimilarly to what described [above](#manually-associate-the-correct-identifier-to-a-file-from-command-line), it is possible to associate a (manually found) \nidentifier to a pdf file also from within python, by using the function ```pdf2doi.add_found_identifier_to_metadata```:\n\n```python\n\u003e\u003e\u003e import pdf2doi\n\u003e\u003e\u003e pdf2doi.add_found_identifier_to_metadata(path_to_pdf_file, identifier)\n```\nthis creates a new metadata in the pdf file with label '/pdf2doi_identifier' and containing the string ```identifier```.  \n\n## Installing the shortcuts in the right-click context menu of Windows\nThis functionality is only available on Windows (and so far it has been tested only on Windows 10). It adds additional commands to the context menu of Windows\nwhich appears when right-clicking on a pdf file or on a folder.\n\u003c!--\n\u003cimg src=\"docs/ContextMenu_pdf.png\" width=\"550\" /\u003e\u003cimg src=\"docs/ContextMenu_folder.png\" width=\"550\" /\u003e\n--\u003e\nThe different menu commands allow to copy the paper(s) identifier(s) into the system clipboard, or also to manually\nset the identifier of a pdf file (see also [here](#manually-associate-the-correct-identifier-to-a-file-from-command-line)).\n\u003c!--\n\u003cimg src=\"docs/ContextMenu_pdf.gif\" width=\"500\" /\u003e\n--\u003e\nTo install this functionality, first install ```pdf2doi``` via pip (as described above), then open a command prompt **with administrator rights** and execute\n```\n$ pdf2doi  -install--right--click\n```\nTo remove it, simply run (again from a terminal with administrator rights)\n```\n$ pdf2doi  -uninstall--right--click\n```\nIf it is not possible to run this command from a terminal with administrator rights, the batch files\n[here](/right_click_menu_installation) can be alternatively used (see readme.MD file in the same folder for instructions), although it is still required to have \nadmnistrator rights.\n\nNOTE: when multiple pdf files are selected, and the right-click context menu commands are used, ```pdf2doi``` will be called separately for each file, and thus\nonly the info of the last file will be stored in the clipboard. In order to copy the info of multiple files it is necessary to save them in a folder and right-click on the folder.\n\n## Contributing\nPull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.\n\n## Acknowledgment\nI am thankful to my friend and colleague Yarden Mazor for leading the beta-testing efforts for this project.\n\n## Donating\nIf you find this library useful (or amazing!), please consider making donations on my behalf to organizations that advocate for and promote free dissemination of science, such as\n\n[arXiv](https://arxiv.org/about/donate)\n\n[Sci-Hub](https://sci-hub.se/donate)\n\n[Wikipedia](https://donate.wikimedia.org/)\n\n\n## License\n[MIT](https://choosealicense.com/licenses/mit/)\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmichelecotrufo%2Fpdf2doi","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmichelecotrufo%2Fpdf2doi","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmichelecotrufo%2Fpdf2doi/lists"}