{"id":13586145,"url":"https://github.com/perrette/papers","last_synced_at":"2025-04-06T13:09:46.229Z","repository":{"id":26528095,"uuid":"109117658","full_name":"perrette/papers","owner":"perrette","description":"Command-line tool to manage bibliography (pdfs + bibtex) ","archived":false,"fork":false,"pushed_at":"2023-10-18T14:24:12.000Z","size":14023,"stargazers_count":137,"open_issues_count":12,"forks_count":22,"subscribers_count":8,"default_branch":"master","last_synced_at":"2024-04-14T11:15:07.823Z","etag":null,"topics":["bibtex","crossref","doi","filemanager","google-scholar","pdf"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/perrette.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-11-01T10:25:27.000Z","updated_at":"2024-05-20T10:46:44.843Z","dependencies_parsed_at":"2024-06-09T17:47:40.101Z","dependency_job_id":null,"html_url":"https://github.com/perrette/papers","commit_stats":{"total_commits":192,"total_committers":10,"mean_commits":19.2,"dds":0.21875,"last_synced_commit":"e12996431135ab9755b8dee7738f45095abdb74d"},"previous_names":[],"tags_count":22,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/perrette%2Fpapers","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/perrette%2Fpapers/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/perrette%2Fpapers/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/perrette%2Fpapers/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/perrette","download_url":"https://codeload.github.com/perrette/papers/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247485287,"owners_count":20946398,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bibtex","crossref","doi","filemanager","google-scholar","pdf"],"created_at":"2024-08-01T15:05:21.214Z","updated_at":"2025-04-06T13:09:46.197Z","avatar_url":"https://github.com/perrette.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"[![test](https://github.com/perrette/papers/workflows/CI/badge.svg?query=branch%3Amaster)](https://github.com/perrette/papers/actions)\n[![python](https://img.shields.io/badge/python-3.9-blue.svg)]()\n[![python](https://img.shields.io/badge/python-3.10-blue.svg)]()\n[![python](https://img.shields.io/badge/python-3.11-blue.svg)]()\n[![python](https://img.shields.io/badge/python-3.12-blue.svg)]()\n[![python](https://img.shields.io/badge/python-3.13-blue.svg)]()\n\n# papers\n\nCommand-line tool to manage bibliography (pdfs + bibtex)\n\nDisclaimer: This tool requires further development and testing, and might never be fully production-ready (contributors welcome). \\\nThat said, it is becoming useful :)\n\n## Motivation\n\nThis project is an attempt to create a light-weight,\ncommand-line bibliography management tool. Aims:\n\n- maintain a PDF library (with appropriate naming)\n- maintain one or several bibtex-compatible collections, linked to PDFs\n- enough PDF-parsing capability to fetch metadata from the internet (i.e. [crossref](https://github.com/CrossRef/rest-api-doc) or google-scholar)\n\n\n## Dependencies\n\n- python 3.9+\n- [PyMuPDF](https://github.com/pymupdf/PyMuPDF) (preferred) or [poppler-utils](https://en.wikipedia.org/wiki/Poppler_(software)) (only:`pdftotext`; deprecated): convert PDF to text for parsing\n- [bibtexparser](https://bibtexparser.readthedocs.io) : parse bibtex files\n- [crossrefapi](https://github.com/fabiobatalha/crossrefapi) : make polite requests to crossref API\n- [scholarly](https://github.com/OrganicIrradiation/scholarly) : interface for google scholar\n- [rapidfuzz](https://github.com/rhasspy/rapidfuzz) : calculate score to sort crossref requests\n- [unidecode](https://github.com/avian2/unidecode) : replace unicode with ascii equivalent\n\n## Install\n\n- `pip install papers-cli`\n\nNote there is another project registered on pypi as papers, hence `papers-cli` for command-line-interface.\n\n## Getting started\n\nThis tool's interface is built like `git`, with main command `papers` and a range of subcommands.\n\n### Extract PDF metadata and add to library\n\nStart with PDF of your choice (modern enough to have a DOI, e.g. anything from the Copernicus publications).\nFor the sake of the example, one of my owns: https://www.earth-syst-dynam.net/4/11/2013/esd-4-11-2013.pdf\n\n- extract pdf metadata (doi-based if available, otherwise crossref, or google scholar if so specified)\n\n\t\t$\u003e papers extract esd-4-11-2013.pdf\n\t\t@article{Perrette_2013,\n\t\tdoi = {10.5194/esd-4-11-2013},\n\t\turl = {https://doi.org/10.5194%2Fesd-4-11-2013},\n\t\tyear = 2013,\n\t\tmonth = {jan},\n\t\tpublisher = {Copernicus {GmbH}},\n\t\tvolume = {4},\n\t\tnumber = {1},\n\t\tpages = {11--29},\n\t\tauthor = {M. Perrette and F. Landerer and R. Riva and K. Frieler and M. Meinshausen},\n\t\ttitle = {A scaling approach to project regional sea level rise and its uncertainties},\n\t\tjournal = {Earth System Dynamics}\n\t\t}\n\n- add pdf to `papers.bib`  library, and rename a copy of it in a files directory `files`.\n\n\t\t$\u003e papers add esd-4-11-2013.pdf --rename --copy --bibtex papers.bib --filesdir files --info\n\t\tINFO:papers:found doi:10.5194/esd-4-11-2013\n\t\tINFO:papers:new entry: perrette_landerer2013\n\t\tINFO:papers:mv /home/perrette/playground/papers/esd-4-11-2013.pdf files/perrette_et_al_2013_a-scaling-approach-to-project-regional-sea-level-rise-and-its-uncertainties.pdf\n\t\tINFO:papers:renamed file(s): 1\n\n(the `--info` argument asks for the above output information to be printed out to the terminal)\n\nThat is equivalent to doing:\n\n    papers extract esd-4-11-2013.pdf \u003e entry.bib\n    papers add entry.bib --bibtex papers.bib --attachment esd-4-11-2013.pdf --rename --copy\n\nSee [Control fields when renaming file](#control-fields-when-renaming-file) for how to specify file naming patterns.\n\n### Add library entry from its DOI\n\nIf you already know the DOI of a PDF, and don't want to gamble the fulltext search and match, you can indicate it via `--doi`:\n\n    papers add esd-4-11-2013.pdf --doi 10.5194/esd-4-11-2013 --bibtex papers.bib\n\nThe `add` command above also works without any PDF (create a bibtex entry without file attachment).\n\n    papers add --doi 10.5194/esd-4-11-2013 --bibtex papers.bib\n\n\n### Add entry without DOI from bibtex library + PDF\n\nSome old files don't have a DOI. Best is to add the entry from its bibtex:\n\n    papers add entry.bib --attachment esd-4-11-2013.pdf\n\n\n### List entries (and edit etc...)\n\nPretty listing by default (otherwise pass --plain for plain bibtex):\n\n    $\u003e papers list\n    Perrette2013: A scaling approach to project regional sea level rise and it... (doi:10.5194/esd-4-11-2013, file:1)\n\nSearch with any number of keywords:\n\n    $\u003e papers list perrette scaling approach sea level\n    ... (short list)\n    $\u003e papers list perrette scaling approach sea level --any\n    ... (long list)\n    $\u003e papers list --key perrette2013 --author perrette --year 2013 --title scaling approach sea level\n    ... (precise list)\n\nAdd tags to view papers by topic:\n\n    $\u003e papers list perrette2013 --add-tag sea-level projections\n    ...\n    $\u003e papers list --tag sea-level projections\n    Perrette2013: A scaling approach to project regional sea level rise and it... (doi:10.5194/esd-4-11-2013, file:1, sea-level | projections )\n\n`papers list` is a powerful command, inspired from unix's `find` and `grep`.\n\nIt lets you search in your bibtex in a typical manner (including a number of special flags such as `--duplicates`, `--review-required`, `--broken-file`...),\nthen output the result in a number of formats (one-liner, raw bibtex, keys-only, selected fields) or let you perform actions on it (currently `--edit`, `--delete`, `--add-tag`, `--fetch`).\nFor instance, it is possible to manually merge the duplicates with:\n\n    $\u003e papers list --duplicates --edit\n\n\n### Control fields when renaming file\n\n        $\u003e papers add --rename --info --name-template \"{AuthorX}{year}-{Title}\" --name-title-sep '' --name-author-sep '' esd-4-11-2013\n        INFO:papers:found doi:10.5194/esd-4-11-2013\n        INFO:papers:new entry: perrette2013scaling\n        INFO:papers:create directory: files/2013\n        INFO:papers:mv /home/perrette/playground/papers/esd-4-11-2013.pdf files/PerretteEtAl2013-AScalingApproachToProjectRegionalSeaLevelRiseAndItsUncertainties.pdf\n        INFO:papers:renamed file(s): 1\n\nwhere '--name-template' is a python template (will be formated via .format() method) with valid fields being any field available in the bibtex. Fields not in the bibtex will remain untouched.\n\nTo rename `esd-4-11-2013.pdf` as `perrette_2013.pdf`, the template should be `--name-template {author}_{year} --name-author-num 1`\nIf that happens to be the entry ID, `ID` also works.\n\nTo `rename esd-4-11-2013.pdf` as `2013/Perrette2013-AScalingApproachToProjectRegionalSeaLevelRiseAndItsUncertainties.pdf`,\nname-template should be `--name-template {year}/{Author}{year}-{Title} --name-title-sep ''` (note the case).\n\nEntries are case-sensitive, and a few more fields are added, so that:\n- 'author' generates 'perrette'\n- 'Author' generates 'Perrette'\n- 'AUTHOR' generates 'PERRETTE'\n- 'authorX' generates 'perrette', 'perrette_and_landerer' or 'perrette_et_al' dependening on the number of authors\n- 'AuthorX' same as authorX but capitalized\n\nThe modifiers are:\n\n- `--name-title-sep` : separator for title words\n- `--name-title-length` : max title length\n- `--name-title-word-size` : min size to be considered a word\n- `--name-title-word-num` : max number of title words\n\nand similarly:\n\n- `--name-author-sep` : separator for authors\n- `--name-author-num` : number of authors to  (not relevant for `{authorX}`)\n\nThe same template and modifiers system applies to the bibtex key generation by replacing the prefix `--name-` with `--key-`, e.g. `--key-template`\n\n\nIn the common case where the bibtex (`--bibtex`), files directory  (`--filesdir`), and name and key formats (e.g. `--name-template`) do not change, it is convenient to\n(install)[#install-make-bibtex-and-files-directory-persistent] `papers`.\n\n\n### install: make bibtex and files directory persistent\n\n    $\u003e papers install --bibtex papers.bib --filesdir files\n    papers configuration\n    * configuration file: /home/perrette/.config/papersconfig.json\n    * cache directory:    /home/perrette/.cache/papers\n    * absolute paths:     True\n    * files directory:    files (1 files, 5.8 MB)\n    * bibtex:            papers.bib (1 entries)\n\nThe configuration file is global (unless `--local` is specified), so from now on, any `papers`\ncommand will know about these settings: no need to specify bibtex file or files directory.\n\nType `papers status -v` to check your configuration.\n\nYou also notice a cache directory. All internet requests such as crossref requests are saved in the cache directory.\nThis happens regardless of whether `papers` is installed or not.\n\n\n#### local install\n\nSometimes it is desirable to have separate configurations. In that case a local install is the way to go:\n\n    $\u003e papers install --local\n    Bibtex file name [default to existing: papers.bib] [Enter/Yes/No]:\n    Files folder [default to new: papers] [Enter/Yes/No]: pdfs\n    papers configuration\n    * configuration file: papersconfig.json\n    * cache directory:    /home/perrette/.cache/papers\n    * absolute paths:     True\n    * git-tracked:        False\n    * files directory:    pdfs (90 files, 337.4 MB)\n    * bibtex:             papers.bib (82 entries)\n\n\nCreates a local configuration file in a hidden `.papers` folder.\nBy default, it expects existing or creates new `papers.bib` bibliography and `papers` files folder in the local directory, though `papers` will ask first unless explicitly provided.\nNote that every call from a subfolder will also detect that configuration file (it has priority over global install).\n\nBy default, the local install is meant to be portable with bibtex and files, so the file paths are encoded relatively to the bibtex file.\nIf instead absolute paths make more sense (example use case: local bibtex file but central PDF folder), then simply specify `--absolute-paths` options:\n\n    `papers install --local --absolute-paths --filesdir /path/to/central/pdfs`\n\n\n#### uninstall\n\nGetting confused with papers config files scattered in subfolders ? Check the config with\n\n    papers status -v\n\nand remove the configuration file by hand (`rm ...`). Or use `papers uninstall` command:\n\n    papers uninstall\n\nYou may repeat `papers status -v` and cleaning until a satisfying state is reached, or remove all config files recursively up to (and including) global install:\n\n    papers uninstall --recursive\n\n\n### Relative versus Absolute path\n\nBy default, the file paths in the bibtex are stored as absolute paths (starting with `/`), except for local installs.\nIt is possible to change this behaviour explicitly during install or in a case by case basis with `--relative-paths` or `--absolute-paths` options.\nWith or without install.\n\n\n### Move library to a new location\n\nMoving a library can be tricky.\nSimple cases are:\n- files are stored in a central repository, and the bibtex contains absolute paths. Then moving the bibtex by hand is fine.\n- files are stored alongside the bibtex, and the bibtex contains relative paths. Just move around the folder containing bibtex and files\nIn any other cases, you risk breaking the file links.\n\nPapers tries to be as little opinionated as possible about how files are organized, and so it relies on your own judgement and use case.\nWhen loading a bibtex, it always interprete relative file links as being relative to the bibtex file.\nWhen saving a bibtex, it will save file links accordingly to the default setting path (usually absolute, unless local install or unless you specify otherwise).\n\nIn any case, the following set of commands will always work provided the initial file links are valid (optional parameters in brackets):\n\n    touch new.bib\n    papers add /path/to/old.bib --bib new.bib [ --rename ] [ --relative-paths ] [ --filesdir newfilesdir ]\n    rm -f /path/to/old.bib\n\n\n### check\n\nIt's easy to end up with duplicates in your bibtex. After adding PDFs, or every once in a while, do:\n\n    papers check --duplicates\n\n\n### filecheck\n\nCheck for broken links, rename files etc. Example:\n\n    papers filecheck --rename\n\nThe command can be used to move around the file directory:\n\n    papers filecheck --rename --filesdir newfilesdir\n\nThat command is also convenient to check on what's actually tracked and what is not. Example workflow\n\n    papers filecheck --rename --filesdir tmp\n    # check what's left over in your initial files directory, e.g.\n    # papers extract files/leftover1.pdf\n    # papers add files/leftover1.pdf\n    # ...\n    papers filecheck --rename --filesdir files\n\nThere is also a command specifically designed to clean up the zombie files and folders:\n\n    papers filecheck --clean-filesdir\n\nThat command will ask before removig anything, unless `--force` is passed. Currently\nit ignores hidden files and folders, and will only consider folder that have a `.{folder}.bib` file inside,\nwhich is the convention `papers` follows to store multiple attachments. That command works best\nwhen the files are in their own folder, and not mixed up with other things, obviously.\n\n### Setup git-tracked library (optional)\n\n    Install comes with the option to git-track any change to the bibtex file (`--git`) options.\n\n    $\u003e papers install --bibtex papers.bib --filesdir files --git  [ --git-lfs ]\n\nFrom now on, every change to the library will result in an automatic git commit.\nAnd `papers git ...` command will work just as `git ...` executed from the bibtex directory.\nE.g. `papers git add origin *REMOTE URL*`; `papers git lfs track files`; `papers git add files`; `papers git push`\n\nIf `--git-lfs` is passed, the files will be backed-up along with the bibtex.\nUnder the hood, bibtex and files (if applicable) are copied (hard-linked) to a back-up directory.\nDetails are described in [issue 51](https://github.com/perrette/papers/issues/51).\n\nBackup occurs in a subfolder of `~/.local/.share/papers` regardless of the type of installation. Type `papers status -v` to find out.\nFor local install that are already git-tracked, the feature remains useful as it is the basis for `papers undo` and `papers redo`.\n\n### undo / redo\n\nDid a `papers add` and are unhappy with the result?\n\n    papers undo\n\nwill revert to the previous version. If repeated, it will jump back and forth between latest and before-latest.\nUnless papers is installed with --git option, in which case `papers undo` and `papers redo` will have essentially infinite memory\n(doing undos and making a new commit risk loosing history, unless you keep track of the commit).\n\n\nConsult inline help for more detailed documentation!\n\n\nCurrent features\n----------------\n- parse PDF to extract DOI\n- fetch bibtex entry from DOI (using crossref API)\n- fetch bibtex entry by fulltext search (using crossref API or google scholar)\n- create and maintain bibtex file\n- add entry as PDF (`papers add ...`)\n- add entry as bibtex (`papers add ...`)\n- scan directory for PDFs (`papers add ...`)\n- rename PDFs according to bibtex key and year (`papers filecheck --rename [--copy]`)\n- some support for attachment\n- merging (`papers check --duplicates ...`)\n- fix entries (`papers check --format-name --encoding unicode --fix-doi --fix-key ...`)\n- configuration file with default bibtex and files directory (`papers install --bibtex BIB --filesdir DIR ...`)\n- integration with git\n- undo/redo command (`papers undo / redo`)\n- display / search / list entries : format as bibtex or key or whatever (`papers list ... [--key-only, -l]`)\n- list + edit or remove entry by key or else  (`papers list ... [--edit, --delete]`)\n- fix broken PDF links (`papers filecheck ...`):\n    - remove duplicate file names (always) or file copies (`--hash-check`)\n    - remove missing link (`--delete-missing`)\n    - fix files name after a Mendeley export (`--fix-mendeley`):\n        - leading '/' missing is added\n        - latex characters, e.g. `{\\_}` or `{\\'{e}}` replaced with unicode\n\n\nTests\n-----\nTest coverage is improving (now 80%)\n\nCurrently covers:\n- `papers extract` (test on a handful of PDFs)\n    - parse pdf DOI\n    - fetch bibtex on crossref based on DOI\n    - fetch bibtex on crossref based fulltext search\n    - fetch bibtex on google-scholar based fulltext search\n- `papers add`\n    - add entry and manage conflict\n    - add pdf file, bibtex, directory\n    - add one pdf file with attachment (beta, API will change)\n    - conflict resolution\n- `papers install`\n- internals:\n    - duplicate test with levels `EXACT`, `GOOD`, `FAIR` (the default), `PARTIAL`\n- `papers list`\n- `papers undo / redo` (partial)\n- `papers filecheck --rename` (superficial)\n- `papers check --duplicate` (fix DOI etc.) (superficial)\n\n\nWhy not JabRef, Zotero or Mendeley (or...) ?\n--------------------------------------------\n- JabRef (2.10) is nice, light-weight, but is not so good at managing PDFs.\n- Zotero (5.0) features excellent PDF import capability, but it needs to be manually one by one and is a little slow. Not very flexible.\n- Mendeley (1.17) is perfect at automatically extracting metadata from downloaded PDF and managing your PDF library,\nbut it is not open source, and many issues remain (own experience, Ubuntu 14.04, Desktop Version 1.17):\n    - very unstable\n    - PDF automatic naming is too verbose, and sometimes the behaviour is unexpected (some PDFs remain in on obscure Downloaded folder, instead of in the main collection)\n    - somewhat heavy (it offers functions of online syncing, etc)\n    - poor seach capability (related to the point above)\n\nAbove-mentioned issues will with no doubt be improved in future releases, but they are a starting point for this project.\nAnyway, a command-line tool is per se a good idea for faster development,\nas noted [here](https://forums.zotero.org/discussion/43386/zotero-cli-version),\nbut so far I could only find zotero clients for their online API\n(like [pyzotero](https://github.com/urschrei/pyzotero) or [zotero-cli](https://github.com/jbaiter/zotero-cli)).\nPlease contact me if you know another interesting project.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fperrette%2Fpapers","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fperrette%2Fpapers","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fperrette%2Fpapers/lists"}