{"id":16779771,"url":"https://github.com/petermosmans/apdfhelper","last_synced_at":"2025-03-16T19:45:16.141Z","repository":{"id":205101378,"uuid":"713406476","full_name":"PeterMosmans/apdfhelper","owner":"PeterMosmans","description":"Fix links in PDF files, rewrite links, extract text annotations, remove pages","archived":false,"fork":false,"pushed_at":"2024-01-04T10:44:54.000Z","size":101,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-01-23T06:28:52.476Z","etag":null,"topics":["annotations","calendar","pdf","pdf-converter","pdf-extractor","pdf-parser","planner"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/PeterMosmans.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-11-02T13:09:43.000Z","updated_at":"2023-11-02T14:52:48.000Z","dependencies_parsed_at":"2023-11-06T14:54:22.090Z","dependency_job_id":"9ab823de-c161-4fe8-bf64-9391dede49bf","html_url":"https://github.com/PeterMosmans/apdfhelper","commit_stats":null,"previous_names":["petermosmans/apdfhelper"],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PeterMosmans%2Fapdfhelper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PeterMosmans%2Fapdfhelper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PeterMosmans%2Fapdfhelper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PeterMosmans%2Fapdfhelper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/PeterMosmans","download_url":"https://codeload.github.com/PeterMosmans/apdfhelper/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243924039,"owners_count":20369644,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["annotations","calendar","pdf","pdf-converter","pdf-extractor","pdf-parser","planner"],"created_at":"2024-10-13T07:32:14.050Z","updated_at":"2025-03-16T19:45:16.120Z","avatar_url":"https://github.com/PeterMosmans.png","language":"Python","readme":"# apdfhelper: Annotated PDF Helper\n\nThis tool is originally meant to customize a PDF planner, and enhance its usage.\nWith PDF files, it can:\n\n- remove pages\n\n  If you don't use certain pages (anymore), you can remove them.\n\n- display or add table of contents\n\n  Would you like to have a table of contents (bookmarks, or an outline in\n  PDF-parlance)? Title each page? With this tool you can view , edit and rewrite\n  page titles with a table of contents.\n\n- extract notes\n\n  Extract notes (text annotations) in text format, ordered per page. If there is\n  a table of contents title defined for that page, it will show the title of the\n  page on which the note(s) appear(s).\n\n- rewrite (broken) internal links\n\n  Rewire named links in a document to specific pages.\n\n- fix link types\n\n  Ensure that internal links show the correct page, fitted in a PDF viewer.\n\n- swap pages\n\n  Not happy with the current page ordering? Swap them around. Cut a page, and\n  insert it into another location.\n\n- split PDF into multiple single pages\n\n  Extract all pages from a PDF file as single PDF files.\n\n- inject pages from a PDF file into another PDF file\n\n  If you need some pages of another PDF file, you can copy and insert them into\n  your PDF file.\n\nSorry, currently only a command-line version of this tool is supplied, no\ngraphical interface exists (yet...).\n\n## Installation\n\nPython 3 is required.\n\n```bash\npip install -r requirements.txt\n```\n\n## Workflow\n\nWhen wanting to 're-organize' a PDF file, say `calendar.pdf`, first ensure that\nthe pages themselves are in order, using the `cut`, `remove` and `swap`\ncommands. Then, create a text file with page titles, `toc.txt`, the table of\ncontents. The format of the file is TITLE PAGENUMBER, for example:\n\n```\nOverview 2024-2025 3\nJanuary 14\nNovember 28\n Week 44 29\n```\n\nThis table of contents creates 4 table of content entries, for 'Overview\n2024-2025' pointing to page 3, to 'January' on page 14, 'November' on page 28,\nwith sub item 'Week 44' on page 29. Entries support nesting, where spaces are\nused as delimiter.\n\nThen, if there are any named links in the document defined, extract them using\n`python apdfhelper.py links calendar.pdf \u003e links.txt`. This outputs all named\nlinks to `links.txt` with the page numbers it's referring to.\n\nNext, edit `links.txt` and use the correct page numbers or use any of the titles\nthat are defined in the table of contents file `toc.txt`. When using titles,\ndon't forget to use quotes around them, for example:\n\n```\nmossery-dpln-2023_third-edition.indd:2023-03-2023\u00262024YC:244 \"Overview 2024-2025\"\nmossery-dpln-2023_third-edition.indd:2023-04-M-Jan:5 \"January\"\nmossery-dpln-2023_third-edition.indd:2023-04-WG-Week44:148 \"Week 44\"\nmossery-dpln-2023_third-edition.indd:2023-04-Note-02:183 6\n```\n\nNext, embed the table of contents in `calendar.pdf` and create or update the\nlinks using the `rewrite` command:\n\n```bash\npython apdfhelper.py rewrite calendar.pdf --tocfile toc.txt output.pdf links.txt\n```\n\nAnd voila, the file `output.pdf` will now contain the defined table of content\nentries, as well as links to the correct pages.\n\n### Remove one or more pages\n\nSpecify one page number, multiple page numbers (separated by a ','), or ranges\nof pages (separated by a '-') to be deleted.\n\n```bash\npython apdfhelper.py remove INFILE OUTFILE RANGES\n```\n\nExample to remove page 1, and page 189 up to and including 212:\n\n```\npython apdfhelper.py calendar.pdf output.pdf 1,189-212\n```\n\n### View table of content entries\n\n```bash\npython apdfhelper.py toc INFILE\n```\n\n### Add table of content entries\n\n```bash\npython apdfhelper.py toc --add --title \"Title of my page\" --page PAGENUMBER\n```\n\n### Extract notes (annotations) from a PDF file\n\nExtract all notes (text annotations) from a PDF file, and optionally show the\ntitle or page number where the annotation appears.\n\nExample:\n\n```\npython apdfhelper.py notes --headers calendar.pdf\n```\n\nThis will return a list of all text annotations in `calendar.pdf`, grouped per\npage. If there is a title defined for that page, it will show the title of the\npage instead.\n\n### Extract all named links from a PDF file\n\nInstead of directly linking to page numbers, PDF links can be named. `links`\nextracts all named links that are defined in a PDF file, with the page number\nit's pointing to. This can be useful as input when rewriting links. If the link\nsays `broken`, it's pointing to a non-existing page. Note that this can be fixed\nusing `rewrite`.\n\nExample:\n\n```\npython apdfhelper.py links calendar.pdf\n```\n\n### Rewrite links in a PDF file\n\nSometimes named links are broken: They point to non-existing pages. Or, you'd\nlike to rewire the location of a named link. Use as input a text file,\ncontaining the named link, followed by a space and a page number.\n\nExample contents of a link file:\n\n```\nmossery-dpln-2023_third-edition.indd:2023-02-Index:241 2\nmossery-dpln-2023_third-edition.indd:2023-03-YO-H1:3 29\n```\n\nThis rewrites the link named\n`mossery-dpln-2023_third-edition.indd:2023-02-Index:241` to page 2, and the link\nnamed `mossery-dpln-2023_third-edition.indd:2023-03-YO-H1:3` to page 29.\n\nAlternatively, you can supply a table of contents file, in order to map page\nnumbers to page titles. This can be easier when for instance a lot of links\npoint to the same page number, or when you often change the ordering of pages.\nThe dictionary consists of a title, and a page number. Then, in the link file,\nuse that title instead of the page number. Don't forget to put double quotes\naround the title in the link file, for example:\n\n```\nmossery-dpln-2023_third-edition.indd:2023-04-M-Nov:147 \"November\"\nmossery-dpln-2023_third-edition.indd:2023-04-WG-Week43:144 27\nmossery-dpln-2023_third-edition.indd:2023-04-WG-Week44:148 \"Week 44\"\n```\n\n#### Usage\n\n```\napdfhelper.py rewrite [OPTIONS] INFILE OUTFILE LINKFILE\n\n  Rewrite links in a PDF file based on a configuration file.\n\n  If fit is given, rewrite type of link to 'Fit to page'. If tocfile is given,\n  parse page numbers from a table of contents file.\n\nArguments:\n  INFILE    [required]\n  OUTFILE   [required]\n  LINKFILE  [required]\n\nOptions:\n  --tocfile TEXT\n  --fit / --no-fit          [default: no-fit]\n  --verbose / --no-verbose  [default: no-verbose]\n```\n\n#### Example\n\n```\npython apdfhelper.py rewrite calendar.pdf output.pdf --tocfile toc.txt links.txt\n```\n\nNote that existing table of content entries will be removed, prior to importing\nnew ones when `--tocfile` is supplied.\n\n### Detailed link information\n\nIf you'd like to see which page contains links (clickable areas), and what the\nlink points to, use `page-links`. The result is the page number on which the\nlink occurs, with the coordinates of the link (left, top, right, bottom), the\n_type_ of link (internal or external), and what the link points to.\n\nOptionally you can see which page number a link points to, which can be useful\nfor troubleshooting broken links on pages.\n\n#### Usage\n\n```bash\napdfhelper.py page-links [OPTIONS] INFILE\n\nDisplay links on a specific page, or all pages.\n\nOutput format is: pagenumber left top right bottom [internal | external] link.\n\nWhen resolve is given, specify the page number of the link instead of the\nnamed link. Otherwise links might show up as broken.\n\nArguments:\nINFILE [required]\n\nOptions:\n--page INTEGER [default: 0]\n--resolve / --no-resolve [default: no-resolve]\n--detailed / --no-detailed [default: no-detailed]\n```\n\n### Split PDF\n\nSay you want to extract each page of a PDF file as single PDF file. Use the\nsplit command to do exactly that. Naming of the extracted files can be set by\nspecifying a prefix, which will be followed by the page number.\n\n```bash\napdfhelper.py split [OPTIONS] INFILE PREFIX\n\nSplit one PDF into multiple single pages. The name uses prefix and the page\nnumber.\n\nArguments:\nINFILE [required]\nPREFIX [required]\n```\n\n## Advanced usage\n\nAs an advanced example, the PDF Mossery 2024 calendar that can be found on\nhttps://www.mossery.co/products/2024-digital-planner contains gridded, vertical\nand horizontal layouts. To remove the gridded and horizontal layouts in an\noriginal unmodified (!) calendar file, use the following commands:\n\n```bash\n./apdfhelper.py remove calendar-2024.pdf output.pdf 38,40,41,43,44,46,47,49,51,53,54,56,57,59,60,62,63,65,67,69,70,72,73,75,76,78,80,82,83,85,86,88,89,91,93,95,96,98,99,101,102,104,105,107,109,111,112,114,115,117,118,120,122,124,125,127,128,130,131,133,135,137,138,140,141,143,144,146,147,149,151,153,154,156,157,159,160,162,164,166,167,169,170,172,173,175,176,178,180,182,183,185,186,188,189,191,193,195,196,198,199,201,202,204,205,207\n```\n\nNote that this removes the pages, which will result in broken links. Create a\nfile with all named links:\n\n```\n./apdfhelper.py links output.pdf \u003e links.txt\n```\n\nThen use a text editor to fix the broken links in `links.txt` (replace them with\nvalid page numbers), and apply the new links to the modified file:\n\n```bash\n./apdfhelper.py rewrite output.pdf fixed.pdf links.txt --fit\n```\n\nNow the file `fixed.pdf` will contain the 2024 calendar, containing the vertical\nlayout, with working links.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpetermosmans%2Fapdfhelper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpetermosmans%2Fapdfhelper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpetermosmans%2Fapdfhelper/lists"}