{"id":15651868,"url":"https://github.com/aminya/tocpdf","last_synced_at":"2025-10-10T11:37:56.650Z","repository":{"id":66267132,"uuid":"194973237","full_name":"aminya/tocPDF","owner":"aminya","description":"Generates bookmarks from the table of contents already available at the beginning of pdf files.","archived":false,"fork":false,"pushed_at":"2025-06-29T08:56:38.000Z","size":126,"stargazers_count":39,"open_issues_count":2,"forks_count":4,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-10-10T11:37:55.893Z","etag":null,"topics":["bookmark","bookmarker","ocr","outline","pdf","table-of-contents","tableofcontents","toc"],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/aminya.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null},"funding":{"github":["aminya"],"polar":"aminya","patreon":"aminya"}},"created_at":"2019-07-03T03:30:30.000Z","updated_at":"2025-09-25T04:05:29.000Z","dependencies_parsed_at":"2023-03-13T20:30:14.892Z","dependency_job_id":"7a93c21b-1952-482b-b3d3-f3d76402893a","html_url":"https://github.com/aminya/tocPDF","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/aminya/tocPDF","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aminya%2FtocPDF","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aminya%2FtocPDF/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aminya%2FtocPDF/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aminya%2FtocPDF/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/aminya","download_url":"https://codeload.github.com/aminya/tocPDF/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aminya%2FtocPDF/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279003712,"owners_count":26083610,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-10T02:00:06.843Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bookmark","bookmarker","ocr","outline","pdf","table-of-contents","tableofcontents","toc"],"created_at":"2024-10-03T12:40:27.925Z","updated_at":"2025-10-10T11:37:56.645Z","avatar_url":"https://github.com/aminya.png","language":null,"funding_links":["https://github.com/sponsors/aminya","https://polar.sh/aminya","https://patreon.com/aminya"],"categories":[],"sub_categories":[],"readme":"# tocPDF\n*by Amin Yahyaabadi*\n\nGenerates bookmarks from the table of contents already available at the beginning of pdf files.\n\n The plan is to automate the whole procedure (https://github.com/aminya/tocPDF#automated).\n\n\n Until then here is the manual procedure:\n## Manual:\n### Step 1:  Extraction of toc pages from PDF:\nUse Chrome or software that you already have to extract the pages that contain the table of contents.\n\nTutorial for extracting pages using Chrome\nhttps://www.techadvisor.co.uk/how-to/software/how-extract-pages-from-pdf-3679232/\n\nWe refer to this file as tocPDF.\n\n### Step 2: Extract table of contents text\nHere we extract the text from tocPDF.\n\nEven if your pdf file is searchable, usually when you copy the text the result is not in proper format (like a table).\n\nPreferred Methods:\n\n* ####  Tabula technology\nFor searchable PDF only -  https://tabula.technology/\n\n\t* Download and run the software\n\t* Select table of contents and do this for each page\n\t* Hit preview and export extracted data\n\t* Export to csv format\n\n* #### Using OCR.space\nfor both scanned and searchable PDF -  https://ocr.space/\n\n\t* Upload tocPDF\n\t* Check \"Do receipt scanning and/or table recognition\" option\n\t* Use \"Just extract text and show overlay (fastest option)\" option.\n\t* download or copy paste the generated text.\n\n\nWe refer to the generated text as tocText.\n\n\n## For the following steps, instead, you can check the following which does a similar thing but with a GUI \nhttps://github.com/ifnoelse/pdf-bookmark/blob/master/README-EN.md\n\n### Step 3: Preparing the text of the table of content\n\nOpen tocText (txt or csv) with a spreadsheet editor (MS Excel or Google Sheet) or using a text editor.\n\nEdit the text such that each page number is at the beginning of a line, e.g.\n```\n1 Cover\n2 Table of Contents\n5 Chapter 1\n+6 Subchapter1\n++7 Sub-Subchapter1\n25 Chapter 2\n```\nDon't forget to add the offset to page number (usually the page numbers in pdf have an offset compared to printed document).\n\n### Step 4: Download k2pdfoptdoes:\nhttp://willus.com/k2pdfopt/download/\n\n### Step 5: (only for Windows) Disabling the GUI :\n\nDisabling the GUI using this tutorial\nhttp://willus.com/k2pdfopt/help/nogui.shtml\n\nThen drag the original pdf file into your shortcut.\n\n\n### Step 6: Run the command:\n#### Windows:\ncopy toc.txt and source pdf file in the folder of your shortcut for convenience.\n\nCopy-paste the following command in the terminal and press enter.\n```\n-mode copy -n -toclist toc.txt srcfile.pdf -o outfile.pdf\n```\nPress enter again to start bookmarking.\n\n#### OSX or Linux:\n```\nk2pdfopt -mode copy -n -toclist toc.txt srcfile.pdf -o outfile.pdf\n```\n\n## Other Manual Methods:\n#### Other method using Jpdfbookmark\nhttps://sourceforge.net/projects/jpdfbookmarks/\n\nfrom https://ebooks.stackexchange.com/a/7763/12921\n\n    Prepare the tocText file such that\n\n    Chapter 1. The Beginning/23\n        Para 1.1 Child of The Beginning/25,FitWidth,96\n            Para 1.1.1 Child of Child of The Beginning/26,FitHeight,43\n    Chapter 2. The Continue/30,TopLeft,120,42\n        Para 2.1 Child of The Beginning/32,FitPage\n\n    You can OCR the TOC and use regex to fix it.\n\n    Load that TOC\n\n    Expand all bookmarks (Ctrl + E), select all of them, then go to Tools \u003e Apply Page Offset\n\n    Enter the first pages that outmatch the page number in the TOC\n\nYou can read its manual (http://jpdfbookmarks.altervista.org/InsertBookmarks.html#1_3_1) or watch a quick video tutorial (https://youtu.be/7DUkvH7_wII?t=30). It has command line mode and can work on Linux, Mac.\n\n#### Other Methods for step 2:\n\n* Tesseract OCR:\n\thttps://github.com/tesseract-ocr/tesseract\n\n\tThree is a good tutorial for Extracting Table Data From PDFs with Tesseract OCR:\n\thttps://web.archive.org/web/20141022033241/http://craiget.com/extracting-table-data-from-pdfs-with-ocr/\n\n* Using OnlineOCR.net - Free up to a limit:\nhttps://www.onlineocr.net/\n\n\t* Register in the website (to remove page number limitation) and log in\n\t* Select txt file option, Upload tocPDF, Convert your file\n\n* A related Stack Overflow question:\nhttps://stackoverflow.com/questions/6173439/can-ocr-software-reliably-read-values-from-a-table\n\n#### References:\n\nhttps://www.willus.com/k2pdfopt/help/k2menu.shtml\n\nhttps://www.willus.com/k2pdfopt/help/options.shtml\n\n\nhttps://ebooks.stackexchange.com/questions/107/how-to-create-clickable-table-of-contents-in-a-pdf\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faminya%2Ftocpdf","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faminya%2Ftocpdf","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faminya%2Ftocpdf/lists"}