{"id":18277117,"url":"https://github.com/js2hou/pdf-bookmark","last_synced_at":"2025-04-09T04:31:04.519Z","repository":{"id":179869097,"uuid":"450162907","full_name":"Js2Hou/PDF-Bookmark","owner":"Js2Hou","description":"semi-auto add bookmark for Electronic pdf books","archived":false,"fork":false,"pushed_at":"2022-01-20T16:03:33.000Z","size":8,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-14T22:46:03.619Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Js2Hou.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-01-20T16:00:12.000Z","updated_at":"2022-01-20T16:05:40.000Z","dependencies_parsed_at":null,"dependency_job_id":"007ef022-b95e-40e2-b6bb-ee41e6b801e0","html_url":"https://github.com/Js2Hou/PDF-Bookmark","commit_stats":null,"previous_names":["js2hou/pdf-bookmark"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Js2Hou%2FPDF-Bookmark","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Js2Hou%2FPDF-Bookmark/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Js2Hou%2FPDF-Bookmark/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Js2Hou%2FPDF-Bookmark/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Js2Hou","download_url":"https://codeload.github.com/Js2Hou/PDF-Bookmark/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247980826,"owners_count":21027803,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-05T12:18:09.446Z","updated_at":"2025-04-09T04:31:04.495Z","avatar_url":"https://github.com/Js2Hou.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# PDF-Bookmark\n\n\u003e 网络下载的pdf书籍很多都没有大纲信息，当然可以使用Adobe Acrobat Reader软件添加目录结构，但当书籍章节很多时这样颇为耗时。为解决此痛点，本项目可半自动给pdf书籍添加目录大纲结构。\n\n## 功能\n\n本项目实现了给pdf书籍添加书签的功能。\n\n## 使用步骤\n\n1. 安装依赖库 `pip install -r requirements.txt\n`\n2. 制作pdf目录页码索引.txt文件\n\n    格式：[缩进]@[章节标题]@[页码]，\n    样例如下\n\n    ```python\n    #@content@-7\n    #@1 Introduction@1\n    ##@1.1 A Universal Task: Pursuit of Low-Dimensionality@1\n    #@2 Sparse Signal Models@33\n    ```\n\n    其中`#`数目表示层级，如`#`表示一级标题，`##`表示二级标题；`@`为分隔符，第一个`@`后为章节名称，第二个`@`后为页码数。\n\n    **注意**：可通过QQ ocr辅助制作该文件，后面会介绍\n\n3. 修改配置文件info.ini\n\n    ```python\n    [info]\n    pdf_path = C:\\Users\\ahou\\Desktop\\book.pdf\n    bookmark_path = C:\\Users\\ahou\\Desktop\\bookmark.txt\n    page_offset = 26\n    new_pdf_path = C:\\Users\\ahou\\Desktop\\new_book.pdf\n    ```\n\n    `bookmark_path` 为步骤1中生成的文件路径；`pdf_path`和`new_pdf_path`分别为待处理pdf文件和处理后新文件保存路径；`page_offset`为显示页码和实际页码差。书籍一般从正文开始记页，但在该页前面可能会有目录、前言、封面等信息，这部分页码数即为`page_offset`。\n\n4. 运行`main.py`文件\n\n## 使用QQ OCR辅助制作书签索引\n\n可以使用qq功能键ctrl+alt+o OCR识别pdf目录，生成对应文本。然后在notepad++中编辑格式，使用正则表达式进行快捷替换，常用替换规则如下：\n\n1. 替换页码前的换行回车为@：`([a-z])\\r\\n([1-9][0-9])  --\u003e  \\1@\\2`\n\n    ```python\n    # before\n    chapter 1 Introduction\n    50\n    \n    # after\n    chapter 1 Introduction@50\n    ```\n\n2. 替换目录名称前的换行回车为空格：`(\\.[1-9])\\r\\n([A-Z])  --\u003e  \\1 \\2`\n\n    ```python\n    # before\n    1.1.1\n    Background\n    \n    # after\n    1.1.1 Background\n    ```\n\n3. 替换一级标题后面的换行回车为空格：`(\\r\\n[1-9]+)\\r\\n([A-Z])  ---\u003e  \\1 \\2`\n\n    ```python\n    # before\n    1\n    Background\n    \n    # after\n    1 Background\n    ```\n\n4. 标题添加缩进\n   - 一级标题：`\\r\\n([1-9]+ [A-Z])  ---\u003e  \\r\\n#@\\1`\n   - 二级标题：`\\r\\n([0-9]+\\.[0-9]+ [A-Z])  ---\u003e  \\r\\n##@\\1`\n   - 三级标题：`\\r\\n([0-9]+\\.[0-9]+\\.[0-9]+ [A-Z])  ---\u003e  \\r\\n###@\\1`\n   - 剩余无缩进的添加一级缩进\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjs2hou%2Fpdf-bookmark","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjs2hou%2Fpdf-bookmark","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjs2hou%2Fpdf-bookmark/lists"}