{"id":13435809,"url":"https://github.com/caj2pdf/caj2pdf","last_synced_at":"2025-05-14T21:09:12.649Z","repository":{"id":37660852,"uuid":"100870313","full_name":"caj2pdf/caj2pdf","owner":"caj2pdf","description":"Convert CAJ (China Academic Journals) files to PDF. 转换中国知网 CAJ 格式文献为 PDF。佛系转换，成功与否，皆是玄学。","archived":false,"fork":false,"pushed_at":"2024-03-20T02:09:01.000Z","size":308,"stargazers_count":3071,"open_issues_count":26,"forks_count":632,"subscribers_count":46,"default_branch":"master","last_synced_at":"2025-04-06T13:07:57.072Z","etag":null,"topics":["caj","cnki","pdf","python","python3"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/caj2pdf.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-08-20T15:24:03.000Z","updated_at":"2025-04-05T13:40:56.000Z","dependencies_parsed_at":"2023-02-15T23:46:06.196Z","dependency_job_id":"c369fd1d-aca2-450c-b389-9a1f780d9bd2","html_url":"https://github.com/caj2pdf/caj2pdf","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/caj2pdf%2Fcaj2pdf","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/caj2pdf%2Fcaj2pdf/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/caj2pdf%2Fcaj2pdf/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/caj2pdf%2Fcaj2pdf/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/caj2pdf","download_url":"https://codeload.github.com/caj2pdf/caj2pdf/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248741202,"owners_count":21154255,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["caj","cnki","pdf","python","python3"],"created_at":"2024-07-31T03:00:39.456Z","updated_at":"2025-04-13T16:10:23.217Z","avatar_url":"https://github.com/caj2pdf.png","language":"Python","funding_links":[],"categories":["HarmonyOS","📦 Others (实用工具、媒体与其它)","Python"],"sub_categories":["Windows Manager"],"readme":"# caj2pdf\n\n## Why\n\n[中国知网](http://cnki.net/)的某些文献（多为学位论文）仅提供其专有的 CAJ 格式下载，仅能使用知网提供的软件（如 [CAJViewer](http://cajviewer.cnki.net/) 等）打开，给文献的阅读和管理带来了不便（尤其是在非 Windows 系统上）。\n\n若要将 CAJ 文件转换为 PDF 文件，可以使用 CAJViewer 的打印功能。但这样得到的 PDF 文件的内容为图片，无法进行文字的选择，且原文献的大纲列表也会丢失。本项目希望可以解决上述两问题。\n\n## How far we've come\n\n知网下载到的后缀为 `caj` 的文件内部结构其实分为两类：CAJ 格式和 HN 格式（受考察样本所限可能还有更多）。目前本项目支持 CAJ 格式文件的转换，HN 格式的转换未完善，并且需要建立两个新的共享库（除了Microsoft Windows：我们提供Microsoft Windows 32-bit/64-bit DLLs, Mac OS users can download from [extra libs build](https://github.com/caj2pdf/caj2pdf-extra-libs/releases/tag/BUILD-0.1), and `chmod +x ...`)，详情如下：\n\n```\ncc -Wall -fPIC --shared -o libjbigdec.so jbigdec.cc JBigDecode.cc\ncc -Wall `pkg-config --cflags poppler` -fPIC -shared -o libjbig2codec.so decode_jbig2data.cc `pkg-config --libs poppler`\n```\n\n抑或和libpoppler 相比，还是取决于您是否更喜欢libjbig2dec一点，可以替换libpoppler：\n\n```\ncc -Wall -fPIC --shared -o libjbigdec.so jbigdec.cc JBigDecode.cc\ncc -Wall `pkg-config --cflags jbig2dec` -fPIC -shared -o libjbig2codec.so decode_jbig2data_x.cc `pkg-config --libs jbig2dec`\n```\n\n**关于两种格式文件结构的分析进展和本项目的实现细节，请查阅[项目 Wiki](https://github.com/JeziL/caj2pdf/wiki)。**\n\n## How to contribute\n\n受测试样本数量所限，即使转换 CAJ 格式的文件也可能（或者说几乎一定）存在 Bug。如遇到这种情况，欢迎在 [Issue](https://github.com/JeziL/caj2pdf/issues) 中提出，**并提供可重现 Bug 的 caj 文件**——可以将样本文件上传到网盘等处\u003cdel\u003e，也可直接提供知网链接\u003c/del\u003e（作者已滚出校园网，提 issue 请提供可下载的 caj 文件）。\n\n如果你对二进制文件分析、图像/文字压缩算法、逆向工程等领域中的一个或几个有所了解，欢迎帮助完善此项目。你可以从阅读[项目 Wiki](https://github.com/JeziL/caj2pdf/wiki) 开始，看看是否有可以发挥你特长的地方。**Pull requests are always welcome**.\n\n## How to use\n\n### 环境和依赖\n\n- Python 3.3+\n- [PyPDF2](https://github.com/mstamy2/PyPDF2)\n- [mutool](https://mupdf.com/index.html)\n\n除了Microsoft Windows：我们提供Microsoft Windows 32-bit/64-bit DLLs，HN 格式需要\n\n- C/C++编译器\n- libpoppler开发包，或libjbig2dec开发包\n\n### 用法\n\n```\n# 打印文件基本信息（文件类型、页面数、大纲项目数）\ncaj2pdf show [input_file]\n\n# 转换文件\ncaj2pdf convert [input_file] -o/--output [output_file]\n\n# 从 CAJ 文件中提取大纲信息并添加至 PDF 文件\n## 遇到不支持的文件类型或 Bug 时，可用 CAJViewer 打印 PDF 文件，并用这条命令为其添加大纲\ncaj2pdf outlines [input_file] -o/--output [pdf_file]\n```\n\n### 例\n\n```\ncaj2pdf show test.caj\ncaj2pdf convert test.caj -o output.pdf\ncaj2pdf outlines test.caj -o printed.pdf\n```\n\n### 异常输出（IMPORTANT!!!）\n\n尽管这个项目目前有不少同学关注到了，但它**仍然只支持部分 caj 文件的转换**，必须承认这完全不是一个对普通用户足够友好的成熟项目。具体支持哪些不支持哪些，在前文也已经说了，但似乎很多同学并没有注意到。所以**如果你遇到以下两种输出，本项目目前无法帮助到你**。与此相关的 issue 不再回复。\n\n- `Unknown file type.`：未知文件类型；\n\n## License\n\n本项目基于 [GLWTPL](https://github.com/me-shaon/GLWTPL)  (Good Luck With That Public License) 许可证开源。\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcaj2pdf%2Fcaj2pdf","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcaj2pdf%2Fcaj2pdf","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcaj2pdf%2Fcaj2pdf/lists"}