{"id":21206640,"url":"https://github.com/oncomouse/openmodernism-tools","last_synced_at":"2025-10-20T03:58:20.079Z","repository":{"id":26661964,"uuid":"30118339","full_name":"oncomouse/openmodernism-tools","owner":"oncomouse","description":null,"archived":false,"fork":false,"pushed_at":"2015-02-05T18:25:58.000Z","size":1593,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-01-21T15:48:43.130Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/oncomouse.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-01-31T16:40:45.000Z","updated_at":"2022-02-23T05:49:44.000Z","dependencies_parsed_at":"2022-07-24T11:45:05.201Z","dependency_job_id":null,"html_url":"https://github.com/oncomouse/openmodernism-tools","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oncomouse%2Fopenmodernism-tools","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oncomouse%2Fopenmodernism-tools/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oncomouse%2Fopenmodernism-tools/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oncomouse%2Fopenmodernism-tools/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/oncomouse","download_url":"https://codeload.github.com/oncomouse/openmodernism-tools/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243658270,"owners_count":20326467,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-20T20:56:17.674Z","updated_at":"2025-10-20T03:58:15.053Z","avatar_url":"https://github.com/oncomouse.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Open Modernism Tools\n\nThis repository collects tools I'm working on for the Open Modernism project. At the moment, the major project is converting image PDFs into LaTeX files, which I discuss below.\n\n## Converting PDFs of scanned images into LaTeX files\n\nIn converting scanned PDF files (whether w/ OCR or w/o OCR) into Markdown for Open Modernism, we have expressed concerns over how to work in an ecology in which some document have been corrected and converted to Markdown and some have not. This tool (`pdf-to-latex-partial.py`) attempts to bridge that gap by converting a PDF of scanned images into a LaTeX partial that can be incorporated into a larger LaTeX document (w/ TOC, etc).\n\n### Requirements\n\n* Python 2.7+\n* [Poppler](http://poppler.freedesktop.org/)\n* XeTeX (optional; for compiling)\n\t* graphicx\n\t* geometry\n\t* grffile\n\n### How it works\n\nThe python script harvests a scanned image PDF file (`Mina Loy - History of Religion of Eros.pdf` and `MrBennettAndMrsBrown.pdf` are provided as examples; one file has an OCR layer, the other doesn't (this matters, sadly)) and creates a directory of `.png` files. It then generates a LaTeX partial that resets the page geometry to have no margins, includes each image, and then restores the original geometry.\n\nThe images are generated using Poppler's `pdftocairo` program which uses a complicated PDF rendering algorithm to extract the page images (Cairo is important because w/ OCR there isn't a stable image file that can just be pulled out of the file). For PDFs with OCR layers, this process can take several minutes.\n\nWhile this may seem like a useless feature (it's essentially doing a ton of work to reproduce something that already exists), the LaTeX partial can then be included into a larger TeX anthology and appear in a TOC and keep proper page numbering.\n\n#### Dependencies for TeX\n\nI've only tried compiling this in `XeTeX`, so YMMV, but as documented in `base.pdf`, the minimum packages needed for this to work are:\n\n* graphicx\n* geometry\n* grffile\n\n### Use\n\n``` shell\npython pdf-to-latex-partial.py [\u003cFILE\u003e]+\n```\n\n## Hosting Platform Ideas\n\n* [Ikiwiki](http://ikiwiki.info/) --- Git backend, Markdown support, various account handlers\n\t* Custom plugin for firing Pandoc + plugins","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foncomouse%2Fopenmodernism-tools","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Foncomouse%2Fopenmodernism-tools","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foncomouse%2Fopenmodernism-tools/lists"}