{"id":13761541,"url":"https://github.com/ktaaaki/paper2html","last_synced_at":"2025-05-10T12:32:51.900Z","repository":{"id":44336087,"uuid":"292438157","full_name":"ktaaaki/paper2html","owner":"ktaaaki","description":"Converts a single/double-column PDF formatted paper into a html page, which has the original view \u0026 the paragraph view extracted from the paper for translation from the browser.","archived":false,"fork":false,"pushed_at":"2022-05-09T05:38:11.000Z","size":585,"stargazers_count":21,"open_issues_count":13,"forks_count":5,"subscribers_count":1,"default_branch":"master","last_synced_at":"2024-08-04T13:06:51.970Z","etag":null,"topics":["academic-paper","paper-reading","pdf-miner","translation"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ktaaaki.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-09-03T01:49:52.000Z","updated_at":"2024-07-11T09:37:06.000Z","dependencies_parsed_at":"2022-09-12T11:20:28.266Z","dependency_job_id":null,"html_url":"https://github.com/ktaaaki/paper2html","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ktaaaki%2Fpaper2html","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ktaaaki%2Fpaper2html/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ktaaaki%2Fpaper2html/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ktaaaki%2Fpaper2html/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ktaaaki","download_url":"https://codeload.github.com/ktaaaki/paper2html/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224962055,"owners_count":17399149,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["academic-paper","paper-reading","pdf-miner","translation"],"created_at":"2024-08-03T13:01:59.411Z","updated_at":"2024-11-16T19:30:54.776Z","avatar_url":"https://github.com/ktaaaki.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# paper2html\n\n[![License: AGPL](https://img.shields.io/badge/license-AGPL-yellow)](https://opensource.org/licenses/AGPL-3.0)\n[![Python Version](https://img.shields.io/badge/python-3.6|3.7|3.8|3.9-blue)](https://github.com/ktaaaki/paper2html)\n[![Platform](https://img.shields.io/badge/platform-windows|macos|linux-blue)](https://github.com/ktaaaki/paper2html)\n\nConvert a PDF paper to html page.  \nYou can translate the paper easily by browser functions, and you can view the original document and the translated document at the same time.\n\n![work_on_edge](https://user-images.githubusercontent.com/50911393/110310478-f2d3d600-8045-11eb-9f97-4f8bbfd5ec3a.gif)\n\nAlbanie, Samuel, Sébastien Ehrhardt, and Joao F. Henriques. \"Stopping gan violence: Generative unadversarial networks.\" arXiv preprint arXiv:1703.02528 (2017).\n\nIf you want to convert papers more accurately, you can also use [a good experimental service](https://papertohtml.org/) by Allen Institute for AI.\n\n## Features\n\n- Convert PDF files on the Internet easily by using a bookmarklet.\n- Support for double-column papers.\n\n## Installing and running paper2html server\n\n### Docker\n\n```shell\n$ docker run --rm -it -p 6003:6003 ghcr.io/ktaaaki/paper2html\n```\n\nUse with care as it opens up the port.\n\n### Debian GNU/Linux, Ubuntu\n\n```shell\n$ sudo apt install poppler-utils poppler-data\n$ git clone https://github.com/ktaaaki/paper2html.git\n$ pip install -e paper2html\n$ python3 ./paper2html/main.py\n```\n\n### macOS\n\n```shell\n$ brew install poppler\n$ git clone https://github.com/ktaaaki/paper2html.git\n$ pip install -e paper2html\n$ python3 ./paper2html/main.py\n```\n\n### Windows\n\nDownload `Poppler for Windows` binary file from \u003chttp://blog.alivate.com.au/poppler-windows/\u003e  \nPlease set the `Poppler for Windows` path(ex.`C:\\Users\\YOUR_NAME\\Downloads\\poppler-0.68.0\\bin`) in the PATH environment variable.\n\nVerify that the path is displayed with the following command.\n\n```powershell\n\u003e where.exe pdfinfo\n```\n\nDownload the zip file or use `git clone` command to save the paper2html code locally, and then install it using the following command.\n\n```powershell\n\u003e py -m pip -e paper2html\n\u003e python .\\paper2html\\main.py\n```\n\n## Usage\n\n### Conversion PDF on the web to html with paper2html server\n\nUpload a PDF file to the server by using this bookmarklet.\n\n```js\njavascript:var esc=encodeURIComponent;var d=document;var subw=window.open('http://localhost:6003/paper2html/convert?url='+esc(location.href)).document;\n```\n\nClick on the bookmarklet when you open a PDF paper in your browser.  \nThen the conversion will start and the generated html will be opened after a while.\n\nYou can see the list of converted documents in the index page `localhost:6003/paper2html/index.html`\n\nNOTE👉 If you are running a paper2html server on Docker, you will not be able to convert PDF file on the host OS with the bookmarklet. See [docker image doc](docker/README.md).\n\n### Conversion local PDF to html with CLI\n\nRun this command, then open the html file in your browser.\n\n```shell\n$ python paper2html/commands.py \"path-to-paper-file.pdf\"\n```\n\nIn IPython, do it like this.\n\n```py\n\u003e\u003e\u003e import paper2html\n\u003e\u003e\u003e paper2html.open_paper_htmls(\"path-to-paper-file.pdf\")\n```\n\nYou can use specific browser.\n\n```shell\n$ python paper2html/commands.py \"path-to-paper-file.pdf\" --browser_path=\"/path/to/browser\"\n```\n\nYou can also only convert without opening a browser.\n\n```py\n\u003e\u003e\u003e import paper2html\n\u003e\u003e\u003e paper2html.paper2html(\"path-to-paper-file or directory\")\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fktaaaki%2Fpaper2html","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fktaaaki%2Fpaper2html","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fktaaaki%2Fpaper2html/lists"}