Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ktaaaki/paper2html
Converts a single/double-column PDF formatted paper into a html page, which has the original view & the paragraph view extracted from the paper for translation from the browser.
https://github.com/ktaaaki/paper2html
academic-paper paper-reading pdf-miner translation
Last synced: 2 months ago
JSON representation
Converts a single/double-column PDF formatted paper into a html page, which has the original view & the paragraph view extracted from the paper for translation from the browser.
- Host: GitHub
- URL: https://github.com/ktaaaki/paper2html
- Owner: ktaaaki
- License: agpl-3.0
- Created: 2020-09-03T01:49:52.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2022-05-09T05:38:11.000Z (over 2 years ago)
- Last Synced: 2024-08-04T13:06:51.970Z (6 months ago)
- Topics: academic-paper, paper-reading, pdf-miner, translation
- Language: Python
- Homepage:
- Size: 571 KB
- Stars: 21
- Watchers: 1
- Forks: 5
- Open Issues: 13
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# paper2html
[![License: AGPL](https://img.shields.io/badge/license-AGPL-yellow)](https://opensource.org/licenses/AGPL-3.0)
[![Python Version](https://img.shields.io/badge/python-3.6|3.7|3.8|3.9-blue)](https://github.com/ktaaaki/paper2html)
[![Platform](https://img.shields.io/badge/platform-windows|macos|linux-blue)](https://github.com/ktaaaki/paper2html)Convert a PDF paper to html page.
You can translate the paper easily by browser functions, and you can view the original document and the translated document at the same time.![work_on_edge](https://user-images.githubusercontent.com/50911393/110310478-f2d3d600-8045-11eb-9f97-4f8bbfd5ec3a.gif)
Albanie, Samuel, Sébastien Ehrhardt, and Joao F. Henriques. "Stopping gan violence: Generative unadversarial networks." arXiv preprint arXiv:1703.02528 (2017).
If you want to convert papers more accurately, you can also use [a good experimental service](https://papertohtml.org/) by Allen Institute for AI.
## Features
- Convert PDF files on the Internet easily by using a bookmarklet.
- Support for double-column papers.## Installing and running paper2html server
### Docker
```shell
$ docker run --rm -it -p 6003:6003 ghcr.io/ktaaaki/paper2html
```Use with care as it opens up the port.
### Debian GNU/Linux, Ubuntu
```shell
$ sudo apt install poppler-utils poppler-data
$ git clone https://github.com/ktaaaki/paper2html.git
$ pip install -e paper2html
$ python3 ./paper2html/main.py
```### macOS
```shell
$ brew install poppler
$ git clone https://github.com/ktaaaki/paper2html.git
$ pip install -e paper2html
$ python3 ./paper2html/main.py
```### Windows
Download `Poppler for Windows` binary file from
Please set the `Poppler for Windows` path(ex.`C:\Users\YOUR_NAME\Downloads\poppler-0.68.0\bin`) in the PATH environment variable.Verify that the path is displayed with the following command.
```powershell
> where.exe pdfinfo
```Download the zip file or use `git clone` command to save the paper2html code locally, and then install it using the following command.
```powershell
> py -m pip -e paper2html
> python .\paper2html\main.py
```## Usage
### Conversion PDF on the web to html with paper2html server
Upload a PDF file to the server by using this bookmarklet.
```js
javascript:var esc=encodeURIComponent;var d=document;var subw=window.open('http://localhost:6003/paper2html/convert?url='+esc(location.href)).document;
```Click on the bookmarklet when you open a PDF paper in your browser.
Then the conversion will start and the generated html will be opened after a while.You can see the list of converted documents in the index page `localhost:6003/paper2html/index.html`
NOTE👉 If you are running a paper2html server on Docker, you will not be able to convert PDF file on the host OS with the bookmarklet. See [docker image doc](docker/README.md).
### Conversion local PDF to html with CLI
Run this command, then open the html file in your browser.
```shell
$ python paper2html/commands.py "path-to-paper-file.pdf"
```In IPython, do it like this.
```py
>>> import paper2html
>>> paper2html.open_paper_htmls("path-to-paper-file.pdf")
```You can use specific browser.
```shell
$ python paper2html/commands.py "path-to-paper-file.pdf" --browser_path="/path/to/browser"
```You can also only convert without opening a browser.
```py
>>> import paper2html
>>> paper2html.paper2html("path-to-paper-file or directory")
```