https://github.com/ktaaaki/paper2html
Converts a single/double-column PDF formatted paper into a html page, which has the original view & the paragraph view extracted from the paper for translation from the browser.
https://github.com/ktaaaki/paper2html
academic-paper paper-reading pdf-miner translation
Last synced: about 2 months ago
JSON representation
Converts a single/double-column PDF formatted paper into a html page, which has the original view & the paragraph view extracted from the paper for translation from the browser.
- Host: GitHub
- URL: https://github.com/ktaaaki/paper2html
- Owner: ktaaaki
- License: agpl-3.0
- Created: 2020-09-03T01:49:52.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2022-05-09T05:38:11.000Z (about 3 years ago)
- Last Synced: 2024-08-04T13:06:51.970Z (11 months ago)
- Topics: academic-paper, paper-reading, pdf-miner, translation
- Language: Python
- Homepage:
- Size: 571 KB
- Stars: 21
- Watchers: 1
- Forks: 5
- Open Issues: 13
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# paper2html
[](https://opensource.org/licenses/AGPL-3.0)
[](https://github.com/ktaaaki/paper2html)
[](https://github.com/ktaaaki/paper2html)Convert a PDF paper to html page.
You can translate the paper easily by browser functions, and you can view the original document and the translated document at the same time.
Albanie, Samuel, Sébastien Ehrhardt, and Joao F. Henriques. "Stopping gan violence: Generative unadversarial networks." arXiv preprint arXiv:1703.02528 (2017).
If you want to convert papers more accurately, you can also use [a good experimental service](https://papertohtml.org/) by Allen Institute for AI.
## Features
- Convert PDF files on the Internet easily by using a bookmarklet.
- Support for double-column papers.## Installing and running paper2html server
### Docker
```shell
$ docker run --rm -it -p 6003:6003 ghcr.io/ktaaaki/paper2html
```Use with care as it opens up the port.
### Debian GNU/Linux, Ubuntu
```shell
$ sudo apt install poppler-utils poppler-data
$ git clone https://github.com/ktaaaki/paper2html.git
$ pip install -e paper2html
$ python3 ./paper2html/main.py
```### macOS
```shell
$ brew install poppler
$ git clone https://github.com/ktaaaki/paper2html.git
$ pip install -e paper2html
$ python3 ./paper2html/main.py
```### Windows
Download `Poppler for Windows` binary file from
Please set the `Poppler for Windows` path(ex.`C:\Users\YOUR_NAME\Downloads\poppler-0.68.0\bin`) in the PATH environment variable.Verify that the path is displayed with the following command.
```powershell
> where.exe pdfinfo
```Download the zip file or use `git clone` command to save the paper2html code locally, and then install it using the following command.
```powershell
> py -m pip -e paper2html
> python .\paper2html\main.py
```## Usage
### Conversion PDF on the web to html with paper2html server
Upload a PDF file to the server by using this bookmarklet.
```js
javascript:var esc=encodeURIComponent;var d=document;var subw=window.open('http://localhost:6003/paper2html/convert?url='+esc(location.href)).document;
```Click on the bookmarklet when you open a PDF paper in your browser.
Then the conversion will start and the generated html will be opened after a while.You can see the list of converted documents in the index page `localhost:6003/paper2html/index.html`
NOTE👉 If you are running a paper2html server on Docker, you will not be able to convert PDF file on the host OS with the bookmarklet. See [docker image doc](docker/README.md).
### Conversion local PDF to html with CLI
Run this command, then open the html file in your browser.
```shell
$ python paper2html/commands.py "path-to-paper-file.pdf"
```In IPython, do it like this.
```py
>>> import paper2html
>>> paper2html.open_paper_htmls("path-to-paper-file.pdf")
```You can use specific browser.
```shell
$ python paper2html/commands.py "path-to-paper-file.pdf" --browser_path="/path/to/browser"
```You can also only convert without opening a browser.
```py
>>> import paper2html
>>> paper2html.paper2html("path-to-paper-file or directory")
```