Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/guillaumegenthial/im2latex
Image to LaTeX (Seq2seq + Attention with Beam Search) - Tensorflow
https://github.com/guillaumegenthial/im2latex
attention-seq2seq beam-search im2latex imagecaptioning seq2seq seq2seq-attn show-and-tell tensorflow
Last synced: 6 days ago
JSON representation
Image to LaTeX (Seq2seq + Attention with Beam Search) - Tensorflow
- Host: GitHub
- URL: https://github.com/guillaumegenthial/im2latex
- Owner: guillaumegenthial
- License: apache-2.0
- Created: 2017-09-16T23:38:58.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2020-08-19T07:51:03.000Z (over 4 years ago)
- Last Synced: 2024-12-30T02:17:54.050Z (13 days ago)
- Topics: attention-seq2seq, beam-search, im2latex, imagecaptioning, seq2seq, seq2seq-attn, show-and-tell, tensorflow
- Language: Python
- Homepage:
- Size: 3.75 MB
- Stars: 454
- Watchers: 10
- Forks: 128
- Open Issues: 10
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# Im2Latex
Seq2Seq model with Attention + Beam Search for Image to LaTeX, similar to [Show, Attend and Tell](https://arxiv.org/abs/1502.03044) and [Harvard's paper and dataset](http://lstm.seas.harvard.edu/latex/).
Check the [blog post](https://guillaumegenthial.github.io/image-to-latex.html).
## Install
Install pdflatex (latex to pdf) and ghostsript + [magick](https://www.imagemagick.org/script/install-source.php
) (pdf to png) on Linux```
make install-linux
```(takes a while ~ 10 min, installs from source)
On Mac, assuming you already have a LaTeX distribution installed, you should have pdflatex and ghostscript installed, so you just need to install magick. You can try
```
make install-mac
```## Getting Started
We provide a small dataset just to check the pipeline. To build the images, train the model and evaluate
```
make small
```You should observe that the model starts to produce reasonable patterns of LaTeX after a few minutes.
## Data
We provide the pre-processed formulas from [Harvard](https://zenodo.org/record/56198#.V2p0KTXT6eA) but you'll need to produce the images from those formulas (a few hours on a laptop).
```
make build
```Alternatively, you can download the [prebuilt dataset from Harvard](https://zenodo.org/record/56198#.V2p0KTXT6eA) and use their preprocessing scripts found [here](https://github.com/harvardnlp/im2markup)
## Training on the full dataset
If you already did `make build` you can just train and evaluate the model with the following commands
```
make train
make eval
```Or, to build the images from the formulas, train the model and evaluate, run
```
make full
```## Details
1. Build the images from the formulas, write the matching file and extract the vocabulary. __Run only once__ for a dataset
```
python build.py --data=configs/data.json --vocab=configs/vocab.json
```2. Train
```
python train.py --data=configs/data.json --vocab=configs/vocab.json --training=configs/training.json --model=configs/model.json --output=results/full/
```3. Evaluate the text metrics
```
python evaluate_txt.py --results=results/full/
```4. Evaluate the image metrics
```
python evaluate_img.py --results=results/full/
```(To get more information on the arguments, run)
```
python file.py --help
```